The CORDS multidatabase project
IBM Systems Journal
In virtually every organization, data are stored in a variety of ways and managed by different database and file systems. Applications requiring data from multiple sources must recognize and deal with the specifics of each data source and must also perform any necessary data integration. The objective of a multidatabase system is to provide application developers and end users with an integrated view of and a uniform interface to all the required data. The view and the interface should be
... ndent of where the data are stored and how the data are managed. CORDS is a research project focused on distributed applications. As part of this project, we are designing and prototyping a multidatabase system. This paper provides an overview of the system architecture and describes the approaches taken in the following areas: management of catalog information, schema integration, global query optimization, (distributed) transaction management, and interactions with component data sources. The prototype system gives application developers a view of a single relational database system. Currently supported component data sources include several relational database systems, a hierarchical database system, and a network database system. A lmost every large organization faces a data integration problem in which applications require access to data stored in a variety of data sources, possibly distributed over multiple platforms. The data sources may be diverse, consisting of, for example, file systems, relational database systems, or nonrelational database systems. Typically, each type of data source has its own interface and protocols for retrieving and updating data. Applications that require data from multiple data sources become complex, expensive to develop and maintain, and directly dependent on the specific data sources. Consider an application program running on a machine that needs to access data in two different database systems. Furthermore, assume that each database system runs on a different machine and that different communication protocols are required to communicate with the machines. The complexity of the application program depends on the level of support provided for connectivity and data integration. Most modern database systems provide support for remote clients; that is, an application running on a separate machine can transparently access the database systems. Remote access capability provides connectivity-a necessary prerequisite for distributed applications. However, the application program still has to deal with two different Wopyright 1995 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems.