Data warehouse architectures

Data warehouse architectures

The technical architecture of data warehouses is somewhat similar to other systems, but does have some special characteristics. There are two border areas in data warehouse architecture - the single-layer architecture and the N-layer architecture. The difference here is the number of middleware between the operational systems and the analytical tools. The data warehouse architecture described here is a high level architecture and the parts in the architectures mentioned are full bodied systems and not system-parts.

Contents

Single-layer architecture

A simple architecture is the single-layer architecture. There is no physical data warehouse or data mart between the operation data and the analytic tools. The middleware in this type of system should be considered a virtual data warehouse, which consists of a software layer and not a data based layer. The single-layer model is light weight as it minimises redundancies and thereby the amount of data stored. It has, however, no separation between analytical and operational processing. The analysis are based directly on the operational data[1].

Two-layer architecture

The two-layer model consists of operational (and external) data in the source layer and a data warehouse layer on top of these. Between the source layer and the data warehouse layer is an ETL system. The analytical part of this architecture bases its analysis on the loaded data in the data warehouse or possibly data marts. The redundancy of data means a more stable source of information as heavy load or failure in the operational systems have no effect on the analytical tools and vice versa. The data warehouse layer furthermore adds the possibility to structure data in a way that fits with the multidimensional model of analytical tools, which in turn make them faster. Such an architecture is, however, more resource consuming to build and maintain.

Three-layer architecture

The three-layer architecture consists of the source layer (containing multiple source systems), the reconciled layer and the data warehouse layer (containing both data warehouses and data marts). The reconciled layer sits between the source data and data warehouse. It is populated with data from the source systems through an ETL process and the data stored in it is published further through another ETL process. In the reconciled layer the data has been cleaned up once and integrated to a common standardised form from multiple different source systems. The ETL process that feeds the data warehouse then only gets already integrated data that has less need for transformation. This architecture is especially useful for the very large, enterprise-wide systems[1]. A disadvantage of this architecture is the extra data storage space used through the extra redundant reconciled layer. It also makes the analytical tools a little further away from being real-time.


References

  1. ^ a b Golfarelli, Matteo; Rizzi Stefano (2009). "Data Warehouse Design : Modern Principles and Methodologies", New York: McGraw-Hill.

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Data warehouse appliance — In computing, a data warehouse appliance consists of an integrated set of servers, storage, operating system(s), DBMS and software specifically pre installed and pre optimized for data warehousing (DW). Alternatively, the term can also apply to… …   Wikipedia

  • Data Intensive Computing — is a class of parallel computing applications which use a data parallel approach to processing large volumes of data typically terabytes or petabytes in size and typically referred to as Big Data. Computing applications which devote most of their …   Wikipedia

  • Real-time business intelligence — is the process of delivering information about business operations without any latency. In this context, real time means delivering information in a range from milliseconds to a few seconds after the business event. While traditional business… …   Wikipedia

  • Column-oriented DBMS — A column oriented DBMS is a database management system (DBMS) that stores its content by column rather than by row. This has advantages for data warehouses and library catalogues where aggregates are computed over large numbers of similar data… …   Wikipedia

  • Online analytical processing — In computing, online analytical processing, or OLAP (  /ˈoʊlæ …   Wikipedia

  • Database — A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports… …   Wikipedia

  • Service-oriented architecture — (SOA) is a method for systems development and integration where functionality is grouped around business processes and packaged as interoperable services . SOA also describes IT infrastructure which allows different applications to exchange data… …   Wikipedia

  • Enterprise content management — (ECM) is a set of technologies used to capture, store, preserve and deliver content and documents and content related to organizational processes. ECM tools and strategies allow the management of an organization s unstructured information,… …   Wikipedia

  • Decision support system — Example of a Decision Support System for John Day Reservoir. A decision support system (DSS) is a computer based information system that supports business or organizational decision making activities. DSSs serve the management, operations, and… …   Wikipedia

  • Computing — For the formal concept of computation, see computation. For the magazine, see Computing (magazine). For the scientific journal, see Computing (journal). A difference engine: computing the solution to a polynomial function …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”