Snowflake schema

Snowflake schema

A snowflake schema is a logical arrangement of tables in a relational database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. In the snowflake schema, however, dimensions are normalized into multiple related tables whereas the star schema's dimensions are denormalized with each dimension being represented by a single table. When the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and where child tables have multiple parent tables ("forks in the road"), a complex snowflake shape starts to emerge. The "snowflaking" effect only affects the dimension tables and not the fact tables.

Common uses

The star and snowflake schema are most commonly found in dimensional data warehouses and data marts where speed of data retrieval is more important than the efficiency of data manipulations. As such, the tables in these schema are not normalized much, and are frequently designed at a level of normalization short of third normal form.

The decision on whether to employ a star schema or a snowflake schema should consider the relative strengths of the database platform in question and the query tool to be employed. Star schema should be favored with query tools that largely expose users to the underlying table structures, and in environments where most queries are simpler in nature. Snowflake schema are often better with more sophisticated query tools that isolate users from the raw table structures and for environments having numerous queries with complex criteria.

Data normalization and storage

Normalization splits up data to avoid redundancy (duplication) by moving commonly repeating groups of data into a new table. Normalization therefore tends to increase the number of tables that need to be joined in order to perform a given query, but reduces the space required to hold the data and the number of places where it needs to be updated if the data changes.

From a space storage point of view, the size of the dimensional tables are typically small compared to that of the fact tables. This often removes the storage space benefit of snowflaking the dimension tables.

Some database developers compromise by creating an underlying snowflake schema with views built on top of it that perform many of the necessary joins to simulate a star schema. This provides the storage benefits achieved through the normalization of dimensions with the ease of querying that the star schema provides. The tradeoff is that requiring the server to perform the underlying joins automatically can result in a performance hit when querying as well as extra joins to tables that may not be necessary to fulfill certain queries.

Benefits of "snowflaking"

* Some OLAP multidimensional database modeling tools that use dimensional data marts as a data source are optimized for snowflake schemas.
* If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snowflaking may be appropriate.
* A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will hence be easier to implement.
* A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.
* Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snowflaking would typically be required to permit simple query tools to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed.

Examples

The example schema shown to the right is a snowflaked version of the star schema example provided in the star schema article.

The following example query is the snowflake schema equivalent of the star schema example code which returns the total number of TV units sold by brand and by country for 1997. Notice that the snowflake schema query requires many more joins than the star schema version in order to fulfill even a simple query. The benefit of using the snowflake schema in this example is that the storage requirements are lower since the snowflake schema eliminates many duplicate values from the dimensions themselves.SELECT B.Brand, G.Country, SUM (F.Units_Sold)FROM Fact_Sales FINNER JOIN Dim_Date D ON F.Date_Id = D.IdINNER JOIN Dim_Store S ON F.Store_Id = S.IdINNER JOIN Dim_Geography G ON S.Geography_Id = G.IdINNER JOIN Dim_Product P ON F.Product_Id = P.IdINNER JOIN Dim_Product_Category C ON P.Product_Category_Id = C.IdINNER JOIN Dim_Brand B ON P.Brand_Id = B.IdWHERE D.Year = 1997 AND C.Product_Category = 'tv'GROUP BY B.Brand, G.Country

Using the Open Source tool [http://revj.sourceforge.net/ Reverse Snowflake Joins] , it is possible to parse the SQL statement above and generate a diagram showing not only the joins, but also the SUM, WHERE and GROUP BY clauses.

See also

*Star schema
*Data Warehouse
*OLAP

References

*
*

External links

*" [http://www.dcs.bbk.ac.uk/~mark/download/star.pdf Why is the Snowflake Schema a Good Data Warehouse Design?] " by Mark Levene and George Loizou
*Article " [http://intelligententerprise.com/010629/warehouse1_1.jhtml A Trio of Interesting Snowflakes] " (Beat three common modeling challenges with extensions of the dimensional model) by Ralph Kimball
* [http://learndatamodeling.com/snow_flake.htm Designing Snowflake Schema]
* [http://sourceforge.net/projects/revj/ Reverse Snowflake Joins]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Snowflake-Schema — Das Schneeflockenschema ist eine Weiterführung des Sternschemas, das beim OLAP und Data Warehousing eingesetzt wird. Beim Sternschema liegen die Dimensionstabellen denormalisiert vor, was eine bessere Verarbeitungsgeschwindigkeit zu Lasten der… …   Deutsch Wikipedia

  • Snowflake Schema — Das Schneeflockenschema ist eine Weiterführung des Sternschemas, das beim OLAP und Data Warehousing eingesetzt wird. Beim Sternschema liegen die Dimensionstabellen denormalisiert vor, was eine bessere Verarbeitungsgeschwindigkeit zu Lasten der… …   Deutsch Wikipedia

  • Snowflake — may refer to:* A particle of snow * Snowflake (albino gorilla) * Snowflake (plant), flowering plants in the genus Leucojum * Snowflake, Arizona * Snowflake schema, a concept in data management * Snowflake children, a child conceived through in… …   Wikipedia

  • Star schema — The star schema (sometimes referenced as star join schema) is the simplest style of data warehouse schema. The star schema consists of a few fact tables (possibly only one, justifying the name) referencing any number of dimension tables . The… …   Wikipedia

  • Schneeflockenschema — Schneeflockenschema: Fakt und Dimensionstabellen bilden eine schneeflockenförmige Struktur und werden im Unterschied zum Star Schema weiter verfeinert und normalisiert Das Schneeflockenschema ist eine Weiterführung des Sternschemas, das beim OLAP …   Deutsch Wikipedia

  • Dimensional database — A dimensional database is one which, rather than representing data in multiple relations (as a relational database does), represents key data entities as different dimensions. That is, multidimensional database systems offer an extension to the… …   Wikipedia

  • Data model — Overview of data modeling context: A data model provides the details of information to be stored, and is of primary use when the final product is the generation of computer software code for an application or the preparation of a functional… …   Wikipedia

  • Data warehouse — Overview In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations… …   Wikipedia

  • Online analytical processing — In computing, online analytical processing, or OLAP (  /ˈoʊlæ …   Wikipedia

  • Data mart — A data mart (DM) is the access layer of the data warehouse (DW) environment that is used to get data out to the users. The DM is a subset of the DW, usually oriented to a specific business line or team. Contents 1 Terminology 2 Design schemas 3… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”