Data warehouse appliance


Data warehouse appliance

In computing, a data warehouse appliance consists of an integrated set of servers, storage, operating system(s), DBMS and software specifically pre-installed and pre-optimized for data warehousing (DW). Alternatively, the term can also apply to similar software-only systems[1] — purportedly very easy to install on specific recommended hardware configurations or preconfigured as a complete system - a true appliance.[2][3]

DW appliances provide solutions for the mid-to-large volume data warehouse market, offering low-cost performance most commonly on data volumes in the terabyte to petabyte range.

Contents

Appliance technology

Most DW appliance vendors use massively parallel processing (MPP) architectures to provide high query performance and platform scalability. MPP architectures consist of independent processors or servers executing in parallel. Most MPP architectures implement a "shared-nothing architecture" where each server operates self-sufficiently and controls its own memory and disk. Shared-nothing architectures have a proven record[citation needed] for high scalability and little contention. DW appliances distribute data onto dedicated disk storage units connected to each server in the appliance. This distribution allows DW appliances to resolve a relational query by scanning data on each server in parallel. The divide-and-conquer approach delivers high performance and scales linearly as new servers are added into the architecture. Other DW appliance vendors use specialized hardware and advanced software, instead of MPP architectures. This approach is able to achieve MPP performance in a much smaller form factor.[4] The first vendor to market with a data warehouse appliance featuring specialized SQL hardware was Netezza in 2003 through leveraging FPGA technology as sophisticated projection and restriction filters, minimizing data movement and I/O within the system. Kickfire followed in 2008 with what they deem a dataflow "sql chip".[5][citation needed]

MPP database architectures have a long pedigree. Teradata, Tandem, Britton Lee, and Sequent offered MPP SQL-based architectures in the 1980s. Open source and commodity components have aided a re-emergence of MPP data warehouses. Advances in technology have reduced costs and improved performance in storage devices, multi-core CPUs and networking components. Open-source RDBMS products, such as Ingres and PostgreSQL, reduce software-license costs and allow DW-appliance vendors to focus on optimization rather than providing basic database functionality. Open-source Linux provides a stable, well-implemented operating system for DW appliances.

History

Some consider Teradata's initial product as the first DW appliance — or Britton-Lee's[6][7] (Note: Teradata acquired Britton Lee — renamed ShareBase — in June, 1990.[8]) Others disagree, considering appliances as a "disruptive technology" for Teradata[9] . Interest in the data warehouse appliance category is generally dated[by whom?] to the emergence of Netezza in the early 2000s.

As of 2009 a second generation of DW appliances has emerged, marking the move to mainstream vendor integration. IBM integrated its InfoSphere Warehouse (formerly DB2 Warehouse) with its own servers and storage to create the IBM InfoSphere Balanced Warehouse. Netezza introduced its TwinFin platform based on commodity IBM hardware. Other DW appliance vendors have also partnered with major hardware vendors to help bring their appliances to market. DATAllegro, prior to acquisition by Microsoft, partnered with EMC and Dell and implemented open-source Ingres on Linux. Greenplum has a partnership with Sun Microsystems and implements Greenplum Database (based on PostgreSQL) on Solaris using the ZFS file system. HP Neoview has a wholly owned solution and uses HP NonStop SQL. XtremeData offers a FPGA based data-warehousing appliance built on commodity hardware and open-source operating system for "deep analytics" and data mining.

Kognitio offers a row-based "virtual" data warehouse appliance while Vertica, EXASOL and Paraccel offer column-based "virtual" data warehouse appliances. Like Greenplum, ParAccel partners with Sun Microsystems. These solutions provide software-only solutions deployed on clusters of commodity hardware. Kognitio’s homegrown WX2 database runs on several blade configurations. Other players in the DW appliance space include Calpont and Kickfire. Kickfire employs a column store storage engine compatible with MySQL for ease of deployment and use, in combination with specialized hardware for proven[4] performance.

The market has also seen the emergence of data-warehouse bundles where vendors combine their hardware and database software together as a data warehouse platform. The Oracle Optimized Warehouse Initiative combines the Oracle Database with hardware from various computer manufacturers (Dell, EMC, HP, IBM, SGI and Sun Microsystems). Oracle's Optimized Warehouses offer pre-validated configurations and the database software comes pre-installed. In 2008 Oracle began offering a more classic appliance offering, the HP Oracle Database Machine, a jointly developed and co-branded platform that Oracle sells and supports and HP builds in configurations specifically for Oracle.[10][11] In 2009, Oracle released a second-generation Exadata system,[12] based on their newly acquired Sun Microsystems hardware.

Benefits

The total cost of ownership (TCO) of a data warehouse consists of initial entry costs, on-going maintenance costs and the cost of changing capacity as the data warehouse grows. DW appliances offer low entry and maintenance costs. Initial costs depend on the size of the appliance installed.

The resource cost for monitoring and tuning the data warehouse makes up a large part of the TCO, often as much as 80%. DW appliances reduce administration for day-to-day operations, setup and integration. Many also offer low costs for expanding processing power and capacity.

With an increased focus on controlling costs combined with tight IT Budgets, data warehouse managers sometimes need to reduce and manage expenses even while leveraging their technology as much as possible, making DW appliances a solution.

Parallel performance

Many DW appliances support mixed-workloads where a broad range of ad hoc queries and reports run simultaneously with loading. DW appliance vendors use several distribution and partitioning methods to provide parallel performance. Some DW appliances scan data using partitioning and sequential I/O instead of index usage. Other DW appliances use standard database indexing.[citation needed]

With high performance on highly granular data, DW appliances can address analytics that previously could not meet performance requirements.

Reduced administration

DW appliances provide a single vendor solution and take ownership for optimizing the parts and software within the appliance. This eliminates the customer's costs for integration and regression testing of the DBMS, storage and OS on a terabyte scale and avoids some of the compatibility issues that arise from multi-vendor solutions. A single support-point also provides a single source for problem-resolution and a simplified upgrade-path for software and hardware.[citation needed]

Built-in high availability

MPP DW appliance vendors provide built-in high availability through redundancy on components within the appliance. Many offer warm-standby servers, dual networks, dual power-supplies, disk mirroring with failover and solutions for server failure.

Scalability

DW appliances scale for both capacity and performance. Many DW appliances implement a modular design that database administrators can add to incrementally, eliminating up-front costs for over-provisioning. In contrast, architectures that do not support incremental expansion result in hours of production downtime, during which database administrators export and re-load terabytes of data. In MPP architectures, adding servers increases performance as well as capacity. This does not always happen with alternate solutions.

Rapid time-to-value

Companies increasingly expect to use business analytics to improve the current cycle.[citation needed] DW appliances provide fast implementations without the need for regression- and integration-testing. In some cases, reduced tuning, reduced index creation, fast loading and reduced need for aggregation make rapid prototyping possible.

Application uses

DW appliances provide solutions for many analytic application uses, including:

  • enterprise data warehousing
  • super-sized sandboxes which isolate power users with resource intensive queries
  • pilot projects or projects requiring rapid prototyping and rapid time-to-value
  • off-loading projects from the enterprise data warehouse, such as large analytical query projects that affect the overall workload of the enterprise data warehouse
  • applications with specific performance or loading requirements
  • data marts that have outgrown their present environment
  • turnkey data warehouses or data marts
  • solutions for applications with high data-growth and high-performance requirements
  • applications requiring data warehouse encryption

Trends

The DW appliance market has started to shift trends[citation needed] in[weasel words]many areas as it evolves:

  • Vendors have started moving toward using commodity technologies rather than proprietary assembly of commodity components[citation needed]
  • Implemented applications show usage expansion from tactical and data-mart solutions to strategic and enterprise data-warehouse use.
  • Mainstream vendor participation has become apparent as of 2009[citation needed].
  • With a lower total cost of ownership, reduced maintenance and high performance to address business analytics on growing data volumes,[weasel words]most analysts believe[citation needed] that DW appliances will gain market share - though TeraData maintain their leadership position.[13]
  • Vendors have begun providing the ability to incorporate 'in-database' analytic algortihms to take advantage of their MPP architectures, eliminating the need to extract large datasets into traditional analytic and data mining platforms such as SAS.

See also

References

  1. ^ Queries From Hell blog » When is an appliance not an appliance?
  2. ^ DBMS2 — DataBase Management System Services»Blog Archive » Data warehouse appliances – fact and fiction
  3. ^ Omer Trajman, Alain Crolotte, David Steinhoff, Raghunath Nambiar, Meikel Poess: Database Are Not Toasters: A Framework for Comparing Data Warehouse Appliances
  4. ^ a b [1]
  5. ^ [2]
  6. ^ Kobielus, James (April 22, 2008). "Teradata Goes Appliance, Officially". http://blogs.forrester.com/james_kobielus/08-04-22-teradata_goes_appliance_officially. Retrieved 2011-01-14. "Teradata effectively established the DW appliance market a quarter-century ago when it rolled out the first in a long line of preconfigured, preoptimized solutions that combine CPUs, storage, software, and database to address the most demanding analytical and decision support requirements" 
  7. ^ "Database machines and data warehouse appliances – the early days". Monash Research. September 15, 2008. http://www.softwarememories.com/2008/09/15/database-machines/. Retrieved 2011-01-15. "But for all practical purposes, the first two significant “database machine” vendors were Britton-Lee and Teradata. And since Britton-Lee eventually sold out to Teradata (after a brief name change to ShareBase), Teradata is entitled to whatever historical glory accrues from having innovated the database management appliance category." 
  8. ^ Todd White (November 5, 1990). "Teradata Corp. suffers first quarterly loss in four years". Los Angeles Business Journal. http://www.allbusiness.com/north-america/united-states-california-metro-areas/123633-1.html. Retrieved 2008-07-14. 
  9. ^ All, Ann (Apr 6, 2007). "Will a Data Warehouse Appliance Work for You?". http://www.itbusinessedge.com/cm/community/features/interviews/blog/will-a-data-warehouse-appliance-work-for-you/?cs=22447. Retrieved 2011-01-14. "DATAllegro has a site at Sears. Sears uses [the appliance] as a front end to their Teradata warehouse to calculate aggregates. So when they want to do slice-and-dice, how many we sold in which stores and of what color, they use the appliance...I think [appliances] could be a disruptive technology for Teradata" 
  10. ^ Oracle Performance Architect Kevin Clossen - Oracle Exadata Storage Server
  11. ^ Oracle Exadata - What is the benefit?
  12. ^ [3]
  13. ^ Gartner 2007 Magic Quadrant for Data Warehouse Database Management Systems

External links


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Data warehouse — Overview In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations… …   Wikipedia

  • Mobile data terminal — A mobile data terminal (MDT) is a computerized device used in public transit vehicles, taxicabs, courier vehicles, service trucks, commercial trucking fleets, military logistics, fishing fleets, warehouse inventory control, and emergency vehicles …   Wikipedia

  • Netezza — Type Subsidiary of IBM Industry Data Warehousing Founded 2000 Headquarters Marlborough, Massachusetts, United States Key people Jitendra Saxena, Founder Foster Hi …   Wikipedia

  • Dataupia — Corporation Type Corporation Industry Information storage (data warehouse appliance) Founded Delaware (2005) Headquarters …   Wikipedia

  • IBM DB2 — Developer(s) IBM Initial release 1983 (1983) …   Wikipedia

  • Netezza — Тип …   Википедия

  • Teradata — Infobox Company company name = Teradata Corporation company type = Public (NYSE: [http://www.nyse.com/about/listed/lcddata.html?ticker=TDC TDC] ) company slogan = Raising Intelligence foundation = 1979 location = key people = Michael Koehler,… …   Wikipedia

  • PANTA — Systems Type Private Industry Computer Hardware Data Warehousing Founded 2002 …   Wikipedia

  • Kognitio — Founded in 1991, Kognitio is headquartered in Bracknell, United Kingdom and has offices in Chicago, Illinois. Kognitio develops WX2. Kognitio merged with Whitecross Systems in August 2005. [http://www.it analysis.com/technology/data… …   Wikipedia

  • Sun Fire X4500 — The Sun Fire X4500 data server (code named Thumper) integrates server and storage technologies. It was announced in July, 2006 [cite web|url=http://blogs.sun.com/jonathan/entry/the rise of the general|accessdate = 2007 10 31| publisher = Sun… …   Wikipedia