- Open Data
Open Data is a philosophy and practice requiring that certain data are freely available to everyone, without restrictions from
copyright, patents or other mechanisms of control. It has a similar ethos to a number of other "Open" movements and communities such as open sourceand open access. However these are not logically linked and many combinations of practice are found. The practice and ideology itself is well established (for example in the Mertonian tradition of science) but the term "Open Data" itself is recent. Much of the emphasis in this entry is on data from scientific research and from the data-driven web. In some cases Open Data may be considered as more properly Open Metadata and there is not yet a consistent formalisation. This article uses recent publications and activities to define the scope of the concept and term.
The concept of Open Data is not new; but although the term is currently in frequent use, there are no commonly agreed definitions (unlike, for example,
Open Accesswhere several formal declarations have been made and signed).
Open Data is often focussed on non-textual material such as
maps, genomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data are controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of Open Data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license.
A typical depiction of the need for Open Data:
"Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge"[ [http://sciencecommons.org/ Science Commons] ] John Wilbanks, Executive Director, Science Commons
Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use. For example, many scientists do not regard the published data arising from their work to be theirs to control and the act of publication in a journal is an implicit release of the data into the commons. However the lack of a license makes it difficult to determine the status of a
data setand may restrict the use of data offered in an Open spirit. Because of this uncertainty it is also possible for public or private organizations such as IEEEto aggregate said data, protect it with copyright and then resell it.
Under "Toward Open Data" Connolly (2005, v.i.) gives two quotations:
* I want my data back. (Jon Bosak circa 1997)
* I've long believed that customers of any application own the data they enter into it. [ [http://www.veen.com/jeff/archives/000810.html Jeffrey Veen] ] . (This quote refers to Veen's own heart-rate data.)These quotations suggest that Openness refers to the metadata (formats, licenses, ontologies) rather than the data themselves.
Keith Jeffery writes:
Although the term open data is rather new, the concept is rather old. The International Geophysical Year of 1957-8 caused the setting up of several world data centres and - more importantly - set standards for descriptive metadata to be used for data exchange and utilisation. [Keith G Jeffery on [http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=32 Peter Murray-Rust's blog] ]
In 1995 GCDIS (US) put the position clearly in "On the Full and Open Exchange of Scientific Data" (A publication of the Committee on Geophysical and Environmental Data - National Research Council):
"The Earth's atmosphere, oceans, and biosphere form an integrated system that transcends national boundaries. To understand the elements of the system, the way they interact, and how they have changed with time, it is necessary to collect and analyze environmental data from all parts of the world. Studies of the global environment require international collaboration for many reasons:[ [http://globalchange.gov/policies/nas-fando.html GCDIS] ]
*to address global issues, it is essential to have global data sets and products derived from these data sets;
*it is more efficient and cost-effective for each nation to share its data and information than to collect everything it needs independently; and
*the implementation of effective policies addressing issues of the global environment requires the involvement from the outset of nearly all nations of the world. International programs for global change research and environmental monitoring crucially depend on the principle of full and open data exchange (i.e., data and information are made available without restriction, on a non-discriminatory basis, for no more than the cost of reproduction and distribution."
The last phrase highlights the traditional cost of disseminating information by print and post. It is the removal of this cost through the Internet which has made data vastly easier to disseminate technically. It is correspondingly cheaper to create, sell and control many data resources and this has led to the current concerns over non-Open data.
More recent uses of the term include:
* SAFARI 2000 (South Africa, 2001) used a license informed by ICSU and NASA policies [http://mercury.ornl.gov/safari2k/s2kpolicy.pdf]
* the human genome [ [http://www.oreillynet.com/pub/a/network/2002/04/05/kent.html Jim Kent 2002] ] (Kent, 2002)
* An Open Data Consortium on geospatial data [ [http://www.opendataconsortium.org/newsbdy.htm Open Data Consortium ca. 2003] ] (2003)
* The Blue Obelisk group in chemistry (mantra: Open Data, Open Source, Open Standards) [http://www.blueobelisk.org Blue Obelisk, 2004] (2004) doi-inline|10.1021/ci050400b|doi:10.1021/ci050400b
* Manifesto for Open Chemistry [ [http://www.w3.org/2004/10/swls/Murray-Rust/communal/manifesto.html Peter Murray-Rust, Henry Rzepa 2004] ] (Murray-Rust and Rzepa, 2004) (2004)
* Presentations to JISC and OAI under the title "Open Data" [ [http://oai4.web.cern.ch/OAI4/ "Open Data" at CERN Workshop on Innovations in Scholarly Communication (OAI4) Peter Murray-Rust, 2005] ] (Murray-Rust, 2005)
* Science Commons launch [ [http://www.biomedcentral.com/openaccess/archive/?page=features&issue=23 Report on Science Commons Dec 2004] ] (2004)
* The Petition for Open Data in Crystallography is launched by the Crystallography Open Database Advisory Board. [http://www.crystallography.net/] (2005)
* XML Conference & Exposition 2005 [ [http://www.w3.org/2002/12/cal/mash/slides#(1) Semantic Web Data Integration with hCalendar and GRDDL; Dan Connolly
From Syntax to Semantics (XML 2005)Atlanta, GA, USA] ] (Connolly 2005)
* SPARC Open Data mailing list [ [http://www.arl.org/sparc/opendata/ SPARC Open Data Mailing list] ] (2005)
* XTech [ [http://times.usefulinc.com/2005/01/05-xtech-open-data XTech 2005] ] (Dumbill, 2005), [ [http://www.tbray.org/ongoing/When/200x/2006/07/28/Open-Data Tim Bray and Tim O'Reilly] ] (Bray and O'Reilly 2006)
In 2004, the Science Ministers of all nations of the
OECD(Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly-funded archive data should be made publicly available. [ [http://www.oecd.org/document/0,2340,en_2649_34487_25998799_1_1_1_1,00.html OECD Declaration on Open Access to publicly-funded data] ] Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the "OECD Principles and Guidelines for Access to Research Data from Public Funding" as a "soft-law" recommendation. [ [http://www.oecd.org/document/55/0,3343,en_2649_201185_38500791_1_1_1_1,00.html OECD Principles and Guidelines for Access to Research Data from Public Funding] ]
In 2005 Edd Dumbill introduced an "Open Data" theme in XTech, including:
* Public web services.
* Grassroots data.
* Scientific and academic publishing.
* Blogging and personal content.
In 2006 Science Commons [ [http://www.spatial.maine.edu/icfs/ Science Commons in Washington 2006] ] ran a 2-day conference in Washington where the primary topic could be described as Open Data. It was reported that the amount of micro-protection of data (e.g. by license) in areas such as biotechnology was creating a
Tragedy of the anticommons. In this the costs of obtaining licenses from a large number of owners made it uneconomic to do research in the area.
In 2007 SPARC and Science Commons announced a consolidation and enhancement of their author addenda [ [https://mx2.arl.org/Lists/SPARC-OAForum/Message/3767.html SPARC-OAF forum] ]
Fundamental Open Rights
Arguments made on behalf of Open Data include:
* "Data belong to the human race". Typical examples are genomes, data on organisms, medical science, environmental data.
* Public money was used to fund the work and so it should be universally available.
* It was created by or at a government institution (this is common in US National Laboratories and government agencies)
* Facts cannot legally be copyrighted.
* Sponsors of research do not get full value unless the resulting data are freely available
* Restrictions on data re-use create an anticommons
* Data are required for the smooth process of running communal human activities (map data, public institutions)
* In scientific research, the rate of discovery is accelerated by better access to data. [ [http://www.jstage.jst.go.jp/article/dsj/6/0/6_S116/_article How to Make the Dream Come True] argues in one research area (Astronomy) that access to open data increases the rate of scientific discovery.]
It is generally held that factual data cannot be copyrighted. [ [http://sciencecommons.org/about/towards Towards a Science Commons] includes an overview of the basis of Openness in science data.] However publishers frequently add their copyright statements (often forbidding re-use) to scientific data accompanying (supporting, supplementing) a publication. It is also usually unclear whether the factual data embedded in full text are part of the copyright.
While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.
As the term Open Data is relatively new it is difficult to collect arguments against it. Unlike
Open Accesswhere groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments may include:
* this is a non-profit organisation and the revenue is necessary to support other activities (e.g. learned society publishing supports the society)
* the government gives specific legitimacy for certain organisations to recover costs (NIST in US,
Ordnance Surveyin UK)
* government funding may not be used to duplicate or challenge the activities of the private sector (e.g.
Relation to Open Access
Much data is made available through scholarly publication, which now attracts intense debate under "
Open Access". The Budapest Open Access Initiative(2001) coined this term:
By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
The logic of the declaration permits re-use of the data although the term "literature" has connotations of human-readable text and can imply a scholarly publication process. In Open Access discourse the term "full-text" is often used which does not emphasize the data contained within or accompanying the publication.
Some Open Access publishers do not require the authors to assign copyright and the data associated with these publications can normally be regarded as Open Data. Some publishers have Open Access strategies where the publisher requires assignment of the copyright and where it is unclear that the data in publications can be truly regarded as Open Data.
The ALPSP and STM publishers have issued a statement about the desirability of making data freely available [http://www.alpsp.org/ForceDownload.asp?id=129] :
Publishers recognise that in many disciplines data itself, in various forms,is now a key output of research. Data searching and mining tools permitincreasingly sophisticated use of raw data. Of course, journal articlesprovide one ‘view’ of the significance and interpretation of that data – andconference presentations and informal exchanges may provide other‘views’ – but data itself is an increasingly important community resource.Science is best advanced by allowing as many scientists as possible tohave access to as much prior data as possible; this avoids costlyrepetition of work, and allows creative new integration and reworking ofexisting data.and
We believe that, as a general principle, data sets, the raw data outputs ofresearch, and sets or sub-sets of that data which are submitted with apaper to a journal, should wherever possible be made freely accessible toother scholars. We believe that the best practice for scholarly journalpublishers is to separate supporting data from the article itself, and not torequire any transfer of or ownership in such data or data sets as acondition of publication of the article in question.Even though this statement was without any effect on the open availability of primary data related to publications in journals of the ALPSP and STM members. Data tables provided by the authors as supplement with a paper are still available to subscribers only.
Relation to other Open Activities
There are a number of other "Open" philosophies which are similar to, but not synonymous with Open Data but which may overlap, be supersets, or subsets. Here they are briefly listed and compared.
Open Source(Software) is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.
Open Contenthas similarities to Open Data and may be seen as a superset but differs in that it emphasizes creative works while Open Data is more oriented towards factual data and the output of the scientific research process.
Open Notebook Sciencerefers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data. [http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html creation of term]
* Open Knowledge. [http://www.okfn.org The Open Knowledge Foundation] argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information
Several funding bodies which mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR) [ [https://mx2.arl.org/Lists/SPARC-OpenData/Message/34.html SPARC-OpenData@arl.org Mailing List Archive ] ] :
* to deposit bioinformatics, atomic and molecular coordinate data, experimental data into the appropriate public database immediately upon publication of research results.
* to retain original data sets for a minimum of five years after the grant. This applies to all data, whether published or not.
Note the fundamental requirement to be able to replicate the experiment.
Other bodies active in promoting the deposition of data as well as fulltext include the
Several intentional or unintentional mechanisms exist for restricting access to or re-use of data. They include:
* compilation in databases or websites to which only registered members or customers can have access.
* use of a proprietary or closed technology or encryption which creates a barrier for access.
* copyright forbidding (or obfuscating) re-use of the data.
* license forbidding (or obfuscating) re-use of the data (such as
* patent forbidding re-use of the data (for example the 3-dimensional coordinates of some experimental protein structures have been patented)
* restriction of robots to websites, with preference to certain search engines
* aggregating factual data into "databases" which may be covered by "database rights" or "database directives" (e.g.
Directive on the legal protection of databases)
* time-limited access to resources such as e-journals (which on traditional print were available to the purchaser indefinitely)
* political, commercial or legal pressure on the activity of organisations providing Open Data (for example the
American Chemical Societylobbied the US Congress to limit funding to the National Institutes of Healthfor its Open PubChemdata. [http://osc.universityofcalifornia.edu/news/acs_pubchem.html Review of history and positions by the University of California] ]
Organisations promoting Open Data
Scholarly Publishing and Academic Resources Coalition
* [http://www.freeourdata.org.uk/index.php "Free our data"] (
The Guardiantechnology section)
* [http://www.okfn.org/ The Open Knowledge Foundation]
* [http://www.talis.com/ Talis]
* [http://www.web2express.org/ Web2Express.org, Open data on semantic web]
* [http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData Linking Open Data on the Semantic Web]
Budapest Open Access Initiative
* [http://www.talis.com/tdn/tcl Talis Community License]
* [http://www.opencontentlawyer.com/open-data/open-database-licence/ Open Data Commons Database Licence (an update to the Talis Community License)]
Wikimedia Foundation. 2010.
Look at other dictionaries:
Open Data — ist eine Philosophie und Praxis, die auf der Grundidee beruht, dass vorteilhafte Entwicklungen eingeleitet werden, wenn Daten für jedermann frei zugänglich gemacht werden. Dies betrifft insbesondere Abwesenheit von Urheberrechten, Patenten oder… … Deutsch Wikipedia
Open data — Linking Open Data project in September 2007 … Wikipedia
Open Data in Canada — describes the capacity for the Canadian Federal Government and other levels of government in Canada to provide online access to internal data in a standards compliant Web 2.0 way. Government 2.0 is a way to engage individuals and businesses in… … Wikipedia
Open Data Commons — (ODC) ist ein Projekt der Open Knowledge Foundation (OKF), das rechtliche Lösungen für freie Daten bereitstellt. Es pflegt eine Reihe von Lizenzen für freie Datenbanken. Sie gibt eine Reihe unterschiedlicher Lizenzen für freie Datenbanken heraus … Deutsch Wikipedia
Open Data Protocol — (OData) это открытый веб протокол для запроса и обновления данных. Протокол позволяет выполнять операции с ресурсами, используя в качестве запросов HTTP команды, и получать ответы в форматах Atom, JSON или XML. Спецификация OData доступна для… … Википедия
Open Data Link Interface — [Abk. ODI, dt. »offene Schnittstelle für die Datenverbindung«], ein von der Firma Novell vorgelegter Standard, in dem die Einbindung von Netzwerkkarten in ein Novell Netz festgelegt wird. Mit ODI können die Karten mehrere Protokolle wie TCP/IP… … Universal-Lexikon
Open Data in the United Kingdom — There have been campaigns in the UK for its government to open up the large amounts of data it has for greater public usage without prohibitively large fees. Currently UK public sector data are released under a Creative Commons compatible license … Wikipedia
Open Data Center Alliance — The Open Data Center Alliance is an independent organization created in Oct. 2010 with the assistance of Intel to coordinate the development of standards for cloud computing. Approximately 100 companies, which account for more than $50bn of IT… … Wikipedia
Open Data Protocol — The Open Data Protocol (OData) is an open web protocol for querying and updating data. The protocol allows for a consumer to query a datasource over the HTTP protocol and get the result back in formats like Atom, JSON or plain XML, including… … Wikipedia
Open Data Protocol — L Open Data Protocol (OData) est un protocole permettant le partage de données, basé sur Atom et AtomPub. Les spécifications d OData sont publiées sous Microsoft Open Specification Promise (OSP), garantissant le format ouvert par l absence de… … Wikipédia en Français