General Architecture for Text Engineering

General Architecture for Text Engineering

Infobox Software
name = GATE



caption = General Architecture for Text Engineering.
developer = [http://gate.ac.uk/ GATE research team] , Dept. Computer Science, University of Sheffield
released = 1996
frequently_updated = yes
programming language = Java
operating system = Cross-platform
language = English
genre = Text mining Information Extraction
license = LGPL
website = [http://gate.ac.uk/ http://gate.ac.uk/]

General Architecture for Text Engineering or GATE is a Java software toolkit originally developed at the University of Sheffield since 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages.

GATE comprises an architecture, a free open source API, framework and graphical development environment.

GATE community and research is involved in several European research projects including TAO and SEKT.

Features

GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.

Languages currently handled in GATE include English, Spanish, Chinese, Arabic, French, German, Hindi, Cebuano, Romanian, Russian.

There is a large set of plugins for machine learning with Weka, RASP, MAXENT, SVM Light, for managing Ontologies like WordNet, for querying search engines like Google or Yahoo, for part of speech tagging with Brill or TreeTager, and many more.

GATE can handle input in various formats, such as TXT, HTML, XML, Doc, PDF documents, and Java Serial, PostgreSQL, Lucene, Oracle Databases with help of RDBMS storage over JDBC.

It also uses the JAPE (Java Annotation Patterns Engine) language for building rules in order to annotate documents with tags. A debugger, corpus benchmark and annotations comparator tools are also present.

Description of the graphical user interface

Each processing and language resource can have its own associated visual resource. When double clicked, the resource’s respective visual resource appears in the GATE GUI. The GATE GUI is divided into three visible parts (See Figure). One of them contains a tree that shows the loaded instances of resources. The one below this is used for various purposes - such as to display document features and that the execution is in progress. This part of the GUI is referred to as ”small”. The third and the largest part of the GUI is referred to as ”large”. On this figure, the central one is the document with annotations shown in yellow and the vertical one shows the list of annotations used in the document.

References

* [http://gate.ac.uk/ GATE website] at [http://nlp.shef.ac.uk/ University of Sheffield Natural Language Processing Group]

ee also

* Unstructured Information Management Architecture (UIMA)


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • General Architecture for Text Engineering — GATE ventana principal de GATE Developer v5 Desarrollador GATE research team …   Wikipedia Español

  • Architecture Générale Pour Le Traitement De Texte — L’Architecture générale pour le traitement de texte ou GATE (pour General Architecture for Text Engineering) est une boîte à outils logicielle écrite en Java à l université de Sheffield (GB) à partir de 1995 et utilisée très largement à travers… …   Wikipédia en Français

  • Architecture generale pour le traitement de texte — Architecture générale pour le traitement de texte L’Architecture générale pour le traitement de texte ou GATE (pour General Architecture for Text Engineering) est une boîte à outils logicielle écrite en Java à l université de Sheffield (GB) à… …   Wikipédia en Français

  • Architecture générale pour le traitement de texte — L’Architecture générale pour le traitement de texte ou GATE (pour General Architecture for Text Engineering) est une boîte à outils logicielle écrite en Java à l université de Sheffield (GB) à partir de 1995 et utilisée très largement à travers… …   Wikipédia en Français

  • General Motors — Company Type Public Traded as NYSE: GM TSX:  …   Wikipedia

  • Text mining — Text mining, sometimes alternately referred to as text data mining , roughly equivalent to text analytics , refers generally to the process of deriving high quality information from text. High quality information is typically derived through the… …   Wikipedia

  • architecture — /ahr ki tek cheuhr/, n. 1. the profession of designing buildings, open areas, communities, and other artificial constructions and environments, usually with some regard to aesthetic effect. Architecture often includes design or selection of… …   Universalium

  • General Motors Chapter 11 reorganization — The General Motors Chapter 11 sale of the assets of automobile manufacturer General Motors and some of its subsidiaries was implemented through section 363 of Chapter 11, Title 11, United States Code in the United States Bankruptcy Court for the… …   Wikipedia

  • Engineering — The Watt steam engine, a major driver in the Industrial Revolution, underscores the importance of engineering in modern history. This model is on display at the main building of the ETSIIM in Madrid, Spain. Engineering is the discipline, art,… …   Wikipedia

  • Engineering drawing — Technical drawings An engineering drawing, a type of technical drawing, is used to fully and clearly define requirements for engineered items. Engineering drawing (the activity) produces engineering drawings (the documents). More than just the… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”