
Netflix /MetacatData catalog

Metacat is a flexible, open source metadata catalog and data repository that targets scientific data, particularly from ecology and environmental science. Metacat accepts XML as a common syntax for representing the large number of metadata content standards that are relevant to ecology and other sciences. Thus, Metacat is a generic XML database that allows storage, query, and retrieval of arbitrary XML documents without prior knowledge of the XML schema.

Metacat is designed and implemented as a Java servlet application that utilizes a relational database management system to store XML and associated meta-level information. Installation of Metacat recommends the use of Apache Tomcat for servlet management and PostgreSQL as the underlying RDBMS, although other configurations are possible. Metacat provides a rich client Application Programming Interface (API) and supports a variety of languages, including Java, Python, and Perl.

Metacat is being used extensively throughout the world to manage environmental data. It is a key infrastructure component for the NCEAS data catalog, the Knowledge Network for Biocomplexity (KNB) data catalog, and for the DataONE system, among others.


* Data abstraction and interoperability.
* Business and user-defined metadata storage.
* Data discovery.
* Data change auditing and notifications.
* Hive metastore optimizations.

Official website

Tutorial and documentation

Enter your contact information to continue reading