Apache Atlas

[wtm_mlop_cats]

Apache Atlas is an open source metadata management and governance system designed to help you easily find, organize, and manage data assets.

Atlas was incubated by Hortonworks under the umbrella of Data Governance Initiative (DGI) and joined the official Apache Foundation Incubator in May of 2015, where it lived and grew until it graduated as a top-level project in June 2017. The initial focus was the Apache Hadoop environment although Apache Atlas has no dependencies on the Hadoop platform itself.

Features

Metadata types & instances:
* Pre-defined types for various Hadoop and non-Hadoop metadata
* Ability to define new types for the metadata to be managed
* Types can have primitive attributes, complex attributes, object references; can inherit from other types
* Instances of types, called entities, capture metadata object details and their relationships
*REST APIs to work with types and instances allow easier integration

Classification:
* Ability to dynamically create classifications – like PII, EXPIRES_ON, DATA_QUALITY, SENSITIVE
* Classifications can include attributes – like expiry_date attribute in EXPIRES_ON classification
* Entities can be associated with multiple classifications, enabling easier discovery and security enforcement
* Propagation of classifications via lineage – automatically ensures that classifications follow the data as it goes through various processing

Lineage:
* Intuitive UI to view lineage of data as it moves through various processes
* REST APIs to access and update lineage

Search/Discovery:
* Intuitive UI to search entities by type, classification, attribute value or free-text
* Rich REST APIs to search by complex criteria
* SQL like query language to search entities – Domain Specific Language (DSL)

Security & Data Masking:
* Fine grained security for metadata access, enabling controls on access to entity instances and operations like add/update/remove classifications
* Integration with Apache Ranger enables authorization/data-masking on data access based on classifications associated with entities in Apache Atlas. For example:
— who can access data classified as PII, SENSITIVE
— customer-service users can only see last 4 digits of columns classified as NATIONAL_ID

Official website

Tutorial and documentation

Enter your contact information to continue reading