Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data. It does that today by indexing data resources (tables, dashboards, streams, etc.) and powering a page-rank style search based on usage patterns (e.g. highly queried tables show up earlier than less queried tables). Think of it as Google search for data. The project is named after Norwegian explorer Roald Amundsen, the first person to discover the South Pole.
1. Discover trusted data: Search for data within your organization by a simple text search. A PageRank-inspired search algorithm recommends results based on names, descriptions, tags, and querying/viewing activity on the table/dashboard.
2. See automated and curated metadata: Build trust in data using automated and curated metadata — descriptions of tables and columns, other frequent users, when the table was last updated, statistics, a preview of the data if permitted, etc. Easy triage by linking the ETL job and code that generated the data.
3. Share context with co-workers: Update tables and columns with descriptions, reduce unnecessary back and forth about which table to use and what a column contains.
4. Learn from others: See what data fellow co-workers frequently use, own or have bookmarked. Learn what most common queries for a table look like by seeing dashboards built on a given table.