Petastorm is an open source data access library developed at Uber ATG. This library enables single machine or distributed training and evaluation of deep learning models directly from datasets in Apache Parquet format. Petastorm supports popular Python-based machine learning (ML) frameworks such as Tensorflow, PyTorch, and PySpark. It can also be used from pure Python code.


To support different training scenarios for autonomous driving algorithms, Petastorm incorporates various features, including efficient implementation of data sharding, row filtering, shuffling, access to a subset of fields, and support of time-series data.

Official website

Tutorial and documentation

Enter your contact information to continue reading