With SETL, an ETL application could be represented by a Pipeline. A Pipeline contains multiple Stages. In each stage, we could find one or several Factories.
The class Factory[T] is an abstraction of a data transformation that will produce an object of type T. It has 4 methods (read, process, write and get) that should be implemented by the developer.
The class SparkRepository[T] is a data access layer abstraction. It could be used to read/write a Dataset[T] from/to a datastore. It should be defined in a configuration file. You can have as many SparkRepositories as you want.
The entry point of a SETL project is the object io.github.setl.Setl, which will handle the pipeline and spark repository instantiation.