DataPlunger¶

DataPlunger is a prototype ETL processing toolchain.

The goal is to create a modular code for the purpose of extracting data from multiple backing stores, performing n-number of transformational processing steps on those records, with the final output being loaded into a new format.

A workflow, or processing pipeline, is defined via a JSON configuration file containing the following information:

Connection information to source data for processing.
Processing steps to be applied to individual records extracted from source.

Source code for this project can be found at: https://github.com/mattmakesmaps/DataPlunger

Configuration¶

Processing pipelines are described using a JSON configuration file.

Configuration File
- Example Configuration File
- Parameters

Main Modules¶

dataplunger.core module - Code for parsing a JSON configuration file, building a processing pipeline, and executing it.

dataplunger.readers module - Connections to backing datastores (Postgres, CSV, SHP, etc).

dataplunger.processors module - Tools designed to execute a on either a single record, or an aggregate of records.

DataPlunger¶

Configuration¶

Main Modules¶

Indices and tables¶