Data Saluki

Building a Data Platform: Part 4 - Ingesting data

Introduction Having been through concepts, architecture and the team required to build a modern data platform it’s time to focus on the various stages of the data flow, beginning with ingestion. As in previous posts the focus will be on building the example architecture from part 2 . Ingestion Workflow The aim of the ingestion stage is to have loaded data from a source into the data lake so that it can be queried. The amount of transformation should be kept to the bare minimum and should consist of: Translating the data into a storage format that is efficient for data lake usage (e.g. Parquet files) Standardising field names (e.g. converting all fields names to lowercase and using underscores to separate words) Type conversion (e.g. if loading data from a CSV, convert any numeric fields to an appropriate numeric data type) Adding metadata to the records for lineage The metadata to include on each record can include: The date and time the data was ingested The sou...

Data Saluki

Search This Blog

Posts

Building a Data Platform: Part 4 - Ingesting data