Introduction Once the data has been ingested into the data lake it's time to make it available for reporting by transforming the data into structures that are simple to query for a BI tool. This part of the series will describe the data modelling approach, the transformation tool (DBT) and how to transform the data ready for reporting. The example architecture diagram is shown below. Data Model There are a variety of data modelling approaches that can be used to model a data warehouse, popular approaches include: Dimensional Data Vault Third-Normal Form (3NF) A comparison of the different approaches is beyond the scope of this article, however for our example architecture we will be using dimensional modelling. Dimensional modelling has been the dominant data warehouse modelling approach in my personal experience and there are a good number of reasons to use it, the most important being ease-of-use for analysts and business users and performance. Dimensional modelling has two main...
Introduction Having been through concepts, architecture and the team required to build a modern data platform it’s time to focus on the various stages of the data flow, beginning with ingestion. As in previous posts the focus will be on building the example architecture from part 2 . Ingestion Workflow The aim of the ingestion stage is to have loaded data from a source into the data lake so that it can be queried. The amount of transformation should be kept to the bare minimum and should consist of: Translating the data into a storage format that is efficient for data lake usage (e.g. Parquet files) Standardising field names (e.g. converting all fields names to lowercase and using underscores to separate words) Type conversion (e.g. if loading data from a CSV, convert any numeric fields to an appropriate numeric data type) Adding metadata to the records for lineage The metadata to include on each record can include: The date and time the data was ingested The sou...