Skip to main content

Posts

Showing posts from August, 2022

Building a Data Platform: Part 2 - Example Data Platform

Introduction In  part 1  we walked through a conceptual model of a basic data platform that covered off the key capabilities and requirements for each capability. In this part we'll discuss an example architecture that provides the minimum capabilities for a modern data platform. Conceptual to Logical architecture The diagram below shows a potential logical architecture for building our conceptual data platform. Before we dive into discussing the elements of it, note that the logical architecture is always going to depend on your situation and decisions will be affected by: The skills you have within your organisation Which cloud provider you use (assuming you're even in the cloud) Existing technology stack The appetite for change to the technical estate The key things to note are: All conceptual capabilities have a technology to provide them. Some technologies provide multiple conceptual capabilities (e.g. Redshift providing the transformed data storage and query engine) This...

Building a Data Platform: Part 1 - Concepts

Introduction This is the first part of a series about how to build a basic modern data platform that should be sufficient for most small-medium sized businesses. The series will focus on the key architecture elements, why they are needed and potential technologies that can be used. I'll only discuss batch processing initially, but I may cover real-time/streaming ingestion in a future post. This post will cover the conceptual elements of a modern data platform which can be used as a guide if you're starting out and help when considering which services or tools will meet your requirements. There isn't a "best" technology stack when building a data platform, it's all trade-offs. For example, ETL tools are easier to use than writing code but often hard to integrate within CI/CD pipelines. Conceptual Data Platform Conceptually, let's define a data platform as a collection of technologies that enables an organisation to use its data to the fullest. Generally the...