Introduction
In Part 2 we walked through an example modern data architecture (shown below).
The table below shows a mapping of roles to the skills required. Note that this is not exhaustive and it is very common for individuals to have a mix of these skills rather than fitting neatly into one profile or another.
On the surface, it would appear that all you need are Data Engineers to be successful but that isn’t necessarily the case and should become clear as we look at the roles in more detail.
By contrast, the horizontal approach has cross-discipline teams that build processes across all parts of the architecture but are defined by some form of domain or product-set. For example, there could be a team responsible fore ingesting and transforming customer and order data and another one responsible for warehousing and shipping data.
The approach to use depends on the context of your organisation and will depend on a variety of factors including:
The key takeaway is that Data Engineers are essential for building a modern data architecture but other roles are equally important and which roles you need will depend on the skills your organisation already has available.
In this third part we will discuss the people required to build it and discuss the types of skills required. This discussion will focus on three key areas of the platform:
- Ingestion of data into the lake
- Transformation of the data so it is ready for consumers
- The infrastructure to support these two processes
This means that with the exception of analysts, roles that are focused on data sources (e.g. DBAs) or data consumers (e.g. Data Scientists) will not be discussed in this part.
Required Skills
The skills required for the example architecture can be broken down roughly into the following:- Infrastructure - Terraform, IAM, Networking & Security, ECS, S3, Glue
- Data Ingestion - Python, S3, Glue, Lambda, ECS
- Data Transformation - Redshift, SQL, DBT, Python, ECS
Some areas will overlap and the level of skill required will vary on the team. For example, the Python knowledge to run DBT is significantly less than that required to build ingestion pipelines.
Roles
This section discusses the roles required to build the example architecture. Note that role definitions can vary widely across industry and can be subjective to an extent. As such, this is only my definition of the roles.The table below shows a mapping of roles to the skills required. Note that this is not exhaustive and it is very common for individuals to have a mix of these skills rather than fitting neatly into one profile or another.
On the surface, it would appear that all you need are Data Engineers to be successful but that isn’t necessarily the case and should become clear as we look at the roles in more detail.
Platform Engineer
Platform Engineers are cloud and infrastructure specialists who focus on making an organisation's cloud platforms available for use - this encompasses accounts, networking, security and other infrastructure related tasks. Platform Engineers are often responsible for any shared infrastructure (e.g. networking that links accounts together, CI/CD platforms, etc) as well as provisioning new cloud accounts. Platform Engineers have a deep knowledge of cloud providers and how they interconnect and are essential within an organisation.Data Engineer
Data Engineers are a specialised form of Software Engineer that focus on building infrastructure and processes which integrate and store data. As such Data Engineers have a very broad skillset across data technologies, frameworks and programming languages. Most data engineers will also have some form of platform engineering skills including IaC. However, data engineers are data specialists, not cloud specialists so while they can build out a lot of infrastructure, it’s rare for them to set up a completely new cloud environment (i.e. entirely new AWS organisation/accounts) without support from a platform engineer.A data engineer may not know DBT specifically but with their skills they will be able to pick it up very quickly which is why it has been included in their role profile (as DBT is a data transformation tool).
Analytics Engineer
The analytics engineer is a relatively new role profile but is growing in popularity thanks to tools such as DBT. The analytics engineer is focussed on transforming data so that it can be used by data consumers. Analytics engineers sit between the analyst and data engineer skillsets; they have enough engineering knowledge to be able to build and deploy automated data transformation processes and enough analytical knowledge to know how the data is likely to be used and therefore how to model it effectively.Analyst
Analysts are focused on consuming data for reports and analyses. However, they have the core skills which make them great for cross-training towards analytics engineering. DBT uses SQL to define transformations which makes it easier with some programming knowledge to cross train the analytics engineer skillset.Additional Key Roles
There are several additional roles that are key to a data team that don’t directly build the infrastructure or datasets:- Delivery Manager - Accountable for the team’s output and overall performance. Runs the various ceremonies, helps with blockers and assists with collaboration across teams.
- Product Owner/Manager - Defines the roadmap for the data products and works with stakeholders to build a prioritised backlog for the engineers to implement. Sometimes combined with the Business Analyst Role.
- Business Analyst - This role gathers the detailed requirements from the data consumers and presents them as user stories, process maps and other documentation to guide the engineers in implementation. Sometimes combined with the Product Owner role.
- Quality Assurance Engineer - Responsible for defining the testing strategy and approach. Often responsible for creation and execution of test scripts and automated testing processes.
Minimum Squad
Now that all the key roles have been defined, a minimum squad can be defined.Multiple Squads
There are two main approaches to having multiple squads working on the example data architecture:- Vertical
- Horizontal
By contrast, the horizontal approach has cross-discipline teams that build processes across all parts of the architecture but are defined by some form of domain or product-set. For example, there could be a team responsible fore ingesting and transforming customer and order data and another one responsible for warehousing and shipping data.
The approach to use depends on the context of your organisation and will depend on a variety of factors including:
- The number of domains/data products
- The role mix the organisation has / can hire
- Where bottlenecks are (it could be far quicker and easier to ingest data than transform it or vice versa)
Summary
This article outlined the skills required to build the example data architecture and mapped them to roles. The article then defined what a sensible minimum squad structure would be in terms of roles and the number of individuals required for each role. We also discussed ways to scale the engineering function beyond a single team.The key takeaway is that Data Engineers are essential for building a modern data architecture but other roles are equally important and which roles you need will depend on the skills your organisation already has available.
Comments
Post a Comment