Data engineering has many facets. One of the most common projects a data engineer takes on is developing an ETL pipeline from an operational DB to a data warehouse. Our team wanted to cover the overarching design of an ETL.
What are the typical principal components, stages, considerations, etc?
We started this by first writing Creating An ETL Part 1(more to come) and we and now have worked on a video that is below that walks through the process. We wanted to discuss why each stage is important and what occurs when data goes from raw to stage, why do we need a raw database and so on.
Data engineering is a complex discipline that partners automation, programming, system design, databases, and analytics in order to ensure that analysts, data scientists and end-users have access to clean data.
This all starts with the basic ETL design.