From the course: Data Engineering: dbt for SQL

Supply chain outage: SQL spaghetti

From the course: Data Engineering: dbt for SQL

Supply chain outage: SQL spaghetti

- SQL, while immensely powerful, can often become unwieldy and challenging to manage, especially as companies deal with massive amounts of data and hard-coded table and column names. Let me share a true story from my early consulting days to illustrate this point. I was hired by a supply chain company that heavily relied on a set of source tables for making distribution decisions. These tables contained inventory data for various goods and based on this information, the company would create shipping and distribution manifests. To kickstart their data pipeline, they had a mammoth SQL file with over 10,000 lines of code. The file was actually called Mammoth.SQL. This single file held all the logic and instructions to create and update inventory data sets which were then used to generate core shipping manifests through subsequent pipelines. In essence, this single file was the linchpin for all subsequent steps in the pipeline. As you can imagine, dealing with a SQL file of this magnitude was far from easy. The code was complex, error prone, and as you can imagine, universally feared within the company. Support tickets would get tossed around from one team to another as engineers tried to avoid handling it. One fateful day on November 9th, in fact, after onboarding a new inventory data set, disaster struck, the pipeline failed. Engineers attempted retries, but after three attempts, it remained broken. Little did they know that this marked the beginning of a 24 hour outage across the entire organization. As engineering teams toiled throughout the night, with the CEO and CTO personally getting involved. After much effort, one engineering team finally found the root cause of the issue. Unfortunately, this discovery came at the cost of a day of missed shipments and unhappy customers. In the aftermath of this incident, the engineering leadership prioritized dismantling this gigantic SQL file. It took three weeks of engineering time to revamp it. In this process, they uncovered outdated table names, references to incorrect columns, and shockingly, a reference to a staging table in production. This story is not an isolated case but rather a common scenario at many organizations. Critical pipelines are initially designed as functional masterpieces but quickly become outdated and challenging to comprehend. Only a few engineers possess the internal knowledge of these pipelines. In this chapter, we'll dive into a powerful solution to manage SQL more effectively and prevent such headaches in the future. We'll explore ways to improve SQL organization and structure, ensuring robustness, maintainability, and scalability. By mastering these techniques, you'll become a SQL champion and steer your data engineering efforts towards smoother waters.

Contents