Query Banks

That query bank should be a semantic layer I’ve done it myself: “Oh we should just cache all these commonly used queries somewhere central, so we always have a nice definition for someone to use.” You could even extend the functionality of this “Query Bank” and train an LLM, or perform code reviews, or implement search, etc… Sweet, now anyone can grab some code and use it!
Read more →

Table Creation

Collection of thoughts on table creation. This is specifically regarding tables for analytics (OLAP queries, Data Warehouse, etc…). This is also specific to a larger organization, I don’t necessary apply these when working locally with duckDB per se. Get a good understanding of who’s going to use the table, what they need, and when they’ll need that data by. This will help inform your data model and manage dependencies. If you are working directly with another team have them list out all of the requirements and set a date when this list will be final.
Read more →

Stop building on top of bad decisions

One step back, two steps forward: Stop building on top off bad decisions and fix the bad decisions. Lets say you work with a bunch of people that are setting weekly goals. Each year goals are appended in a particular format to a table referenced by everyone to present goals throughout the organization. Long long ago in a galaxy far far away someone decided set the goal date column as type VARCHAR() so now you have goals associated with strings that range from ‘2025-05’ to ‘2025-02-05’ to ‘02/05/2025’.
Read more →

DRY Analytics

I was talking to one of our partner teams in a developing market. They are a newer analytics department of savvy Business Analysts & SCMs (Supply Chain Managers). Long story short, they are working on this new metric and I was like, How are you filtering for our specific products in your marketplace? And they were like: Oh we have this case statement that we include in all of our jobs…
Read more →

Wide Tables

A few considerations before building that extremely wide table. Every once in a while, an idea comes along to build a table with a wide array of columns at a particular granularity. This is usually associated with “self-serve analytics” and/or an attempt to define a “source of truth” table. Wide tables, self-serve analytics, and source of truth tables aren’t inherently good or bad… but there are some considerations to this type of table that I’m going to rant about.
Read more →

[6 / 4 = 1 = 1.5]: Type Coercion and Precision in SQL

An overview of what’s to come: I was originally writing this as a high-level introduction to send people when they encounter things like 6/4 returning 1 instead of 1.5 in some Database Management Systems (DBMS)… overtime its sprawled far beyond that. The points below serve as a TLDR, with a sprawling discussion expanding afterward. Operations like division in SQL can be confusing for novices or anyone without a programming background
Read more →

Temporary Tables

The database management system (DBMS) I’ll reference today is Redshift (AWS Cloud Datawarehouse offering) which is based on Postgres. I’ll dive into some of the anti-patterns around Temporary Tables I’ve seen abused during my work experience, why they don’t make sense, and how to fix them. Lets get it out of the way… Temporary Tables are not “better” than Common Table Expressions (CTEs) and CTEs are not “better” than temporary tables.
Read more →