What I learned this week

Software

ACID

GIT_CURL_VERBOSE=1 git pull --rebase interesting verbose info on HTTPS

A few more linux commands: seq generates series of numbers rev reverses input, prints to stdout wc outputs count of lines, words, and characters

I was attemping to pull changes from a remote repository, running git pull rebase and just randomly got the response:

BUG: remote-curl.c:1528: The entire rpc->buf should be larger than LARGE_PACKET_MAX
error: git-remote-https died of signal 6
fatal: expected flush after ref listing

I’d never gotten this before.

  1. I tried increasing http.postBuffer git config --global http.postBuffer 524288000… nah
  2. git fsck --full && git gc… nah
  3. brew upgrade git from 2.46.0 -> 2.47.0… nah
  4. Ok… no more HTTPS lets use SSH instead… works fine

Not great info online and I’m not sure exactly why this randomly occured. Also the data I was fetching and merging was… tiny:

git pull --rebase 
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 9 (delta 7), reused 7 (delta 5), pack-reused 0 (from 0)
Unpacking objects: 100% (9/9), 850 bytes | 44.00 KiB/s, done.

Reviewed DBT:

  • Model: Just SQL file, added in models folder

  • Version Control: Connect to GIT or GIT Lab, allows for analytics folks who don’t get to easily create branch, merge, push and pull

  • Preview: Run model and return limited result set back to DBT cloud IDE

  • Connection: Underlying DW that DBT uses for compute (Snowflake, Databricks, Redshift)

  • Jinja: Templating engine that allows you to reference other models, define how you want your model to be materialized in your DW [Running Compile in the ide shows the actual SQL]

  • DBT Run: Takes all you models, looks through all the code, wrap it in correct DDL and construct objects in DW in order of the DAG

  • DBT build: just Run + Test for each step in DAG

  • Lineage shows dependencies [Jinja this is the REF function used in the downstream model]

  • Sources [From Fivetran, Stitch, etc…] -> Staging -> First Layer models -> Fact Models / Dimension models

  • dbt_project.yml: Can tell DBT to build all models under a certain directory as tables instead of doing this with {{config}}

  • Sources: Document raw tables in yaml, then you reference these tables using jinja, so if the names change you only need to update this file not every single model. Can also monitor source freshness dbt source freshness

  • Tests: Defined inside yaml that lives with a models directory, supports simple tests against columns out of the box and customer SQL tests by adding models under the tests directory

  • Add descriptions to your yaml as well, this can be and probably is better done via markdown. dbt docs generate spins up a small web page detailing the docs

  • Deploying: Deploy models to production, dedicated branch, schema, and schedule/run dbt commands. To do this create a prod env in the deploy tab

Business/Finance

Great tutorial for anyone brand new to building websites

Math/Stats

Travel

Other

Why is the speed of light so fast