What I learned this week⌗

Software⌗

GIT_CURL_VERBOSE=1 git pull --rebase interesting verbose info on HTTPS

A few more linux commands: seq generates series of numbers rev reverses input, prints to stdout wc outputs count of lines, words, and characters

I was attemping to pull changes from a remote repository, running git pull rebase and just randomly got the response:

BUG: remote-curl.c:1528: The entire rpc->buf should be larger than LARGE_PACKET_MAX
error: git-remote-https died of signal 6
fatal: expected flush after ref listing

I’d never gotten this before.

I tried increasing http.postBuffer git config --global http.postBuffer 524288000… nah
git fsck --full && git gc… nah
brew upgrade git from 2.46.0 -> 2.47.0… nah
Ok… no more HTTPS lets use SSH instead… works fine

Not great info online and I’m not sure exactly why this randomly occured. Also the data I was fetching and merging was… tiny:

git pull --rebase 
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 9 (delta 7), reused 7 (delta 5), pack-reused 0 (from 0)
Unpacking objects: 100% (9/9), 850 bytes | 44.00 KiB/s, done.

Reviewed DBT:

Model: Just SQL file, added in models folder
Version Control: Connect to GIT or GIT Lab, allows for analytics folks who don’t get to easily create branch, merge, push and pull
Preview: Run model and return limited result set back to DBT cloud IDE
Connection: Underlying DW that DBT uses for compute (Snowflake, Databricks, Redshift)
Jinja: Templating engine that allows you to reference other models, define how you want your model to be materialized in your DW [Running Compile in the ide shows the actual SQL]
DBT Run: Takes all you models, looks through all the code, wrap it in correct DDL and construct objects in DW in order of the DAG
DBT build: just Run + Test for each step in DAG
Lineage shows dependencies [Jinja this is the REF function used in the downstream model]
Sources [From Fivetran, Stitch, etc…] -> Staging -> First Layer models -> Fact Models / Dimension models
dbt_project.yml: Can tell DBT to build all models under a certain directory as tables instead of doing this with {{config}}
Sources: Document raw tables in yaml, then you reference these tables using jinja, so if the names change you only need to update this file not every single model. Can also monitor source freshness dbt source freshness
Tests: Defined inside yaml that lives with a models directory, supports simple tests against columns out of the box and customer SQL tests by adding models under the tests directory
Add descriptions to your yaml as well, this can be and probably is better done via markdown. dbt docs generate spins up a small web page detailing the docs
Deploying: Deploy models to production, dedicated branch, schema, and schedule/run dbt commands. To do this create a prod env in the deploy tab

What I learned this week 10-12-2024

What I learned this week⌗

Software⌗

Business/Finance⌗

Interesting Links⌗

Math/Stats⌗

Travel⌗

Other⌗