What I learned this week 10-12-2024
What I learned this week⌗
Software⌗
GIT_CURL_VERBOSE=1 git pull --rebase
interesting verbose info on HTTPS
A few more linux commands:
seq
generates series of numbers
rev
reverses input, prints to stdout
wc
outputs count of lines, words, and characters
I was attemping to pull changes from a remote repository, running git pull rebase
and just randomly got the response:
BUG: remote-curl.c:1528: The entire rpc->buf should be larger than LARGE_PACKET_MAX
error: git-remote-https died of signal 6
fatal: expected flush after ref listing
I’d never gotten this before.
- I tried increasing http.postBuffer
git config --global http.postBuffer 524288000
… nah git fsck --full && git gc
… nahbrew upgrade git
from 2.46.0 -> 2.47.0… nah- Ok… no more HTTPS lets use SSH instead… works fine
Not great info online and I’m not sure exactly why this randomly occured. Also the data I was fetching and merging was… tiny:
git pull --rebase
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 9 (delta 7), reused 7 (delta 5), pack-reused 0 (from 0)
Unpacking objects: 100% (9/9), 850 bytes | 44.00 KiB/s, done.
Reviewed DBT:
-
Model: Just SQL file, added in models folder
-
Version Control: Connect to GIT or GIT Lab, allows for analytics folks who don’t get to easily create branch, merge, push and pull
-
Preview: Run model and return limited result set back to DBT cloud IDE
-
Connection: Underlying DW that DBT uses for compute (Snowflake, Databricks, Redshift)
-
Jinja: Templating engine that allows you to reference other models, define how you want your model to be materialized in your DW [Running Compile in the ide shows the actual SQL]
-
DBT Run: Takes all you models, looks through all the code, wrap it in correct DDL and construct objects in DW in order of the DAG
-
DBT build: just Run + Test for each step in DAG
-
Lineage shows dependencies [Jinja this is the REF function used in the downstream model]
-
Sources [From Fivetran, Stitch, etc…] -> Staging -> First Layer models -> Fact Models / Dimension models
-
dbt_project.yml: Can tell DBT to build all models under a certain directory as tables instead of doing this with {{config}}
-
Sources: Document raw tables in yaml, then you reference these tables using jinja, so if the names change you only need to update this file not every single model. Can also monitor source freshness
dbt source freshness
-
Tests: Defined inside yaml that lives with a models directory, supports simple tests against columns out of the box and customer SQL tests by adding models under the tests directory
-
Add descriptions to your yaml as well, this can be and probably is better done via markdown.
dbt docs generate
spins up a small web page detailing the docs -
Deploying: Deploy models to production, dedicated branch, schema, and schedule/run dbt commands. To do this create a prod env in the deploy tab
Business/Finance⌗
Interesting Links⌗
Great tutorial for anyone brand new to building websites