Tag: data engineering
Data Advantage Matrix
You DEFENITELY should consider read about the ‘Data Advantage Matrix’ before starting your next Data Project https://towardsdatascience.com/data-advantage-matrix-a-new-way-to-think-about-data-strategy-4178cd2f520a
Ferramentas de orquestração – Comparações
Apache Airflow if you want the most full-featured, mature tool and you can dedicate time to learning how it works, setting it up, and maintaining it. Luigi if you need something with an easier learning curve than Airflow. It has fewer features, but it’s easier to get off the ground. Argo if you’re already deeply invested […]
gluestick: ETL lib para Python
Rival do petl? Talvez. Bora testar 🙂 Artigo original de encontro com a lib: https://towardsdatascience.com/how-to-write-etl-operations-in-python-baffbceeadf4 Documentação da Lib: https://github.com/hotgluexyz/gluestick
Benchmark de performances de diferentes formatos de arquivos vs Pandas
Tabela de comparação: Artigos de comparação de performance sobre velocidades de leitura e escrita, consumo de memória, compressão, etc.: https://towardsdatascience.com/stop-persisting-pandas-data-frames-in-csvs-f369a6440af5 https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d