Core Responsibilities: • Created and implemented ELT pipelines using Airflow, Snowflake, DBT, and AWS services like Glue/S3. • Optimized SQL and Jinja code for data transformation within DBT, significantly enhancing API response times by 25% through streamlined data processing and efficient query optimization. • Developed data models that support business requirement, optimizing query performance by 20% while decreasing resource utilization by 10%. • Implemented data quality checks and DBT tests to ensure the accuracy and completeness of our data, achieving a 15% increase in data reliability. Key Technologies and Tools Python, Spark/PySpark, Snowflake, Airflow, AWS services, SQL, DBT, Leadership, ETL/ELT, CI/CD, REST APIs, Redis, MongoDB, PostgreSQL, MySQL, Docker, Kubernetes, Terraform, Tableau.
Core Responsibilities: • Developed OLAP cubes and deployed an Azure Machine Learning project, incorporating TensorFlow and Pandas for predictive modeling, and demonstrated proficiency in MLOps practices to enhance AI model deployment efficiency. Focused on enhancing predictive accuracy for score analysis of legal entities in a Brazilian bank. As a result, successfully reduced response time from 2 days to under 2 hours. • Integrated secure data access protocols with OAuth, employed Postman for robust API testing, and managed data security using Azure Identity, significantly reducing data processing times in 30%. • Developed ETL routines using PySpark, SQL, and Hadoop to streamline data processing and integrationfor the bank’s data engineering team, resulting in a 25% reduction in data processing time. • Contributed to the integration of generative AI models into the data pipeline utilizing Databricks on Azure and AWS EMR.
Core Responsibilities: • Treated, manipulated, and prepared complex data for analysis and created visualizations in Power BI for data exploration and storytelling, enhancing data comprehension and decision-making for the analytics project of an edtech. • Implemented high-throughput data processing solutions with Python's psycopg2 and PySpark for PostgreSQL databases, achieving a 15% reduction in processing time. This enhancement improved data accessibility and bolstered analytical capabilities for the data science team. • Refactored on-premises pipelines into Azure Cloud infrastructure, boosting scalability and reliability for the data engineering project, resulting in a 25% decrease in processing time for pipelines. Key Technologies and Tools Spark/PySpark, Azure Databricks, Azure Services, SQL, ETL/ELT, APIs, Hadoop, MongoDB, PostgreSQL, MySQL, Docker, Power BI.
Core Responsibilities: • Processed, manipulated, and prepared data for analysis and created visualizations in Power BI for data exploration, enhancing data comprehension and decision-making for the marketing analytics project of an industrial company. • Structured relational and non-relational databases using Microsoft SQL Server and Apache HBase, developed new features and maintained an application using Python and Spark. • Supported data prep in an Azure and GCP environment using Databricks, improving data quality and accessibility. • Deployed and ran SQL Server Integration Services (SSIS) packages in Azure Data Factory to automate data extraction, transformation, and loading resulting in a 25% reduction in data processing time. Key Technologies and Tools: Python, Spark/PySpark Azure Databricks, AWS Services, SQL, ETL/ELT, APIs, Hadoop, PostgreSQL, MySQL, Power BI.