Aug.2014 - Dec.2019
5yrs 4mos
Analyst for Data Quality - Big Data & Machine Learning
- Built automation tools for data balancing/comparison leading to an increased productivity by over 75% to validate
the reliability of financial data duplicated across multiple data centers in hadoop environment using Python, PySpark,
Apache Hive & shell scripting.
- Deployed machine learning models in development and production settings to detect anomalies in streaming transaction data over a distributed database reducing false alerts by over 5% compared to statistical methods.
- Implemented post-deployment monitoring, logging of the model performance metrics using ELK stack and built reporting dashboards using Graphana to assess quality of the models to improve their continual learning capabilities.
- Developed ETL pipelines using Informatica and performed data engineering tasks using advanced SQL queries to migrate a legacy system that processed banking data used for business analytics.
- Coordinated with large teams following agile development process to brainstorm, plan and executive innovative solutions and to implement systems with high code quality through regular and systematic code reviews.
- Lead a team of 3 developers to mentor and supervise tasks that met Jira user stories.