Project Portfolio
Client: confidential Dec 2022- Current
Role: Lead Data Engineer
Project description:
Designed and implemented scalable data solutions for a healthcare organization, ensuring high availability, low
latency, and compliance with healthcare regulations such as HIPAA. The project involved real-time data ingestion,
processing, and analytics, migration of on-premises data solutions to AWS, and integration across multiple
platforms to provide actionable insights through data visualization tools.
Roles & Responsibilities:
● Developed and optimized complex PL/SQL procedures, functions, packages, and triggers to support data processing,
transformation, and reporting requirements.
● Designed and maintained efficient relational database schemas in PostgreSQL, MySQL, SQL Server, and Oracle, ensuring
high performance, scalability, and data integrity.
● Optimized SQL queries using indexing strategies, materialized views, and partitioning techniques to enhance database
performance and reduce execution times.
● Implemented ETL processes using PL/SQL, AWS Glue, Apache Airflow, and Apache NiFi to extract, transform, and load
data from multiple sources into centralized storage.
● Automated recurring data processing tasks by developing PL/SQL scripts, improving operational efficiency, and
reducing manual workload.
● Diagnosed and resolved database performance bottlenecks by analyzing execution plans, optimizing queries, and
refactoring inefficient code.
● Ensured data security and compliance with regulatory standards such as HIPAA and GDPR by implementing encryption,
role-based access control (RBAC), and data masking techniques.
● Designed and built high-performance data pipelines for seamless integration of structured and semi-structured data
across multiple platforms.
● Created and managed stored procedures, triggers, and views to streamline data aggregation, validation, and
transformation processes.
● Built efficient SQL-based data models and analytical solutions to support business intelligence reporting with Power BI
and Tableau.
● Led and mentored a team of 3-5 data engineers, providing technical guidance on PL/SQL development, query
optimization, and data pipeline management.
● Worked closely with client architects and technical teams to design robust database solutions, ensuring alignment with
business requirements and industry best practices.
● Collaborated with cross-functional teams to analyze complex data requirements and implement scalable and optimized
database solutions.
● Documented technical processes, best practices, and database architectures, ensuring knowledge sharing and
maintainability of data solutions.
● Designed and optimized data integration workflows for large-scale data migration projects, ensuring minimal downtime
and data consistency.
● Developed efficient Python scripts to automate data extraction, transformation, and loading (ETL) processes, reducing
manual intervention.
● Wrote and optimized complex SQL queries for data manipulation, reporting, and analytics, improving system efficiency
and response times.
● Created, optimized, and maintained database indexing strategies to enhance query performance and support high-
volume data transactions.
● Integrated data from multiple sources using PL/SQL and Python-based ETL pipelines, ensuring seamless data availability
for analytics and reporting.
● Conducted performance tuning of SQL queries, database structures, and PL/SQL procedures to enhance the
responsiveness of data-driven applications.
● Monitored and troubleshot database performance issues, proactively identifying and resolving bottlenecks to maintain
system reliability.
● Provided strategic recommendations for database design, ensuring scalability and future growth for business-critical
applications.
● Developed and maintained CI/CD pipelines for database code deployment, ensuring smooth and efficient code releases
with minimal disruptions.
● Configured and optimized AWS RDS, PostgreSQL, and Oracle environments, ensuring high availability and fault
tolerance.
● Led database migration projects from on-premises systems to AWS, leveraging cloud-native services like AWS Glue,
Redshift, and Snowflake.
● Implemented robust data governance strategies, including access control policies, encryption mechanisms, and audit
trails, to enhance data security.
● Designed and implemented real-time data processing pipelines using Apache Kafka and Spark, enabling low-latency
data processing and analysis.
● Created and maintained technical documentation for database designs, ETL workflows, and data integration processes,
improving knowledge sharing across teams.
● Provided technical leadership in data architecture discussions, ensuring alignment with industry best practices and
emerging technologies.
● Conducted regular code reviews, enforcing best practices in SQL, PL/SQL, and Python development, and ensuring high-
quality code delivery.
● Collaborated with business analysts, data scientists, and application developers to understand data requirements and
develop optimized database solutions.
● Designed and implemented high-availability and disaster recovery strategies, including replication, backups, and
failover mechanisms, to ensure business continuity.
● Developed robust data validation and quality control mechanisms to ensure data accuracy and consistency across
multiple systems.
Client: Broadridge Jul 15- Dec 22
Title: Data solutions Analyst
Project Description:
This project involved designing and implementing scalable data processing pipelines using Apache Spark on
Databricks, spanning Azure, AWS, and GCP environments. The objective was to optimize data processing
efficiency, ensure seamless data integration across cloud platforms, and enhance data retrieval performance.
The project also included cloud data warehouse optimization, ETL process automation, and supporting
business intelligence initiatives..
Roles & Responsibilities:
● Developed and optimized large-scale data processing pipelines using Apache Spark on Databricks, ensuring
efficient data ingestion and transformation across Azure, AWS, and GCP.
● Designed and managed data lakes on Azure Data Lake, AWS S3, and GCP Cloud Storage, improving data
accessibility, reducing query response times, and enhancing storage efficiency.
● Implemented and optimized cloud-based data warehouses, including Snowflake, AWS Redshift, and Google
BigQuery, achieving significant reductions in data retrieval time through query optimization and indexing
strategies.
● Integrated multiple AWS services (AWS Glue, S3, Redshift, EMR) to support a hybrid cloud data
infrastructure, improving data processing speed and scalability.
● Led end-to-end data migration projects from on-premises systems to cloud environments, utilizing Azure
Synapse Analytics, AWS Redshift, and BigQuery to enhance analytical capabilities.
● Developed and deployed machine learning models in collaboration with data scientists, enabling predictive
analytics and business insights generation.
● Designed and implemented ETL workflows using Apache NiFi, Talend, Azure Data Factory, and AWS Glue,
ensuring seamless data ingestion, transformation, and loading across platforms.
● Optimized PL/SQL queries and stored procedures, improving database performance, reducing execution
times, and ensuring data integrity across systems.
● Automated data validation and quality checks using AWS Glue DataBrew and implemented error-handling
mechanisms for improved ETL process reliability.
● Managed and resolved critical month-end and quarter-end financial data discrepancies, ensuring accurate
financial reporting and compliance.
● Collaborated with business analysts and end-users to gather and analyze requirements, translating business
needs into scalable and efficient data solutions.
● Utilized JIRA for Agile project management, tracking tasks, sprints, and issue resolution, while maintaining
version control and collaborative development using GitHub.
● Provided technical leadership and mentorship to junior developers, conducting code reviews, knowledge-
sharing sessions, and training workshops to improve data engineering best practices.
Client: Johnson Controls Feb 14–Jun 15
Title: ETL Developer
Project Description:
The project involved developing and maintaining robust data processing pipelines, utilizing a combination of
Hadoop, SQL Server, Apache Kafka, and modern data visualization tools like Tableau and Power BI. The project
aimed to streamline data processing across multiple domains, ensuring real-time data availability, efficient
database management, and insightful data visualization for stakeholders.
Roles & Responsibilities:
• Developed and maintained Hadoop-based data pipelines, processing petabytes of data across various
domains.
• Automated data extraction, transformation, and loading processes, reducing manual intervention
• Enhanced data processing capabilities using Apache Kafka for real-time data streaming.
● Experienced in implementing the network in the infrastructure using the AWS services like VPC,Security
group, VPC peering, VPC endpoints, private link, cloud trail, cloud front, API gateway.
• Experienced in implementing the azure data factory, azure data lake, snowflake
• Created data visualization dashboards in Tableau and Power BI, enabling stakeholders to derive
actionable insights.
• Implemented CI/CD pipelines using Jenkins and Docker, improving deployment efficiency and reducing
downtime.
• Created new database objects like Procedures, Functions, Packages, Triggers, Indexes and Views using T-
SQL in SQL Server 2008.
• Validated change requests and made appropriate recommendations. Standardized the implementation of
data.
• Promoted database objects from test/develop to production. Coordinated and communicated production
schedules within development team.
• Created the DTS Package through ETL Process to vendors in which records were extracts from Flat file and
Excel sources and loaded daily at the server.
• Involved in the development of a normalized database using Data Definition Language (DDL) in T-SQL.
• Performed job scheduling in the SQL Server Environment.
• Used Data Modification Language (DML) to insert and update data, satisfying the referential integrity
constraints and ACID properties.
• Modified database structures as directed by developers for test/develop environments and assist
with coding, design and performance tuning.
Client: JPMorgan Chase Jul12–Dec13
Role: Test Data Management Analyst
Project Overview:
The primary goal of this project was to design and develop a robust Data Warehouse (DW) and DataMart
infrastructure to support advanced reporting and business intelligence needs. This involved optimizing existing ETL
processes, developing dynamic and scalable solutions, and ensuring efficient data management across multiple
environments.
Roles & Responsibilities
• Involved in end-to-end process of design and development of the DW and DataMart’s for reporting
purposes.
• Fine-tuned the existing code to make it more dynamic for using it to load multiple sources to multiple
destinations or single destination based on the requirement.
• Converted queries that took more than 2 hours to get data from different sources into the destination to
get the data within mins by making changes to the code to use latest coding techniques available.
• Involved in designing, development using T- SQL and pkgs reports and cubes.
• Assign the work to the Jr developer and help him complete his tasks on time and in an effective manner.
• Involved in lot of dynamic SQL coding that gets data from different sources like excel, flat files, tables into
the staging.
• Created processes to automatically look for the existence of the file and load it when it’s available and
send the details about the file at each step to the concerned people.
• Created load balancing details for audit purposes.
• Created config files to move SSIS packages between different environments for package deployment
model.