Karthik Raj
Lead Data Engineer
Big Data Engineer with expertise in ETL pipelines, real-time analytics, and modern technologies.
About
I’m a data engineer passionate about designing scalable, efficient data pipelines and real-time analytics solutions. I enjoy solving complex problems by blending technical expertise with innovative thinking, ensuring systems are robust, performant, and tailored for impactful results.
Currently, I’m a Lead Data Engineer at Softborne, where I work on building multi-platform data connectors and managing large-scale data ingestion using GCP services like BigQuery, Pub/Sub, and Cloud Run. My work focuses on optimizing data processes and enabling seamless analytics for diverse use cases.
In the past, I’ve contributed to companies across various industries, including healthcare, e-commerce personalization, and outsourcing. My experience spans building Snowflake data models, migrating systems to distributed platforms like Spark, and developing recommendation engines for global brands. Additionally, I’ve worked on multiple freelance and personal projects, showcasing my adaptability and curiosity for exploring new technologies.
Outside of work, I love customizing development environments, exploring Linux systems, and spending time on open-source projects. You might also find me diving into challenging tech puzzles or refining my Vim setup or smashing final boss in PS5!
Experience
2023 — Present Developed data connectors and enhanced a Python client for GRPC-based ingestion; utilized GCP services like GCS, Pub/Sub, BigQuery, Cloud Run, and Log Explorer for data processing.
- Python
- Scala
- Flask
- GCP
- Snowflake
- BigQuery
2020 — 2023 Designed a snowflake data model for the US Provider Network and developed a real-time analytics dashboard. Built a live data pipeline using Azure Cosmos DB, integrating it with Django and React for seamless interaction and end-user visualization.
- Python
- React
- Django
- AWS
- Azure
2019 — 2020 Migrated code from SQL to Spark SQL for distributed processing, warehoused data from AWS Athena and Netezza, and converted SAS code to Python. Configured AWS services including Lambda, Kinesis, EC2, and S3 for scalable data processing and integration.
- Python
- Spark
- SQL
- AWS
2018 — 2019 Developed dynamic ETL pipelines on a distributed cloud platform with real-time analytics using Apache Kafka and Spark. Created visualizations using Thoughtspot and Metabase, built a recommendation model for e-commerce, and configured AWS services while following an agile development process.
- Scala
- Spark
- AWS
- Cassandra
- Thoughtspot
- Jenkins
- Docker
2017 — 2018 Developed a desktop application for report analysis and an analytics platform to optimize production efficiency. Used Electron.js, Video.js, DC.js, and Material Design for the frontend, and Django REST Framework with serializers and mixins for the backend.
- Python
- Django
- Node JS
- Electron.js
- React
- AWS
Projects
FomoRadioAI - Crypto-based Radio Jockey
Built a real-time analytics dashboard for cryptocurrency trends by scraping social media data (Twitter) using Snowflake, AWS, Python, Kafka, Django REST, and MongoDB to deliver up-to-date user insights.
MakeMyTweet - Crypto-Miner using tweets
Built a real-time analytics dashboard for cryptocurrency trends by scraping social media data (Twitter) using Snowflake, AWS, Python, Kafka, Django REST, and MongoDB to deliver up-to-date user insights.
Neuroliten - Connected Clinical Platform
Developed secure real-time analytics pipelines for healthcare, aligning with FDA-approved big data solutions and FHIR standards. Used Snowflake, AWS, Kafka, Scala, Python, Django REST, and RNCryptor for implementation.
