Data Engineering Lead

Job Description

Title: Data Engineering Lead
Location: Bangalore (with occasional travel)
Experience: 5 to 6 Years
Qualification: B.E./B.Tech in CSE, ECE, or related technical field

About CDPG

At CDPG, we are committed to democratizing data. Our mission is to harness its power by creating sovereign data exchange platforms that serve the Public Good. By ensuring that data exchange is ethical, secure, and privacy-centric, we strive to make the benefits of data accessible to all, promoting regional inclusivity and informed decision-making.

Scope of Work

As the Data Engineering Lead, you will own the end-to-end strategy, architecture, and execution of data onboarding across the CDPG ecosystem. You will lead a high-performing team of engineers to integrate complex datasets from national agencies, government departments, and system integrators. Your mission is to manage the full data lifecycle, ensuring all integrations are scalable, secure, and fully compliant with international data sovereignty policies.

Responsibilities

Lead a team of data engineers, oversee resource allocation, and drive technical upskilling in AI-driven development and automation.
Act as the primary technical liaison for high-level government officials and agency heads to define data protocols and strategic integration roadmaps.
Evaluate APIs and datasets to design robust data models (JSON-LD, GeoJSON). Architect ETL modules that are containerized and ready for deployment.
Oversee the development of Python-based ETL modules for data ingestion via REST APIs and streaming protocols (AMQP, MQTT).
Own the integration delivery lifecycle using Agile methodologies, ensuring high-quality, on-time delivery of data flows.
Monitor data flows to identify and resolve integration errors, ensuring high availability and performance.
Create and maintain documentation for integration procedures, data maps, and technical specifications.
Enforce software development best practices, ensuring rigorous unit/functional testing and maintaining the integrity of the Data Catalogue.

Technical Skills

Expert-level software design in Python and Java/JavaScript, with deep proficiency in data modeling (JSON-schema, JSON-LD).
Strong analytical and problem-solving skills to troubleshoot complex data issues. Ability to evaluate data availability, frequency, and systemic issues (e.g., stream repetitions or schema drift) within data systems.
Proven experience building ETL modules for real-time ingestion using REST, AMQP, and MQTT protocols.
Expertise in building resilient data processing workflows using PySpark for heavy lift transformations and Airflow, NiFi for orchestration.
Strong knowledge of modern data warehouse architectures: Medallion with Bronze, Silver, Gold refinement layers, Data Mesh with decentralized, domain-owned data products.
Ability to leverage AI technologies and LLMs within data integration frameworks to streamline and accelerate the integration lifecycle.
Good command of geospatial data (vector/raster) and GIS tools such as QGIS or similar for analyzing and transforming spatial datasets is desirable.
Proficient in Linux environments and GIT version control. Familiarity with Docker and Kubernetes deployment orchestration is desirable.
Experience with monitoring/logging stacks Prometheus, Grafana, and Logstash to ensure system health and reliability.