We invite you to join our team Lead Data EngineerResponsibilities: design and deployment of on-prem data lakehouse and formation of data flows (stream/batch) for CV/Recsys/Forecast, with guarantees quality/availability, observability and controlled cost.Areas of responsibility: Data architecture and modeling, zoning (raw/curated/feature), data contracts; Stream and batch pipelines, showcases (feature store/data marts), SLA/SLO; Integration of sources: POS/ERP/WMS/e-com/mobile, CDC (Debezium), sc
We invite you to join our team Lead Data Engineer
Responsibilities: design and deployment of on-prem data lakehouse and formation of data flows (stream/batch) for CV/Recsys/Forecast, with guarantees quality/availability, observability and controlled cost.
Areas of responsibility:
- Data architecture and modeling, zoning (raw/curated/feature), data contracts;
- Stream and batch pipelines, showcases (feature store/data marts), SLA/SLO;
- Integration of sources: POS/ERP/WMS/e-com/mobile, CDC (Debezium), schemes/catalogue;
- Data quality/lineage/metadata: DQ rules, automatic tests, cataloging, PII control;
- Productivity/reliability: near-real-time channels, storage/calculation optimization, cost-aware design;
- Vector layer for personalization: versions of embeddings, SLA updates, compatibility with online serving;
- Interaction with DS/MLOps: feature requirements, versioning, service levels.
OKR (examples):
- DQ-rule stability 99% on critical tables;
- SLA of feature availability for inference is fulfilled 99.5%;
- Zero-SRM incidents in experiments; full traceability.
Requirements (must-have):
- 5+ years in Data Engineering, 2+ years in design and operation of on-prem platforms;
- Production experience in streaming (Kafka/Redpanda, CDC via Debezium) and batch processing;
- Design of lakehouse on Iceberg/Delta/Hudi with ACID, scheme evolution and time-travel;
- Orchestration (Airflow or Dagster), dbt Core transformations;
- Shop windows on ClickHouse and SQL layer (PostgreSQL/Trino/Presto); practices DQ (Great Expectations or similar), lineage (OpenLineage), directory/metadata (OpenMetadata or DataHub);
- Infrastructure: Kubernetes/OpenShift, Docker/Containerd, Terraform/Ansible, GitLab CI; observability: Prometheus/Grafana/Loki, OpenTelemetry;
- Leading SQL and query optimization;
- Access/PII and audit experience.
Would be a plus:
- Food Retail/FMCG, SLO for checkout/price events, integration with ERP/WMS;
- ClickHouse replication/sharding, data-contracts-as-code, FinOps (unit-economics at the table/job level);
- Vector indexes (pgvector/FAISS/Milvus) for personalization;
Technical stack (on-prem):
Storage and formats
- Object: MinIO | CEPH
- Lakehouse tables: Apache Iceberg | Delta Lake | Apache Hudi
- File formats: Parquet| ORC
Processing and Transformations
- Clusters: Apache Spark or Apache Flink or Apache Beam
- Orchestration: Apache Airflow | Dagster
- SQL Transformations: dbt Core
Streaming and Integrations
- Event Bus: Apache Kafka | Redpanda
- CDC: Debezium
Storefronts and SQL layer
- Analytical DBMS: ClickHouse
- Operational/OLTP and time-series: PostgreSQL or TimescaleDB
- Federated SQL Engine: Trino | Presto
Data quality, catalog and provenance
- Data quality: Great Expectations or Soda
- Lineage: OpenLineage
- Catalog/metadata: OpenMetadata or DataHub
Infrastructure and operation
- Containers and orchestration: Docker, Kubernetes or OpenShift
- Infrastructure as code: Terraform | Ansible
- CI/CD: GitLab CI
- Observability: Prometheus, Grafana, Loki, OpenTelemetry
Security and Access Control
- Secrets: HashiCorp Vault | Sealed Secrets
- Access policies: policy-as-code (OPA/Gatekeeper or Kyverno)
The company offers:
- remote or hybrid format work;
- employment on the terms of a gig contract or in the state (reservation is possible);
- paid annual leave of 24 calendar days, paid sick leave;
- regular payment of wages without delays and in stipulated amounts, regular salary review;
- opportunity for professional and career growth;
- training courses.
Contact person: Kateryna, tel. data-vacancyphone="">0984567857 (t.me/KaterynaB_HR)