Search by job, company or skills

woxa group

Senior Data Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 14 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities:

Point-in-Time Research Data Lake

  • Design and build a point-in-time research data lake that records the state of market data as it was known at each historical moment, eliminating look-ahead bias in research and backtesting.

Reproducible Feature Engineering Pipelines

  • Build version-controlled, reproducible feature engineering pipelines. Every model output must be traceable back to the data state and transformations that produced it.

Handle the realities of equity market data:

  • Outliers and anomalous prints
  • Corporate actions (splits, dividends, mergers, spin-offs)
  • Symbol mapping and ticker changes across vendors

Optimized Data Retrieval Layer

  • Tune the data retrieval layer for both research and production access patterns — columnar formats, partitioning, caching, and query performance tuning.

Monitor production data flow and quality end-to-end:

  • Real-time data flow from exchanges
  • Data availability checks for model predictions
  • Continuous data quality monitoring in production

Qualifications

Education:

  • Bachelor's degree or higher in Computer Science, Engineering, Mathematics, or a related quantitative field.

Experience:

  • Minimum 5 years in Data Engineering, with at least 2 years on production AWS infrastructure.
  • Track record of designing data systems from greenfield through to production.
  • Strong computer-science fundamentals (data structures, algorithms, distributed systems).
  • Able to manage multiple parallel projects independently with strong ownership.

Must-Have Skills:

  • Production-grade Python (not notebook-only).
  • AWS production experience: S3, Glue, Athena or EMR, Lake Formation, Lambda.
  • Workflow orchestration: Airflow, Dagster, or Prefect.
  • Time-series & columnar storage: Parquet, Arrow, ClickHouse, or equivalent.
  • Data lake house: Delta Lake, Iceberg, or Hudi.
  • Advanced SQL — both analytical and transactional databases.
  • Database performance: indexing, partitioning, query optimization.
  • Containerization & IaC: Docker, Terraform (or equivalent).
  • Data architecture — designing systems that scale from day

Nice-to-Have:

  • Real-time streaming: Kafka, AWS Kinesis, or MSK - comfortable either building or operating.
  • Equity market data experience — has worked with messy financial data (tick data, OHLC, corporate actions, vendor reconciliation).
  • Vendor integration — onboarding and reconciling data from multiple market-data vendors.
  • kdb+ or other specialized time-series database experience.
  • Experience inside a systematic trading team or hedge fund.
  • ML lifecycle support — feature stores, model registry, experiment tracking.

Personality:

  • Self-driven, takes ownership of the full loop from design to production.
  • Systems thinker, attentive to data-quality detail.
  • Clear technical communicator who writes readable documentation.
  • Collaborates well with research and trading teams.
  • Comfortable with short feedback cycles and rapid iteration.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147384055