Key Responsibilities:
Point-in-Time Research Data Lake
- Design and build a point-in-time research data lake that records the state of market data as it was known at each historical moment, eliminating look-ahead bias in research and backtesting.
Reproducible Feature Engineering Pipelines
- Build version-controlled, reproducible feature engineering pipelines. Every model output must be traceable back to the data state and transformations that produced it.
Handle the realities of equity market data:
- Outliers and anomalous prints
- Corporate actions (splits, dividends, mergers, spin-offs)
- Symbol mapping and ticker changes across vendors
Optimized Data Retrieval Layer
- Tune the data retrieval layer for both research and production access patterns — columnar formats, partitioning, caching, and query performance tuning.
Monitor production data flow and quality end-to-end:
- Real-time data flow from exchanges
- Data availability checks for model predictions
- Continuous data quality monitoring in production
Qualifications
Education:
- Bachelor's degree or higher in Computer Science, Engineering, Mathematics, or a related quantitative field.
Experience:
- Minimum 5 years in Data Engineering, with at least 2 years on production AWS infrastructure.
- Track record of designing data systems from greenfield through to production.
- Strong computer-science fundamentals (data structures, algorithms, distributed systems).
- Able to manage multiple parallel projects independently with strong ownership.
Must-Have Skills:
- Production-grade Python (not notebook-only).
- AWS production experience: S3, Glue, Athena or EMR, Lake Formation, Lambda.
- Workflow orchestration: Airflow, Dagster, or Prefect.
- Time-series & columnar storage: Parquet, Arrow, ClickHouse, or equivalent.
- Data lake house: Delta Lake, Iceberg, or Hudi.
- Advanced SQL — both analytical and transactional databases.
- Database performance: indexing, partitioning, query optimization.
- Containerization & IaC: Docker, Terraform (or equivalent).
- Data architecture — designing systems that scale from day
Nice-to-Have:
- Real-time streaming: Kafka, AWS Kinesis, or MSK - comfortable either building or operating.
- Equity market data experience — has worked with messy financial data (tick data, OHLC, corporate actions, vendor reconciliation).
- Vendor integration — onboarding and reconciling data from multiple market-data vendors.
- kdb+ or other specialized time-series database experience.
- Experience inside a systematic trading team or hedge fund.
- ML lifecycle support — feature stores, model registry, experiment tracking.
Personality:
- Self-driven, takes ownership of the full loop from design to production.
- Systems thinker, attentive to data-quality detail.
- Clear technical communicator who writes readable documentation.
- Collaborates well with research and trading teams.
- Comfortable with short feedback cycles and rapid iteration.