Pipeline Engineer
Opens your email with the role pre-filled.
AI Pipeline Engineer — Extraction Lead
Location: Remote Contract: Full-time, with founding-team equity Reports to: the CEO (technical) + Chief of Staff (operational)
Our mission
We're building the AI data infrastructure that makes Vietnamese-language models world-class. Every dataset we deliver makes Vietnamese AI stronger.
We believe Vietnamese AI deserves world-class infrastructure — and deserves to be built by Vietnamese people, for Vietnamese people. We want Vietnamese AI to lead, not follow other markets.
Tinh Lọc turns Vietnam's media archives (audio, video, documents, broadcasts — historical and modern) into verified, structured, model-ready datasets. Every dataset we ship expands Vietnamese AI capability across the industry.
We have an anchor customer with a multi-year scope of work, and we launch operations within the next 30 days. This is a chance to join at the founding-team stage, not into a settled company.
Why this role matters
You own our Vietnamese extraction pipeline end-to-end — the ASR, OCR, speaker-diarisation, and entity-extraction stack that turns raw audio, video, and documents into clean, structured, machine-readable data. It runs first in our workflow, and everything after it depends on it: our annotators review and correct the pipeline's output, so the higher its first-pass accuracy, the less manual correction each dataset needs.
That makes this the highest-leverage engineering role in the company. Your pipeline's accuracy directly sets our cost per processed hour of media and our overall throughput — improve the pipeline and the whole operation gets faster and cheaper at once. You own this system outright: you choose the models, design the architecture, and ship to production.
In Year 2, if execution meets expectations, you expand into our data flywheel — the loop that feeds accumulated human corrections back into better automated models (fine-tuning pipelines, evaluation harnesses, active learning). This is the work that compounds our unit economics over time.
Who we're looking for
You are someone who:
-
Has 3-7 years of production ML or production data engineering experience. You've shipped real systems that ran in production — not just notebooks, not just research prototypes. We care about the quality of what you've built and operated, not the title you held.
-
Knows the Vietnamese ASR / OCR / NLP landscape and forms evidence-based opinions about it. Vietnamese speech and text are their own discipline — generic English-first tools often underperform on them, and a field of Vietnamese-specific models has grown up in response (ChunkFormer, PhoWhisper, Vintern, PaddleOCR, NomNaOCR for Sino-Nôm, PhoBERT, GLiNER, and newer entrants). You know this landscape, you have informed views on where each option fits, and — crucially — you benchmark rather than assume. You'll make the actual model calls: evaluating and choosing the stack is the job, not a decision we've made for you.
-
Has built the LLM cost layer the right way. At thousands of media-hours per month, the difference between a naive LLM integration and an engineered one is the difference between a viable business and a bankrupt one. You think in per-task cost-routing, prompt-caching, vLLM batch inference, and batching — the 2026 cost differentiators — and you can be concrete about the trade-offs and the numbers.
-
Has genuine passion for Vietnamese AI specifically. This isn't a role for someone who sees Vietnamese AI as a quick opportunity to pad a CV. We need someone who believes Vietnamese AI must be excellent, and is willing to invest the next 3-5 years of their career to help make that happen. You'll be asked about this specifically in the interview.
-
Can demo a production system end-to-end. You can walk us through a real ASR / OCR / document-understanding / NER system you built — the architecture, the metrics (WER / CER / F1), what broke, and how you fixed it. Theory is not enough; we want the practitioner who has shipped.
-
Is comfortable at founding-team stage. You're comfortable owning a system with no existing scaffolding — you build the scaffolding. You're comfortable with the stack evolving as Vietnamese AI moves (the May 2026 SOTA will have moved by Q4 2026), and with shipping to a deadline.
Who we're NOT looking for
To save your time and ours, here are profiles that won't fit:
- Someone who recommends a generic stack (Whisper + Tesseract) without acknowledging Vietnamese-specific tools. If your first instinct on Vietnamese audio is vanilla Whisper, you haven't done the work — and you'll lose months relearning what the Vietnamese AI community already solved.
- Someone dismissive of Vietnamese-language sources or community. "Reddit and HN are enough" is the wrong cultural fit. The best Vietnamese AI research lives on Viblo, in VLSP papers, and in Vietnamese AI groups. You need to read where the work actually is.
- An over-engineer. If your plan for Day 1 is to rebuild ChunkFormer from scratch, this isn't the role. We ship on top of the best available open and internal models, route intelligently, and improve — we don't rebuild foundations for ego.
- Someone too academic to ship. Strong publication record but never owned a production system that ran reliably under load? You'll do better in a research lab. We need the system live, observable, and cheap.
- A short-termist. Someone who sees Vietnamese AI as a quick opportunity rather than a long-term industry won't stick through 6-18 month customer cycles. We want the engineer who wants to build this industry for the long run.
What you'll do
First 2 weeks
- Read the strategic + technical documentation (business plan, data flywheel doctrine, extraction-pipeline research + exec plan, decision log); 1-on-1s with the CEO + Chief of Staff + co-founders
- Set up your dev environment and reproduce the existing bench-test pipeline locally on a real Vietnamese broadcast sample
- Benchmark the leading Vietnamese ASR options (ChunkFormer, PhoWhisper, and newer models) and establish your model-evaluation method — what makes the cut, and why
By Day 30 — first end-to-end sample batch
- Stand up the extraction stack end-to-end — ASR, OCR (incl. a Sino-Nôm / Hán-Nôm sub-layer), speaker diarisation, and NER (starting candidates: Vintern-1B / PaddleOCR / NomNaOCR; pyannote / Sortformer; PhoBERT / GLiNER — you benchmark and decide)
- Run the first end-to-end sample batch on real content — your proof of throughput and quality
- Stand up the LLM layer (per-task cost-routing, prompt-caching, vLLM batch inference, batching) and prove out the cost-per-processed-hour target
Day 30-90 — pilot + production
- Deliver the first full pilot batch to the customer (~Day 60)
- Ship the production runbook + observability dashboard so the pipeline runs reliably and its accuracy is measurable per task
- Run the weekly sync with the Annotation Programme Manager: your extraction output is their input — the most consequential interface in the company
6-12 months
- Continuously benchmark and replace layers as new Vietnamese AI tools release — own the stack's currency
- Drive cost optimisation: GPU utilisation, spot/preemptible strategy, batching, cost per processed hour of media
- Maintain relationships with model authors and the Vietnamese AI community (license negotiation, academic partnerships)
Year 2 — flywheel infrastructure
- If execution meets expectations: expand into flywheel-infrastructure ownership
- Build the automated fine-tuning (SFT) pipeline that turns accumulated corrections into V1.x → V2.x improved pre-labellers
- Build the evaluation harness (per-task held-out benchmarks, regression prevention) and ML linters (diacritic / NER-consistency / dialect-normalisation checkers)
- Build the active learning system that picks the most informative content to label next — maximising information gain per labelling hour
- Bring on a Junior Pipeline Engineer as the layer scales
Compensation
| Component | Detail |
|---|---|
| Cash base | $3,000-4,500 USD/month (71-107M VND) — top of band reserved for an exceptional candidate |
| Performance bonus | $500-1,000 USD/month, target-based |
| Founding-level equity | Meaningful founding-level equity, drafted by Vietnamese counsel and convertible into real equity if/when a holding entity is established |
| Vesting | 4 years, 1-year cliff (industry standard) |
| Statutory benefits | BHXH 17.5% + BHYT 3% + BHTN 1% (employer-side, fully compliant with Vietnamese law) |
| 13th-month salary (lương tháng 13) | Standard, accrued throughout the year |
| Tết bonus | Minimum 1 month salary; can be more based on performance |
| Conference / learning budget | $1,000-2,000/year (VLSP, courses, books) |
| Equipment | Company-provided laptop; remote-work equipment budget |
Compute is a company resource, not personal pay: you'll have the GPU and inference/token budget the pipeline needs to hit its cost and throughput targets — provisioned by the company as the work requires.
Nice-to-haves
- Vietnamese native or fluent speaker
- Direct relationships in the Vietnamese AI community (VinAI / VinBigData / FPT.AI / Viettel AI / Zalo AI / VietAI / 5CD-AI / Viblo authors)
- Publication record at VLSP / Interspeech / ICASSP on Vietnamese-specific work
- Experience with academic partnerships (Sino-Nôm / Hán-Nôm work is a high-value partnership area)
- Past work on long-tail languages (Indonesian, Khmer, Thai practitioners often have transferable instincts)
- Long-form document parsing experience (Marker, MinerU, Docling)
How to apply
Send via email to contact@coreywilton.org (or through the advisor who introduced you to this role):
- CV — concise, focused on production ML / data engineering experience
- A short note (200-400 words, Vietnamese or English) answering these 3 questions:
- Describe a specific ASR / OCR / NLP model you built or fine-tuned for a low-resource or messy-input case — your WER / CER / F1, what broke, and how you fixed it.
- Given thousands of media-hours per month, how would you architect the LLM layer (routing / caching / batching) to control cost without sacrificing accuracy? Be concrete about the trade-offs and the numbers.
- What specifically excites you about turning Vietnamese media into model-ready data (vs a generic AI role), and what's the single hardest technical problem you anticipate here?
We'll respond within 5 business days to all qualified candidates. Round-1 interview with the CEO (60 min, video). Round-2 with the CEO + co-founder (90 min, video). Offer extended within 7 days of Round-2 if there's a fit.