Skip to content

THE FACTORY

Five stages. One factory.

Most vendors run marketplaces. Marketplaces produce inconsistent output because the unit of accountability is the individual annotator. We run a factory — the unit of accountability is the pipeline.

Vietnamese audio and video pass through specialised ASR, diarisation, and OCR. The output is fast but raw — humans correct it next.

OUR EXPERT POOL

A working pool across domains.

Workers we can verify. Experts who can verify the work. Both recruited inside Vietnam and paid for their time.

Vietnamese History Professors
Linguistics Researchers
Sino-Nôm Specialists
Native Vietnamese Annotators
Regional Dialect Reviewers
Classical Vietnamese Scholars
Archive Conservators
Broadcast Audio Engineers
Vietnamese Language Editors

The market gap

500B+TOKENS NEEDED
100–200BPUBLIC CORPORA
50,000VFI UNDIGITISED TITLES

FACTORY ANATOMY

The five stages, end to end.

Each stage has its own tooling, its own worker tier, and its own quality gate. Output from one stage cannot reach the next without clearing the gate behind it.

  1. Stage 1

    Extraction

    FFmpeg, faster-whisper, pyannote, PaddleOCR. First-pass machine output.

  2. Stage 2

    Correction

    Vietnamese-native L1–L3 annotators. Diacritic accuracy gate.

  3. Stage 3

    Structuring

    Senior L3+ annotators. Entities, scenes, topic taxonomy.

  4. Stage 4

    Expert Eval

    Independent Vietnamese subject-matter experts. Cross-validation on contentious content.

  5. Stage 5

    QA

    Gold tasks, reviewer ladder, measured accuracy report per batch.

FREQUENTLY ASKED

Questions you're probably asking.

How do you handle Vietnamese diacritics where Whisper fails?
Vietnamese-specialised ASR plus a Vietnamese-native human correction layer. Every output passes a diacritic-accuracy QA gate before delivery. Pilot CER is under three percent on news content before human correction.
What about old-broadcast audio and regional dialects?
Pre-1990 audio and regional dialects are exactly where global vendors fail. Our annotator pool includes Vietnamese-native speakers across regions, and our expert layer includes independent Vietnamese linguistics specialists for dialectal review. We don't outsource what we don't have the expertise to verify.
Can we keep our taxonomy and methodology under IP control?
Yes. Every contract assigns full ownership of delivered datasets to you. We retain only our internal methodology — never your specific data, never your taxonomy, never your edge-case handling. We're structurally vendor-neutral because we don't train competing models.
How is sensitive or politically careful content handled?
Our team signs per-project NDAs covering all client content. Per-project worker access — no cross-project visibility unless explicitly authorised. Data residency in Vietnam by default.
What if we need to scale from 100 hours per month to 5,000?
Capacity ramp is calibrated to 230–400 FTE-equivalent for a five million dollar annual run rate over 24 months. We don't increase volume past quality thresholds — every batch reports against agreed CER and entity precision before the next tier scales.
How does expert validation actually work?
Tier 3 outputs pass through independent Vietnamese history and linguistics subject-matter experts. Each runs five to fifteen hours per week on formal consulting agreements. Contentious historical content gets cross-validated by two experts.
Can we test you before committing to a long contract?
Yes. Our default engagement starts with a paid twenty-hour pilot, fixed price, four-week delivery. The pilot produces real measurable output you can evaluate against your own criteria. No multi-year commitment required to see the work.
How fast can a project actually start?
Scoping call within 48 hours of you reaching out. Sample processing begins within one to two weeks. Proposal in your hands four to five weeks from first call. First production batch can begin within nine weeks.