Tinh Loc — Vietnamese AI Data Refinery

What if every Vietnamese broadcast since 1945 could train the next generation of AI?

Six decades of VTV reporting. VOV radio archives that stretch back to the founding. Documentary collections at the Vietnam Film Institute. Manuscripts in Sino-Nôm that no global vendor can read.

The content exists. The digital form doesn't.

We turn one into the other — Vietnamese audio, video, documents, broadcasts, books, and archives into verified, structured, model-ready datasets.

Expert-checked. Source-grounded. Delivered in the format your team already uses.

Vietnamese LLMs need 500B+ tokens of training data. Public corpora top out at 100–200B. The gap is where Vietnamese AI succeeds or fails.

We exist to close it.

Join us.

START A CONVERSATION SEE WHAT WE SHIP