CV

Thura Aung

Low-resource Southeast Asian NLP researcher

thuraaung.ai.mdy@gmail.com
+66 82-994-8011
Bangkok, , TH

Summary

Low-resource Southeast Asian NLP researcher focused on Burmese, multilingual multimodal LLMs, and dataset creation and evaluation for under-resourced languages.

Education

  • B.Eng. in Software Engineering, Specialized in Artificial Intelligence
    Present
    King Mongkut's Institute of Technology Ladkrabang

Work Experience

  • AI Engineer Intern
    2025-04-01 -
    AI Singapore & VISTEC
    Part-time research on Southeast Asian language evaluation and Burmese NLP.
    • Translated instruction-following datasets for Burmese LLM evaluation.
    • Built evaluation datasets for seven Burmese NLP tasks.
    • Evaluated Burmese NLP benchmarks for LLMs.
  • Machine Learning Engineer
    2025-06-01 - 2025-12-01
    Looloo Technology
    Part-time ML work on Thai speech and text systems.
    • Finetuned speech-augmented language modeling for Thai language.
    • Created synthetic Thai speech datasets with 25k sentences using Thai TTS.
    • Built evaluation pipelines for Thai TTS systems.
  • Lab. Member
    2022-06-01 -
    Language Understanding Laboratory
    Research on Burmese corpora, OCR, language modeling, and evaluation.
    • Built large-scale Myanmar corpora for OCR and language modeling.
    • Supervised speech corpus creation for the medical domain.
    • Fine-tuned transformer models for sequence and token classification tasks.
    • Reproduced experiments for myNLP, the first Burmese NLP toolkit.

Skills

Language Proficiency

  • Burmese (Native)
  • English (Professional Fluency)

Programming Languages

  • Python
  • Rust
  • C/C++
  • Java
  • JavaScript
  • SQL
  • Prolog

Data Tools

  • NumPy
  • Pandas
  • SciPy

Developer Tools

  • Git
  • Docker
  • VS Code
  • AWS SageMaker

Frameworks and Libraries

  • PyTorch
  • TensorFlow
  • OpenCV
  • spaCy
  • NLTK
  • LangChain
  • OpenNMT
  • FastAPI
  • MLflow

Publications

  • myMedi-Whisper: Construction of Burmese Medical Speech Corpus and Whisper Fine-Tuning for Clinical Dialogue ASR
    2025
    Under Review
    Construction of a Burmese medical speech corpus and Whisper fine-tuning for clinical dialogue ASR.
  • SEA-BED: How Do Embedding Models Represent Southeast Asian Languages?
    2026
    Annual Meeting of the Association for Computational Linguistics (ACL 2026)
    Studying how embedding models represent Southeast Asian languages.
  • Burmese-SAN: Burmese NLP Benchmark for Evaluating Large Language Models
    2026
    Language Resources and Evaluation Conference (LREC 2026)
    Benchmarking large language models on Burmese NLP tasks.
  • Enhancing Burmese News Classification using Kolmogorov-Arnold Head Finetuning
    2025
    20th IEEE International Joint Symposium on Artificial Intelligence and Natural Language Processing
    Burmese news classification with Kolmogorov-Arnold head finetuning.
  • ASR Error Correction in Low-Resource Burmese with Alignment-Enhanced Transformers using Phonetic Features
    2025
    20th IEEE International Joint Symposium on Artificial Intelligence and Natural Language Processing
    Low-resource Burmese ASR error correction with phonetic features.
  • myNER: Contextualized Burmese Named Entity Recognition with Bidirectional LSTM and fastText Embeddings via Joint Training with POS Tagging
    2025
    4th IEEE International Conference on Cybernetics and Innovations
    Contextualized Burmese named entity recognition with joint POS tagging.
  • myOCR: Optical Character Recognition for Myanmar Language with Post OCR Correction
    2024
    19th IEEE International Joint Symposium on Artificial Intelligence and Natural Language Processing
    Myanmar OCR with post-OCR correction.
  • myContradict: Semi-supervised Contradictory Sentence Generation for Myanmar Language
    2024
    19th IEEE International Joint Symposium on Artificial Intelligence and Natural Language Processing
    Semi-supervised contradictory sentence generation for Myanmar language.
  • Neural Sequence Labeling Based Sentence Segmentation for Myanmar Language
    2023
    12th CITA 2023
    Sequence labeling for Myanmar sentence segmentation.
  • mySentence: Sentence Segmentation for Myanmar Language using Neural Machine Translation Approach
    2023
    Journal of Intelligent Informatics and Smart Technology
    Sentence segmentation for Myanmar language using NMT.
  • KAConvText: Novel Approach to Burmese Sentence Classification using Kolmogorov-Arnold Convolution
    2025
    arXiv
    Burmese sentence classification with Kolmogorov-Arnold convolution.

Teaching

  • Elementary System Programming (Rust)
    2024
    King Mongkut's Institute of Technology Ladkrabang
    Role: Teaching Assistant
    Supported 30+ freshmen in systems programming using Rust, assisted weekly labs, and conducted pre-session workshops on Rust syntax, compiler tools, and Cargo workflows.