CV
Thura Aung
Low-resource Southeast Asian NLP researcher
Summary
Low-resource Southeast Asian NLP researcher focused on Burmese, multilingual multimodal LLMs, and dataset creation and evaluation for under-resourced languages.
Education
- B.Eng. in Software Engineering, Specialized in Artificial IntelligencePresentKing Mongkut's Institute of Technology Ladkrabang
Work Experience
- AI Engineer Intern2025-04-01 -AI Singapore & VISTECPart-time research on Southeast Asian language evaluation and Burmese NLP.
- Translated instruction-following datasets for Burmese LLM evaluation.
- Built evaluation datasets for seven Burmese NLP tasks.
- Evaluated Burmese NLP benchmarks for LLMs.
- Machine Learning Engineer2025-06-01 - 2025-12-01Looloo TechnologyPart-time ML work on Thai speech and text systems.
- Finetuned speech-augmented language modeling for Thai language.
- Created synthetic Thai speech datasets with 25k sentences using Thai TTS.
- Built evaluation pipelines for Thai TTS systems.
- Lab. Member2022-06-01 -Language Understanding LaboratoryResearch on Burmese corpora, OCR, language modeling, and evaluation.
- Built large-scale Myanmar corpora for OCR and language modeling.
- Supervised speech corpus creation for the medical domain.
- Fine-tuned transformer models for sequence and token classification tasks.
- Reproduced experiments for myNLP, the first Burmese NLP toolkit.
Skills
Language Proficiency
- Burmese (Native)
- English (Professional Fluency)
Programming Languages
- Python
- Rust
- C/C++
- Java
- JavaScript
- SQL
- Prolog
Data Tools
- NumPy
- Pandas
- SciPy
Developer Tools
- Git
- Docker
- VS Code
- AWS SageMaker
Frameworks and Libraries
- PyTorch
- TensorFlow
- OpenCV
- spaCy
- NLTK
- LangChain
- OpenNMT
- FastAPI
- MLflow
Publications
- myMedi-Whisper: Construction of Burmese Medical Speech Corpus and Whisper Fine-Tuning for Clinical Dialogue ASR2025Under ReviewConstruction of a Burmese medical speech corpus and Whisper fine-tuning for clinical dialogue ASR.
- SEA-BED: How Do Embedding Models Represent Southeast Asian Languages?2026Annual Meeting of the Association for Computational Linguistics (ACL 2026)Studying how embedding models represent Southeast Asian languages.
- Burmese-SAN: Burmese NLP Benchmark for Evaluating Large Language Models2026Language Resources and Evaluation Conference (LREC 2026)Benchmarking large language models on Burmese NLP tasks.
- Enhancing Burmese News Classification using Kolmogorov-Arnold Head Finetuning202520th IEEE International Joint Symposium on Artificial Intelligence and Natural Language ProcessingBurmese news classification with Kolmogorov-Arnold head finetuning.
- ASR Error Correction in Low-Resource Burmese with Alignment-Enhanced Transformers using Phonetic Features202520th IEEE International Joint Symposium on Artificial Intelligence and Natural Language ProcessingLow-resource Burmese ASR error correction with phonetic features.
- myNER: Contextualized Burmese Named Entity Recognition with Bidirectional LSTM and fastText Embeddings via Joint Training with POS Tagging20254th IEEE International Conference on Cybernetics and InnovationsContextualized Burmese named entity recognition with joint POS tagging.
- myOCR: Optical Character Recognition for Myanmar Language with Post OCR Correction202419th IEEE International Joint Symposium on Artificial Intelligence and Natural Language ProcessingMyanmar OCR with post-OCR correction.
- myContradict: Semi-supervised Contradictory Sentence Generation for Myanmar Language202419th IEEE International Joint Symposium on Artificial Intelligence and Natural Language ProcessingSemi-supervised contradictory sentence generation for Myanmar language.
- Neural Sequence Labeling Based Sentence Segmentation for Myanmar Language2023
- mySentence: Sentence Segmentation for Myanmar Language using Neural Machine Translation Approach2023Journal of Intelligent Informatics and Smart TechnologySentence segmentation for Myanmar language using NMT.
- KAConvText: Novel Approach to Burmese Sentence Classification using Kolmogorov-Arnold Convolution2025
Teaching
- Elementary System Programming (Rust)2024King Mongkut's Institute of Technology LadkrabangRole: Teaching AssistantSupported 30+ freshmen in systems programming using Rust, assisted weekly labs, and conducted pre-session workshops on Rust syntax, compiler tools, and Cargo workflows.