Alham Fikri Aji / Curriculum Vitae
I am an assistant professor at MBZUAI. I obtained my Ph.D. from the University of Edinburgh’s Institute for Language, Cognition, and Computation, where I focused on enhancing the training and inference speed of machine translation. My studies were supervised by Dr. Kenneth Heafield and Dr. Rico Sennrich. Presently, my research centers around multilingual, low-resource, and low-compute NLP. I have been developing various multilingual large language models, as well as building multilingual NLP resources for underrepresented languages, especially Indonesian. In the past, I have gained industry experience at companies such as Amazon, Google, and Apple.
- PhD, University of Edinburgh (2016 - 2020)
Thesis: Approximating Neural Machine Translation for Efficiency; making NMT trains faster using distributed and asynchronous training, sparse gradient, and transfer learning.
- MSc Artificial Intelligence, University of Edinburgh (2014 - 2015)
With distinction. Final project: Haiku generator with word vector model.
- BSc Computer Science, Universitas Indonesia (2010 - 2014)
Final project: Earthquake detector from phone’s accelerometer reading.
- Assistant Professor, MBZUAI (2023 - Current)
Teaching and supervising master’s and doctoral students, visiting researchers, and postdocs on a wide array of NLP research.
- Applied Scientist, Amazon Alexa AI (2021 - 2023)
Utilized knowledge graphs to create a truthful, multilingual question-answering system for Alexa.
- Postdoctoral Research Associate, University of Edinburgh (2020 - 2021)
Developed a fast, privacy-focused offline neural machine translation system by distilling large models.
- Research Scientist, Kata.ai (2019 - 2021)
Worked on and mentored junior researchers in various Indonesian NLP-related projects such as machine translation and formality style transfer.
- Engineering Intern, Google Research (2017)
Worked on integrating context from Google search to improve neural machine translation.
- Language Engineer, Apple Siri (2015 - 2016)
Designed Malay language rules and trained models to optimize Siri’s performance in Malay.
- Reviewer and Program Committee
- Conferences: ACL, COLING, ICML, ICLR, NeurIPS
- Workshop: WNGT, TL4NLP
- Area Chair: ACL (2023), EMNLP (2023)
- Organizer: South-East Asia Language Processing (SEALP) 2023
- Informatics Olympiad:
- Problem Setter: OSN Indonesia (2013, 2014, 2015). ACM-ICPC (2014, 2015), APIO (2015), Gemastik (2016)
- Committee: Gemastik (2016), TOKI-Open (2018), IOI (2022)
You may also refer to my Google Scholar for an updated list of publications
- Crosslingual Generalization through Multitask Finetuning. Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff and Colin Raffel (ACL, 2023)
- On “Scientific Debt” in NLP: A Case for More Rigour in Language Model Pre-Training Research. Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Alham Fikri Aji, Genta Indra Winata, Radityo Eko Prasojo, Phil Blunsom and Adhiguna Kuncoro (ACL, 2023)
- WebIE: Faithful and Robust Information Extraction on the Web. Chenxi Whitehouse, Clara Vania, Alham Fikri Aji, Christos Christodoulopoulos and Andrea Pierleoni (ACL, 2023)
- The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges. Genta Indra Winata, Alham Fikri Aji, Zheng Xin Yong and Thamar Solorio (ACL, 2023)
- NusaCrowd: Open Source Initiative for Indonesian NLP Resources. Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Fajri Koto, Rahmad Mahendra, et al. (ACL, 2023)
- Direct Fact Retrieval from Knowledge Graphs without Entity Linking. Jinheon Baek, Alham Fikri Aji, Jens Lehmann and Sung Ju Hwang (ACL, 2023)
- Multi-lingual and Multi-cultural Figurative Language Understanding. Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo and Graham Neubig (ACL, 2023)
- NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder (EACL, 2023) -- Outstanding Award🏅
- Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering. Priyanka Sen, Alham Fikri Aji, Amir Saffari (COLING, 2022)
- REDTab: A Relation Extraction Dataset for Knowledge Extraction from Web Tables. Siffi Singh, Alham Fikri Aji, Gaurav Singh, Christos Christodoulopoulos (COLING, 2022)
- One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder (ACL, 2022)
- IndoNLI: A Natural Language Inference Dataset for Indonesian. Rahmad Mahendra, Alham Fikri Aji, Samuel Louvan, Fahrurrozi Rahman, Clara Vania (EMNLP, 2021)
- ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair. Alham Fikri Aji, Radityo Eko Prasojo, Tirana Noor Fatyanosa, Philip Arthur, Suci Fitriany, Salma Qonitah, Nadhifa Zulfa, Tomi Santoso, Mahendra Data (PACLIC, 2021)
- IndoCollex: A Testbed for Morphological Transformation of Indonesian Word Colloquialism. Haryo Akbarianto Wibowo, Made Nindyatama Nityasya, Afra Feyza Akyürek, Suci Fitriany, Alham Fikri Aji, Radityo Eko Prasojo, Derry Tanti Wijaya (ACL-IJCNLP, 2021)
- In Neural Machine Translation, What Does Transfer Learning Transfer?. Alham Fikri Aji, Nikolay Bogoychev, Kenneth Heafield, Rico Sennrich (ACL, 2020)
- Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training. Alham Fikri Aji, Kenneth Heafield, Nikolay Bogoychev (EMNLP, 2019)
- Accelerating asynchronous stochastic gradient descent for neural machine translation. Nikolay Bogoychev, Marcin Junczys-Dowmunt, Kenneth Heafield, Alham Fikri Aji (EMNLP, 2018)
- Marian: Fast neural machine translation in C++. Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Grundkiewicz, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, Andre Martins, Alexandra Birch (ACL, 2018)
- Sparse communication for distributed gradient descent. Alham Fikri Aji, Kenneth Heafield (EMNLP, 2017)
- Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation. Haryo Akbarianto Wibowo, Tatag Aziz Prawiro, Muhammad Ihsan, Alham Fikri Aji, Radityo Eko Prasojo, Rahmad Mahendra, Suci Fitriany (IALP, 2020)
- Toward a standardized and more accurate Indonesian part-of-speech tagging. Kemal Kurniawan, Alham Fikri Aji (IALP, 2018)
- Can smartphones be used to detect an earthquake? Using a machine learning approach to identify an earthquake event. Alham Fikri Aji, I Putu Edy Suardiyana Putra, Petrus Mursanto, Setiadi Yazid (SysCon, 2014)
- Towards better structured and less noisy Web data: Oscar with Register annotations. Veronika Laippala, Anna Salmela, Samuel Rönnqvist, Alham Fikri Aji, Li-Hsin Chang, Asma Dhifallah, Larissa Goulart‡ Henna Kortelainen, Marc Pàmies, Deise Prina Dutra, Valtteri Skantsi, Lintang Sutawika, Sampo Pyysalo (W-NUT at COLING, 2022)
- Efficient Machine Translation with Model Pruning and Quantization. Maximiliana Behnke, Nikolay Bogoychev, Alham Fikri Aji, Kenneth Heafield, Graeme Nail, Qianqian Zhu, Svetlana Tchistiakova, Jelmer van der Linde, Pinzhen Chen, Sidharth Kashyap, Roman Grundkiewicz (WMT at EMNLP, 2021)
- The University of Edinburgh's Bengali-Hindi Submissions to the WMT21 News Translation Task. Proyag Pal, Alham Fikri Aji, Pinzhen Chen, Sukanta Sen (WMT at EMNLP, 2021)
- BERT Goes Brrr: A Venture Towards the Lesser Error in Classifying Medical Self-Reporters on Twitter. Alham Fikri Aji, Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Radityo Eko Prasojo, Tirana Fatyanosa (SMM4H at NAACL, 2021)
- Edinburgh's Submissions to the 2020 Machine Translation Efficiency Task. Nikolay Bogoychev, Roman Grundkiewicz, Alham Fikri Aji, Maximiliana Behnke, Kenneth Heafield, Sidharth Kashyap, Emmanouil-Ioannis Farsarakis, Mateusz Chudyk (WNGT at ACL, 2020)
- Compressing Neural Machine Translation Models with 4-bit Precision. Alham Fikri Aji, Kenneth Heafield (WNGT at ACL, 2020)
- Benchmarking Multidomain English-Indonesian Machine Translation. Tri Wahyu Guntara, Alham Fikri Aji, Radityo Eko Prasojo (BUCC, 2020)
- Making Asynchronous Stochastic Gradient Descent Work for Transformers. Alham Fikri Aji, Kenneth Heafield (WNGT at EMNLP, 2019) -- Outstanding Contribution 🏅
- From Research to Production and Back: Ludicrously Fast Neural Machine Translation. Young Jin Kim, Marcin Junczys-Dowmunt, Hany Hassan, Alham Fikri Aji, Kenneth Heafield, Roman Grundkiewicz, Nikolay Bogoychev (WNGT at EMNLP, 2019)
Teaching and Talks
- NLP801 Deep Learning for Language Processing (MBZUAI, 2023 Fall)
- Q2AI: A Quick Course to Quick AI
PRICAI, Tutorial (Upcoming, 17th November 2023)
- Current Status of NLP in South East Asia with Insights from Multilingualism and Language Diversity
AACL, Tutorial (Upcoming, 1st November 2023)
- Surviving your PhD Study
Telkom University, Invited Talk (2nd August 2023)
- Generative AI with Large Language Models Workshop
Institut Teknologi Bandung, Invited Talk (1st August 2023)
- Multilingual and Low-Resource NLP
Universitas Indonesia & Tokopedia AI Center, Invited Talk (25th May 2023)
- Can AI Complete My Academic Writings?
Doctrine UK, Online Talk (14th May 2023)
- Multilingual NLP through Collaborative Research
The 2nd Composable, Automatic and Scalable Learning Workshop (CASL), Invited Talk (23rd February 2023)
- Sequence-to-Sequence and Neural Machine Translation Model
Universitas Indonesia, Guest Lecture (28th April 2021)
- Q2AI: A Quick Course to Quick AI
- Outstanding Paper Award, EACL 2023
- Outstanding Contribution Award, WNGT 2019
- World Finalists, ACM-ICPC 2014
- Silver Medalists, IOI 2010