Alham Fikri Aji / Curriculum Vitae
alham.fikri@mbzuai.ac.ae
Education
- PhD, University of Edinburgh Nov 2016 - Jun 2020
Thesis: Approximating Neural Machine Translation for Efficiency.
Supervised by Kenneth Heafield and Rico Sennrich.
Examiner: Graham Neubig and Barry Haddow - MSc Artificial Intelligence, University of Edinburgh Sep 2014 - Aug 2015
With distinction. Final project: Haiku generator with word vector model. - BSc Computer Science, Universitas Indonesia Aug 2010 - Jul 2014
Final project: Earthquake detector from phone’s accelerometer reading.
Working Experience
- Adjunct Assistant Professor, Monash Indonesia Jan 2024 - Current
- Assistant Professor, MBZUAI Jan 2023 - Current
- Applied Scientist, Amazon Alexa AI Oct 2021 - Jan 2023
- Postdoctoral Research Associate, University of Edinburgh Jun 2020 - Jul 2021
- Research Scientist, Kata.ai Nov 2019 - Sep 2021
- Engineering Intern, Google Research Jul 2017 - Nov 2017
- Language Engineer, Apple Siri Oct 2015 - Oct 2016
Awards
- Best Resource Paper Award, EACL 2024
- Best Resource Paper Award, AACL 2023
- Outstanding Paper Award, EACL 2023
- Outstanding Contribution Award, WNGT 2019
- World Finalists, ACM-ICPC 2014
- Silver Medalists, International Olympiad of Informatics (IOI) 2010
Professional Activities
Services to Scientific Communities
- Reviewer and Program Committee Member
- Conferences: ARR, ACL, COLING, ICML, ICLR, NeurIPS, LREC
- Workshop: WNGT, TL4NLP
- Area Chair: ACL (2023), EMNLP (2023), COLM (2024)
- Local Chair: COLING (2025)
- Organizer: South-East Asia Language Processing (SEALP) 2023, Semeval shared task organizer (2024, 2025)
University Services
- MBZUAI HPC Committee, MBZUAI 2023
- MBZUAI PhD Qualifying Exam Committee, MBZUAI 2023
- MBZUAI Executive Education Program advisor, 2023
- MBZUAI PhD Candidacy Exam Committee: 4 students
- MBZUAI MSc Thesis Defence Committee: 7 students
- Problem Setter: OSN Indonesia (2013, 2014, 2015), ACM-ICPC (2014, 2015), APIO (2015), Gemastik (2016)
- Committee: Gemastik (2016), TOKI-Open (2018), IOI (2022)
- Training: Indonesia’s Pre-OSN Distance training (2009, 2010), Indonesia’s National Camp (2011, 2012, 2013), University of Edinburgh ACM-ICPC preparation (2014), Saudi Arabia National Team (2020)
Publications
I mainly publish at ACL conferences. You may also refer to my Google Scholar for an updated list of publications. My papers have accumulated over 4,000 citations and I have achieved an h-index of 20.
● denotes my role as (Co-)senior author(s), whereas ■ denotes my role as main author(s).
Peer-Reviewed Conferences
- Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages. Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Rifki Afina Putri, Emmanuel Dave, Jhonson Lee, Nuur Shadieq, Wawan Cenggoro, Salsabil Maulana Akbar, Muhammad Ihza Mahendra, Dea Annisayanti Putri, Bryan Wilie, Genta Indra Winata, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung (ACL, 2024)
- M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection. Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov (ACL, 2024)
- SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages. Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine De Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata, Seid Muhie Yimam, Saif M Mohammad (ACL, 2024)
- Copal-ID: Indonesian Language Reasoning with Local Culture and Nuances. Haryo Akbarianto Wibowo, Erland Hilman Fuadi, Made Nindyatama Nityasya, Radityo Eko Prasojo, Alham Fikri Aji (NAACL, 2024)
- M4: Multi-generator, Multi-domain, and Multi-lingual Black-box Machine-generated Text Detection. Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov (EACL, 2024) -- Best Resource Paper🏅
- A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models. Chenyang Lyu, Zefeng Du, Jitao Xu, Yitao Duan, Minghao Wu, Teresa Lynn, Alham Fikri Aji, Derek F Wong, Longyue Wang (LREC, 2024)
- Lamini-LM: A Diverse Herd of Distilled Models from Large-scale Instructions. Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji (EACL, 2024)
- LLM-powered Data Augmentation for Enhanced Crosslingual Performance. Chenxi Whitehouse, Monojit Choudhury, Alham Fikri Aji (EMNLP, 2023)
- Multilingual Large Language Models Are Not (Yet) Code-Switchers. Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Alham Fikri Aji (EMNLP, 2023)
- GlobalBench: A benchmark for global progress in natural language processing. Yueqi Song, Catherine Cui, Simran Khanuja, Pengfei Liu, Fahim Faisal, Alissa Ostapenko, Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Yulia Tsvetkov, Antonios Anastasopoulos, Graham Neubig (EMNLP, 2023)
- Nusawrites: Constructing high-quality corpora for underrepresented and extremely low-resource languages. Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung (AACL, 2023) -- Best Resource Paper🏅
- Crosslingual Generalization through Multitask Finetuning. Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff and Colin Raffel (ACL, 2023)
- On “Scientific Debt” in NLP: A Case for More Rigour in Language Model Pre-Training Research. Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Alham Fikri Aji, Genta Indra Winata, Radityo Eko Prasojo, Phil Blunsom and Adhiguna Kuncoro (ACL, 2023)
- WebIE: Faithful and Robust Information Extraction on the Web. Chenxi Whitehouse, Clara Vania, Alham Fikri Aji, Christos Christodoulopoulos and Andrea Pierleoni (ACL, 2023)
- BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. Zheng-Xin Yong, Hailey Schoelkopf, Niklas Muennighoff, Alham Fikri Aji, David Ifeoluwa Adelani, Khalid Almubarak, M Saiful Bari, Lintang Sutawika, Jungo Kasai, Ahmed Baruwa, Genta Indra Winata, Stella Biderman, Edward Raff, Dragomir Radev, Vassilina Nikoulina (ACL, 2023)
- The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges. Genta Indra Winata, Alham Fikri Aji, Zheng Xin Yong and Thamar Solorio (ACL, 2023)
- NusaCrowd: Open Source Initiative for Indonesian NLP Resources. Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Fajri Koto, Rahmad Mahendra, et al. (ACL, 2023)
- Direct Fact Retrieval from Knowledge Graphs without Entity Linking. Jinheon Baek, Alham Fikri Aji, Jens Lehmann and Sung Ju Hwang (ACL, 2023)
- Multi-lingual and Multi-cultural Figurative Language Understanding. Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo and Graham Neubig (ACL, 2023)
- NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder (EACL, 2023) -- Outstanding Award🏅
- Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering. Priyanka Sen, Alham Fikri Aji, Amir Saffari (COLING, 2022)
- REDTab: A Relation Extraction Dataset for Knowledge Extraction from Web Tables. Siffi Singh, Alham Fikri Aji, Gaurav Singh, Christos Christodoulopoulos (COLING, 2022)
- One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder (ACL, 2022)
- IndoNLI: A Natural Language Inference Dataset for Indonesian. Rahmad Mahendra, Alham Fikri Aji, Samuel Louvan, Fahrurrozi Rahman, Clara Vania (EMNLP, 2021)
- ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair. Alham Fikri Aji, Radityo Eko Prasojo, Tirana Noor Fatyanosa, Philip Arthur, Suci Fitriany, Salma Qonitah, Nadhifa Zulfa, Tomi Santoso, Mahendra Data (PACLIC, 2021)
- IndoCollex: A Testbed for Morphological Transformation of Indonesian Word Colloquialism. Haryo Akbarianto Wibowo, Made Nindyatama Nityasya, Afra Feyza Akyürek, Suci Fitriany, Alham Fikri Aji, Radityo Eko Prasojo, Derry Tanti Wijaya (ACL-IJCNLP, 2021)
- In Neural Machine Translation, What Does Transfer Learning Transfer?. Alham Fikri Aji, Nikolay Bogoychev, Kenneth Heafield, Rico Sennrich (ACL, 2020)
- Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation. Haryo Akbarianto Wibowo, Tatag Aziz Prawiro, Muhammad Ihsan, Alham Fikri Aji, Radityo Eko Prasojo, Rahmad Mahendra, Suci Fitriany (IALP, 2020)
- Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training. Alham Fikri Aji, Kenneth Heafield, Nikolay Bogoychev (EMNLP, 2019)
- Accelerating asynchronous stochastic gradient descent for neural machine translation. Nikolay Bogoychev, Marcin Junczys-Dowmunt, Kenneth Heafield, Alham Fikri Aji (EMNLP, 2018)
- Marian: Fast neural machine translation in C++. Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Grundkiewicz, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, Andre Martins, Alexandra Birch (ACL, 2018)
- Toward a standardized and more accurate Indonesian part-of-speech tagging. Kemal Kurniawan, Alham Fikri Aji (IALP, 2018)
- Sparse communication for distributed gradient descent. Alham Fikri Aji, Kenneth Heafield (EMNLP, 2017)
Peer-Reviewed Workshops
- Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages. Zheng Xin Yong, Ruochen Zhang, Jessica Forde, Skyler Wang, Arjun Subramonian, Holy Lovenia, Samuel Cahyawijaya, Genta Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Fikri Aji (CALCS at EMNLP, 2023)
- Low-Resource Clickbait Spoiling for Indonesian via Question Answering. Ni Putu Intan Maharani, Ayu Purwarianti, Alham Fikri Aji (SEALP at NAACL, 2023)
- Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering. Jinheon Baek, Alham Fikri Aji, Amir Saffari (NLRSE at ACL, 2023)
- Towards better structured and less noisy Web data: Oscar with Register annotations. Veronika Laippala, Anna Salmela, Samuel Rönnqvist, Alham Fikri Aji, Li-Hsin Chang, Asma Dhifallah, Larissa Goulart‡ Henna Kortelainen, Marc Pàmies, Deise Prina Dutra, Valtteri Skantsi, Lintang Sutawika, Sampo Pyysalo (W-NUT at COLING, 2022)
- Efficient Machine Translation with Model Pruning and Quantization. Maximiliana Behnke, Nikolay Bogoychev, Alham Fikri Aji, Kenneth Heafield, Graeme Nail, Qianqian Zhu, Svetlana Tchistiakova, Jelmer van der Linde, Pinzhen Chen, Sidharth Kashyap, Roman Grundkiewicz (WMT at EMNLP, 2021)
- The University of Edinburgh's Bengali-Hindi Submissions to the WMT21 News Translation Task. Proyag Pal, Alham Fikri Aji, Pinzhen Chen, Sukanta Sen (WMT at EMNLP, 2021)
- BERT Goes Brrr: A Venture Towards the Lesser Error in Classifying Medical Self-Reporters on Twitter. Alham Fikri Aji, Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Radityo Eko Prasojo, Tirana Fatyanosa (SMM4H at NAACL, 2021)
- Edinburgh's Submissions to the 2020 Machine Translation Efficiency Task. Nikolay Bogoychev, Roman Grundkiewicz, Alham Fikri Aji, Maximiliana Behnke, Kenneth Heafield, Sidharth Kashyap, Emmanouil-Ioannis Farsarakis, Mateusz Chudyk (WNGT at ACL, 2020)
- Compressing Neural Machine Translation Models with 4-bit Precision. Alham Fikri Aji, Kenneth Heafield (WNGT at ACL, 2020)
- Benchmarking Multidomain English-Indonesian Machine Translation. Tri Wahyu Guntara, Alham Fikri Aji, Radityo Eko Prasojo (BUCC, 2020)
- Making Asynchronous Stochastic Gradient Descent Work for Transformers. Alham Fikri Aji, Kenneth Heafield (WNGT at EMNLP, 2019) -- Outstanding Contribution 🏅
- From Research to Production and Back: Ludicrously Fast Neural Machine Translation. Young Jin Kim, Marcin Junczys-Dowmunt, Hany Hassan, Alham Fikri Aji, Kenneth Heafield, Roman Grundkiewicz, Nikolay Bogoychev (WNGT at EMNLP, 2019)
- Can smartphones be used to detect an earthquake? Using a machine learning approach to identify an earthquake event. Alham Fikri Aji, I Putu Edy Suardiyana Putra, Petrus Mursanto, Setiadi Yazid (SysCon, 2014)
Supervision and Mentorship
Current Students
Aside from MBZUAI, I co-supervise students from Indonesian universities, where I commit to meeting them weekly.
Past Students
- Jonibek Mansurov — MSc at MBZUAIJan 2023 - Jun 2024
Role: Main Advisor; with Preslav Nakov
Current position: PhD at MBZUAI
- Ni Putu Intan Maharani — MSc at Institut Teknologi BandungAug 2022 - Mar 2024
Role: Co Advisor; with Ayu Purwarianti
Current position: Delloite
- Jalaluddin Al-Mursyidy Fadhlurrahman — MSc at Institut Teknologi BandungAug 2022 - Mar 2024
Role: Co Advisor; with Ayu Purwarianti
- Muhammad Razif Rizqullah — MSc at Institut Teknologi BandungAug 2022 - Mar 2024
Role: Co Advisor; with Ayu Purwarianti
- Minghao Wu — Visiting PhD StudentApr 2023 - Oct 2023
Role: Research Advisor
Current position: PhD at Monash University
- Chenxi Whitehouse — Visiting PhD StudentApr 2023 - Oct 2023
Role: Research Advisor
Current position: Research Scientist at Amazon
- Muhammad Ravi Sulthan Habibi — BSc at Universitas IndonesiaAug 2022 - Aug 2023
Role: Co Advisor; with Rahmad Mahendra
Research Advisorship
- Chenyang Lyu — Postdoctoral Researcher at MBZUAIAug 2023 - Present
Role: Research Advisor
- Mohamed Fazli Mohamed Imam — Research Assistant at MBZUAIJul 2024 - Present
Role: Research Advisor
- Muhammad Farid Adilauzarda — Research Assistant at MBZUAIAug 2023 - Present
Role: Research Advisor
- Rendi Chevi — Research Assistant at MBZUAIAug 2023 - Present
Role: Research Advisor
- Jinheon Baek — Research Intern at Amazon AlexaOct 2022 - Jan 2023
Role: Research Advisor
Current position: PhD at KAIST
- Tirana Noor Fatyanosa — Research Intern at Kata.aiNov 2020 - Apr 2021
Role: Research Advisor
Current position: Assistant Professor at Universitas Brawijaya
Grants and Funding
- Microsoft Research: “Developing Robust Methodology and Datasets for Holistic Evaluation of Cultural Awareness and Bias in Foundation Models” (Co-PI)
Amount: 20,000 USD - Cohere For AI research grants: “SEACrowd: Consolidating South-east Asia NLP dataset” (Co-PI)
Amount: 3,000 USD - IBM: “Question Answering for Arabic Dialects”
Amount: Postdoctoral support of Chenyang Lyu of 100,000 USD
Teachings
- FIT5145: Intro to Data Science (for MSc) - Monash Indonesia Term 4 2024
Main instructor. Introduction to Python, data science, and AI. - NLP702: Advanced Natural Language Processing (for MSc) - MBZUAI Spring 2024
Co-instructor. Covered efficient and large-scale NLP, including LLM, distributed training, distillation, parameter-efficient fine-tuning, and linear Transformers. - NLP801: Deep Learning for Language Processing (for PhD) - MBZUAI Fall 2023
Main instructor. Designed and taught the module, covering various recent research topics and trends in NLP.
Talks
- Training Lightweight Model via Knowledge Distillation and Parameter Efficient Finetuning
Mexican NLP Summer School, Co-located with NAACL (14-15th June 2024) - Consolidating NLP Resources for South-East Asian Languages
Google Singapore, Invited Talk (27th May 2024) - Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
Google Singapore, Invited Talk (21th November 2023) - Building Multilingual & Multicultural LLMs: Methods and Challenges
AI Singapore, Invited Talk (20th November 2023) - Q2AI: A Quick Course to Quick AI
PRICAI, Tutorial (17th November 2023) - Current Status of NLP in South East Asia with Insights from Multilingualism and Language Diversity
AACL, Tutorial (1st November 2023) - Surviving your PhD Study
Telkom University, Invited Talk (2nd August 2023) - Generative AI with Large Language Models Workshop
Institut Teknologi Bandung, Invited Talk (1st August 2023) - Multilingual and Low-Resource NLP
Universitas Indonesia & Tokopedia AI Center, Invited Talk (25th May 2023) - Can AI Complete My Academic Writings?
Doctrine UK, Online Talk (14th May 2023) - Multilingual NLP through Collaborative Research
The 2nd Composable, Automatic and Scalable Learning Workshop (CASL), Invited Talk (23rd February 2023) - Sequence-to-Sequence and Neural Machine Translation Model
Universitas Indonesia, Guest Lecture (28th April 2021)