Research Statement

NLP technology has progressed significantly over the years. Yet, the focus is still heavily English-centric, leaving many languages behind. My research interest focuses on working on NLP for underrepresented languages. However, the English-centric nature of AI research is not just about progress in terms of models or data. AI resources are also heavily distributed to a limited number of communities, leaving compute resources scarce for many NLP communities. With trends moving towards large language models, it is even more prohibitive for many communities to participate in NLP research and deployment.

My research goal can be summarized as “making NLP technology inclusive and accessible”. Realizing this vision requires a multifaceted approach: we must not only expand model capabilities to support diverse languages and cultural contexts, but also democratize access by lowering computational barriers and refine the interaction dynamics to suit global users. To address these interconnected challenges, I pursue the following research directions:

  • Multilingual and Cultural NLP
  • Lightweight NLP system
  • Efficient Training of NLP system
  • Multimodal-Multicultural NLP
  • Human-Computer Interaction of NLP Systems

I primarily publish in *CL venues, maintaining an h-index of 35 and close to 8,000 citations according to Semantic Scholar. I have received 5 paper awards at these conferences and was recently honored with the 2025 MBZUAI Early Career Researcher Award, which recognizes assistant professors with exceptional research promise.

Multilingual and Cultural NLP

Most of my recent work focuses on multilingual and culturally grounded NLP, covering various topics from resource building to interpretability. Most of my research awards also fall into this area of work.

Multilingual NLP Resources and Benchmarks A persistent challenge in multilingual NLP is the scarcity of high‑quality datasets for both training and nuanced evaluation. My current primary area of depth lies in multilingual and culturally grounded data construction. I address the scarcity of high-quality data not merely by collection, but by developing robust, high-quality methodologies for resource building and evaluation. This involves designing protocols for human annotation, quality control, and cultural relevance. Some resources that I worked on are highlighted below.

Overview of contributed multilingual datasets and resources.
Resource Description & Scope Ref.
IndoNLI Natural Language Inference (NLI) for Indonesian. (Mahendra et al., 2021)
Mintaka Complex Question Answering across 9 languages. (Sen et al., 2022)
NusaX Sentiment analysis and MT covering 10 Indonesian local languages. (Winata et al., 2023)
NusaWrites Generation benchmarks for 12 Indonesian languages. (Cahyawijaya et al., 2023)
SEACrowd Multilingual multimodal data hub and benchmark suite for Southeast Asian languages. (Lovenia et al., 2024)
SemRel Semantic relatedness of Asian and African languages.
Powered SemEval-2024 Task 1. (163 participants).
(Ousidhoum et al., 2024)
M4 Multilingual machine-generated text detection.
Powered SemEval-2024 Task 8. (285 participants).
(Wang et al., 2024)
CVQA Culturally diverse multilingual Visual Question Answering of 39 language-country pairs. (Romero et al., 2024)
COPAL-ID Culturally specific causal reasoning for Indonesian. (Wibowo et al., 2024)
Stingray Multilingual word-sense disambiguation benchmark. (Cahyawijaya et al., 2025)
BRIGHTER Emotion classification for low-resource languages.
Powered SemEval-2025 Task 11 (800 participants).
(Muhammad et al., 2025)
WangchanThai Instruction-following dataset for Thai culture and domains. (Limkonchotiwat et al., 2025)
NusaAksara OCR and translation benchmark for Indonesian languages in local scripts. (Adilazuarda et al., 2025)
LoraxBench Multitask benchmark for 20 low-resource Indonesian languages. (Aji and Cohn, 2025)

Culturally‑Nuanced NLP Beyond language coverage, my work examines cultural representation and evaluation, since covering languages themselves might not be enough. This is an issue with translated benchmarks, in which you get questions that are not relevant to the local context, even if the language is translated. Several of our benchmarks mentioned earlier cover not only language aspects but also cultural understanding.

In Adilazuarda et al. (2024), we survey the research on culture in large language models and find that most studies use narrow proxies such as demographics or semantics without defining culture itself. We propose a taxonomy of these approaches and highlight gaps in contextual and robust evaluations of cultural representation.

In Adilazuarda et al. (2025), we explore how to adapt large language models to better reflect diverse cultural values by moving beyond survey-based data such as the World Values Survey (WVS). Our results show that this mixed-source approach produces more culturally distinct and balanced models than relied on survey data alone.

Indonesian NLP Having grown up with Indonesian languages and cultures, some of my work deeply involves Indonesian languages. As we reported in (Aji et al., 2022), Indonesia is one of the most culturally and linguistically diverse countries, with over 700 languages spoken and more than 200M population. Yet, NLP research for Indonesian languages is underrepresented. We present challenges and opportunities for Indonesian NLP. This work is widely cited as a reference in Indonesian NLP studies.

In another work in (Adilazuarda et al., 2025), we studied current issues with regard to models that are not capable of dealing with Indonesian native scripts, while releasing a benchmark. Similarly, in (Farhansyah et al., 2025), we studied various Javanese honorific systems in several models, showing that many models face challenges.

In a collaborative effort with the Indonesian NLP community, we built NusaCrowd, a resource catalog that standardizes NLP resources for Indonesian languages (Cahyawijaya et al., 2023). I was part of the core team that initiated and designed the project from the very beginning. NusaCrowd gained more than 270 stars on GitHub. A follow-up for South-East Asian languages, SEACrowd (Lovenia et al., 2024), was also released; I similarly served as a core initiator for this expansion, which served as the embryo of the SEA-NLP community of the same name, in which I am now a member of the advisory board.

Code Switching and Code Mixing Code-Switching (CS) or Code-Mixing (CM) is a phenomenon commonly observed in multilingual cultures, making it inline to my research direction. In Winata et al. (2023), we provide a systematic survey of code-switching research in NLP, tracing its evolution from linguistic theories to modern machine learning. We analyze decades of progress to highlight key trends, challenges, and future directions.

Shortly after the release of ChatGPT, we noted in Yong et al. (2023) that it struggled to generate and understand CS/CM, although current models have improved significantly. This work led to an interview for a Wired article on the multilingual limitations of early ChatGPT, specifically in South-East Asia. In Cahyawijaya et al. (2025), we find that multilingual LLMs consistently fail to distinguish the meanings of false friends across languages, revealing major gaps in cross-lingual sense understanding.

Multilingual LLMs In parallel with dataset and benchmark building, I collaborate on building multilingual models, including mT0 and BLOOMZ (Muennighoff et al., 2023), the Arabic‑centric Jais (Sengupta et al., 2023), and the Indonesian LLM Cendol (Cahyawijaya et al., 2024). Particularly, BLOOMZ attracted significant traction and gained decent citations and downloads, with more than 1M downloads of all time, and it is still widely downloaded now. Since joining Google as a visiting researcher, I have also worked on the multilinguality aspect of Gemini.

Interpretability and Understanding of Multilingual Models Recently, I explored the intersection of multilinguality and interpretability. In (Rahmanisa et al., 2025), we find that amplifying language-specific neurons in multilingual models boosts performance in their respective languages, particularly low-resource ones, but often harms cross-lingual generalization. Separately, in (Andrylie et al., 2025) we use sparse auto encoders to identify interpretable neurons associated with particular languages, showing that multilingual models encode clear language-specific representations within their internal layers.

Lightweight NLP Systems

Fast Machine Translation System During my PhD, I worked on fast machine translation systems. In (Aji and Heafield, 2020), we explored quantization techniques for neural machine translation and achieved 4-bit precision using a log-based quantization approach. Building on that, I collaborated with others in a shared task on efficient machine translation. By combining quantization, knowledge distillation, and model pruning, we achieved the best overall performance (Bogoychev et al., 2020). Although I no longer work exclusively on machine translation, my current research on lightweight models continues in the same direction.

Lightweight Models via Distillation During the early days of GPT, we distilled ChatGPT into several smaller-sized models smaller than 1B parameters with, back then, reasonable performance in our Lamini-LM project (Wu et al., 2024). In this project, we release several Lamini models, which are still one of the most downloaded models in MBZUAI’s HuggingFace repo and gained more than 800 GitHub stars. Some of the lightweight model efforts focus on multilingual capabilities. For example, Bactrian-X is a distilled multilingual model that covers 52 languages (Li et al., 2023). We have also attempted to distill a large multilingual encoder model for low-resource languages (Cruz, 2025).

Sink-Free Attention Transformers In our ongoing work (Zuhri et al., 2025), we proposed a new softmax replacement named SoftPick, whose objective is to remove the attention sink. We managed to remove the attention sink, thus making the attention sparse. With this, we show that the model can be better quantized, hence improving the efficiency.

Efficient Inference Memory via KV Sharing In (missing reference), we address the memory bottlenecks of large-scale inference by introducing Multi-Layer Key-Value sharing. By extending key-value sharing across the depth dimension—rather than just within attention heads—we reduced the KV cache footprint. This modification significantly lowers the memory barrier for deploying large models.

Knowledge Distillation Study In (Aji et al., 2020), we show that in neural machine translation transfer learning, copying the inner layers of a model is essential for quality gains. Our recent work in (Wibowo et al., 2025) similarly investigates model copying in knowledge distillation in multilingual settings. We also study the potential harm of knowledge distillation. In (Mansurov et al., 2025), we find that leaked data (such as test data) can also be accidentally leaked by knowledge distillation. At the moment, we are investigating leakage of PIIs or poisoned data via distillation.

Efficient Training of NLP Systems

Not only inference, but lack of data could also be the issue of inaccessible NLP systems. Some of my work explores faster or better learning.

Effective Language Extension of NLP Models With the lack of training data for many languages, we investigate various methods to address this. In (Adilazuarda et al., 2024), we introduce a novel typological alignment objective that bridges the gap between neural representations and discrete linguistics. Our method leverages URIEL vectors to explicitly supervise the model’s language embedding space as an additional training loss. This innovation allows the model to synthesize representations for unseen languages by interpolating their linguistic features, effectively enabling zero-shot generalization. This method significantly improves the performance of some unseen languages, such as Amharic.

In (Elshabrawy et al., 2025), we enable zero-shot language and task generalization for encoder models by training the model with true/false statements across languages, enabling ‘prompting’ for encoder models. This project was part of MBZUAI’s 2024 internship program with talented undergraduate interns visiting for a month. Our project was awarded the best team award among other MBZUAI internship projects.

Multi-Token Learning An ongoing work in (Zuhri et al., 2025), we proposed a novel learning objective to learn from multiple tokens at once, with the aim of better training the model. Specifically, we instruct the model to predict token ordering. Manifold Labs has supported this work through the support of computational resources, approximately $50k in value.

Multimodal-Multicultural NLP

Most of my work has focused purely on text. With the advancement of AI technology going beyond text, multimodality is the next direction that fits my overarching goal that I recently explored.

Multimodal-Multicultural Datasets and Benchmarks I have been working on data set construction for a while; hence, multimodal datasets were a natural extension. CVQA is one of the largest human-made multimodal multilingual datasets. I served as the main lead and organizer of this initiative, conceptualizing the project and spearheading a massive collaboration of over 70 authors to construct culturally relevant visual question answering for more than 30 language and country pairs.

n (Winata et al., 2025), where I served as a senior advisor working closely with the core team, we gather images of food and cuisine from around the world and annotate them. A recent follow-up work on that in (Irawan et al., 2025), in which we perform adversarial image editing by replacing the background with landmarks of different countries, or by adding flags of different countries, noted that VLMs are easily distracted.

Multimodal-Multicultural Models In an ongoing project with the SEACrowd community, we are building SeaVL, a multimodal language model for Southeast Asian people. We started with the data set (Cahyawijaya et al., 2025) and now work on the model. Another ongoing project focuses on building multilingual, multimodal reward models.

Human-Computer Interaction of NLP Systems

I have recently initiated a new line of research into Human-AI interaction. This is particularly important for inclusive technology, as different demographics and cultural backgrounds significantly influence how users perceive and expect AI to behave.

Bias in Human Preferences In Wu and Aji (2025), we explored typical human-preference evaluations used in standard leaderboards. We noted that humans exhibit a bias towards output length and grammatical correctness to such a degree that they often prefer hallucinated outputs, provided they are long and grammatically polished. In our follow-up work (Chevi et al., 2025), we found that this preference correlates with the user’s personality traits. Specifically, users with different personality profiles prioritize distinct aspects of model responses, suggesting that a single universal reward model is insufficient to capture the diversity of human preferences. To extend this line of exploration into practical applications, we have been awarded a $450k grant by Etihad for persuasive LLMs, where the relationship between user demographics and the susceptibility to different persuasive strategies is relevant.

Future Research Agenda

My long-term goal remains to democratize NLP technology. Having established strong foundations in data-centric NLP and model efficiency, my next 5-year phase focuses on converging these streams into a unified framework of my vision for accessible and inclusive AI technology.

From Static Resources to Dynamic Simulation. While my previous work established static benchmarks, the future of evaluation lies in dynamic environments. I aim to transition from fixed datasets to interactive simulations and games. By utilizing scenarios where models engage in culturally-grounded games, role-play, or debates, we can create self-evolving benchmarks that resist contamination. Furthermore, this simulation-based approach will serve as a data synthesis engine, generating high-quality training signals for underrepresented languages where natural data is scarce.

Operationalizing Efficient Multimodality. The move toward multimodal models comes with a significant increase in cost. These models are significantly more resource-intensive than text-only baselines, making them prohibitive for many communities to utilize. Furthermore, they are extremely data-hungry, exacerbating the challenge for low-resource cultures where paired visual-linguistic data is exceptionally scarce. To bridge this gap, I will connect my research on efficiency, both training and deployment efficiency for multimodality. My goal is to develop methods that maximize learning from scarce signals while reducing the computational burden, ensuring that systems capable of capturing complex, cultural visual nuances remain accessible to train and deploy on consumer-grade hardware.

Deepening Cross-Cultural Human-Computer Interaction. Moving beyond preliminary preference analysis, I plan to establish a rigorous HCI research agenda focused on the global user experience. Rather than purely optimizing model parameters, I aim to conduct empirical studies on how cultural backgrounds and diverse demographics shape mental models, trust, and interaction patterns with AI systems. By investigating these dynamics through a user-centric lens, I seek to uncover how distinct communities perceive and utilize AI, providing the foundational insights needed to design interfaces and workflows that are truly intuitive and inclusive for a global population.

Reference

  1. Aji, Alham Fikri and Bogoychev, Nikolay and Heafield, Kenneth and Sennrich, Rico. 2020. In Neural Machine Translation, What Does Transfer Learning Transfer?. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics .
  2. Aji, Alham Fikri and Heafield, Kenneth. 2020. Compressing Neural Machine Translation Models with 4-bit Precision. Proceedings of the Fourth Workshop on Neural Generation and Translation .
  3. Bogoychev, Nikolay and Grundkiewicz, Roman and Aji, Alham Fikri and Behnke, Maximiliana and Heafield, Kenneth and Kashyap, Sidharth and Farsarakis, Emmanouil-Ioannis and Chudyk, Mateusz. 2020. Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task. Proceedings of the Fourth Workshop on Neural Generation and Translation .
  4. Mahendra, Rahmad and Aji, Alham Fikri and Louvan, Samuel and Rahman, Fahrurrozi and Vania, Clara. 2021. IndoNLI: A Natural Language Inference Dataset for Indonesian. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing .
  5. Aji, Alham Fikri and Winata, Genta Indra and Koto, Fajri and Cahyawijaya, Samuel and Romadhony, Ade and Mahendra, Rahmad and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Baldwin, Timothy and Lau, Jey Han and Ruder, Sebastian. 2022. One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) .
  6. Sen, Priyanka and Aji, Alham Fikri and Saffari, Amir. 2022. Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering. Proceedings of the 29th International Conference on Computational Linguistics .
  7. Cahyawijaya, Samuel and Lovenia, Holy and Aji, Alham Fikri and Winata, Genta Indra and Wilie, Bryan and Mahendra, Rahmad and Wibisono, Christian and Romadhony, Ade and Vincentio, Karissa and Koto, Fajri and Santoso, Jennifer and Moeljadi, David and Wirawan, Cahya and Hudi, Frederikus and Parmonangan, Ivan Halim and Alfina, Ika and Wicaksono, Muhammad Satrio and Putra, Ilham Firdausi and others. 2023. NusaCrowd: Open Source Initiative for Indonesian NLP Resources. Findings of the Association for Computational Linguistics: ACL 2023 .
  8. Cahyawijaya, Samuel and Lovenia, Holy and Koto, Fajri and Adhista, Dea and Dave, Emmanuel and Oktavianti, Sarah and Akbar, Salsabil and Lee, Jhonson and Shadieq, Nuur and Cenggoro, Tjeng Wawan and Linuwih, Hanung Wahyuning and Wilie, Bryan and Muridan, Galih Pradipta and Winata, Genta Indra and Moeljadi, David and Aji, Alham Fikri and Purwarianti, Ayu and Fung, Pascale. 2023. NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) .
  9. Li, Haonan and Koto, Fajri and Wu, Minghao and Aji, Alham Fikri and Baldwin, Timothy. 2023. Bactrian-x: Multilingual replicable instruction-following models with low-rank adaptation. arXiv preprint arXiv:2305.15011 .
  10. Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Scao, Teven Le and Bari, M Saiful and Shen, Sheng and Yong, Zheng Xin and Schoelkopf, Hailey and Tang, Xiangru and Radev, Dragomir and Aji, Alham Fikri and Almubarak, Khalid and Albanie, Samuel and Alyafeai, Zaid and Webson, Albert and Raff, Edward and Raffel, Colin. 2023. Crosslingual Generalization through Multitask Finetuning. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) .
  11. Sengupta, Neha and Sahu, Sunil Kumar and Jia, Bokang and Katipomu, Satheesh and Li, Haonan and Koto, Fajri and Afzal, Osama Mohammed and Kamboj, Samta and Pandit, Onkar and Pal, Rahul and Pradhan, Lalit and Mujahid, Zain Muhammad and Baali, Massa and Aji, Alham Fikri and Liu, Zhengzhong and Hock, Andy and Feldman, Andrew and Lee, Jonathan and Jackson, Andrew and Nakov, Preslav and Baldwin, Timothy and Xing, Eric. 2023. Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models. Preprint .
  12. Wibowo, Haryo Akbarianto and Aji, Alham Fikri and Wijaya, Derry Tanti. 2023. Do Language Models Understand Honorific Systems in Javanese?. Preprint .
  13. Winata, Genta and Aji, Alham Fikri and Yong, Zheng Xin and Solorio, Thamar. 2023. The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges. Findings of the Association for Computational Linguistics: ACL 2023 .
  14. Winata, Genta Indra and Aji, Alham Fikri and Cahyawijaya, Samuel and Mahendra, Rahmad and Koto, Fajri and Romadhony, Ade and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Fung, Pascale and Baldwin, Timothy and Lau, Jey Han and Sennrich, Rico and Ruder, Sebastian. 2023. NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics .
  15. Yong, Zheng Xin and Zhang, Ruochen and Forde, Jessica and Wang, Skyler and Subramonian, Arjun and Lovenia, Holy and Cahyawijaya, Samuel and Winata, Genta and Sutawika, Lintang and Cruz, Jan Christian Blaise and Tan, Yin Lin and Phan, Long and Phan, Long and Garcia, Rowena and Solorio, Thamar and Aji, Alham Fikri. 2023. Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages. Proceedings of the 6th Workshop on Computational Approaches to Linguistic Code-Switching .
  16. Adilazuarda, Muhammad Farid and Cahyawijaya, Samuel and Winata, Genta Indra and Purwarianti, Ayu and Aji, Alham Fikri. 2024. LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization. Findings of the Association for Computational Linguistics: EMNLP 2024 .
  17. Adilazuarda, Muhammad Farid and Mukherjee, Sagnik and Lavania, Pradhyumna and Singh, Siddhant Shivdutt and Aji, Alham Fikri and O’Neill, Jacki and Modi, Ashutosh and Choudhury, Monojit. 2024. Towards Measuring and Modeling “Culture” in LLMs: A Survey. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing .
  18. Cahyawijaya, Samuel and Lovenia, Holy and Koto, Fajri and Putri, Rifki Afina and Dave, Emmanuel and Lee, Jhonson and Shadieq, Nuur and Cenggoro, Wawan and Akbar, Salsabil Maulana and Mahendra, Muhammad Ihza and Putri, Dea Annisayanti and Wilie, Bryan and Winata, Genta Indra and Aji, Alham Fikri and Purwarianti, Ayu and Fung, Pascale. 2024. Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) .
  19. Lovenia, Holy and Mahendra, Rahmad and Cahyawijaya, Samuel and Winata, Genta Indra and Aji, Alham Fikri and others. 2024. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing .
  20. Ousidhoum, Nedjma and Muhammad, Shamsuddeen and Abdalla, Mohamed and Abdulmumin, Idris and Ahmad, Ibrahim and Ahuja, Sanchit and Aji, Alham and Araujo, Vladimir and Ayele, Abinew and Baswani, Pavan and Beloucif, Meriem and Biemann, Chris and Bourhim, Sofia and Kock, Christine and Dekebo, Genet and Hourrane, Oumaima and Kanumolu, Gopichand and Madasu, Lokesh and Rutunda, Samuel and Shrivastava, Manish and Solorio, Thamar and Surange, Nirmal and Tilaye, Hailegnaw and Vishnubhotla, Krishnapriya and Winata, Genta and Yimam, Seid and Mohammad, Saif. 2024. SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages. Findings of the Association for Computational Linguistics: ACL 2024 .
  21. Romero, David and Lyu, Chenyang and Wibowo, Haryo Akbarianto and ... and Solorio, Thamar and Aji, Alham Fikri. 2024. CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) .
  22. Wang, Yuxia and Aji, Alham Fikri and Shelmanov, Artem and Whitehouse, Chenxi and Ivanov, Petar and Mansurov, Jonibek and Su, Jinyan and Mahmoud, Tarek and Afzal, Osama Mohammed and Tsvigun, Akim and Sasaki, Toru and Arnold, Thomas and Habash, Nizar and Gurevych, Iryna and Nakov, Preslav. 2024. SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024) .
  23. Wang, Yuxia and Mansurov, Jonibek and Ivanov, Petar and Su, Jinyan and Shelmanov, Artem and Tsvigun, Akim and Whitehouse, Chenxi and Mohammed Afzal, Osama and Mahmoud, Tarek and Sasaki, Toru and Arnold, Thomas and Aji, Alham Fikri and Habash, Nizar and Gurevych, Iryna and Nakov, Preslav. 2024. M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) .
  24. Wibowo, Haryo and Fuadi, Erland and Nityasya, Made and Prasojo, Radityo Eko and Aji, Alham. 2024. COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) .
  25. Wu, Minghao and Waheed, Abdul and Zhang, Chiyu and Abdul-Mageed, Muhammad and Aji, Alham Fikri. 2024. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) .
  26. Adilazuarda, Farid and Liu, Chen Cecilia and Gurevych, Iryna and Aji, Alham Fikri. 2025. From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing .
  27. Adilazuarda, Muhammad Farid and Wijanarko, Musa Izzanardi and Susanto, Lucky and Nur’aini, Khumaisa and Wijaya, Derry Tanti and Aji, Alham Fikri. 2025. NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) .
  28. Aji, Alham Fikri and Cohn, Trevor. 2025. LORAXBENCH: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing .
  29. Andrylie, Lyzander Marciano and Rahmanisa, Inaya and Ihsani, Mahardika Krisna and Wicaksono, Alfan Farizki and Wibowo, Haryo Akbarianto and Aji, Alham Fikri. 2025. Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages. Preprint .
  30. Cahyawijaya, Samuel and Lovenia, Holy and Moniz, Joel Ruben Antony and Wong, Tack Hwa and Farhansyah, Mohammad Rifqi and Maung, Thant Thiri and Hudi, Frederikus and Anugraha, David and Habibi, Muhammad Ravi Shulthan and Qorib, Muhammad Reza and Agarwal, Amit and Imperial, Joseph Marvin and Patel, Hitesh Laxmichand and Feliren, Vicky and Nasution, Bahrul Ilmi and Rufino, Manuel Antonio and Winata, Genta Indra and Rajagede, Rian Adam and Catalan, Carlos Rafael and Imam, Mohamed Fazli Mohamed and Pattnayak, Priyaranjan and Pranida, Salsabila Zahirah and Pratama, Kevin and Bangera, Yeshil and Na-Thalang, Adisai and Monderin, Patricia Nicole and Song, Yueqi and Simon, Christian and Ng, Lynnette Hui Xian and Sapan, Richardy Lobo and Rafi, Taki Hasan and Wang, Bin and Supryadi and Veerakanjana, Kanyakorn and Ittichaiwong, Piyalitt and Roque, Matthew Theodore and Vincentio, Karissa and Kreangphet, Takdanai and Artkaew, Phakphum and Palgunadi, Kadek Hendrawan and Yu, Yanzhi and Hastuti, Rochana Prih and Nixon, William and Bangera, Mithil and Lim, Adrian Xuan Wei and Khine, Aye Hninn and Zhafran, Hanif Muhammad and Ferdinan, Teddy and Izzani, Audra Aurora and Singh, Ayushman and Evan, Evan and Krito, Jauza Akbar and Anugraha, Michael and Ilasariya, Fenal Ashokbhai and Li, Haochen and Daniswara, John Amadeo and Tjiaranata, Filbert Aurelian and Yulianrifat, Eryawan Presma and Udomcharoenchaikit, Can and Ansori, Fadil Risdian and Ihsani, Mahardika Krisna and Nguyen, Giang and Barik, Anab Maulana and Velasco, Dan John and Genadi, Rifo Ahmad and Saha, Saptarshi and Wei, Chengwei and Flores, Isaiah Edri W. and Han, Kenneth Chen Ko and Santos, Anjela Gail D. and Lim, Wan Shen and Phyo, Kaung Si and Santos, Tim and Dwiastuti, Meisyarah and Luo, Jiayun and Cruz, Jan Christian Blaise and Hee, Ming Shan and Hanif, Ikhlasul Akmal and Hakim, M.Alif Al and Sya’ban, Muhammad Rizky and Kerdthaisong, Kun and Miranda, Lester James Validad and Koto, Fajri and Fatyanosa, Tirana Noor and Aji, Alham Fikri and Rosal, Jostin Jerico and Kevin, Jun and Wijaya, Robert and Kampman, Onno P. and Zhang, Ruochen and Karlsson, Börje F. and Limkonchotiwat, Peerat. 2025. Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) .
  31. Cahyawijaya, Samuel and Zhang, Ruochen and Cruz, Jan Christian Blaise and Lovenia, Holy and Gilbert, Elisa and Nomoto, Hiroki and Aji, Alham Fikri. 2025. Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Senses. Findings of the Association for Computational Linguistics: NAACL 2025 .
  32. Chevi, Rendi and Inui, Kentaro and Solorio, Thamar and Aji, Alham Fikri. 2025. How Individual Traits and Language Styles Shape Preferences In Open-ended User-LLM Interaction: A Preliminary Study. Preprint .
  33. Cruz, Jan Christian Blaise. 2025. Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation. Proceedings of the First Workshop on Language Models for Low-Resource Languages .
  34. Elshabrawy, Ahmed and Nguyen, Thanh-Nhi and Kang, Yeeun and Feng, Lihan and Jain, Annant and Shaikh, Faadil Abdullah and Mansurov, Jonibek and Imam, Mohamed Fazli Mohamed and Ortiz-Barajas, Jesus-German and Chevi, Rendi and Aji, Alham Fikri. 2025. Statement-Tuning Enables Efficient Cross-lingual Generalization in Encoder-only Models. Findings of the Association for Computational Linguistics: ACL 2025 .
  35. Farhansyah, Mohammad Rifqi and Darmawan, Iwan and Kusumawardhana, Adryan and Winata, Genta Indra and Aji, Alham Fikri and Wijaya, Derry Tanti. 2025. Do Language Models Understand Honorific Systems in Javanese?. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) .
  36. Irawan, Patrick Amadeus and Hanif, Ikhlasul Akmal and Kautsar, Muhammad Dehan Al and Winata, Genta Indra and Koto, Fajri and Aji, Alham Fikri. 2025. Vision Language Models are Confused Tourists. Preprint .
  37. Limkonchotiwat, Peerat and Tuchinda, Pume and Lowphansirikul, Lalita and Nonesung, Surapon and Tasawong, Panuthep and Aji, Alham Fikri and Udomcharoenchaikit, Can and Nutanong, Sarana. 2025. WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing .
  38. Mansurov, Jonibek and Sakip, Akhmed and Aji, Alham Fikri. 2025. Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) .
  39. Muhammad, Shamsuddeen Hassan and Ousidhoum, Nedjma and Abdulmumin, Idris and Wahle, Jan Philip and Ruas, Terry and Beloucif, Meriem and de Kock, Christine and Surange, Nirmal and Teodorescu, Daniela and Ahmad, Ibrahim Said and Adelani, David Ifeoluwa and Aji, Alham Fikri and Ali, Felermino D. M. A. and Alimova, Ilseyar and Araujo, Vladimir and Babakov, Nikolay and Baes, Naomi and Bucur, Ana-Maria and Bukula, Andiswa and Cao, Guanqun and Tufiño, Rodrigo and Chevi, Rendi and Chukwuneke, Chiamaka Ijeoma and Ciobotaru, Alexandra and Dementieva, Daryna and Gadanya, Murja Sani and Geislinger, Robert and Gipp, Bela and Hourrane, Oumaima and Ignat, Oana and Lawan, Falalu Ibrahim and Mabuya, Rooweither and Mahendra, Rahmad and Marivate, Vukosi and Panchenko, Alexander and Piper, Andrew and Ferreira, Charles Henrique Porto and Protasov, Vitaly and Rutunda, Samuel and Shrivastava, Manish and Udrea, Aura Cristina and Wanzare, Lilian Diana Awuor and Wu, Sophie and Wunderlich, Florian Valentin and Zhafran, Hanif Muhammad and Zhang, Tianhui and Zhou, Yi and Mohammad, Saif M.. 2025. BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) .
  40. Muhammad, Shamsuddeen Hassan and Ousidhoum, Nedjma and Abdulmumin, Idris and Yimam, Seid Muhie and Wahle, Jan Philip and Lima Ruas, Terry and Beloucif, Meriem and De Kock, Christine and Belay, Tadesse Destaw and Ahmad, Ibrahim Said and Surange, Nirmal and Teodorescu, Daniela and Adelani, David Ifeoluwa and Aji, Alham Fikri and Ali, Felermino Dario Mario and Araujo, Vladimir and Ayele, Abinew Ali and Ignat, Oana and Panchenko, Alexander and Zhou, Yi and Mohammad, Saif. 2025. SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection. Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025) .
  41. Rahmanisa, Inaya and Andrylie, Lyzander Marciano and Ihsani, Mahardika Krisna and Wicaksono, Alfan Farizki and Wibowo, Haryo Akbarianto and Aji, Alham Fikri. 2025. Unveiling the Influence of Amplifying Language-Specific Neurons. Preprint .
  42. Wibowo, Haryo Akbarianto and Song, Haiyue and Tanaka, Hideki and Utiyama, Masao and Aji, Alham Fikri and Dabre, Raj. 2025. IteRABRe: Iterative Recovery-Aided Block Reduction. Preprint .
  43. Winata, Genta Indra and Hudi, Frederikus and Irawan, Patrick Amadeus and Anugraha, David and Putri, Rifki Afina and Wang, Yutong and Nohejl, Adam and Prathama, Ubaidillah Ariq and Ousidhoum, Nedjma and Amriani, Afifa and Rzayev, Anar and Das, Anirban and Pramodya, Ashmari and Adila, Aulia and Wilie, Bryan and Mawalim, Candy Olivia and Cheng, Ching Lam and Abolade, Daud and Chersoni, Emmanuele and Santus, Enrico and Ikhwantri, Fariz and Kuwanto, Garry and Zhao, Hanyang and Wibowo, Haryo Akbarianto and Lovenia, Holy and Cruz, Jan Christian Blaise and Putra, Jan Wira Gotama and Myung, Junho and Susanto, Lucky and Machin, Maria Angelica Riera and Zhukova, Marina and Anugraha, Michael and Adilazuarda, Muhammad Farid and Santosa, Natasha and Limkonchotiwat, Peerat and Dabre, Raj and Audino, Rio Alexander and Cahyawijaya, Samuel and Zhang, Shi-Xiong and Salim, Stephanie Yulia and Zhou, Yi and Gui, Yinxuan and Adelani, David Ifeoluwa and Lee, En-Shiun Annie and Okada, Shogo and Purwarianti, Ayu and Aji, Alham Fikri and Watanabe, Taro and Wijaya, Derry Tanti and Oh, Alice and Ngo, Chong-Wah. 2025. WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) .
  44. Wu, Minghao and Aji, Alham Fikri. 2025. Style Over Substance: Evaluation Biases for Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics .
  45. Zuhri, Zayd M. K. and Fuadi, Erland Hilman and Aji, Alham Fikri. 2025. Predicting the Order of Upcoming Tokens Improves Language Modeling. Preprint .
  46. Zuhri, Zayd M. K. and Fuadi, Erland Hilman and Aji, Alham Fikri. 2025. Softpick: No Attention Sink, No Massive Activations with Rectified Softmax. Preprint .