Increasing Southeast Asian Representation in AI

As artificial intelligence (AI) continues to make strides, the Tech For Good Institute sat down with Leslie Teo, Senior Director of AI Products at AI Singapore, to unravel the intricacies of AI development, ethical considerations, and ongoing initiatives.

TFGI: Leslie, it’s a pleasure to have you here. Could you share a bit about your background and the role you play at AI Singapore?

In my background, economics and finance paved the way for years of work in macro-economic policy-making and investment strategy. Recognising the transformative potential of big data, fast networks, and powerful computing a decade ago, I pivoted towards data science and AI. About a year and a half ago, I took on the role of leading the product work at AI Singapore, a national program funded by Singapore’s National Research Foundation to enhance the country’s AI capabilities.

TFGI: For those less familiar with AI terminology, could you simplify what Large Language Models (LLMs) are and explain their role in the broader spectrum of Generative AI?

Generative AI creates new content rather than just analysing data. Large Language Models (LLMs), such as ChatGPT, focus on generating text and serve as versatile Foundation Models that require tuning for specific applications. They can create diverse content like text, images, and music.

LLMs are designed to understand and generate human language, trained on vast amounts of text data. However, existing models often display bias due to training data sourced predominantly from Western societies.

TFGI: Moving on to your recent initiatives, what prompted the focus on developing multimodal and localised LLMs specifically tailored for Singapore and Southeast Asia?

Recognising the potential economic and social impact of LLMs, we turned our attention to the scarcity of Southeast Asian data in existing models. Most LLMs are trained on Western data, leading to biases and performance gaps for Southeast Asian languages. Our goal is to bridge this gap by developing open-source LLMs like SEA-LION, focusing on representing the region’s 600 million citizens and their diverse cultures.

TFGI: Southeast Asian Languages in One Network (SEA-LION), AI Singapore’s Southeast Asian Languages in One Network, is designed as a family of open-source LLMs for SEA use cases. Can you delve into the significance of this model and how it contributes to representing the cultural contexts of Southeast Asia?

SEA-LION, trained on 11 languages used in Southeast Asia, stands out with its emphasis on regional tasks. It addresses language nuances that many other models overlook, contributing to a better understanding of Southeast Asian languages and cultures. By open-sourcing data and measures, we aim to encourage collaborative efforts to enhance LLMs in the region.

At the heart of understanding the SEA region lies the issue of language tokenisation. Existing tokenisers of popular LLMs are often English-centric. We created a custom SEABPETokeniser for optimal model performance after testing various approaches. Our SEABPETokeniser is designed to balance fertility and proportion of continued words, considering linguistic nuances.

Source: AI Singapore (2024)

TFGI: As part of the National Multimodal Large Language Model (LLM) Programme, how does SEA-LION contribute to advancing Singapore’s research and engineering capabilities, especially in the realm of multi-modal Large Language Models?

Given the rapidly evolving technology, there’s a strategic need to develop sovereign capabilities in LLMs, considering the unique cultural contexts of Singapore and the Southeast Asian region. This initiative emphasises the development of multimodal and localised LLMs for understanding the diverse cultures and languages of Southeast Asia. SEA-LION plays a crucial role in building expertise among our AI engineers and scientists, contributing to local and regional LLMs. As part of A*Star‘s efforts, SEA-LION, will expand to include audio capabilities, further advancing Singapore’s capabilities in multi-modal LLMs.

 

TFGI: Looking ahead, what developments and outcomes can we anticipate from the National Multimodal Large Language Model (NMLP) over the next two years?

Over the next two years, our goal is to expand data sets for text and speech, releasing sets with three trillion tokens. We plan to build better-performing models, incorporating speech and exploring approaches like a mixture of experts. Through partnerships with commercial, public, and NGO sectors, we aim to validate the utility of these models and foster a vibrant open-source community. The ongoing effort in research, engineering, governance, computing, and enterprise AI use cases will develop models and use cases. Building on early outcomes of AI Singapore‘s SEA-LION1 model, which is more representative of Southeast Asia’s cultural contexts and linguistic nuances, SEA-LION will evolve and extend into a multimodal speech-text model, offering a cost-effective and efficient option for enterprises in Southeast Asia to incorporate AI into their workflows.

 

About the organisation

Launched in May 2017, AI Singapore brings together all Singapore-based research institutions and the vibrant ecosystem of AI start-ups and companies developing AI products to perform use-inspired research, grow knowledge, create tools, and develop the talent to power Singapore’s AI efforts.

Download Report

Download Report

Latest Updates

Latest Updates​

Keep pace with the digital pulse of Southeast Asia!

Never miss an update or event!

Mouna Aouri

Programme Fellow

Mouna Aouri is an Institute Fellow at the Tech For Good Institute. As a social entrepreneur, impact investor, and engineer, her experience spans over two decades in the MENA region, South East Asia, and Japan. She is founder of Woomentum, a Singapore-based platform dedicated to supporting women entrepreneurs in APAC through skill development and access to growth capital through strategic collaborations with corporate entities, investors and government partners.

Dr Ming Tan

Founding Executive Director

Dr Ming Tan is founding Executive Director for the Tech for Good Institute, a non-profit founded to catalyse research and collaboration on social, economic and policy trends accelerated by the digital economy in Southeast Asia. She is concurrently a Senior Fellow at the Centre for Governance and Sustainability at the National University of Singapore and Advisor to the Founder of the COMO Group, a Singaporean portfolio of lifestyle companies operating in 15 countries worldwide.  Her research interests lie at the intersection of technology, business and society, including sustainability and innovation.

 

Ming was previously Managing Director of IPOS International, part of the Intellectual Property Office of Singapore, which supports Singapore’s future growth as a global innovation hub for intellectual property creation, commercialisation and management. Prior to joining the public sector, she was Head of Stewardship of the COMO Group and the founding Executive Director of COMO Foundation, a grantmaker focused on gender equity that has served over 47 million women and girls since 2003.

 

As a company director, she lends brand and strategic guidance to several companies within the COMO Group. Ming also serves as a Council Member of the Council for Board Diversity, on the boards of COMO Foundation and Singapore Network Information Centre (SGNIC), and on the Digital and Technology Advisory Panel for Esplanade–Theatres on the Bay, Singapore’s national performing arts centre.

 

In the non-profit, educational and government spheres, Ming is a director of COMO Foundation and Singapore Network Information Centre (SGNIC) and chairs the Asia Advisory board for Swiss hospitality business and management school EHL. She also serves on  the Council for Board Diversity and the Digital and Technology Advisory Panel for Esplanade–Theatres on the Bay, Singapore’s national performing arts centre.

 

Ming was educated in Singapore, the United States, and England. She obtained her bachelor’s and master’s degrees from Stanford University and her doctorate from Oxford.