Economy >> The Korea Economic Daily Global


NCSOFT unveils AI dataset to rival against hyperscale language models


Link [2022-04-14 14:32:21]



NCSoft building in Pangyo, near capital Seoul. Courtesy of NCSOFT Corp.

NCSOFT Corp. unveiled an artificial intelligence (AI) conversation dataset it developed with Korea University’s research center on Thursday. The South Korean game developer and publisher headquartered in Pangyo city is hopeful the latest development could be the much awaited rival to the hyperscale language models dominating the natural language processing (NLP) field. Lim Hui-seok, professor of computer science and engineering at the university, led the research. Lim also heads the academic institute’s NLP and AI research center. The collection of data is named FoCus Dataset, a short form of For Customized Conversation Dataset. The research team says it is the first such dataset that encompasses both user persona and outside knowledge. As it stands, it is comprised of more than 15,000 conversations on some 8,000 subjects. An AI that is equipped with the FoCus Dataset will be able to understand the experience, preference, and taste of the person who is having a conversation with it. Not only that, it will be able to source and learn the latest information available on Wikipedia in real time. The collection and utilization of language data for AI adaptation falls in the NLP category. The goal of the machine learning technology is to program computers to process and analyze large amounts of the language spoken by humans for seamless communication between machines and people. In this process, personas refer to a profile that represents large segments of data since it is easier to test a given strategy against the average of different individuals, i.e. a persona, as opposed to thousands of individuals. What sets FoCus Dataset apart from other data collections is that it can enable sophisticated conversations without the help of hyperscale language models. Even though typical large-scale language models take a long time to learn from and deduct meaning from data, it still hits a bottleneck when it comes to inferring real time data and reflecting the personal experiences.In late February, NCSOFT and Korea University jointly published a paper on the dataset at the AAAI 2022 conference. Founded in 1979, the Association for the Advancement of Artificial Intelligence is one of the best regarded scientific societies in the AI community. Come this October, the two entities will host the first workshop on the customized chat technology at COLING 2022, an international conference on computational linguistics. “Recently in the NLP academic circle, the need for alternative conversation technologies that will rival hyperscale language models has risen – due to financial and environmental reasons,” Lee Yeon-soo, director of NCSOFT’s Language AI Lab said. The lead scientist at NCSOFT elabroated that he hopes the dataset will spark vibrant conversation and technological development in the NLP sector. NCSOFT is best known for the distribution of massively multiplayer online role-playing games (MMORPG) such as Lineage and Guild Wars. In recent years, it has been expanding its foothold in other tech sectors. By Jee Abbey Leejal@hankyung.com

Most Read

2024-09-20 07:07:25