top of page

How To Make Computers Understand Idioms

Updated: May 29, 2023

Natural Language Processing (NLP in short) is the type of computing we observe in programs like Google Translate and Siri: it all has to do with processing spoken languages into computer terms. Although not widely popular, NLP has a lot of applications in our lives, and we depend on them more than we could imagine. But as all things are, the mechanisms behind NLP aren’t perfect. One of its major setbacks is its rather underdeveloped ability to correctly process idioms.


What do I mean? Let’s give an example. Think of the expression “break a leg.” Usually, when you tell someone to go break a leg, you wish them good luck. However, you can also be having malicious intentions and quite literally tell someone to go break a leg. How will language models be able to differentiate between a multiword expression that was used in its literal form or a multiword expression that was used in its phrasemic form? Janelle Shane answers this question in a broader perspective in her book “You Look Like a Thing and I Love You,” but I will be paraphrasing the answer just for the sake of not writing a 2000-word article: AI models and programs need to “observe” lots of data in order to enhance their processing skills. The more data researchers “feed” into linguistic models, the better interpretations it will have, and so developers of these models are always on the search for vast quantities of data – sentences that contain MWEs either used in their actual or imaginary forms. But acquiring this data isn’t easy: the masterminds of the models cannot invest their time into creating data when they have other important tasks to attend to, one of them being maintaining and debugging the programs. So how can they obtain so much data? Associate Professor Gülşen Eryiğit and Master’s candidate Ali Şentaş from the Istanbul Technical University have answered this question by introducing a creative approach to this problem.


Their research, titled “Gamified Crowdsourcing for Idiom Corpora Construction” and published in the Natural Language Engineering journal by Cambridge University Press, provides every technicality that would acquaint a computational linguist with the scope of the project. But I am pretty sure that going over technical terms and graphs may not be everyone’s piece of cake, so I will be summarizing the research in simpler words.


Crowdsourcing is basically like asking a dispersed group of people to do you a favor in exchange for a reward, and this concept is the essence of this research. Gamification is the likening of something to a video game. So, combined, gamified crowdsourcing means the usage of a game-like software in order to obtain MWE data from various people, in exchange for prizes. And that’s what the academicians of ITU have developed: a chatbot with a game-like interface that prompts people to submit sentences including MWEs in exchange for digital gift cards from online services like Wolt. The Telegram-based chatbot’s name is Dodiom, the bird who wants to learn idioms.


By Batuhan Yeltekin

14 views0 comments

Related Posts

See All

Comments


LJDJ

CONSULT

NEWSLETTER

  • Twitter
  • TikTok
  • Instagram
  • Facebook
  • YouTube

Vision and mission

Privacy

Transparency

Contact

Join our mailing list

Thanks for subscribing!

© 2024 by LJDJ. All Rights Reserved. 

bottom of page