Dystemo: Distant Supervision for Emotion Recognition in Tweets

Abstract

Emotion recognition in text has become an important research objective. It involves building classifiers capable of detecting human emotions for a specific application, e.g. analyzing reactions to product launches, monitoring emotions at sports events, or discerning opinions in political debates. Most successful approaches rely heavily on costly manual annotation. To alleviate this burden, we propose a distant supervision method—Dystemo—for automatically producing emotion classifiers from tweets labeled using existing or easy-to-produce emotion lexicons. The goal is to obtain emotion classifiers that work more accurately for specific applications than available emotion lexicons.

Keywords

Emotion Recognition, Social Media Analysis, Distant Supervision, Twitter

Years

2013-2016

Main researchers

Valentina Sintsova and Pearl Pu

Publications

  • Valentina Sintsova and Pearl Pu. Dystemo: Distant Supervision Method for Multi-Category Emotion Recognition in Tweets. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):Article No.13, 2016

Available Resources

Dystemo framework – source distribution

We provide to the research community the source code of the developed distant learning framework for emotion recognition. It is shared under the General Purpose License (GPL 3.0) and accessible online at https://bitbucket.org/vsintsova/dystemo. We ask researchers who would use this code to cite the Dystemo paper from above.

Dystemo-produced emotion lexicons for sports events

We also make available for research purposes six emotion lexicons produced using the Dystemo framework and designed to recognize emotions within the domain of sport events. They associate unigrams and bigrams to 20 emotion categories from Geneva Emotion Wheel (GEW), version 2.0. The construction process involved pseudo-labeling of unlabeled within-domain data (Olympic tweets) with emotions based on the application of initial emotion lexicons. The shared six lexicons were trained with Balanced Weighted Voting and PMI-based methods, while initialized from different initial lexicons (either GALC-R, OlympLex-1.1, or PMI-Hash). They were trained with the optimized parameters for each case.

These lexicons are distributed under the following terms of use:

  • The lexicons can be used for research purposes only.
  • Do not redistribute the lexicons further. Instead, please refer interested parties to Valentina Sintsova (valentinasintsova@gmail.com) or to the web page of the Dystemo project: http://hci.epfl.ch/research-projects/dystemo/
  • Please kindly cite the Dystemo paper above in your publications if you use any of these lexicons in your research.

To request these lexicons, please send an e-mail to pearl.pu [at] epfl.ch and valentinasintsova [at] gmail.com.

Initial emotion lexicons

We used three emotion lexicons for initial pseudo-labeling of tweets: GALC-R, OlympLex-1.1, and PMI-Hash. You can find more information on these lexicons and their terms of use on this page

.