|Goals:||Curate a benchmark dataset for testing novel evaluation metrics of conversational chatbots|
|Assistant:||Ekaterina Svikhnushina (ekaterina DOT svikhnushina AT epfl DOT ch)|
|Keywords:||Dataset curation; conversational chatbots; natural language processing|
Evaluation of conversational chatbots is an open research problem within the NLP community. Previous studies tested various automatic metrics for a proxy of human evaluation of chatbot’s naturalness. While several popular automatic metrics correlated poorly with human judgment (Liu et al., 2016), perplexity demonstrated promising results (Adiwardana et al., 2020). However, the notion of naturalness in the aforementioned study did not include a set of essential human-like conversation attributes, e.g., entertainment or empathy, as suggested by the PEACE model (Svikhnushina and Pu, 2021). The aim of this project is to create a benchmark dataset of conversations with sufficient coverage of the PEACE constructs that could be further used for evaluation and comparison of different conversational models as well as testing of novel evaluation metrics.
|Related Skills:||Knowledge in natural language processing, data mining, and machine learning; strong analytical skills; programming skills (knowledge of Python is essential, basic web development skills is a plus).|
|Suitable for:||Master student. Interested student should contact Ekaterina Svikhnushina (ekaterina DOT svikhnushina AT epfl DOT ch) and Pearl Pu (pearl DOT pu AT epfl DOT ch) along with a copy of your CV.|