SemEval 2020 Task 9 </br>SentiMix: Sentiment Analysis for Code-Mixed Social Media Text

Important Dates

* All deadlines are calculated at 11:59 pm
UTC-12 hours

Trial Data Ready	~~Jul 31 (Wed), 2019~~
Training Data Ready	~~Sep 4 (Wed), 2019~~
Test Data Ready	~~Feb 19 (Wed), 2020~~
Evaluation Start	~~Feb 19 (Wed), 2020~~
Evaluation End	~~Mar 11 (Wed), 2020~~
Results Posted	~~Mar 18 (Wed), 2020~~
System Description Paper Submission Due	~~May 1 (Fri), 2020~~
Task Description Paper Submission Due	~~May 8 (Fri), 2020~~
Notification to Authors	~~Jun 24 (Wed), 2020~~
Camera-ready Due	Jul 8 (Wed), 2020
Workshop	12-13 December 2020

Competition

We will be using CodaLab for the competition. Here are the links:

Trial datasets

Data Format

We follow the CoNLL format for both datasets. Every token in a tweet has its own line, and next to the token you will find its corresponding language identification label separated by a tab. The first line of a tweet contains tab-separated metadata where we provide the index and the sentiment of the tweet. Every tweet is separated by an empty line. Here is an example of two tweets from the Spanglish dataset, one positive and one neutral:

meta	1	positive
So	lang1
that	lang1
means	lang1
tomorrow	lang1
cruda	lang2
segura	lang2
lol	lang1

meta	2	neutral
Tonight	lang1
peda	lang2
segura	lang2

Official Competition Metric for the Task

The metric for evaluating the participating systems will be as follows. We will use F1 averaged across the positives, negatives, and the neutral. The final ranking would be based on the average F1 score. However, for further theoritical discussion and we will release macro-averaged recall (recall averaged across the three classes), since the latter has better theoretical properties than the former2015), and since this provides better consistency. Each participating team will initially have access to the training data only. Later, the unlabelled test data will be released. After SemEval-2020, the labels for the test data will be released as well. We will ask the participants to submit their predictions in a specified format (within 24 hours), and the organizers will calculate the results for each participant. We will make no distinction between constrained and unconstrained systems, but the participants will be asked to report what additional resources they have used for each submitted run.

SemEval 2020 Task 9 SentiMix: Sentiment Analysis for Code-Mixed Social Media Text

Important Dates

Competition

Trial datasets

Data Format

Official Competition Metric for the Task

SemEval 2020 Task 9
SentiMix: Sentiment Analysis for Code-Mixed Social Media Text