6th Social Media Mining for Health (SMM4H) Shared Tasks at NAACL 2021

Event Dates

Jun 10, 2021 - Jun 10, 2021

Location

Mexico City, Mexico and Online

Submission Deadline

Mar 01, 2021

The Social Media Mining for Health Applications (#SMM4H) Shared Task involves natural language processing (NLP) challenges of using social media data for health research, including informal, colloquial expressions and misspellings of clinical concepts, noise, data sparsity, ambiguity, and multilingual posts. The proceedings of the shared task will be presented at the 6th SMM4H Workshop hosted by NAACL (North American Chapter of the Association for Computational Linguistics). This year’s shared task has eight NLP challenges, the most hosted at SMM4H.

For each of the eight tasks listed below, participating teams will be provided with a set of annotated tweets for developing systems, followed by a three-day window during which they will run their systems on unlabeled test data and upload it to Codalab. For additional details about the tasks and information about registration, data access, paper submissions, and presentations, go to https://healthlanguageprocessing.org/smm4h-shared-task-2021/

For information regarding workshop paper submissions, topics and deadlines to the workshop go to https://healthlanguageprocessing.org/smm4h-2021/

This year’s SMM4H Shared Task features tasks across text classification, named entity recognition (NER) and entity normalization problems on social media datasets. Researchers will have the opportunity to apply state-of-the-art methods on multiple tasks to further validate your research on data from social media. For students and educators, this is an excellent opportunity to teach and apply natural language processing (NLP) techniques by participating in the shared tasks. Each participating team will have the opportunity to submit a 2-page system description which will be published as part of the shared task proceedings.

This year’s shared tasks are listed below. If you are interested in one or more of these tasks, please register your team at : https://forms.gle/1qs3rdNLDxAph88n6

1) Classification, Extraction and Normalization of Adverse Effect mentions in English tweets : https://healthlanguageprocessing.org/smm4h-2021/task-1/

2) Classification of Russian tweets for detecting presence of Adverse Effect mentions

https://healthlanguageprocessing.org/smm4h-2021/task-2/

3) Classification of change in medications regimen in tweets

https://healthlanguageprocessing.org/smm4h-2021/task-3/

4) Classification of tweets self-reporting adverse pregnancy outcomes

https://healthlanguageprocessing.org/smm4h-2021/task-4/

5) Classification of tweets self-reporting potential cases of COVID-19

https://healthlanguageprocessing.org/smm4h-2021/task-5/

6) Classification of COVID19 tweets containing symptoms

https://healthlanguageprocessing.org/smm4h-2021/task-6/

7) Identification of professions and occupations (ProfNER) in Spanish tweets

https://healthlanguageprocessing.org/smm4h-2021/task-7/

8) Classification of self-reported breast cancer posts on Twitter

https://healthlanguageprocessing.org/smm4h-2021/task-8/

Important dates for shared tasks:

Training set release : Dec 15, 2020

Codalab link available : Feb 1, 2021

Validation set submission due : Feb 15, 2021

Test set release : Feb 26 – Mar 1, 2021

Test set predictions due : Mar 1 – Mar 4, 2021

Test set evaluation scores release : Mar 8, 2021

System descriptions due : Mar 15, 2021

Acceptance notification : Apr 1, 2021

Camera ready system descriptions : Apr 12, 2021

Workshop : June 10

Organizers:

Graciela Gonzalez-Hernandez, University of Pennsylvania, USA

Arjun Magge, University of Pennsylvania, USA

Davy Weissenbacher, University of Pennsylvania, USA

Ari Z. Klein, University of Pennsylvania, USA

Karen O’Connor, University of Pennsylvania, USA

Abeed Sarker, Emory University, USA

Mohammed Ali Al-Garadi, Emory University, USA

Elena Tutubalina, Kazan Federal University, Russia

Zulfat Miftahutdinov, Kazan Federal University, Russia

Ilsear Alimova, Kazan Federal University, Russia

Martin Krallinger, Barcelona Supercomputing Center, Spain

Juan Banda, Georgia State University, USA

Contact Information:

Arjun Magge (Arjun.Magge@pennmedicine.upenn.edu)