Important Dates
Invited Talks
Accepted Abstracts
Workshop Organizers
Previous Workshops

Invited speakers!

Andrew Crouse, University of Alabama at Birmingham
Marti Hearst, University of California Berkeley
Maria Liakata, Queen Mary, University of London
Zhiyong Lu, National Center for Biotechnology Information
Jevin D. West, University of Washington

SciNLP 2021: 2nd Workshop on Natural Language Processing for Scientific Text

The Workshop on Natural Language Processing for Scientific Texts, SciNLP 2021, will once again be hosted at AKBC 2021, as an on-line event. This year’s SciNLP focus is on Understanding Scientific Text.

Important Dates


Registration for SciNLP will be through the AKBC 2021 conference.


Date / Time: October 8th from 7:50AM - 12:30PM PDT (UTC-7).

  West Coast (UTC-7) East Coast (UTC-4) Europe (UTC+2) Beijing (UTC+8) Length
Open session 7:50 AM 10:50 16:50 22:50 0:10
Maria Liakata 8:00 AM 11:00 17:00 23:00 0:40
Andrew Crouse 8:40 AM 11:40 17:40 23:40 0:40
Break 9:20 AM 12:20 18:20 0:20 0:05
Zhiyong Lu 9:25 AM 12:25 18:25 0:25 0:40
Jevin D. West 10:05 AM 13:05 19:05 1:05 0:40
Break 10:45 AM 13:45 19:45 1:45 0:05
Marti Hearst 10:50 AM 13:50 19:50 1:50 0:40
Poster session 11:30 AM 14:30 20:30 2:30 1:00
Closing 12:30 PM 15:30 21:30 3:30 0:05

Each invited talk is roughly 35 min (plus 5 min after for Q&A and buffer time for transitions).

Invited Talks

Andrew Crouse

What is another word for synonym? The value of NLP data and its challenges: a user’s perspective

Dr. Crouse has been working on biomedical research for a decade and a half including industry, non-profit, and now academic research. His background is in molecular biology and genetics. Currently, as director of research for the Hugh Kaul Precision Medicine Institute at the University of Alabama at Birmingham, he has been assisting in the development of programs utilizing knowledge graphs of biomedical information to help find therapeutic options for patients with rare genetic disorders. They rely heavily on knowledge set developed through applying NLP on all of PubMed. Dr. Crouse will discuss the value and the challenges associated with using the NLP data.

Marti Hearst

Broadening Access to Scientific Research via NLP-Enhanced User Interfaces

Key to understanding the latest developments in AI and other branches of science is being able to comprehend what is written in research papers. Even experienced researchers are having trouble keeping up with the literature, and the mathematical content of machine learning papers can be especially challenging for those less familiar with technical content. To help broaden access to the scientific literate, the ScholarPhi project – a collaboration between UC Berkeley, AI2, and U Washington – has been developing new user interfaces for improving the reading of scientific articles. This includes context-relevant explanations of technical terms and notation, and eventually, tools to improve the writing of such papers. Scientific scholarly text presents numerous challenges to NLP algorithms, many of which do not work with high accuracy on this complex data source. Our efforts have uncovered interesting language phenomena, and led to innovations in NLP algorithm design. It has also underscored the especially difficult task of annotating language phenomena in this domain. In this talk I will discuss these and other challenges to help inspire research in this important and exciting domain. This research is supported by a grant from the Alfred P. Sloan Foundation, and AI2.

Marti Hearst is a Professor at UC Berkeley in the School of Information and the Computer Science Division. Her research encompasses user interfaces with a focus on search, information visualization with a focus on text, computational linguistics, and educational technology. She is the author of Search User Interfaces, the first academic book on that topic. She is a former President of the Association for Computational Linguistics, a member of the CHI Academy, the SIGIR Academy, an ACM Fellow, and has received four Excellence in Teaching Awards from the students of UC Berkeley. She received her PhD, MS, and BA degrees in Computer Science from UC Berkeley and was a member of the research staff at Xerox PARC.

Maria Liakata

Towards automatically understanding and measuring the contributions of scientific work

Researchers in NLP have been working on the automatic extraction of information from scientific articles for over two decades. A key aspect in this line of research is capturing how scientists discuss their work, the scientific discourse. I will give a brief overview of early work on identifying the scientific discourse and how this can improve downstream tasks involving the extraction of information from the scientific literature. I will then present more recent work on the relation between the scientific discourse and the way it is represented in the news as a step towards understanding the more comprehensive (non-academic) impact of scientific work.

Maria Liakata is a Turing AI fellow and Professor in Natural Language Processing (NLP) at the School of Electronic Engineering and Computer Science, Queen Mary University of London. She is also honorary Professor at the Department of Computer Science, University of Warwick. At the Turing she founded and co-leads the NLP and data science for mental health special interest groups and supervises PhD students. Maria is in receipt of a five year EPSRC/UKRI Turing AI Fellowship which involves developing new methods for NLP and multi-modal data to allow the creation of longitudinal personalized language monitoring. She is also the co-PI of projects on Mobile Sensing of Altered EveryDay Function in Early Alzheimer’s Disease (MEDEA), “Language sensing for dementia monitoring & diagnosis”, “Opinion summarization from social media”, “PANACEA: An AI-enabled evidence-driven framework for claim veracity assessment during pandemics”. She leads a team of 5 RAs and 8 PhD students.

Zhiyong Lu

Biomedical Text Mining for Knowledge Discovery

The explosion of biomedical big data and information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge — which mostly exists as free text in journal articles for humans to read — presents a grand new challenge: individual scientists around the world are increasingly finding themselves overwhelmed by the sheer volume of research literature and are struggling to keep up to date and to make sense of this wealth of textual information. Our research aims to break down this barrier and to empower scientists towards accelerated knowledge discovery. In this talk, I will present our work on developing large-scale, machine-learning based tools for better understanding scientific text in the biomedical literature. Moreover, I will demonstrate their uses in some real-world applications such as improving PubMed searches, scaling up data curation with PubTator, and taming COVID-19 pandemic paper tsunami in LitCovid.

Dr. Zhiyong Lu is a Senior Investigator at the National Library of Medicine’s (NLM) Intramural Research Program, leading research in biomedical text and image processing, information retrieval, and machine learning. As Deputy Director for Literature Search at the National Center of Biotechnology Information (NCBI), Dr. Lu also directs the overall R&D efforts to improve literature search and information access (e.g. PubMed Search; LitCovid). Over the years, Dr. Lu has mentored over 40 trainees and is a highly cited author with ~300 peer-reviewed articles. Dr. Lu is a Fellow of the American College of Medical Informatics (ACMI), Associate Editor of Bioinformatics, and Organizer of the BioCreative challenge series.

Jevin D. West

Mapping Latent Knowledge

The collective knowledge of humankind is assembled in an academic literature composed of tens of millions of articles and scholarly books. This primary knowledge is immeasurably valuable if one knows where to look. However, navigating the literature can be extraordinarily difficult due to its vast volume and antiquated systems of organization. To leverage this primary knowledge, researchers require a higher level of latent knowledge: familiarity with the key documents in a research area; an understanding of how to interpret individual documents in light of broader research paradigms; a mental map of the relations among fields and their core concepts; a knowledge of the specialist jargon needed for effective retrieval of particular research threads. To date, a scholar could acquire this sort of understanding only through long experience within a field. It is not laid out clearly in any classic text, nor can it be, for it maps a continually changing terrain. Given these challenges, how do we develop a semantic search that both captures and expedites the construction of this latent knowledge? Using examples from my own research and research of others, I will explore this question and point to potential future directions within the SciNLP community.

Jevin D. West is an Associate Professor in the Information School at the University of Washington. He is the Director of the new Center for an Informed Public at UW aimed at resisting strategic misinformation, promoting an informed society and strengthening democratic discourse. He is also the co-founder of the DataLab at UW, a Data Science Fellow at the eScience Institute, and Affiliate Faculty for the Center for Statistics & Social Sciences. His research and teaching focus on the impact of technology on science and society, with a focus on slowing the spread of misinformation. He develops methods for mining the scientific literature to study the origins of disciplines, examine the social and economic biases that drive these disciplines, and measure the impact of the current publication system on the health of science. He is also the co-author of the new book, Calling Bullshit: The Art of Skepticism in a Data-Driven World, which helps non-experts question numbers, data, and statistics without an advanced degree in data science.

Accepted Abstracts

Accepted abstracts will be presented at a poster session at the workshop on Gathertown. You can also watch the videos associated with them anytime. Listed below are the links to the abstracts and YouTube videos:

Title Authors Abstract Video (YouTube Link)
End-to-End NLP Knowledge Graph Construction Ishani Mondal, Yufang Hou, Charles Jochim pdf vid
Categorising Scientific Uncertainty in Papers Iana Atanassova, Francois-C. Rey pdf vid
Social Bias in Masked LMs Pre-trained on Scientific Corpora Kejian Shi, Leyi Yan, Chuwei Xu pdf vid
Automatic Error Analysis for Document-level Information Extraction from Scientific Text Aliva Das, Xinya Du, Barry Wang, Jiayuan Gu, Kejian Shi, Thomas Porter, Claire Cardie pdf vid
Teaching BERT Mathematics Anja Reusch, Maik Thiele, Wolfgang Lehner pdf vid
AI-powered résumé-job matching based on document semantic similarity and deep neural networks Sima Rezaeipourfarsangi, Evangelos Milios pdf vid
A Novel Dataset of Peer Reviews and Scientific Articles with Links Jan Buchmann, Ilia Kuznetsov, Iryna Gurevych pdf vid
A Search Engine for Discovery of Biomedical Challenges and Directions Dan Lahav, Jon Saad Falcon, Bailey Kuehl, Sophie Johnson, Sravanthi Parasa, Noam Shomron, Duen Horng Chau, Diyi Yang, Eric Horvitz, Daniel S. Weld, Tom Hope pdf vid
SCICO: Hierarchical Cross-Document Coreference for Scientific Concepts Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope pdf vid
Automating the screening of articles for a review on suicide research Osiris Rankin, Daniel M. Low, Jordyn R. Ricard, Franchesca Castro-Ramirez, Pedro Garcia, Skylar Smith, Narise Ramlal, Rediet Alemu, Ariel Ervin, Margaret Vo, Anastasia Carney, Matthew K. Nock pdf vid
Context-aware Citation Recommendation Based on BERT-based Bi-Ranker Kaito Sugimoto, Akiko Aizawa pdf vid
Citation Context-Aware Citation Network Embeddings Based on Pre-trained Transformer Masaya Ohagi, Akiko Aizawa pdf vid
Quality Over Quantity: Assessing the Effect of Corpus Quality and Size on Rhetorical Classification of Biomedical Abstracts Mengfei Lan, Halil Kilicoglu pdf vid
Improving Automatic Citation Text Generation using Self-supervised Pre-trained Model Guoao Wei, Nadia Ghobadipasha pdf vid
Summarizing scientific literature on the basis of deconstructed systematic reviews and meta-analyses Anders McIlquham-Schmidt, Leon Derczynski pdf vid
Systematic Extraction of Covid-19 Risk Factors and Vaccine Side Effects Francis Wolinski pdf vid
Semi-supervised ontology linking for food system research papers Elina Gundyreva, Lidia Pivovarova pdf vid
Representing the disciplinary structure of physics: a comparative evaluation of graph and text embedding methods Isabel Constantino, Sadamori Kojaku, Santo Fortunato, Yong-Yeol Ahn pdf vid
Mining Acknowledgement Texts in Web of Science (MinAck) Nina Smirnova, Philipp Mayr pdf vid
The Delayed Recognition of Scientific Novelty Yiling Lin, James Evans, Lingfei Wu pdf vid
SBDH and Suicide: A Multi-task Learning Framework for SBDH in Electronic Health Record Avijit Mitra, Bhanu Pratap Singh Rawat, Emily B. Druhl, Heather Keating, Raelene Goodwin, Wen Hu, Weisong Liu, Hong Yu pdf vid
Extracting Material Synthesis Procedure: A Research on Relation-Level Shanshan Liu, Tatsuya Ishigaki, Yui Uehara, Hiroya Takamura, Chowdhury Mohammad Mahir Asef, Mutsunori Uenuma, Hiroyuki Shindo, Yuji Matsumoto pdf vid
DisamBERT: Author name disambiguation with BERT Sadamori Koujaku, Xiaoran Yan, Jisung Yoon, Filipi N. Silva, Vincent Lariviere, Yong-Yeol Ahn pdf vid
Annotating Natural Language Processing Shared Task Descriptions Anna Martin, Jennifer D’Souza, and Ted Pedersen pdf vid
Domain-adaptation of spherical embeddings Mihalis Gongolidis, Jeremy Minton, Ronin Wu, Valentin Stauber, Jason Hoelscher-Obermaier, Viktor Botev pdf vid
CORWA: A Citation-Oriented Related Work Annotation Dataset Xiangci Li, Jessica Ouyang pdf vid
Using Document Classification to Map ‘Disease Research State’ across Rare Diseases Gully Burns, Michaela Torkar, Ana-Maria Istrate, Hana Zaydens, Lia Prins, Ellaine Chou, Donghui Li, Samantha Scovanner pdf vid
Assessing Readability of Scientific Texts for English as a Second Language Learners Yo Ehara pdf vid


Feel free to contact us at or on Twitter via #SciNLP!

Join the mailing list to receive announcements.

Workshop Organizers

Previous Workshops

SciNLP 2020 at AKBC 2020

Hosted on GitHub Pages — Theme by orderedlist