SciNLP 2021: 2nd Workshop on Natural Language Processing for Scientific Text

The Workshop on Natural Language Processing for Scientific Texts, SciNLP 2021, will once again be hosted at AKBC 2021, as an on-line event. This year’s SciNLP focus is on Understanding Scientific Text.

Important Dates

~~August 20, 2021~~ September 1st, 2021: Submission deadline
September 10, 2021: Notification of workshop acceptance
October 1, 2021: Deadline to upload accepted posters and videos
October 8, 2021: Workshop

Registration

Registration for SciNLP will be through the AKBC 2021 conference.

Schedule

Date / Time: October 8th from 7:50AM - 12:30PM PDT (UTC-7).

	West Coast (UTC-7)	East Coast (UTC-4)	Europe (UTC+2)	Beijing (UTC+8)	Length
Open session	7:50 AM	10:50	16:50	22:50	0:10
Maria Liakata	8:00 AM	11:00	17:00	23:00	0:40
Andrew Crouse	8:40 AM	11:40	17:40	23:40	0:40
Break	9:20 AM	12:20	18:20	0:20	0:05
Zhiyong Lu	9:25 AM	12:25	18:25	0:25	0:40
Jevin D. West	10:05 AM	13:05	19:05	1:05	0:40
Break	10:45 AM	13:45	19:45	1:45	0:05
Marti Hearst	10:50 AM	13:50	19:50	1:50	0:40
Poster session	11:30 AM	14:30	20:30	2:30	1:00
Closing	12:30 PM	15:30	21:30	3:30	0:05

Each invited talk is roughly 35 min (plus 5 min after for Q&A and buffer time for transitions).

Invited Talks

Andrew Crouse

What is another word for synonym? The value of NLP data and its challenges: a user’s perspective

Dr. Crouse has been working on biomedical research for a decade and a half including industry, non-profit, and now academic research. His background is in molecular biology and genetics. Currently, as director of research for the Hugh Kaul Precision Medicine Institute at the University of Alabama at Birmingham, he has been assisting in the development of programs utilizing knowledge graphs of biomedical information to help find therapeutic options for patients with rare genetic disorders. They rely heavily on knowledge set developed through applying NLP on all of PubMed. Dr. Crouse will discuss the value and the challenges associated with using the NLP data.

Marti Hearst

Broadening Access to Scientific Research via NLP-Enhanced User Interfaces

Key to understanding the latest developments in AI and other branches of science is being able to comprehend what is written in research papers. Even experienced researchers are having trouble keeping up with the literature, and the mathematical content of machine learning papers can be especially challenging for those less familiar with technical content. To help broaden access to the scientific literate, the ScholarPhi project – a collaboration between UC Berkeley, AI2, and U Washington – has been developing new user interfaces for improving the reading of scientific articles. This includes context-relevant explanations of technical terms and notation, and eventually, tools to improve the writing of such papers. Scientific scholarly text presents numerous challenges to NLP algorithms, many of which do not work with high accuracy on this complex data source. Our efforts have uncovered interesting language phenomena, and led to innovations in NLP algorithm design. It has also underscored the especially difficult task of annotating language phenomena in this domain. In this talk I will discuss these and other challenges to help inspire research in this important and exciting domain. This research is supported by a grant from the Alfred P. Sloan Foundation, and AI2.

Marti Hearst is a Professor at UC Berkeley in the School of Information and the Computer Science Division. Her research encompasses user interfaces with a focus on search, information visualization with a focus on text, computational linguistics, and educational technology. She is the author of Search User Interfaces, the first academic book on that topic. She is a former President of the Association for Computational Linguistics, a member of the CHI Academy, the SIGIR Academy, an ACM Fellow, and has received four Excellence in Teaching Awards from the students of UC Berkeley. She received her PhD, MS, and BA degrees in Computer Science from UC Berkeley and was a member of the research staff at Xerox PARC.

Maria Liakata

Towards automatically understanding and measuring the contributions of scientific work

Researchers in NLP have been working on the automatic extraction of information from scientific articles for over two decades. A key aspect in this line of research is capturing how scientists discuss their work, the scientific discourse. I will give a brief overview of early work on identifying the scientific discourse and how this can improve downstream tasks involving the extraction of information from the scientific literature. I will then present more recent work on the relation between the scientific discourse and the way it is represented in the news as a step towards understanding the more comprehensive (non-academic) impact of scientific work.

Maria Liakata is a Turing AI fellow and Professor in Natural Language Processing (NLP) at the School of Electronic Engineering and Computer Science, Queen Mary University of London. She is also honorary Professor at the Department of Computer Science, University of Warwick. At the Turing she founded and co-leads the NLP and data science for mental health special interest groups and supervises PhD students. Maria is in receipt of a five year EPSRC/UKRI Turing AI Fellowship which involves developing new methods for NLP and multi-modal data to allow the creation of longitudinal personalized language monitoring. She is also the co-PI of projects on Mobile Sensing of Altered EveryDay Function in Early Alzheimer’s Disease (MEDEA), “Language sensing for dementia monitoring & diagnosis”, “Opinion summarization from social media”, “PANACEA: An AI-enabled evidence-driven framework for claim veracity assessment during pandemics”. She leads a team of 5 RAs and 8 PhD students.

Zhiyong Lu

Biomedical Text Mining for Knowledge Discovery

The explosion of biomedical big data and information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge — which mostly exists as free text in journal articles for humans to read — presents a grand new challenge: individual scientists around the world are increasingly finding themselves overwhelmed by the sheer volume of research literature and are struggling to keep up to date and to make sense of this wealth of textual information. Our research aims to break down this barrier and to empower scientists towards accelerated knowledge discovery. In this talk, I will present our work on developing large-scale, machine-learning based tools for better understanding scientific text in the biomedical literature. Moreover, I will demonstrate their uses in some real-world applications such as improving PubMed searches, scaling up data curation with PubTator, and taming COVID-19 pandemic paper tsunami in LitCovid.

Dr. Zhiyong Lu is a Senior Investigator at the National Library of Medicine’s (NLM) Intramural Research Program, leading research in biomedical text and image processing, information retrieval, and machine learning. As Deputy Director for Literature Search at the National Center of Biotechnology Information (NCBI), Dr. Lu also directs the overall R&D efforts to improve literature search and information access (e.g. PubMed Search; LitCovid). Over the years, Dr. Lu has mentored over 40 trainees and is a highly cited author with ~300 peer-reviewed articles. Dr. Lu is a Fellow of the American College of Medical Informatics (ACMI), Associate Editor of Bioinformatics, and Organizer of the BioCreative challenge series.

Jevin D. West

Mapping Latent Knowledge

The collective knowledge of humankind is assembled in an academic literature composed of tens of millions of articles and scholarly books. This primary knowledge is immeasurably valuable if one knows where to look. However, navigating the literature can be extraordinarily difficult due to its vast volume and antiquated systems of organization. To leverage this primary knowledge, researchers require a higher level of latent knowledge: familiarity with the key documents in a research area; an understanding of how to interpret individual documents in light of broader research paradigms; a mental map of the relations among fields and their core concepts; a knowledge of the specialist jargon needed for effective retrieval of particular research threads. To date, a scholar could acquire this sort of understanding only through long experience within a field. It is not laid out clearly in any classic text, nor can it be, for it maps a continually changing terrain. Given these challenges, how do we develop a semantic search that both captures and expedites the construction of this latent knowledge? Using examples from my own research and research of others, I will explore this question and point to potential future directions within the SciNLP community.

Jevin D. West is an Associate Professor in the Information School at the University of Washington. He is the Director of the new Center for an Informed Public at UW aimed at resisting strategic misinformation, promoting an informed society and strengthening democratic discourse. He is also the co-founder of the DataLab at UW, a Data Science Fellow at the eScience Institute, and Affiliate Faculty for the Center for Statistics & Social Sciences. His research and teaching focus on the impact of technology on science and society, with a focus on slowing the spread of misinformation. He develops methods for mining the scientific literature to study the origins of disciplines, examine the social and economic biases that drive these disciplines, and measure the impact of the current publication system on the health of science. He is also the co-author of the new book, Calling Bullshit: The Art of Skepticism in a Data-Driven World, which helps non-experts question numbers, data, and statistics without an advanced degree in data science.

Accepted Abstracts

Accepted abstracts will be presented at a poster session at the workshop on Gathertown. You can also watch the videos associated with them anytime. Listed below are the links to the abstracts and YouTube videos:

Title	Authors	Abstract	Video (YouTube Link)
End-to-End NLP Knowledge Graph Construction	Ishani Mondal, Yufang Hou, Charles Jochim	pdf	vid
Categorising Scientific Uncertainty in Papers	Iana Atanassova, Francois-C. Rey	pdf	vid
Social Bias in Masked LMs Pre-trained on Scientific Corpora	Kejian Shi, Leyi Yan, Chuwei Xu	pdf	vid
Automatic Error Analysis for Document-level Information Extraction from Scientific Text	Aliva Das, Xinya Du, Barry Wang, Jiayuan Gu, Kejian Shi, Thomas Porter, Claire Cardie	pdf	vid
Teaching BERT Mathematics	Anja Reusch, Maik Thiele, Wolfgang Lehner	pdf	vid
AI-powered résumé-job matching based on document semantic similarity and deep neural networks	Sima Rezaeipourfarsangi, Evangelos Milios	pdf	vid
A Novel Dataset of Peer Reviews and Scientific Articles with Links	Jan Buchmann, Ilia Kuznetsov, Iryna Gurevych	pdf	vid
A Search Engine for Discovery of Biomedical Challenges and Directions	Dan Lahav, Jon Saad Falcon, Bailey Kuehl, Sophie Johnson, Sravanthi Parasa, Noam Shomron, Duen Horng Chau, Diyi Yang, Eric Horvitz, Daniel S. Weld, Tom Hope	pdf	vid
SCICO: Hierarchical Cross-Document Coreference for Scientific Concepts	Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope	pdf	vid
Automating the screening of articles for a review on suicide research	Osiris Rankin, Daniel M. Low, Jordyn R. Ricard, Franchesca Castro-Ramirez, Pedro Garcia, Skylar Smith, Narise Ramlal, Rediet Alemu, Ariel Ervin, Margaret Vo, Anastasia Carney, Matthew K. Nock	pdf	vid
Context-aware Citation Recommendation Based on BERT-based Bi-Ranker	Kaito Sugimoto, Akiko Aizawa	pdf	vid
Citation Context-Aware Citation Network Embeddings Based on Pre-trained Transformer	Masaya Ohagi, Akiko Aizawa	pdf	vid
Quality Over Quantity: Assessing the Effect of Corpus Quality and Size on Rhetorical Classification of Biomedical Abstracts	Mengfei Lan, Halil Kilicoglu	pdf	vid
Improving Automatic Citation Text Generation using Self-supervised Pre-trained Model	Guoao Wei, Nadia Ghobadipasha	pdf	vid
Summarizing scientific literature on the basis of deconstructed systematic reviews and meta-analyses	Anders McIlquham-Schmidt, Leon Derczynski	pdf	vid
Systematic Extraction of Covid-19 Risk Factors and Vaccine Side Effects	Francis Wolinski	pdf	vid
Semi-supervised ontology linking for food system research papers	Elina Gundyreva, Lidia Pivovarova	pdf	vid
Representing the disciplinary structure of physics: a comparative evaluation of graph and text embedding methods	Isabel Constantino, Sadamori Kojaku, Santo Fortunato, Yong-Yeol Ahn	pdf	vid
Mining Acknowledgement Texts in Web of Science (MinAck)	Nina Smirnova, Philipp Mayr	pdf	vid
The Delayed Recognition of Scientific Novelty	Yiling Lin, James Evans, Lingfei Wu	pdf	vid
SBDH and Suicide: A Multi-task Learning Framework for SBDH in Electronic Health Record	Avijit Mitra, Bhanu Pratap Singh Rawat, Emily B. Druhl, Heather Keating, Raelene Goodwin, Wen Hu, Weisong Liu, Hong Yu	pdf	vid
Extracting Material Synthesis Procedure: A Research on Relation-Level	Shanshan Liu, Tatsuya Ishigaki, Yui Uehara, Hiroya Takamura, Chowdhury Mohammad Mahir Asef, Mutsunori Uenuma, Hiroyuki Shindo, Yuji Matsumoto	pdf	vid
DisamBERT: Author name disambiguation with BERT	Sadamori Koujaku, Xiaoran Yan, Jisung Yoon, Filipi N. Silva, Vincent Lariviere, Yong-Yeol Ahn	pdf	vid
Annotating Natural Language Processing Shared Task Descriptions	Anna Martin, Jennifer D’Souza, and Ted Pedersen	pdf	vid
Domain-adaptation of spherical embeddings	Mihalis Gongolidis, Jeremy Minton, Ronin Wu, Valentin Stauber, Jason Hoelscher-Obermaier, Viktor Botev	pdf	vid
CORWA: A Citation-Oriented Related Work Annotation Dataset	Xiangci Li, Jessica Ouyang	pdf	vid
Using Document Classification to Map ‘Disease Research State’ across Rare Diseases	Gully Burns, Michaela Torkar, Ana-Maria Istrate, Hana Zaydens, Lia Prins, Ellaine Chou, Donghui Li, Samantha Scovanner	pdf	vid
Assessing Readability of Scientific Texts for English as a Second Language Learners	Yo Ehara	pdf	vid

Contact

Feel free to contact us at scinlp@googlegroups.com or on Twitter via #SciNLP!

Join the mailing list to receive announcements.

Workshop Organizers

Arman Cohan @armancohan, Allen Institute for AI
Pradeep Dasigi @pdasigi, Allen Institute for AI
Tom Hope, Allen Institute for AI
Kyle Lo @kylelostat, Allen Institute for AI
Sunil Mohan, Chan Zuckerberg Initiative
Alex Wade @alexwade, Allen Institute for AI
Lucy Lu Wang @lucyluwang, Allen Institute for AI
Ivana Williams, Chan Zuckerberg Initiative
Dongxu Zhang, University of Massachusetts, Amherst

Previous Workshops

SciNLP 2020 at AKBC 2020

Hosted on GitHub Pages — Theme by orderedlist