Seventh International Workshop on SCIentific DOCument Analysis
(SCIDOCA 2023)
associated with JSAI-isAI 2023

Workshop: June 5 - 6, 2023

Venue: Kumamoto, Japan

Workshop photos

Prof. Nguyen opens the workshop

Dr. Vu Tran: Towards Enhancing Information Extraction via Public Discussions on Reddit about COVID-19 Research

Mr. An Dao: A Comparative Study of Language Models for Chemical Entity Recognition

Mr. Ryoma Hosokawa: Reference classification using BERT models to support scientific-document writing

Mr. Yasuaki Ito: Directional Generative Networks

Prof. Minh Nguyen: Semantic Parsing for Questions and Answering Over DBLP Database

Invited talk 1 by Dr. Danilo Carvalho: Learning Disentangled Representations for Natural Language Definitions

Invited talk 2 by Prof. Yoshinobu Kano: Capabilities and limitations of LLMs: from viewpoint of our research in dialog system, medical, legal and political applications and in human-like NLP system

Mr. Van-Thuy Phi : Constructing a Polymer Corpus for Named Entity Recognition and Relation Extraction

Dr. Yo Ehara: Evaluating How Readable Paper Abstracts Are in Diverse Subcategories under Computer Science for Non-Native English Speakers

Mr. Chau Nguyen: Enhancing Legal Text Entailment with Prompt-Based ChatGPT: An Empirical Study

Invited talk 3 by Prof. Aiko Aizawa: Natural Language Processing for Scientific Paper Analysis

Aims and Scope

Recent proliferation of scientific papers and technical documents has become an obstacle to efficient information acquisition of new information in various fields. It is almost impossible for individual researchers to check and read all related documents. Even retrieving relevant documents is becoming harder and harder. This workshop gathers all the researchers and experts who are aiming at scientific document analysis from various perspectives, and invite technical paper presentations and system demonstrations that cover any aspects of scientific document analysis.

Important Dates (Time zone: AOE (Anywhere on Earth))

Submission Deadline for Long Papers: April 25, 2023
Submission Deadline for Short Papers: May 8, 2023
Notification of Acceptance (Long+Short Papers): May 15, 2023
Camera-Ready due (Long+Short Papers): May 20, 2023
Workshop Date: June 5 - 6, 2023

Registration

Please register the workshop at registration page of JSAI International Symposia on AI 2023.

Program

Day 1 (June 5, 2023)

09:50-10:00: Opening
10:00-11:00: Session 1 (SC: Prof. Le-Minh Nguyen)

10:00-10:30: Towards Enhancing Information Extraction via Public Discussions on Reddit about COVID-19 Research
Vu Tran, Tomoko Matsui and Minh Le Nguyen

10:30-11:00: A Comparative Study of Language Models for Chemical Entity Recognition
An Dao, Yuji Matsumoto and Akiko Aizawa

11:00-14:00: Lunch
14:00-15:30: Session 2 (SC: Dr. Vu Tran)

14:00-14:30: Reference classification using BERT models to support scientific-document writing
Ryoma Hosokawa, Junji Yamato, Ryuichiro Higashinaka, Genichiro Kikui and Hiroaki Sugiyama

14:30-15:00: Directional Generative Networks
Yasuaki Ito and Le-Minh Nguyen

15:00-15:30: Semantic Parsing for Questions and Answering Over DBLP Database
Minh Nguyen, Khang Le, Anh Kieu and Y. Nagai

16:00-17:00: Invited talk 1 by Dr. Danilo Carvalho (SC: Dr. Vu Tran)
Title: Learning Disentangled Representations for Natural Language Definitions
Abstract: While large language models (LLMs) have been pushing the state-of-the-art in a variety of Natural Language Processing tasks, they have an important drawback, which is the difficulty in interpreting their internal representations and controlling them. This fact raises practical and ethical concerns about their use. To alleviate this problem, we propose a method for incorporating information about the role of each word in a dictionary definition towards the meaning of the defined term. This promotes disentanglement of said roles in the internal representations of the models, allowing better interpretation of their properties and a higher degree of control over the generated text.
Bio: Danilo Carvalho is a Research Associate at the Department of Computer Science at the University of Manchester, working on Safe and Explainable Artificial Intelligence (XAI) architectures. He holds a Ph.D. in Information Science from Japan Advanced Institute of Science and Technology (JAIST), having worked as a systems analyst within Brazilian state oil company (Petrobras) on job safety analysis (JSA) and environmental licensing control systems, followed by several scientific and technological projects in the fields of Artificial Intelligence, Computational Linguistics and Information Systems, with applications to economics, legal informatics, media literacy and healthcare. His other research interests include Parallel and Distributed Computing and Software Engineering.

Day 2 (June 6, 2023)

10:00-12:00: Session 3 (SC: Prof. Le-Minh Nguyen)

10:00-11:00: Invited talk 2 by Prof. Yoshinobu Kano
Title: Capabilities and limitations of LLMs: from viewpoint of our research in dialog system, medical, legal and political applications and in human-like NLP system
Abstract: Recent advantages of the LLMs (Large Language Models, such as ChatGPT) affect not only to the NLP community but the global society of the oridinary people. A question is that the LLMs could solve what sort of NLP problems, though it is still under investigation. Another question is that to what extent the LLMs are "similar" to the humans -- they apparently differ in its architecture and the training data size. I introduce a couple of related NLP projects that I organize, such as automatic medical diagnosis, AIWolf project (dialog system in conversation games), Legal NLP (legal bar exam solver), and SNS analysis project for public opinions, discuss the ability and limitations of the LLMs with future directions.
Bio: Yoshinobu Kano is an Associate Professor, a Faculty of Informatics in Shizuoka University, Japan. He received BS in physics (2001), MSc (2003) and PhD (2011) in information science and technology from the University of Tokyo, respectively. He was a research associate in University of Tokyo (2009), JST PRESTO researcher (2011), an associate professor (PI) in Shizuoka Univerity (2014-). He is interested in human-like natural language processing and its applications in medical, legal, political and conversational issues. See more here.

11:30-12:00: Constructing a Polymer Corpus for Named Entity Recognition and Relation Extraction
Van-Thuy Phi and Yuji Matsumoto

12:00-13:30: Lunch
13:30-14:30: Session 4 (SC: Dr. Vu Tran)

13:30-14:00: Evaluating How Readable Paper Abstracts Are in Diverse Subcategories under Computer Science for Non-Native English Speakers
Yo Ehara

14:00-14:30: Enhancing Legal Text Entailment with Prompt-Based ChatGPT: An Empirical Study
Chau Nguyen and Le-Minh Nguyen

15:00-16:00: Invited talk 3 by Prof. Aiko Aizawa (SC: Prof. Yuji Matsumoto )
Title: Natural Language Processing for Scientific Paper Analysis
Abstract: The rapid development of recent language models has greatly improved the performance of various natural language processing tasks such as information extraction, summarization, or question answering. This talk will present some of the recent techniques in various areas of scientific text analysis to explore the challenges and future perspectives.
Bio: Akiko AIZAWA is a professor at the National Institute of Informatics (NII). Aizawa is also an adjunct professor at the University of Tokyo as well as at the Graduate University of Advanced Studies. Aizawa's research interests include natural language understanding, dialogue systems, and scholarly document processing.
16:00-16:10: Closing

Topics

Relevant topics include, but are not limited to, the following:

text analysis

document structure analysis

logical structure analysis

figure and table analysis

citation analysis of scientific and technical documents

scientific information assimilation

summarization and visualization

knowledge discovery/mining from scientific papers and data

similar document retrieval

entity and relation linking between documents and knowledge base

survey generation

resources for scientific documents analysis

document understanding in general

NLP systems aiming for scientific documents including tagging, parsing, coreference, etc.

Submissions
There are two classes of submissions:

Long paper on original and completed work, including concrete evaluation and analysis wherever appropriate; and

Short paper on a small, focused contribution, work in progress, a negative result, or an opinion piece.

The page limits are up to 14 pages including references for the longer papers, and up to 7 pages including references for the short papers. (Reviewers will be told that there is no penalty for writing a shorter submission.)

All submissions should be written in English, formatted according to the Springer Verlag LNCS style in a pdf form, which can be obtained from here. The paper should be anonymized. If you use a word file, please follow the instruction of the format, and then convert it into a pdf form and submit it at the paper submission page.

For both classes, in addition to the original unpublished work, we also accept the papers that have already been published or presented in other venues. This submission should also be anonymized, and will be reviewed by the program committee.

The accepted papers will not be archived in general. The papers are distributed to the participants of the workshop on a USB flash drive. If the authors hope to make their paper publicly available, we also will provide a link to the pdf on this webpage. Otherwise, we do not upload the papers on the web. Unpublished submissions on both long and short paper tracks are considered as the candidates for post-proceedings of LNAI (the authors can also reject the invitation, if they wish). The papers will be archived only by this post-proceedings.

You can submit your paper at https://easychair.org/conferences/?conf=scidoca2023 . If you cannot submit a paper by EasyChair System by some trouble, please send email to "nguyenml[at]jaist.ac.jp"

If a paper is accepted, at least one author of the paper must register the workshop and present it. Please register the workshop at registration page.

Post Proceedings
Selected papers will be published as a post-proceedings via Springer Verlag "Lecture Notes in Artificial Intelligence" series after the second round of review after the workshop.

Workshop Chairs

Minh Le Nguyen, Japan Advanced Institute of Science and Technology
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project (Advisor)

Program Committee Members

Nguyen Le Minh, Japan Advanced Institute of Science and Technology
Noriki Nishida, RIKEN Center for Advanced Intelligence Project
Vu Tran, The Institute of Statistical Mathematics
Yusuke Miyao, The University of Tokyo
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project
Yoshinobu Kano, Shizuoka University
Akiko Aizawa, National Institute of Informatics
Ken Satoh, National Institute of Informatics and Sokendai
Junichiro Mori, The University of Tokyo
Kentaro Inui, Tohoku University
Nguyen Ha Thanh, National Institute of Informatics
Nguyen Minh Phuong, Japan Advanced Institute of Science and Technology

For any inquiry concerning the workshop, please send it to "nguyenml[at]jaist.ac.jp"

SCIDOCA 2023 home page https://www.jaist.ac.jp/event/SCIDOCA/2023/

Preivous SCIDOCA workshops

SCIDOCA 2022 (6th workshop)

SCIDOCA 2021 (5th workshop)

SCIDOCA 2020 (4th workshop)

SCIDOCA 2018 (3rd workshop)

SCIDOCA 2017 (2nd workshop)

SCIDOCA 2016 (1st workshop)

Back To Top

Seventh International Workshop on SCIentific DOCument Analysis (SCIDOCA 2023) associated with JSAI-isAI 2023

Workshop photos

Aims and Scope

Important Dates (Time zone: AOE (Anywhere on Earth))

Registration

Program

Day 1 (June 5, 2023)

Day 2 (June 6, 2023)

Topics

Submissions

Post Proceedings

Workshop Chairs

Program Committee Members

Preivous SCIDOCA workshops

Seventh International Workshop on SCIentific DOCument Analysis
(SCIDOCA 2023)
associated with JSAI-isAI 2023