Understanding Language by Computer

Laboratory on Text Analytics
Professor：SHIRAI Kiyoaki

E-mail： E-mai
［Research areas］
Natural Language Processing, Machine Learning, Artificial Intelligence
［Keywords］
Statistical Natural Language Processing, Support for Web Access, NLP Application

Skills and background we are looking for in prospective students

Interest in human language, Desire for learning natural language processing, Fundamental knowledge on algorithm and automaton

What you can expect to learn in this laboratory

How to find new problems on natural language processing by conducting comprehensive survey of previous work. How to explore solution for your own research questions by learning necessary fundamental techniques and methods of natural language processing. Writing skill and presentation skill to tell your research outcome to others by publishing a paper in a domestic/international conference and giving the presentation both in the university and at a conference.

【Job category of graduates】 Information Technology

Research outline

Natural Language Processing (NLP) is a technique to utilize a computer to understand a language we daily use, process a huge amount of texts, and provide a new service. NLP has great ability to enrich our life, but it is difficult to understand a language by a computer. Our laboratory tackles such difficult problems.

Major research themes in our laboratory can be summarized as follows.

(1) Natural language analysis based on a large corpus

“Natural language analysis” means a process to understand a meaning of a sentence. In general, a huge amount of knowledge and rules is required to understand sentences. However, it is difficult to prepare such knowledge exhaustively. We study techniques to acquire statistical information from a large amount of texts (corpus) and use it for accurate natural language analysis.

(2) Support of Web Access

It is a technique to help people to search on Web. For example, since information on Web is not always true, we aim at supporting a user to judge whether searched information is reliable or not. Specifically, we extract “Web writer”, a person or organization who makes a webpage, then show the extracted Web writer to a user. When we want to know something about disease, webpages written by a medical doctor or hospital may be reliable and useful. By showing Web writer of the webpage, a user can easily guess whether the information on that webpage is correct or not, and may be able to obtain correct information more easily.

(3) Opinion Mining

Nowadays, people often post a user review about a product or service on Web such as a blog or social media. Opinion of others is useful for people who want to buy a new product. In opinion mining, for a given target (product or service), we analyze users' reviews, judge whether a user expresses a positive or negative opinion, and reveal reputation of the target. In addition, we focus on analyzing an opinion of not a whole review but an “aspect”, which is a feature of a product such as “design”, “interface”, and “battery” of a mobile phone. By analyzing pros and cons of individual aspects, we can know reputation of a product more precisely.

(4) NLP application

We try to develop many NLP application systems. For example, a free conversation system is a computer system with which we can enjoy free conversation or chat. Another example is a machine translation. Especially, we focus on machine translation of low resource languages, where an amount of available parallel corpus (examples of translations) is not so large, such as translation from dialect to standard language or from sign language to written language.

Key publications

Aye Aye Mar, Kiyoaki Shirai, Natthawut Kertkeidkachorn. Weakly Supervised Learning Approach for Implicit Aspect Extraction. Information 14(11), 612, 2023.
Daichi Haraguchi, Kiyoaki Shirai, Naoya Inoue, Natthawut Kertkeidkachorn. Discovering Highly Influential Shortcut Reasoning: An Automated Template-Free Approach. Findings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.6401-6407, 2023.
Tu Dinh Tran, Kiyoaki Shirai, Natthawut Kertkeidkachorn. Text Generation Model Enhanced with Semantic Information in Aspect Category Sentiment Analysis. Findings of the 61st Annual Meeting of the Association for Computational Linguistics, pp.5256-5268, 2023.

Equipment

Computer server

Our lab's strength in Transdisciplinary Sciences

We perform several research activities in our laboratory to enhance students’ ability of finding a new problem and solving it as well as presentation and communication skills. First, we regularly have a seminar to study previous work where one of students introduces a related paper to other laboratory members. We also have a regular seminar to discuss students’ research contents. Students often have a meeting with the supervisor to discuss their research progress and future direction.

［Website］ URL：https://www.jaist.ac.jp/nlp/lab/?En/Top