Deep Learning, Natural Language
Understanding, Legal Text Processing
NGUYEN Laboratory
Professor:NGUYEN, Minh Le
E-mail:
[Research areas]
Artifical Intelligence, Natural Language Processing, Machine Learning
[Keywords]
Natural Language Understanding, Text Summarization, Deep Learning, Knowledge Representation
Skills and background we are looking for in prospective students
Mathematic, Programing (C++, Java, Python),Statistical models, Background on Artificial intelligence (Search algorithms, machine learning models). Background on Natural Language Processing is a plus point.
What you can expect to learn in this laboratory
We expect that students will obtain the following qualities through research activities in the lab. Skilsl in finding problems and reading papers. Have knowledge background on machine learning (deep learning) and natural language processing. With Ph.D students, we expect that after graduation they will become independent researcher and they can know how to write a scientific journal and how to present they works in an international conference. With master student, we expect that they will have skills in working with the problems of how to expoloit machine learning models on semi-structure data (big data). They can also know how to formulate a problem using machine learning models. They will obtain fundmental knowledge on machine learning and knowledge representation.
【Job category of graduates】 communication industry, software industry, service industry
Research outline
Logical parts in legal paragraph
Text Summarization: Sentence Reduction
Research Overview
Structure representations and machine learning models play a key important role for Artificial intelligence (AI). Our research will focus on how tactical structural representation and machine learning are used for formulating problems in AI ranging from text summarization, natural language understanding, legal engineering, and machine reading.
Machine Learning
Fundamental problems in machine learning are focused on our research directions. We particularly study on structured prediction modes, which are used to recognize structure representation such as sequence, tree, and graph. On the other hand, designing feature spaces for machine learning is difficult and requiring much human effort. To deal with this, we are concerned on how feature representation is automatically learnt from data. Regarding to this problem, Deep learning would probably be suitable for our goal. We also study on reinforcement learning which can learn by interacting with environments.
Natural Language Understanding
One of the ultimate goals in AI is to enable computers to converse with humans through human languages. To achieve the goal, we especially pay attention on semantic computation. This research is used to support computers to understanding natural language. Our initial work showed how synchronous grammars could be combined with structured learning models to transform a natural language sentence to a logical form representation [1]. On the other hand, we want to investigate how natural language generation (NLG) can help computers for producing a human understandable language sentence from its meaning representation. One research topic we pursue is to know how probabilistic models can be applied for generating natural language sentences from their underlying semantic in the form of typed lambda calculus.
For legal engineering, our mission is to support people for reading legal documents. The first task aims at recognizing logical parts of law sentences in a paragraph, and then grouping related logical parts into some logical structures of formulas, which describe logical relations between logical parts [2].
Machine Reading:
One of the direction in our lab is to study the fundamental problems on how we can extract useful information from texts and how to build knowledge from texts. First, we are interested in text summarization [3] which is used to extract gist information from text documents.
We also focus on studying Machine Reading, which automatically extracts knowledge from a large number of documents by reading texts. Communication between human and machine in reading text is also interested in our study. A Question Answering system like IBM-Watson is our expected outcome.
Key publications
- M.L. Nguyen, A. Shimazu: A semi supervised learning model for mapping sentences to logical forms with ambiguous supervision. Data Knowl. Eng. 90: 1-12 (2014)
- B.X. Ngo, M.L. Nguyen, T.T. Oanh, A. Shimazu, “A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles”, ACM TALIP, Volume 12(1), 2013
- M.T. Nguyen and M.L. Nguyen. “SoRTESum: A Social Context Framework for Single-Document Summarization”, ECIR 2016, LNCS 9626, pp. 1–12, 2016
Equipment
Mac Server 64G
Windows Server 64GRAM
Teaching policy
The primary goal for teaching students is that we should teach students how they can develop an ability of self-learning. For supervising graduated students, we think one of the most important things is how to find problems for studying. To support students, we would like to discuss with students as much as possible to help them in choosing the research topic and discovering problems. Reading skill is so important for students in order to enrich their knowledge, and it would be helpful for students in choosing the topics and finding out problems. For this reason, our lab organize seminar courses covering state-of-the-art results. We think reading and discussing on state-of-the-art works, would be useful for improving not only student’s knowledge but also the student’s skills in writing papers. We also organize seminar courses covering the background knowledge both in machine learning and linguistic aspects.
[Website] URL:https://www.jaist.ac.jp/is/labs/nguyen-lab/home/