Hiroyasu Yamada's Web Site

[ TOP | Research Topics | Software | List of Publications ]

Research Topics

Statistical NLP
Statistical Language Analysis
Statistical Parser

Software

If you run this script with -V option, you can see syntactic trees on your terminal in ASCII forms. None-terminal nodes are described as "Label<head child>" . Terminal nodes are described as "word/part-of-speech".

(Example)

      Such/JJ-+            
              |               
         a/DT-+NP<3>-+    
              |       
  proposal/NN-+

where the number "3" in "NP<3>" denotes the head child index in its children. In above example, "proposal/NN" is the head of "NP<3>". About rules of head identification, see Head Rules. In the original Penn treebank the tag "-NONE-" is annotated for gaps. If you want to see trees with -NONE- tags, run this script with -g option. (In our current implementation, if the size of the input tree is too large, it can not display correctly. Please enlarge your terminal screen or use the higher resolution display.)

CFG rules

Run this script with -C option, it extracts CFG rules of the input trees. For example,

S<2> -> NP VP

<n> shows the index of head constituent, i.e., the head constituent of "S" is 2nd. constituent "VP".

Dependency Structure Trees

ASCII Tree Form

Each node is expressed as "word/part-of-speech" forms.

(Example)

            I/PRP-+            
                  |               
                  +read/VBD-+
                  |       
  the/DT-+        |
         +book/NN-+

The root node of this sentence is "read/VBD".

List Form

The list output form of our converter consists of three columns: word, POS and num.

(Example)

    I     PRP   2
    read  VBD   -1
    the   DT    4
    book  NN    2
    .     .     2

where num denotes that the word modifies num-th word. For example, in above sentence the first word "I" modifies 2nd. word "read". The root nodes of dependency trees are expressed as negative number "-1".

Head Rules

This script converts phrase structure forms of Penn treebank into dependency forms using simple head rules. These rules are almost similar to Collins' one (see page 238 on his PhD dissertation). But we haven't implemented some special rules about NP and CONJP yet. All our rules are expressed priority lists for each none terminal node. If you want to see our head rules, please run this script with -H option.

The current best statistical parser, A Maximum-Entropy-Inspired Parser (meip) proposed by Charniak, deals with some extended part-of-speech tags, AUX and AUXG. If you want to convert the outputs of charniak's parser into our dependency forms, please use "-R meip" option.

ptbconv.old

It's an older version of ptbconv which was used in our IWPT2003 paper. I'll never debug or support it in future. It's only for comparison with our results on IWPT2003. (Note that the output dependency forms are different from the latest one. In ptbconv.old, the word offset begins 0, not 1. see readme.rd.)

dparser (release on 10 Feb. 2005)

pparser

List of Publications

Hiroyasu Yamada, Yuji Matsumoto, ``Bottom-up Deterministic Analysis of Dependency Structure Using Support Vector Machines.'', Transactions of IPSJ, Vol. 45, No. 10, p. 2416-2427, 2004 (Japanese), [pdf]
Hiroyasu Yamada, Yuji Matsumoto, ``Statistical Dependency Analysis With Support Vector Machines.'', In proceedings of 8th International Workshop on Parsing Technologies, p. 195-206, 2003, [pdf]
Hiroyasu Yamada, Taku Kudo, Yuji Matsumoto, ``Japanese Named Entity Extraction using Support Vector Machine'', Transactions of IPSJ, Vol. 43, No.1 p. 44-53, 2002 (Japanese), [pdf]

contact to h-yamada@jaist.ac.jp

[ TOP | Research Topics | Software | List of Publications ]

Hiroyasu Yamada's Web Site

Research Topics

Software

ptbconv-3.0

How to install

How to use

Output Formats

Phrase Structure Trees