"plain2 -README-" English version (original: Japanese)
June 7, 1990
by Akihiro Uchida@NEC

NOTICE: {...} is a comment from the translator.

{ Japanese character set is coded by 2-byte system, which is called "Kanji code". Kanji code has mainly 3 kinds; JIS, Shift-JIS and EUC. JIS is 7-bit code with escape sequence and others are 8-bit code. The plain2 contains EUC code. }

How to install

Do "make" in the directory in which source files exist.
The plain2 assumes the EUC Kanji code environment. EUC is internally used of the plain2 source files as a standard condition. (Kanji I/O-code may be selected by an option.) Some of the sources contains Kanji strings as constants. If Kanji code is changed improperly, the plain2 will not work well.

Installation for System V

The plain2 is distributed for 4.xBSD UNIX.
When you compile on System V, delete the "-DBSD" flag of "CFLAGS" in the Makefile.

before: CFLAGS= -DBSD -DDEBUG
after: CFLAGS= -DDEBUG

Installation for Shift-JIS environment

You can select Shift-JIS as Kanji I/O-code by using "-s" option, even if your plain2 is compiled under EUC environment.
You can change the internal Kanji code as Shift-JIS. Then you can edit and compile the source files for Shift-JIS system, like MS-DOS.
To change the internal Kanji code as Shift-JIS:
1. change codes of all source files from EUC to Shift-JIS.
2. redefine the variable "INTERNAL_CODE" of "plain2.h" as "CODE_SJIS".

CAUTION: The compiler which passes Shift-JIS is required. You cannot select EUC for Kanji I/O-code ,when the plain2 is compiled under Shift-JIS environment.

NTT TeX and ASCII TeX

The plain2 is tuned for TeX of NTT version.

{ NTT: Nihon Telegram & Telephone Co., ASCII: a software house. (La)TeX is improved for Japanese environment by NTT and ASCII with different ways. }
To use with TeX of ASCII version, undef "NTT_TEX" flag of "plain2.h".
#define NTT_TEX /* NTT jlatex (use \jxxxsize) */
I don't know the precise action under the ASCII TeX environment.

How to start plain2

% plain2 [option] [input-files ...]

examples:

% plain2 -tex plain.pln > plain.tex % plain2 -roff plain.pln > plain.t

Options for selection of output


    -roff
        selects `roff' outputs.  "\" and "'" become "\e" and "\'".
        "." as the line-head becomes "\&.".
        The default output is `roff'.

    -tex

selects `LaTeX' outputs. Special characters are quoted one by one, i.e. "-", "=" and "-=" become ""$-$", "$=$" and "$-$$=$".

-texq

selects `LaTeX' outputs. With this option 1-byte alphabetic characters are quoted by "verb||". This function is incomplete.


    -raw
        suppresses quoting.

    -nospace

suppresses excessive blank lines.


    -nopre
        suppresses header pre-amples.

    -hifi

keeps the input image as same as possible, i.e. format of list numbers.


    -index
        outputs an index.

    -acursec

outputs section numbers similarly with the input. In default, sections are numbered continuously in spite of casual numbering of the input file.


    -j
        outputs JIS code.  The default code is EUC.

    -s

assumes the input as Shift-JIS. Outputs become Shift-JIS.

{ The following options are extracted from the body of this document. }


    -inline
        permits in-line operations (see section 6.3.).

    -full

extends reference label of figure/table (see subsection 6.3.3.).


    -renum

only renumbers list/section numbers of input texts.

CAUTION: The above options may not be used in the option field of the input files.

Changing of analysis parameters

If you cannot get the outputs which differ from your expectation, you can adjust outputs by changing of analysis parameters.
The analysis parameters can be described in the option field of the input files.
-table=ddd

adjusts a parameter for detecting a table. The "ddd" range is from 0 to 100. The default is 50. If a part is not outputted as a table against your expectation, the parameter value should be increased a little.

-exam=ddd

adjusts a parameter for detecting an example block. The "ddd" range is from 0 to 100. The default is 50. If a part is not outputted as an example block against your expectation, the parameter value should be increased a little.
```
    -notable
        suppresses using tables.

    -nopic
```
suppresses using pictures (drawings).

-ktable

permits to use JIS special code (Keisen) to describe a table. This is not recommended because Keisen code may conflicts with detecting pictures.

-indsec

passes the indented section number. Normaly, section numbers should be put at the beginning of the lines. This option is for documents which are written without such care.

-rmpage

removes new page code (ctrl-L). Headers and footers near by ctrl-L are also removed.

Miscellaneous options.

-v

displays verbosely under processing. This also warns the uncontinuous section/list numbers.


    -help

displays the simple help message.

Options via an environment variable

Options are set via the environment variable "PLAIN2_INIT".
An example:
% setenv PLAIN2_INIT "-tex -raw"

sets the default outputs as a raw mode and a LaTeX.
All options mentioned above are available via the PLAIN2_INIT.

Embedding options in the input files

You can set the analysis parameters in the option field of the input files. (See 3.8.1)

Structure of input texts

The plain2 analyzes the following text structure by using information of indentation, blank line delimiters, marks in line heads, and so on.
- Title
- Section title
- Example block (program output, a simple figure, etc.)
- Citation
- Flat text
- List
  - Item type
  - Enumerate type
  - Description type
- New page and blank line
- Table
- Picture with drawing characters

{ In this chapter, many examples are given using 2-byte Kanji characters.

Because it is useless for the English version, I omitted a part of the contents. Sorry. }

Section titles

A line is recognized as a section title, if it is featured as follows:
- is not indented,
- has a section number, and
- is the beginning line of the file or comes after a blank line.
A section title is not crossed by other structures.
A section number is a string of numbers and a delimiter ".". At the top level, the section number should be finished with ".". At the 2nd (or more) level, the tail "." can be omitted, when proper spaces are given after the number.

In the following examples, "Gray" means a case which may be confused with a list case described later. It ensures an exact analysis to put a blank line before the section number.


>>      1   Mistake         NG   (A dot is required at the tail.)
>>      1.  Confusion       Gray (This may cause a confusion with a list.)
>>      1.1Sub Section      NG   (Space(s) are required after the number.)
>>      1.1 Sub Section     Good (A dot can be omitted.)
>>      1.1.Sub Section     Good (A dot can be omitted.)

If the section number starts with not "1." but like "3.4" in the input files (a case of a part of a huge document), the plain2 generates an output which uses same section numbers. In other cases, the section numbers in the inputs are not used exactly, but used section levels and renumbered continuously (ignoring the discontinuous cases; "1." is followed by "3.").
To ensure the discontinuous section numbers, use "-v" option.

List

There are three types of a list; item, enumerate and description. Lists can be nested (but is limited of `troff' or `LaTeX').
Other structures, i.e. flat texts or example blocks, are available
```
  in a list.
```
A list of item/enumerate type may be outputted by using different head marks. For example, the input "(1)" is outputted as "1.". If you want a similarly marked list in the output, use "-hifi" option.

Item type

An item type is a list which begins with the following marks.
- Bullet: "o ", "* ", "+ "
- Dash: "-"
`troff' distinguishes a bullet list and a dash list, but it regards the bullet marks as a same kind. `LaTeX' regards different bullets and dashes as a same kind.

Enumerate type

An enumerate type is a list which is numbered with alpha-numeric


  characters or roman numbers.  It is described below how to distinguish
  an enumerate type and a section number.

    +--<left and right are parenthesis>--+  +--<numbers>--------+
    |                                    |  |                   |
  --+--<right is parenthesis>------------+--+--<alphabets>------+--
    |                                    |  |                   |
    +--<dot is in right>-----------------+  +--<roman numbers>--+

                Syntax of enumerate type.

Numbers must be followed by a dot or be enclosed by parenthesisses from right or from both sides.
Example of enumerate type list.
Because the above syntax rule conflicts with the section number definition,
- the form of "2-byte characters + dot" is not available, and
- the form of "number + dot" may be analyzed as a section number.
If the inputs contain no Kanji codes (alphabets only), the form of "alphabet + dot" is not available.
Similarly with the section number case, the enumerate type ignores the exact values of alphabets or numbers but puts significance on their format.

In a sequence of lists, if the alphabets or numbers are reseted to


  the first value ("A" or "1"), this line is regarded as a head of a
  new list.

Discontinuity of list values is warned by "-v" option.
See the next example. Between "the second" and "the first again" the list number is reseted to "1". Consequently "the first again" is regarded as a beginning of an another list.
The numbers are continuously changed from "2. the second again" to "3) the third". However, they are managed as distinct lists, because they have different formats.


>>  1) The first
>>  2) The second
>>  1. The first again
>>  2. The second again
>>  3) The third
>>  4) The fourth

The first
The second
1. The first again
2. The second again
The third
The fourth

Description type

You can define a list of description type by enclosing the list title with "[...]" or adding ":".


>>  [item1] Description type A
>>  [item2] Description type A
>>  itemA:  Description type B

[item1] Description type A
[item2] Description type A
itemA: Description type B

Line(s) after list

Generaly speaking, it is not easy to make a judge whether the


  line(s), which follows after a head line of a list, is a part
  of the list or not.

In the following example, the first "item type list" is not continuous with the next line. But the second "enumerate type list is" is continuing to the next line.


>>      * Item type list
>>        A list which begins with the following marks...
>>      * Enumerate type list is
>>        a list which is numbered with alpha-numeric...

The complete judgement should be done from the semantical view.

In the plain2, the head line of a list will continue to the next


  line, when

the head position of the next line is between the list mark and the list body, and
the head line of a list is longer than half of right margin.


>>  * Because the next line is indented similarly with the
>>    list body, this line is a part of the list.
>>  * Because the next line is indented similarly with the
>>  list mark, this line is also a part of the list.
>>  * Because the next lines are
>>      indented much more than the head line of a list,
>>      these lines are analyzed as other texts.
>>  * When the head line is
>>    shorter than the next (this) line, newline code is added.

It is a special case that a list consists of single lines. The final line of the list does not continue to the next line, even if the indentation is same with the list mark.


>> Do you know the words below?
>> minix:    mini size unix for PC
>> xinu:     xinu is a reversed image of unix
>> gnu:      gnu is not unix!
>> These are concerning on a UNIX concept ...

Do you know the words below?

minix: mini size unix for PC
xinu: xinu is a reversed image of unix
gnu: gnu is not unix!

These are concerning on a UNIX concept ...

Judgement basis as list

An item type list is composed of even a single line, because a bullet or dash always give a list format.
```
>>  * Item type
>>      - List which has only a single line
```
Ordinal texts may be regarded as an enumerate or description type.


>>  The circular constant is
>>  3.14159...


>>  From 3: 15 pm today, there is a presentation ...

Consequently, a single line, which seems to be an enumerate or description type, does not compose a list.

Example block

An example block is detected by evaluating the next conditions:
1. to include much of spaces (like figure),
2. to follow short lines continuously, and
3. to have lines which consist of only alphabetic characters.
  { Remember that the plain2 is for Kanji systems.
  
  This is not a usual condition for the original usage. }
The threshold value of the detection is changeable by "-exam" option.

The following cases are regarded as example blocks.


>> % ls /var
>> adm             log             spool           yp
>> crash           preserve        tmp
>> %


>> % ls -1 /var
>> adm
>> crash
>> log
>> preserve
>> spool
>> tmp
>> yp
>> %

A good indentation helps you to make a correct example block. In the next case, because a figure is similarly indented with texts, the figure is divided into many blocks and is failed in outputs.


>>The network is shown below. (This is a bad case.)
>>
>>       to XXNet
>>          |
>>      +---+---+
>>      | CISCO |
>>      +---+---+
>>          |
>>  ---+----+----+----------+----------+---- 131.130.29
>>     |         |          |          |
>>+----++     +--+---+   +--+---+   +--+---+
>>|ns-in|     |host-A|   |host-B|   |host-C|
>>+-----+     +------+   +------+   +------+
>>
>>This is connected with XXNet via 56Kbps private line.

An indentation (more than four spaces) of an example block solves this problem.


>>The network is shown below. (This is a good case.)
>>
>>           to XXNet
>>              |
>>          +---+---+
>>          | CISCO |
>>          +---+---+
>>              |
>>      ---+----+----+----------+----------+---- 131.130.29
>>         |         |          |          |
>>    +----++     +--+---+   +--+---+   +--+---+
>>    |ns-in|     |host-A|   |host-B|   |host-C|
>>    +-----+     +------+   +------+   +------+
>>
>>This is connected with XXNet via 56Kbps private line.

Picture with drawing characters

{ There are special characters in Kanji code to compose a drawing shape.

If a transplantation of the plain2 is properly done for non-Japanese systems, the same logic of analyzing pictures may be applied to 8-bit graphic characters. But I have no methods to express the shape of Kanji characters which are used in this section. I have to omit to translate this section. Sorry. }

Rules for connection among dash lines and thick lines

Special rules for slant lines

Ellipse (picture of floppy disk) and thick lines

Citation

Citation from mail or news are not filled automatically but remained as it is. The followings are conditions of citation.
- The text must not be indented.
- Every line of the text starts with ">>", and blank lines are before/after the text.
- Every line of the text starts with ">" or ":", and the 2nd character of the line is something (space, ":", and so on). This pattern continues more than three lines.

Table

The plain2 detects a table, when items are arranged periodically with blank characters (SPC, TAB) or drawing characters ("-", "+", "|", "=").

The next example is a table with blank characters.


>>      A32     AirBus A320     166
>>      B6      Boeing 767      270
>>      F50     Fokker 50        56
>>      M87     MD-87           134
>>      YS      YS-11            64

The next is a table with drawing characters.


>>      -----------------------------
>>      | Symbol |  Company         |
>>      |===========================|
>>      |  A32   | AirBus           |
>>      |--------+------------------|
>>      |  B6    | Boeing           |
>>      |--------+------------------|
>>      |  F50   | Fokker           |
>>      |--------+------------------|
>>      |  M87   | Macdonell Duglas |
>>      -----------------------------

The following example is a complicated table with drawing characters. Because the plain2 has a limitation for analysis, I don't recommend to make a complicated table.


>>      -----------------------------------------
>>      |  Form  | Feature || Symbol | Material |
>>      |=======================================|
>>      | Solid  |         ||   Au   | Gold     |
>>      |--------| Metal   ||--------+----------+
>>      |        |         ||   Hg   | Mercury  |
>>      |        |---------||--------+----------+
>>      | Liquid | Drink   ||   H2O  | Water    |
>>      |        |---------||--------+----------+
>>      |        |         ||   Br   | Bromine  |
>>      |--------| Halogen ||--------+----------+
>>      | Gas    |         ||   Cl   | Chlorine |
>>      |----------------------------+-----------

For complete analysis of the plain2, a table with blank characters requires more than two columns and two lines, and a table with drawing characters requires more than three lines.

Flat text

Patterns, which do not match with the above formats, classified as flat texts. It is possible to indent the head line of the paragraph with two or less space characters.
The plain2 detects a border of a paragraph after a blank line(s) or before an indented line.


>>   This transcript is from the file A56-7W, classified top-secret,
>> subject is AIR WOLF.  A Mach-1 plus attack helicopter with the most
>> advanced weapon system in the air today.
>>   It has been hidden somewhere in the western United States by its
>> test pilot, Stringfellow Hawke.   Hawke has promised return AIR WOLF
>> only if we can find his brother St. John, an M.I.A. in Vietnam.
>>
>> We suspect that Arch Angel, Deputy Director of the agency that built
>> AIR WOLF, is secretly helping Hawke in return for Hawke's flying AIR
>> WOLF on missons of national concern. (TV program: AIR WOLF)

Title block

You can give a title block at the first block of a file.
A title block is detected by the following patterns. The patterns are ordered by means of superiority.

{ The patterns are using Japanese words with 2 byte codes. The following

translation in this section is not realized for the English version. A good transplantation of the plain2 will solve this problem, I think. }

Title name:
- a string after "Title:" to the line end, and
- a region enclosed with "[ ]" or "< >".
Distribution:
- a string after "Distribution:" to the line end, and
- a whole line including "Mr. and Ms. " or "To:".
Owner:
- a string after "Owner:" or "By:" to the line end.
Section:
- a string after "Section:" to the line end, and
- a whole line including "Headquarter:", "Laboratory:" or "Factory:".
Address:
- a string after "Address:" to the line end, and
- a whole line including "Tel:", "Voice:", "Fax:" or "E-mail:".
Date:
- a string after "Date:" to the line end,
- a whole line including months ("Jan.", "Feb." etc.),
- a whole line including days ("Mon.", "Tue." etc.),
- a whole line including years ("89/", "90/", ..., "95/"), and
- a whole line including years ("/89", "/90", ..., "/95").

CAUTION: These patterns are strictly tested. All lines in the first block of the file must be classified as the above patterns. For instance, if there is a line including "Data:" which should be "Date:", this block is not detected as a title block.
CAUTION: An expression `a string after "xxx"' means a region defined by the following rules. This allows to contain spaces in a title block.

If there are double (or more) space characters in the line, the region is from the double space to the line end.
If there are no double (or more) space characters in the line, the region is from the line head or the first space to the line end.

Extension of title block

There are two extensions for a title block.
- Comment:
  - a line which begins with "comment".
- Option:
  - a line which begins with "option".
```
>>comment Copyright (C) 1991, NEC Corporation.
>>option -notable -exam=60
```
"option" is used to control the plain2 functionality for each file. The available options are for analysis parameters (2.2.). The scope of the options is the file which includes the option line. If the plain2 processes two or more input files, an option in one file does not affect the another file(s).

New page and blank line

The line which begins with "ctrl-L" code is regarded as a paging command.
Five or more blank lines are recognized as equivalent length of blanks.
Four or less blank lines are compressed by using the "-nospace" option.

Outputs

In order to get a same image with a screen image, special characters are outputted by quoted form.
For example, `\' is outputted as `\e' for `troff' and as `$\backslash$'
```
  for  `LaTeX'.
```
For embedding `troff' or `LaTeX' commands in the input files, there is an option to pass them. This is called as "raw mode" of the plain2. { The option may be "-raw", I think. }

Explicit control of plain2

If the plain2 does not work against your expectation, you can control explicitly the plain2 analysis by enclosing texts with "[[X" and "]]X". Here, `X' is one of the control characters as follows. "[[X" and "]]X" must be at the beginning of a line.
Four control characters are available:

E: Example block (program output, a simple figure, etc.)
T: Table
P: Picture with drawing characters
R: Raw mode, which passes the plain2 processing.
(used for inserting mark up command, etc.)
These controls cannot be nested. For instance, there should not be a string "]]E" in an example block.

Experimental functionality

In this chapter, there are functionalities implemented experimentally.
They may not work well in some cases, and they will be drastically changed in future.

Option to remove paging boundary

The option "-rmpage" removes new page code (ctrl-L).
Headers and footers near by ctrl-L are also removed.

Functionality to detect title of figure/table

The plain2 detects a title of a figure/table from a pattern, which is a sequence of the word as follows, a number, and a string (of title). The number in the output is renumbered continuously.
"Figure", "Fig.", "fig.", "Table", "HYOU" {a Kanji character (=table)}, and "ZU" {a Kanji character (=figure)}.
This functionality is incomplete for maintaining the figure/table numbers to be same with numbers in the text body.
Possibly, there may be a new page between the figure/table and its title.

In-line operation

By enclosing texts with "((x" and "))", the following in-line operations become available. Here, `x' is a control character. You must use the "-inline" option.
This functionality is limited in a single block of flat texts and in a beginning line of lists. It is not allowed in section titles, example blocks, and so on.

Foot note (f)

A foot note is defined by enclosing texts with "((f" and "))".

Raw mode (r)

An output in raw mode is given by enclosing texts with "((r" and "))". The raw mode texts are not quoted.

Reference (x)

Texts enclosed with "((x" and "))" are outputted as a reference, that is "\ref{...}" in `LaTeX'.
Using the "-full" option, a label of a figure/table is extended to the whole title. It enables a reference of figure/table number as follows.
```
>>      Figure 3. Structure of Starship
>>
>>As shown in the Figure ((x Structure of Starship)), ...
```
This functionality is not supported for `roff' outputs.

Font size designation

This functionality is under improvement.

Appendix

An appendix can be defined as a special case of section titles (see 3.1.).
Conditions of an appendix are similar with the section title.
```
  That is to say, if a line
```
- is not indented,
- has a string of "Appendix", "APPENDIX", or "FUROKU" {Kanji}, and
- is the beginning line of the file or comes after a blank line.


>>  Appendix        References
>>  APPENDIX A      Program lists

Renumbering

An option "-renum" enables renumbering of list/section numbers, which are numbered arbitrarily.

Local option for the software group of NEC C&C Lab.

{ This functionality depends on the local `troff'/`LaTeX' environment

used in the software group of NEC C&C (Computer and Communication) Lab. The translation is omitted. }

Distribution of the plain2

The plain2 is distributed "as is". There is no warranty or no support in operations of the plain2. No persons and no companies are responsible for the any defects caused by the plain2 and its usage.
Distribution of the plain2 is free, if the copyright is preserved.
There is no limitation for improving the plain2. However it is prohibited to distribute the improved plain2 without the permission of the author. If you want to distribute the improved version, contact to the author.
The author's address is "uchida@ccs.mt.nec.co.jp". You are welcome for sending improving reports and/or bug informations to this address.
{The translator's address is "kobayasi@pu-toyama.ac.jp". If you have some
questions on this English document, please contact to this address.}

"plain2 -README-" English version (original: Japanese) June 7, 1990 by Akihiro Uchida@NEC