"plain2 -README-" English version (original: Japanese)
June 7, 1990
by Akihiro Uchida@NEC
NOTICE: {...} is a comment from the translator.
{ Japanese character set is coded by 2-byte system, which is called
"Kanji code". Kanji code has mainly 3 kinds; JIS, Shift-JIS and EUC.
JIS is 7-bit code with escape sequence and others are 8-bit code.
The plain2 contains EUC code. }
How to install
- Do "make" in the directory in which source files exist.
- The plain2 assumes the EUC Kanji code environment. EUC is internally
used of the plain2 source files as a standard condition.
(Kanji I/O-code may be selected by an option.)
Some of the sources contains Kanji strings as constants.
If Kanji code is changed improperly, the plain2 will not work well.
Installation for System V
- The plain2 is distributed for 4.xBSD UNIX.
- When you compile on System V, delete the "-DBSD" flag of "CFLAGS"
in the Makefile.
- before:
CFLAGS= -DBSD -DDEBUG
- after:
CFLAGS= -DDEBUG
Installation for Shift-JIS environment
- You can select Shift-JIS as Kanji I/O-code by using "-s" option,
even if your plain2 is compiled under EUC environment.
- You can change the internal Kanji code as Shift-JIS. Then you can
edit and compile the source files for Shift-JIS system, like MS-DOS.
- To change the internal Kanji code as Shift-JIS:
- change codes of all source files from EUC to Shift-JIS.
- redefine the variable "INTERNAL_CODE" of "plain2.h" as "CODE_SJIS".
- CAUTION:
The compiler which passes Shift-JIS is required.
You cannot select EUC for Kanji I/O-code ,when the plain2 is
compiled under Shift-JIS environment.
NTT TeX and ASCII TeX
How to start plain2
% plain2 [option] [input-files ...]
- examples:
% plain2 -tex plain.pln > plain.tex
% plain2 -roff plain.pln > plain.t
Options for selection of output
-roff
selects `roff' outputs. "\" and "'" become "\e" and "\'".
"." as the line-head becomes "\&.".
The default output is `roff'.
-tex
selects `LaTeX' outputs. Special characters are quoted one by
one, i.e. "-", "=" and "-=" become ""$-$", "$=$" and "$-$$=$".
-texq
selects `LaTeX' outputs. With this option 1-byte alphabetic
characters are quoted by "verb||". This function is incomplete.
-raw
suppresses quoting.
-nospace
suppresses excessive blank lines.
-nopre
suppresses header pre-amples.
-hifi
keeps the input image as same as possible,
i.e. format of list numbers.
-index
outputs an index.
-acursec
outputs section numbers similarly with the input.
In default, sections are numbered continuously in spite of
casual numbering of the input file.
-j
outputs JIS code. The default code is EUC.
-s
assumes the input as Shift-JIS. Outputs become Shift-JIS.
{ The following options are extracted from the body of this document. }
-inline
permits in-line operations (see section 6.3.).
-full
extends reference label of figure/table (see subsection 6.3.3.).
-renum
only renumbers list/section numbers of input texts.
- CAUTION:
The above options may not be used in the option field of the
input files.
Changing of analysis parameters
- If you cannot get the outputs which differ from your expectation,
you can adjust outputs by changing of analysis parameters.
- The analysis parameters can be described in the option field of the
input files.
-table=ddd
adjusts a parameter for detecting a table.
The "ddd" range is from 0 to 100. The default is 50.
If a part is not outputted as a table against your expectation,
the parameter value should be increased a little.
-exam=ddd
adjusts a parameter for detecting an example block.
The "ddd" range is from 0 to 100. The default is 50.
If a part is not outputted as an example block against your
expectation, the parameter value should be increased a little.
-notable
suppresses using tables.
-nopic
suppresses using pictures (drawings).
-ktable
permits to use JIS special code (Keisen) to describe a table.
This is not recommended because Keisen code may conflicts with
detecting pictures.
-indsec
passes the indented section number. Normaly, section numbers
should be put at the beginning of the lines. This option is
for documents which are written without such care.
-rmpage
removes new page code (ctrl-L). Headers and footers near by
ctrl-L are also removed.
Miscellaneous options.
-v
displays verbosely under processing. This also warns the
uncontinuous section/list numbers.
-help
displays the simple help message.
Options via an environment variable
Embedding options in the input files
- You can set the analysis parameters in the option field of the input
files. (See 3.8.1)
Structure of input texts
- The plain2 analyzes the following text structure by using information
of indentation, blank line delimiters, marks in line heads, and so on.
- Title
- Section title
- Example block (program output, a simple figure, etc.)
- Citation
- Flat text
- List
- Item type
- Enumerate type
- Description type
- New page and blank line
- Table
- Picture with drawing characters
{ In this chapter, many examples are given using 2-byte Kanji characters.
Because it is useless for the English version, I omitted a part of the
contents. Sorry. }
Section titles
- A line is recognized as a section title, if it is featured as follows:
- is not indented,
- has a section number, and
- is the beginning line of the file or comes after a blank line.
A section title is not crossed by other structures.
- A section number is a string of numbers and a delimiter ".".
At the top level, the section number should be finished with ".".
At the 2nd (or more) level, the tail "." can be omitted, when proper
spaces are given after the number.
- In the following examples, "Gray" means a case which may be confused
with a list case described later. It ensures an exact analysis to put
a blank line before the section number.
>> 1 Mistake NG (A dot is required at the tail.)
>> 1. Confusion Gray (This may cause a confusion with a list.)
>> 1.1Sub Section NG (Space(s) are required after the number.)
>> 1.1 Sub Section Good (A dot can be omitted.)
>> 1.1.Sub Section Good (A dot can be omitted.)
- If the section number starts with not "1." but like "3.4" in the input
files (a case of a part of a huge document), the plain2 generates an
output which uses same section numbers. In other cases, the section
numbers in the inputs are not used exactly, but used section levels and
renumbered continuously (ignoring the discontinuous cases; "1." is
followed by "3.").
- To ensure the discontinuous section numbers, use "-v" option.
List
- There are three types of a list; item, enumerate and description.
Lists can be nested (but is limited of `troff' or `LaTeX').
- Other structures, i.e. flat texts or example blocks, are available
in a list.
- A list of item/enumerate type may be outputted by using different
head marks. For example, the input "(1)" is outputted as "1.".
If you want a similarly marked list in the output, use "-hifi" option.
Item type
- An item type is a list which begins with the following marks.
- Bullet: "o ", "* ", "+ "
- Dash: "-"
- `troff' distinguishes a bullet list and a dash list, but it regards
the bullet marks as a same kind. `LaTeX' regards different bullets
and dashes as a same kind.
Enumerate type
- An enumerate type is a list which is numbered with alpha-numeric
characters or roman numbers. It is described below how to distinguish
an enumerate type and a section number.
+--<left and right are parenthesis>--+ +--<numbers>--------+
| | | |
--+--<right is parenthesis>------------+--+--<alphabets>------+--
| | | |
+--<dot is in right>-----------------+ +--<roman numbers>--+
Syntax of enumerate type.
- Numbers must be followed by a dot or be enclosed by parenthesisses
from right or from both sides.
Example of enumerate type list.
- Because the above syntax rule conflicts with the section number
definition,
- the form of "2-byte characters + dot" is not available, and
- the form of "number + dot" may be analyzed as a section number.
- If the inputs contain no Kanji codes (alphabets only), the form of
"alphabet + dot" is not available.
- Similarly with the section number case, the enumerate type ignores
the exact values of alphabets or numbers but puts significance on
their format.
- In a sequence of lists, if the alphabets or numbers are reseted to
the first value ("A" or "1"), this line is regarded as a head of a
new list.
- Discontinuity of list values is warned by "-v" option.
- See the next example. Between "the second" and "the first again"
the list number is reseted to "1". Consequently "the first again"
is regarded as a beginning of an another list.
- The numbers are continuously changed from "2. the second again" to
"3) the third". However, they are managed as distinct lists, because
they have different formats.
>> 1) The first
>> 2) The second
>> 1. The first again
>> 2. The second again
>> 3) The third
>> 4) The fourth
- The first
- The second
- The first again
- The second again
- The third
- The fourth
Description type
- You can define a list of description type by enclosing the list
title with "[...]" or adding ":".
>> [item1] Description type A
>> [item2] Description type A
>> itemA: Description type B
- [item1]
Description type A
- [item2]
Description type A
- itemA:
Description type B
Line(s) after list
- Generaly speaking, it is not easy to make a judge whether the
line(s), which follows after a head line of a list, is a part
of the list or not.
- In the following example, the first "item type list" is not
continuous with the next line. But the second "enumerate type
list is" is continuing to the next line.
>> * Item type list
>> A list which begins with the following marks...
>> * Enumerate type list is
>> a list which is numbered with alpha-numeric...
- The complete judgement should be done from the semantical view.
- In the plain2, the head line of a list will continue to the next
line, when
- the head position of the next line is between the list mark
and the list body, and
- the head line of a list is longer than half of right margin.
>> * Because the next line is indented similarly with the
>> list body, this line is a part of the list.
>> * Because the next line is indented similarly with the
>> list mark, this line is also a part of the list.
>> * Because the next lines are
>> indented much more than the head line of a list,
>> these lines are analyzed as other texts.
>> * When the head line is
>> shorter than the next (this) line, newline code is added.
- It is a special case that a list consists of single lines.
The final line of the list does not continue to the next line,
even if the indentation is same with the list mark.
>> Do you know the words below?
>> minix: mini size unix for PC
>> xinu: xinu is a reversed image of unix
>> gnu: gnu is not unix!
>> These are concerning on a UNIX concept ...
Do you know the words below?
- minix:
mini size unix for PC
- xinu:
xinu is a reversed image of unix
- gnu:
gnu is not unix!
These are concerning on a UNIX concept ...
Judgement basis as list
>> The circular constant is
>> 3.14159...
>> From 3: 15 pm today, there is a presentation ...
Consequently, a single line, which seems to be an enumerate or
description type, does not compose a list.
Example block
>>The network is shown below. (This is a bad case.)
>>
>> to XXNet
>> |
>> +---+---+
>> | CISCO |
>> +---+---+
>> |
>> ---+----+----+----------+----------+---- 131.130.29
>> | | | |
>>+----++ +--+---+ +--+---+ +--+---+
>>|ns-in| |host-A| |host-B| |host-C|
>>+-----+ +------+ +------+ +------+
>>
>>This is connected with XXNet via 56Kbps private line.
An indentation (more than four spaces) of an example block solves
this problem.
>>The network is shown below. (This is a good case.)
>>
>> to XXNet
>> |
>> +---+---+
>> | CISCO |
>> +---+---+
>> |
>> ---+----+----+----------+----------+---- 131.130.29
>> | | | |
>> +----++ +--+---+ +--+---+ +--+---+
>> |ns-in| |host-A| |host-B| |host-C|
>> +-----+ +------+ +------+ +------+
>>
>>This is connected with XXNet via 56Kbps private line.
Picture with drawing characters
{ There are special characters in Kanji code to compose a drawing shape.
If a transplantation of the plain2 is properly done for non-Japanese
systems, the same logic of analyzing pictures may be applied to 8-bit
graphic characters. But I have no methods to express the shape of Kanji
characters which are used in this section. I have to omit to translate
this section. Sorry. }
Rules for connection among dash lines and thick lines
Special rules for slant lines
Ellipse (picture of floppy disk) and thick lines
Citation
- Citation from mail or news are not filled automatically but remained
as it is. The followings are conditions of citation.
- The text must not be indented.
- Every line of the text starts with ">>", and blank lines are
before/after the text.
- Every line of the text starts with ">" or ":", and the 2nd character
of the line is something (space, ":", and so on). This pattern
continues more than three lines.
Table
- The plain2 detects a table, when items are arranged periodically with
blank characters (SPC, TAB) or drawing characters ("-", "+", "|", "=").
- The next example is a table with blank characters.
>> A32 AirBus A320 166
>> B6 Boeing 767 270
>> F50 Fokker 50 56
>> M87 MD-87 134
>> YS YS-11 64
- The next is a table with drawing characters.
>> -----------------------------
>> | Symbol | Company |
>> |===========================|
>> | A32 | AirBus |
>> |--------+------------------|
>> | B6 | Boeing |
>> |--------+------------------|
>> | F50 | Fokker |
>> |--------+------------------|
>> | M87 | Macdonell Duglas |
>> -----------------------------
- The following example is a complicated table with drawing characters.
Because the plain2 has a limitation for analysis, I don't recommend
to make a complicated table.
>> -----------------------------------------
>> | Form | Feature || Symbol | Material |
>> |=======================================|
>> | Solid | || Au | Gold |
>> |--------| Metal ||--------+----------+
>> | | || Hg | Mercury |
>> | |---------||--------+----------+
>> | Liquid | Drink || H2O | Water |
>> | |---------||--------+----------+
>> | | || Br | Bromine |
>> |--------| Halogen ||--------+----------+
>> | Gas | || Cl | Chlorine |
>> |----------------------------+-----------
- For complete analysis of the plain2, a table with blank characters
requires more than two columns and two lines, and a table with drawing
characters requires more than three lines.
Flat text
- Patterns, which do not match with the above formats, classified as
flat texts. It is possible to indent the head line of the paragraph
with two or less space characters.
- The plain2 detects a border of a paragraph after a blank line(s) or
before an indented line.
>> This transcript is from the file A56-7W, classified top-secret,
>> subject is AIR WOLF. A Mach-1 plus attack helicopter with the most
>> advanced weapon system in the air today.
>> It has been hidden somewhere in the western United States by its
>> test pilot, Stringfellow Hawke. Hawke has promised return AIR WOLF
>> only if we can find his brother St. John, an M.I.A. in Vietnam.
>>
>> We suspect that Arch Angel, Deputy Director of the agency that built
>> AIR WOLF, is secretly helping Hawke in return for Hawke's flying AIR
>> WOLF on missons of national concern. (TV program: AIR WOLF)
Title block
- You can give a title block at the first block of a file.
- A title block is detected by the following patterns. The patterns are
ordered by means of superiority.
{ The patterns are using Japanese words with 2 byte codes. The following
translation in this section is not realized for the English version.
A good transplantation of the plain2 will solve this problem, I think. }
- Title name:
- a string after "Title:" to the line end, and
- a region enclosed with "[ ]" or "< >".
- Distribution:
- a string after "Distribution:" to the line end, and
- a whole line including "Mr. and Ms. " or "To:".
- Owner:
- a string after "Owner:" or "By:" to the line end.
- Section:
- a string after "Section:" to the line end, and
- a whole line including "Headquarter:", "Laboratory:" or
"Factory:".
- Address:
- a string after "Address:" to the line end, and
- a whole line including "Tel:", "Voice:", "Fax:" or "E-mail:".
- Date:
- a string after "Date:" to the line end,
- a whole line including months ("Jan.", "Feb." etc.),
- a whole line including days ("Mon.", "Tue." etc.),
- a whole line including years ("89/", "90/", ..., "95/"), and
- a whole line including years ("/89", "/90", ..., "/95").
- CAUTION:
These patterns are strictly tested. All lines in the first
block of the file must be classified as the above patterns.
For instance, if there is a line including "Data:" which should be
"Date:", this block is not detected as a title block.
- CAUTION:
An expression `a string after "xxx"' means a region defined by
the following rules. This allows to contain spaces in a title block.
- If there are double (or more) space characters in the line,
the region is from the double space to the line end.
- If there are no double (or more) space characters in the line,
the region is from the line head or the first space to the line end.
Extension of title block
New page and blank line
- The line which begins with "ctrl-L" code is regarded as a paging command.
- Five or more blank lines are recognized as equivalent length of blanks.
- Four or less blank lines are compressed by using the "-nospace" option.
Outputs
- In order to get a same image with a screen image, special characters
are outputted by quoted form.
- For example, `\' is outputted as `\e' for `troff' and as `$\backslash$'
for `LaTeX'.
- For embedding `troff' or `LaTeX' commands in the input files, there is
an option to pass them. This is called as "raw mode" of the plain2.
{ The option may be "-raw", I think. }
Explicit control of plain2
- If the plain2 does not work against your expectation, you can control
explicitly the plain2 analysis by enclosing texts with "[[X" and "]]X".
Here, `X' is one of the control characters as follows. "[[X" and "]]X"
must be at the beginning of a line.
- Four control characters are available:
- E:
Example block (program output, a simple figure, etc.)
- T:
Table
- P:
Picture with drawing characters
- R:
Raw mode, which passes the plain2 processing.
(used for inserting mark up command, etc.)
- These controls cannot be nested. For instance, there should not be
a string "]]E" in an example block.
Experimental functionality
- In this chapter, there are functionalities implemented experimentally.
- They may not work well in some cases, and they will be drastically
changed in future.
Option to remove paging boundary
- The option "-rmpage" removes new page code (ctrl-L).
- Headers and footers near by ctrl-L are also removed.
Functionality to detect title of figure/table
- The plain2 detects a title of a figure/table from a pattern, which
is a sequence of the word as follows, a number, and a string (of title).
The number in the output is renumbered continuously.
"Figure", "Fig.", "fig.", "Table",
"HYOU" {a Kanji character (=table)}, and
"ZU" {a Kanji character (=figure)}.
- This functionality is incomplete for maintaining the figure/table numbers
to be same with numbers in the text body.
- Possibly, there may be a new page between the figure/table and its title.
In-line operation
- By enclosing texts with "((x" and "))", the following in-line operations
become available. Here, `x' is a control character. You must use the
"-inline" option.
- This functionality is limited in a single block of flat texts and in a
beginning line of lists. It is not allowed in section titles, example
blocks, and so on.
Foot note (f)
- A foot note is defined by enclosing texts with "((f" and "))".
Raw mode (r)
- An output in raw mode is given by enclosing texts with "((r" and "))".
The raw mode texts are not quoted.
Reference (x)
Font size designation
- This functionality is under improvement.
Appendix
>> Appendix References
>> APPENDIX A Program lists
Renumbering
- An option "-renum" enables renumbering of list/section numbers, which
are numbered arbitrarily.
Local option for the software group of NEC C&C Lab.
{ This functionality depends on the local `troff'/`LaTeX' environment
used in the software group of NEC C&C (Computer and Communication) Lab.
The translation is omitted. }
Distribution of the plain2