Paul DuBois
dubois@primate.wisc.edu
Wisconsin Regional Primate Research Center
Revision date: 20 May 1997
tblcvt reads troff input and converts the tbl-related parts to a format that troffcvt can understand more easily than raw tbl output.
This document describes tblcvt ("tbl convert"),
a program that assists the process of using troffcvt to
convert troff documents into other formats. It's assumed
here that you're familiar with tbl. If you don't have the
standard tbl documentation (Tbl - A Program to Format
Tables, by M. E. Lesk), check the archive site from which
you obtained the troffcvt distribution.
tblcvt exists because tables written in the tbl
input language present a problem for troffcvt. troffcvt
understands only the troff language and knows nothing of
the tbl language, so input files containing tables need
to be run through some sort of preprocessor before being given
to troffcvt. In theory, you could run your troff
files through tbl (since tbl generates output written
in troff), and feed the result to troffcvt for processing.
In practice, tbl output is generally arcane and incomprehensible,
and troffcvt doesn't do a very good job with it. The purpose
of tblcvt is to convert the parts of troff input
files that are intended for tbl into something that's easier
for troffcvt to understand. This makes it more likely that
troffcvt will generate output that its postprocessors will
be able to put back together into something that looks like a
table in the target format. Not every table will look great, but
any tables in this document are simple enough that they should
appear reasonably good if the document is formatted with troff2html,
troff2rtf, or unroff.
tblcvt is intended as a drop-in replacement for tbl.
Suppose you'd normally format a document using a command like
this:
% tbl file ... | troff [options]The analogous command using tblcvt and troffcvt looks something like this:
% tblcvt file ... | troffcvt [options] | postprocessorOr, if you use one of the front ends like troff2html that invoke troffcvt and the appropriate postprocessor for you, the command might look like this:
% tblcvt file ... | troff2html [options]If it seems that troffcvt or a front end are not reading the output from tblcvt, specify - after the option list to explicitly tell them to read the standard input after processing their other options:
% tblcvt file ... | troffcvt [options] - | postprocessor % tblcvt file ... | troff2html [options] -
tblcvt ignores its input except for those parts between
corresponding pairs of .TS (table start) and .TE
(table end) requests. For each table, tblcvt digests its
specification, figures out the table structure, and produces troff
output that indicates the structure using a special set of requests.
The output format has the property that it explicitly indicates
the beginning and end of each table, each row within a table,
and each cell within a row. The general form of table information
written by tblcvt looks like this:
.T*table*begin [table options] .T*column*info [column 1 options] ...options for remaining columns... .T*row*begin .T*cell*info [cell layout options] ...options for remaining cells in row... .T*cell*begin [cell formatting options] ...cell contents... .T*cell*end ...remaining cells in row... .T*row*end ...remaining rows in table... .T*table*endShortcut requests are used in certain circumstances. If a cell is empty, tblcvt writes the single request .T*empty*cell rather than .T*cell*begin, .T*cell*end, and the cell data between them. Similarly, if a cell of the table matrix is part of the area spanned by an earlier cell, tblcvt writes .T*spanned*cell. If an entire row consists of a table-width line, tblcvt writes the single request .T*row*line rather than .T*row*begin, .T*row*line, and the cell information between then.
Note that since tblcvt output uses long request names,
you can't use compatibility mode (-C option) with troffcvt
or a troffcvt front end.
Each table begins with a .T*table*begin request, which
has the following form:
.T*table*begin rows cols header-rows align expand box allbox doubleboxrows and cols are the number of rows and columns in the table. (A row that draws a line is considered a data row.)
For tables that are specified to have a header (using .TS
H and .TH), tblcvt writes a non-zero value
for the header-rows value. Otherwise header-rows
is 0.
align is L or C to indicate the table is
left-justified or centered.
expand is y if the table is expanded to the full
line width, n otherwise.
The box, allbox, and doublebox values are
each y or n, depending on whether or not box,
allbox, and doublebox were given in the table specification.
(Note that allbox and doublebox both imply box.)
Each table is terminated by a .T*table*end request.
Following the .T*table*begin request, tblcvt writes
one .T*column*info line for each column of the table, in
the format:
.T*column*info width sep equalThe column number is not specified; .T*column*info lines will appear in consecutive order.
width is the minimum required width of the column. The
value is non-zero if any entry in the given column specified a
w option. If more than one entry specified w, the
last one is used. If width is 0, no entry in the column
specified w and the width is determined from the data values
in the column.
sep is the column separation value.
The equal value is y if any entry in the column
specified the e option, and n otherwise. All columns
with an equal value of y should be made the same
width.
If a table row does not consist of a table-width line, the row
begins and ends with .T*row*begin and .T*row*end
requests. Information for the individual cells is written between
these two requests (see "Cell Information
Requests").
If a row consists of a table-width single or double line, the
.T*row*begin and .T*row*end requests are not used.
Instead, the row is specified completely by a single .T*row*line
request, written using one of the following forms:
.T*row*line 1 Table-width single line .T*row*line 2 Table-width double line
Between each pair of .T*row*begin and .T*row*end
requests, tblcvt writes out the information for each cell
(column) in the row. First a set of .T*cell*info lines
is written, one for each cell. These requests provide basic layout
parameters. Then the contents of the cells are written. For the
usual case, a cell is written using .T*cell*begin and .T*cell*end
requests, with the cell data appearing between the requests. Empty,
spanned, or line-drawing cells are written using .T*empty*cell,
.T*spanned*cell, and .T*cell*line requests.
This means that cells begin with any of .T*cell*begin,
.T*empty*cell, .T*spanned*cell, or .T*cell*line,
and end with any of .T*cell*end, .T*empty*cell,
.T*spanned*cell, or .T*cell*line.
The .T*cell*info request has the following form:
.T*cell*info type vspan hspan vadjust borderThe column number of the cell is not specified; .T*cell*info lines will appear in consecutive order.
type is the cell type:
L Left-justified R Right-justified C Centered N Numeric (align to decimal point) A Alphanumericvspan and hspan are the number of rows and columns spanned by the cell, including itself. Interpret these values as follows:
|
hspan = 0 | hspan > 0 |
vspan = 0 | spanned both ways | spanned from above |
vspan > 0 | spanned from left | not spanned |
border is the border value. If the value is 0, there is
no border. Otherwise, the value is a bitmap with the following
fields:
Bits Value Meaning 0-1 1 Left border, single line 2 Left border, double line 2-3 1 Right border, single line 2 Right border, double line 4-5 1 Top border, single line 2 Top border, double line 6-7 1 Bottom border, single line 2 Bottom border, double line
The .T*cell*begin request has the following form:
.T*cell*begin font ptsize vspacefont is the font to use for formatting the cell, 0 if no font was specified.
ptsize is the point size to use for formatting the cell,
0 if no size was specified.
vspace is the vertical spacing to use for formatting the
cell, 0 if no spacing was specified.
The .T*cell*end request has no arguments:
.T*cell*endIf a cell is empty or spanned or draws a line, the .T*cell*begin and .T*cell*end requests are not used. Instead, the cell is specified using one of the following requests:
.T*cell*begin .T*cell*endexcept that it's not necessary to scan ahead to the second request to find out that the cell is empty.
.T*cell*line 0 Column-data-width single line .T*cell*line 1 Column-width single line .T*cell*line 2 Column-width double lineA column-data-width line is a single line as wide as the contents of the column. It does not extend the full width of the column. This type of cell results from a \_ data value in the table specification.
The .T*xxx requests are defined in the default
actions file that troffcvt reads when it starts
up. The actions for the requests cause troffcvt to perform
a relatively simple mapping:
tblcvt output troffcvt output .T*table*begin arguments \table-begin arguments .T*table*end \table-end .T*column*info arguments \table-column-info arguments .T*row*begin \table-row-begin .T*row*end \table-row-end .T*cell*info arguments \table-cell-info arguments .T*cell*begin arguments \table-cell-begin .T*cell*end \table-cell-end .T*row*line argument \table-row-line argument .T*cell*line argument \table-cell-line argument .T*spanned*cell \table-spanned-cell .T*empty*cell \table-empty-cellWhen a request written by tblcvt has arguments, the corresponding control written by troffcvt is written with arguments that are similar to, but not necessarily exactly the same. The primary exception is that the font, ptsize, and vspace arguments to .T*cell*begin are converted directly by the troffcvt actions file into font and size troff directives, then translated into the troffcvt intermediate language. The font and size controls appear in troffcvt output immediately following the \table-cell-begin control. See troffcvt Output Format and PostProcessor Writing for the exact format of the \table- controls.
In addition to the .T*xxx request names used
by tblcvt, troffcvt uses the register names T*cell*ft,
T*cell*ps, and T*cell*vs for internal purposes.
Table specifications may indicate that a table element spans multiple
rows or columns, or both. However, not all spanning specifications
are legal, and tblcvt tries to catch those that are malformed.
The spanning constraints enforced by tblcvt are:
.TS .TS s . ^ . data data .TE .TEAssuming the first two constraints are satisfied, the smallest illegal table specifications that include spans are shown below (l is used here, but any non-spanning column type may be substituted):
.TS .TS l l l s ^ s . l ^ . data data .TE .TEThe first table is illegal by the following reasoning. The cells in the first column form a single vertically-spanned element. The second column could be part of that element if both cells spanned to the left, since the resulting spanned area would be rectangular. However, since only one of the cells spans to the left, the spanned area is L-shaped, which is illegal. The second table is illegal by similar reasoning. The top two cells form a single element. The bottom two cells could be part of that element if they both spanned upward, but only one of them does.
tblcvt uses the strategy outlined below to determine the
extent of spanned elements and to discover non-rectangularies
in cell spanning. The strategy works by operating on a matrix
with one column for each column specified in the format section
of the table specification, and one row for each row of table
data given in the data section of the specification. Working from
left to right and top to bottom, each cell of the matrix is visited
and the following checks are applied:
Example 1: The table shown below is illegal.
.TS l s s l ^ s s s . data .TEBeginning at the upper left, we see that the vertical and horizontal spans are 2 and 3. The remaining cells in this 2 x 3 block are the second and third cells in the second row. They both span into the first cell of the second row, so they are part of the span block. Therefore, the upper 2 x 3 block is okay. The next unvisited cell is the fourth cell in the first row. This cell is a standalone cell, so it's okay. The last unvisited cell is the fourth cell of the second row. This cell is a spanned cell, but it can't span into the block to the left without forming a non-rectangular block. The table specification is bad.
Example 2: Here's a table that appears at first glance
as though it may be illegal. Is it?
.TS l s s s ^ s ^ ^ ^ ^ s s . data .TEBeginning at the upper left, we see that the vertical and horizontal spans are 3 and 4. This means 12 cells should be in the span block. We know that the three s cells to the right of the corner cell and the two ^ cells below the corner cell are part of the block, so the next step is to examine the remaining 2 x 3 block at the lower right. In the second row of the block, the second cell spans left into the first column (and is thus part of the span block), and the third and fourth cells span up into the first row (and are thus part of the span block). In the third row, the second cell spans up into the second row (and is thus part of the span block since that row has already been determined to be part of the block), and the third and fourth cells span into the third cell (which, since that cell has just been determined to be part of the block, makes the last two cells part of the block as well).
Therefore, in spite of its unusual specification, the table is
legal. It consists of a single 3 x 4 spanned entry.
Example 3: Span calculations are performed with separate
matrices for vertical and horizontal spans that initially assume
all spans are 1. Suppose we have a table specification that looks
like this:
.TS l s l l s l ^ s l . a1 a2 b1 b2 c d .TEThere are three format columns. There are three format rows but four data rows, so the last format line is used for the third and fourth data rows. The vertical and horizontal span matrices are 4 x 3, and start out like this:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1After calculating spans, the matrices end up like this:
1 1 1 2 0 1 3 3 1 2 0 1 0 0 1 2 0 1 0 0 1 2 0 1