\documenttype{article}
% Use gellmu-latex-faq
% This is GELLMU source for the didactic "article" document type.
% Revised from the submitted preprint
\surtitle{W. F. Hammond: Presentations: Bridge from LaTeX to XML}
\latexcommand{\bsl;hyphenation\{gell-mu mark-up new-com-mand\}}
\newcommand{\gellmu}{\abbr{GELLMU}}
\newcommand{\gnu}{\abbr{GNU}}
\newcommand{\info}{\softw{Info}}
\newcommand{\html}{\abbr{HTML}}
\newcommand{\pdf}{\abbr{PDF}}
\newcommand{\sgml}{\abbr{SGML}}
\newcommand{\self}{%
http://math.albany.edu:8000/math/pers/hammond/Presen/tug2001}
\newcommand{\xml}{\abbr{XML}}
\newcommand{\txi}{\softw{Texinfo}}
\newcommand{\href}[2]{\anch[href="#1"]{#2}}
\title{GELLMU: A Bridge for Authors\\
from \latex; to XML}
\subtitle{Presentation at TUG 2001, University of Delaware}
\author{William F. Hammond}
\address{Department of Mathematics \& Statistics\\
University at Albany\\
Albany, New York 12222 (USA)\\
\eaddr{hammond@math.albany.edu}\\
\urlanch{http://www.albany.edu/\tld;hammond/}
}
\date{August 2001\\
(slightly revised: January 2003)}
\nobanner
\begin{document}
\begin{abstract}
\gellmu, which stands for \quophrase{Generalized Extensible \latex;-Like
Markup}, is a system for using \latex;-like markup, though not \latex;
itself, to write consciously for a markup language in the \sgml category
or in its popular \xml subcategory.
The \emph{basic} level of \gellmu offers a way to use \latex;-Like notation
together with a \latex;-Like \emph{newcommand} (with arguments) macro
facility to write web pages.
The \emph{advanced} level of \gellmu enables one additionally to
incorporate certain \latex;-Like features, such as the use of a blank
line for a new paragraph, in writing for an \sgml language.
The didactic \gellmu production system provides an \quophrase{article}
\xml language, with some resemblance to \latex; itself, that is a
rigorous domain for translation to other formats.
\end{abstract}
\tableofcontents
\section{Author Level Markup}
Inasmuch as the World Wide Web is becoming an important library
resource, one wants one's publications to be accessible online, and one
wants web-crawling robots to be able to catalogue them properly.
Despite the popularity of Adobe's Portable Document Format (\pdf)
the distribution of \pdf reading software is not as widespread as
the distribution of web browsing software, and web-crawling robots
often do not scan the contents of \pdf documents.
What is available for the \latex author toward this end? More
specifically, consider the following situations:
\begin{description}
\item[Online publication archives]
Specifically, I would like to cite the \tex;/\latex;-based e-print archive
begun at Los Alamos in the early 1990's by Paul Ginsparg, now known as
\href{http://www.arxiv.org/}{\quophrase{Arxiv}}, which is now a
participant in the \href{http://www.openarchives.org/}{Open Archives
Initiative}. While in its early time the term \emph{e-print} was
understood to mean \quophrase{electronic pre-print}, ArXiv has more
recently become a repository for established journals including now,
for example, the highly regarded
\href{http://www.math.princeton.edu/\tld;annals/}{\emph{Annals of
Mathematics}}, which was founded in 1884 by Ormond Stone of the
University of Virginia, and Ginsparg now tells us that the term
\emph{e-print} denotes \quophrase{self-archiving by the author} under
a new overall academic publication\footnote{
Paul Ginsparg, \quophrase{Electronic Clones vs. the Global Research
Archive}, \urlanch{http://arXiv.org/blurb/pg00bmc.html}.
}
design.
\item[Course handouts]
How can a college teacher prepare course handouts for both paper and online
distribution? If the teacher writes \latex, some manual intervention will
likely be needed in order to obtain correct \html. If the teacher
writes \html, then the paper distribution\footnote{If the \html is correctly
written, then robust translation to \latex is possible.}
will be limited by what can be expressed in \html, which is not as rich
a markup as \latex.
\item[TUG articles]
Before preparing a TUG 2001 article an author is asked to read
\href{http://www.tug.org/TUGboat/Contents/contents20-4.html}{\emph{Preparation
of documents for multiple modes of delivery}}
by Ross Moore, which is available on the web only as a two-column
\pdf printed page image. From this
article one might conclude that carefully prepared \latex; may be
suitable for translation to \html although no \html version of the
article seems to be available.
\item[GNU documentation]
While working for TUG on the \tex; Directory System (\abbr{TDS}) guidelines
--- see
\href{http://ctan.tug.org/tex-archive/tds/standard/}{\path{/tds/standard/}}
at \abbr{CTAN} ---
in January 1998, Ulrik Vieth produced a \latex; document
and a tailored
program for translating that document into \txi, the language of the
\href{http://www.gnu.org/}{GNU} documentation system.
\txi is a \tex;-based system that pre-dates \html. Its original
purpose was to provide both print and (early online hypertext) \info
versions of \gnu software project documentation. When \html came
along, it was possible to provide fairly reliable translation from \txi
to \html because \txi is a well-structured markup. In fact, \txi is
very nearly equivalent to an \sgml language, and, Daniele Giacomini in
August 2000 came up with an effort in that direction:
\href{http://master.swlibero.org/\tld;daniele/software/sgmltexi/
}{\softw{Sgmltexi}}.
\end{description}
Although programs are available for translating carefully structured
\latex; into \html and sometimes into \xml extensions of \html, this
method of generating online content for the basic level of the web
sometimes requires manual intervention. A more direct approach to the
world of \sgml offers better prospects for long-term access to new web
formats without sacrificing access to the quality of print typesetting
that is available through \latex;.
\section{The basics of \emph{basic} \gellmu}
In looking over Vieth's set-up for the \abbr{TDS} document in the late
spring of 1998, I arrived at the idea of using \latex;-like notation
for conscious writing in document languages under \sgml and I have
written a program in the GNU Emacs Lisp language, the \gellmu
syntactic translator, for converting this \latex;-like markup to
\sgml markup.
The advantage of \sgml markup is that each markup language (formally
document type) under the \sgml umbrella constitutes a structured
domain for the application of automatic processors that are easy to
create under any of a number of structured processing frameworks. There
are frameworks accessible in standard computing languages, and there
is also a recent framework
\href{http://www.dcarlisle.demon.co.uk/xmltex/}{\softw{xmltex}} by
David Carlisle for writing \tex; typesetting routines for \xml
document types.
The root idea in using \latex;-like markup for the conscious writing of
markup under \sgml is the simple syntactic correspondence between
markup such as
\begin{verbatim}
some \em{emphasized} text
\end{verbatim}
on the one hand, and the markup
\begin{verbatim}
some emphasized text
\end{verbatim}
on the other.
Most \latex; commands are analogous to \sgml elements. Moreover, the
attribute list associated with an \sgml element can be made to correspond
with a \latex; command option. For example,
\begin{verbatim}
\a[href="http://foo.dom/"
]{The Foo Domain}
\end{verbatim}
matches
\begin{verbatim}
The Foo Domain
\end{verbatim}
\section{Basic \gellmu enhanced with \emph{\bsl;newcommand}}
The idea of using \latex;-like syntax for conscious writing under an
\sgml document type gains power when one realizes that although
the notion of \sgml entity provides, among other things, simple macro
expansions, there is no provision under \sgml for macros that take
arguments. Moreover, there is no obvious method of extending \sgml
systems to accommodate macros with arguments apart from the idea
of extending a document type\footnote{
While document type extensions require enough work that they cannot be
spontaneous, they provide a sound way to avoid the tangles that can
arise working with \tex; or \latex; when attempting the simulataneous
use of conflicting macro packages.
}.
\gellmu provides a \latex;-like meta-command\footnote{
In \gellmu while a \emph{command} corresponds to an \sgml element, a
\emph{meta-command} is something having the same syntax
as a command that does not correspond to an \sgml element and instead
receives resolution into other \sgml markup under the syntactic translator.
} called \emph{newcommand} that may be invoked with arguments.
For example, if one writes
\begin{verbatim}
\newcommand{\afoo}[2]{%
\a[href="http://www.foo.org/#1"]{#2}}
\end{verbatim}
then a subsequent invocation
\begin{verbatim}
\afoo{tex-archive/tds/}{TDS at Foo}
\end{verbatim}
will yield (without line breaks):\footnote{
The syntactic translator maintains line number alignment between its
input and its \sgml output so that line numbers used by \sgml
parsers in flagging errors match those in source markup.
}
\begin{verbatim}
TDS at Foo
\end{verbatim}
This \emph{newcommand} markup differs from that of \latex; in that it
is classical macro substitution rather than vocabulary expansion.
Since the syntax of a \emph{newcommand} invocation is very similar to
that of an \sgml element, the use of \emph{newcommand} can, apart
from its on-the-fly convenience, be a help in the development of
\sgml document type extensions. A new name in a test document can be
moved from being that of a macro to being that of an element simply
with the removal of a \emph{newcommand} definition.
\section{\sgml vs. \xml}
Basic \gellmu as enhanced by its macro facility is as far as one can
sensibly go toward conscious writing under a language in the
restricted subfamily of \sgml document types known as \xml.
From one viewpoint the differences between \sgml and \xml are not
very important since most correct documents under the larger category
can, if correct, be automatically translated into equivalent documents
under \xml.
For example, classical \html that passes validation can be translated
into the newer \xml form of \html using either James Clark's classical
\sgml library
\href{http://www.jclark.com/sp/}{\softw{SP}} or Dave Raggett's program
\href{http://www.w3.org/People/Raggett/tidy/}{\softw{tidy}}.
However, the rules for \xml were designed to make things easy for
processors rather than for humans, and for that reason an author
writing toward an ultimate \xml document type usually is well-advised
to write for a version of the document type under more author-friendly
\sgml rules.
For example, if in an \sgml language a forced linebreak is represented
by the defined-empty element \emph{brk}, then the markup
\quostr{} is sufficient, whereas under the more restrictive \xml
version of the same language, either the markup \quostr{}
or its abbreviated form \quostr{}\footnote{
Moreover, some confusion may arise from the fact that under the \sgml
syntax (formally \sgml declaration) specified for \html neither of these
\xml forms of markup would be permitted.
}
must be used. For \gellmu this means that the markup \quostr{\bsl;brk}
and the markup \quostr{\bsl;brk\{\}} are interchangeable under \sgml,
except for the case of \quostr{\bsl;brk} abutting a following
character without intervening whitespace, but not equivalent under
\xml. \gellmu provides the form \quostr{\bsl;brk;} to represent the
abbreviated form \quostr{} of an element that is defined as
empty.
\section{Advanced \gellmu}
Basic \gellmu deals with markup languages more or less at the level
of syntax without getting to the level of grammar.
Advanced \gellmu may be used to roll language-independent grammatical
concepts into the picture.
The first of these is \latex;-like multiple argment/option syntax.
For example under advanced \gellmu
the markup
\begin{verbatim}
\frac{a x + b}{c x + d}
\end{verbatim}
is converted in syntactic translation to
\begin{verbatim}
a x + bc x + d
\end{verbatim}
That is, a chain of \quochar{\{}, \quochar{\}} pairs and
\quochar{\lsb}, \quochar{\rsb} pairs following a command without
intervening white space between the command name and the first
delimiter nor between a close delimiter and the next open delimiter in
the chain, constitutes an \sgml element whose content begins with a
sequence of generic positional arguments (tag name \emph{ag0}) and
options (tag name \emph{op0}). Without knowledge of the document
type it cannot be determined if a name used with multiple argument/option
syntax has only \emph{ag0}, \emph{op0} content. The syntactic translator
provides a list variable consisting of names that have only this type
of content and that, therefore should be given close tags after the
sequence of arguments and options. Absent that, the author must provide
a close tag unless an \sgml parser can infer it, and even in that case,
if the element can appear in the mixed content model of another element
such as, for example, a paragraph, then the parser's automatic placement
of a close tag could lead to the unwanted collapse of a word boundary
similar to that which occurs in \latex; when an author's careless markup
\display{|\TeX benchmark|} gets typeset as \quophrase{\tex;benchmark}
instead of as \quophrase{\tex; benchmark}.
If multiple argument/option syntax is used, then there is ambiguity
on the nature of the first pair of chained delimiters if it is the
pair \quochar{\lsb}, \quochar{\rsb} --- whether it represents an
\emph{op0} or an attribute list. Therefore, in this case it is
required that it is an attribute list if the first character after its
\quochar{\lsb} opening delimiter is a colon (\quochar{:}).
In basic \gellmu the following four of the ten \latex;-special characters
are special:
\display{\quostr{\bsl\ \ \lbr\ \ \rbr\ \ \pct}\ .}
Additionally, the character \quochar{\hsh} is special when used in the
definition of a \emph{newcommand}, the characters \quochar{\lsb} and
\quochar{\rsb} are special when used for \latex;-like option syntax,
and the character \quochar{\amp} is special when followed immediately
by a letter since then it is the introducer for \sgml entity
invocations.
Advanced \gellmu provides for the possibility of giving traditional \latex;
meaning to \quochar{\amp} when not followed immediately by a letter
and also to the other four \latex;-special characters, which are
\display{\quostr{\und;\ \ \crt;\ \ \dol;\ \ \tld;}\ .}
Additionally, it provides for the possibility of giving traditional
\latex;-like meaning to other short forms of markup such as
\qquostr{\bsl;( . . . \bsl;)} for inline math,
\qquostr{\bsl;\lsb; . . . \bsl;\rsb;} for displayed math,
\qquostr{\hyp;\hyp;} for a range-dash,
\qquostr{\hyp;\hyp;\hyp;} for a punctuation-dash,
\qquostr{\bsl;\spc;} for an inter-word space,
\qquostr{\bsl;,} for a short space, and others including also,
if desired, the use of blank lines, as appropriate, for paragraph
boundaries.
\section{The \gellmu didactic production system}
The conversion of both \emph{basic} and \emph{advanced} \gellmu source
markup to \sgml is performed by my program called the \emph{syntactic
translator}.
If one wishes to write consciously for a public document type such
as \html or the
\href[http://www.tei-c.org]{Text Encoding Initiative}'s \abbr{TEI}
using \gellmu's \latex;-like syntax, the syntactic translator is the
only part of \gellmu that will be of interest.
The optional features of advanced \gellmu described above can only be
used when one is writing for a document type that provides markup
in which the corresponding concepts have representation. For example,
\latex;-like use of the character \quochar{\tld;} for non-breaking
space requires a markup that provides non-breaking space.
Moreover, if blank lines are going to be paragraph boundaries, then
the syntactic translator will need a list of element names before
which a new paragraph does not make sense, and, since there is no
separate provision for a list of names after which a paragraph must
end, the document type cannot be \xml.
The \gellmu didactic production system provides such a document type
and also provides tools, which can be used as inter-changeable
components, for working with that document type.
The didactic production system consists of the syntactic translator
and the following additional components:
\begin{enumerate}
\item An \sgml document type called \quophrase{article}.
\item Its \xml cousin, also called \quophrase{article}.
\item A program for translating \sgml article to \xml article.
\item A program for translating \xml article to \html.
\item A program for translating \xml article to \latex.
\end{enumerate}
The document type is intended to be comfortable for authors with
past experience in \latex;. The document type and the components
are didactic. They are intended to illustrate how such a system
can be assembled from inter-changeable components. They are not
finished in any sense, and each has shortcomings.
They do serve, I hope, to demonstrate to the community of \latex;
authors that it will find no limitations in this approach to
document production.
At the same time it is intended to provide a whole new way of thinking
about the subjects of package design and class design.
Its unfinished nature is intended to make it relatively easy for those
who are so inclined to move in various ways to finish such a system
that fits their needs.
\section{Production of this Document}
This document and the slides used during its presentation were
prepared with the \gellmu didactic production system. Pre-publication
versions of the sources and various automatic formattings are
available in \href{\self/}{the author's web}.
Subsequent to the \gellmu run on this document a copy of its \latex
output was manually modified for conformance with TUG guidelines. If I
were going to submit a number of such TUG articles, it would be
worthwhile to make another variant of the \latex formatter for TUG.
\end{document}