% LaTeX
\documentclass[leqno]{article}
\usepackage{url}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsfonts}
\usepackage{bm}
\usepackage{gellmu}
\usepackage[margin=100bp,nohead]{geometry}
\setlength{\parskip}{6bp}
\setlength{\parindent}{0bp}
\thispagestyle{empty}
\hyphenation{gell-mu mark-up new-com-mand}
\title{GELLMU: A Bridge for Authors\\[0.25\baselineskip] from \LaTeX{} to XML}
\date{August 2001 \\{} (slightly revised: January 2003)}
\newlength{\centerskip}
\setlength{\centerskip}{\topsep}
\newcommand{\hsf}{\hspace*{\fill}}
\newcommand{\tdbc}[1]{\hsf\textbf{#1}\hsf}
\newenvironment{menulist}{
\begin{list}{}{
\setlength{\topsep}{0bp}
\setlength{\labelwidth}{0.03\linewidth}
\setlength{\leftmargin}{0.06\linewidth}
\setlength{\itemindent}{0bp}
\setlength{\itemsep}{-6bp}
\setlength{\parsep}{6bp}}
}{\end{list}}
\newenvironment{Menulist}{
\begin{list}{}{
\setlength{\topsep}{0bp}
\setlength{\labelwidth}{0.03\linewidth}
\setlength{\leftmargin}{0.06\linewidth}
\setlength{\itemindent}{0bp}
\setlength{\itemsep}{3bp}
\setlength{\parsep}{6bp}}
}{\end{list}}
\newenvironment{toclist}{\normalsize
\begin{list}{}{
}}{\end{list}}
\newenvironment{Toclist}{\large
\begin{list}{}{
}}{\end{list}}
\newenvironment{citations}{
\begin{list}{}{
\setlength{\topsep}{0bp}
\setlength{\labelwidth}{0bp}
\setlength{\leftmargin}{0.04\linewidth}
\setlength{\labelsep}{0bp}
\setlength{\itemindent}{-0.2\leftmargin}
\setlength{\itemsep}{3bp}
\setlength{\parsep}{0bp}}
}{\end{list}}
\author{William F. Hammond\\{}Department of Mathematics \& Statistics \\{} University at Albany \\{} Albany, New York 12222 (USA) \\{} \texttt{hammond@math.albany.edu} \\{} \url{http://www.albany.edu/~hammond/}}
\begin{document}
\begin{center}\LARGE\bfseries{}
GELLMU: A Bridge for Authors\\[0.25\baselineskip] from \LaTeX{} to XML
\end{center}
\begin{center}\large\bfseries{}
Presentation at TUG 2001, University of Delaware
\end{center}
\begin{center}\Large\bfseries{}
\textsl{William F. Hammond}
\end{center}
\begin{center}\large
Department of Mathematics \& Statistics \\{} University at Albany \\{} Albany, New York 12222 (USA) \\{} \texttt{hammond@math.albany.edu} \\{} \url{http://www.albany.edu/~hammond/}
\end{center}
\begin{center}
\large\bfseries{}
August 2001 \\{} (slightly revised: January 2003)
\end{center}
\medskip
\begin{abstract}
\par{\textsc{GELLMU}, which stands for ``Generalized Extensible \LaTeX{}-Like %
Markup'', is a system for using \LaTeX{}-like markup, though not \LaTeX{} %
itself, to write consciously for a markup language in the \textsc{SGML} category %
or in its popular \textsc{XML} subcategory. \
}
\par{The \emph{basic} level of \textsc{GELLMU} offers a way to use \LaTeX{}-Like notation %
together with a \LaTeX{}-Like \emph{newcommand} (with arguments) macro %
facility to write web pages. \
}
\par{The \emph{advanced} level of \textsc{GELLMU} enables one additionally to %
incorporate certain \LaTeX{}-Like features, such as the use of a blank %
line for a new paragraph, in writing for an \textsc{SGML} language. \
}
\par{The didactic \textsc{GELLMU} production system provides an ``article'' %
\textsc{XML} language, with some resemblance to \LaTeX{} itself, that is a %
rigorous domain for translation to other formats. \ %
}
\end{abstract}
\section*{Table of Contents}
\begin{Toclist}
\item[]{1\ \ Author Level Markup\dotfill{}~\pageref{SU-1}}
\item[]{2\ \ The basics of \emph{basic} \textsc{GELLMU}\dotfill{}~\pageref{SU-2}}
\item[]{3\ \ Basic \textsc{GELLMU} enhanced with \emph{\textbackslash{}newcommand}\dotfill{}~\pageref{SU-3}}
\item[]{4\ \ \textsc{SGML} vs. \textsc{XML}\dotfill{}~\pageref{SU-4}}
\item[]{5\ \ Advanced \textsc{GELLMU}\dotfill{}~\pageref{SU-5}}
\item[]{6\ \ The \textsc{GELLMU} didactic production system\dotfill{}~\pageref{SU-6}}
\item[]{7\ \ Production of this Document\dotfill{}~\pageref{SU-7}}
\end{Toclist}
\section*{1\ \ \label{SU-1}Author Level Markup}
\par{Inasmuch as the World Wide Web is becoming an important library %
resource, one wants one's publications to be accessible online, and one %
wants web-crawling robots to be able to catalogue them properly. \
}
\par{Despite the popularity of Adobe's Portable Document Format (\textsc{PDF}) %
the distribution of \textsc{PDF} reading software is not as widespread as %
the distribution of web browsing software, and web-crawling robots %
often do not scan the contents of \textsc{PDF} documents. \
}
\par{What is available for the \LaTeX{} author toward this end? \ More %
specifically, consider the following situations: %
%
\begin{description}
\item[{Online publication archives}]
Specifically, I would like to cite the \TeX{}/\LaTeX{}-based e-print archive %
begun at Los Alamos in the early 1990's by Paul Ginsparg, now known as %
``Arxiv''\footnote{URI: http://www.arxiv.org/}, which is now a %
participant in the Open Archives %
Initiative\footnote{URI: http://www.openarchives.org/}. \ While in its early time the term \emph{e-print} was %
understood to mean ``electronic pre-print'', ArXiv has more %
recently become a repository for established journals including now, %
for example, the highly regarded %
\emph{Annals of %
Mathematics}\footnote{URI: http://www.math.princeton.edu/\textasciitilde{}annals/}, which was founded in 1884 by Ormond Stone of the %
University of Virginia, and Ginsparg now tells us that the term %
\emph{e-print} denotes ``self-archiving by the author'' under %
a new overall academic publication\footnote{Paul Ginsparg, ``Electronic Clones vs. the Global Research %
Archive'', \url{http://arXiv.org/blurb/pg00bmc.html}. \ } %
design. \ %
\item[{Course handouts}]
How can a college teacher prepare course handouts for both paper and online %
distribution? \ If the teacher writes \LaTeX{}, some manual intervention will %
likely be needed in order to obtain correct \textsc{HTML}. \ If the teacher %
writes \textsc{HTML}, then the paper distribution\footnote{If the \textsc{HTML} is correctly %
written, then robust translation to \LaTeX{} is possible.} %
will be limited by what can be expressed in \textsc{HTML}, which is not as rich %
a markup as \LaTeX{}. \ %
\item[{TUG articles}]
Before preparing a TUG 2001 article an author is asked to read %
\emph{Preparation %
of documents for multiple modes of delivery}\footnote{URI: http://www.tug.org/TUGboat/Contents/contents20-4.html} %
by Ross Moore, which is available on the web only as a two-column %
\textsc{PDF} printed page image. \ From this %
article one might conclude that carefully prepared \LaTeX{} may be %
suitable for translation to \textsc{HTML} although no \textsc{HTML} version of the %
article seems to be available. \ %
\item[{GNU documentation}]
While working for TUG on the \TeX{} Directory System (\textsc{TDS}) guidelines %
--- see %
\url{/tds/standard/}\footnote{URI: http://ctan.tug.org/tex-archive/tds/standard/} %
at \textsc{CTAN} --- %
in January 1998, Ulrik Vieth produced a \LaTeX{} document %
and a tailored %
program for translating that document into \textsl{Texinfo}, the language of the %
GNU\footnote{URI: http://www.gnu.org/} documentation system. \ %
\par{\textsl{Texinfo} is a \TeX{}-based system that pre-dates \textsc{HTML}. \ Its original %
purpose was to provide both print and (early online hypertext) \textsl{Info} %
versions of \textsc{GNU} software project documentation. \ When \textsc{HTML} came %
along, it was possible to provide fairly reliable translation from \textsl{Texinfo} %
to \textsc{HTML} because \textsl{Texinfo} is a well-structured markup. \ In fact, \textsl{Texinfo} is %
very nearly equivalent to an \textsc{SGML} language, and, Daniele Giacomini in %
August 2000 came up with an effort in that direction: %
\textsl{Sgmltexi}\footnote{URI: http://master.swlibero.org/\textasciitilde{}daniele/software/sgmltexi/ }. \
}
\end{description}
}
\par{Although programs are available for translating carefully structured %
\LaTeX{} into \textsc{HTML} and sometimes into \textsc{XML} extensions of \textsc{HTML}, this %
method of generating online content for the basic level of the web %
sometimes requires manual intervention. \ A more direct approach to the %
world of \textsc{SGML} offers better prospects for long-term access to new web %
formats without sacrificing access to the quality of print typesetting %
that is available through \LaTeX{}. \ %
}
\section*{2\ \ \label{SU-2}The basics of \emph{basic} GELLMU}
\par{In looking over Vieth's set-up for the \textsc{TDS} document in the late %
spring of 1998, I arrived at the idea of using \LaTeX{}-like notation %
for conscious writing in document languages under \textsc{SGML} and I have %
written a program in the GNU Emacs Lisp language, the \textsc{GELLMU} %
syntactic translator, for converting this \LaTeX{}-like markup to %
\textsc{SGML} markup. \
}
\par{The advantage of \textsc{SGML} markup is that each markup language (formally %
document type) under the \textsc{SGML} umbrella constitutes a structured %
domain for the application of automatic processors that are easy to %
create under any of a number of structured processing frameworks. \ There %
are frameworks accessible in standard computing languages, and there %
is also a recent framework %
\textsl{xmltex}\footnote{URI: http://www.dcarlisle.demon.co.uk/xmltex/} by %
David Carlisle for writing \TeX{} typesetting routines for \textsc{XML} %
document types. \
}
\par{The root idea in using \LaTeX{}-like markup for the conscious writing of %
markup under \textsc{SGML} is the simple syntactic correspondence between %
markup such as %
\begin{menulist}
\item\texttt{some\ }\verb+\+\texttt{em}\verb+{+\texttt{emphasized}\verb+}+\texttt{\ text}
\end{menulist} %
on the one hand, and the markup %
\begin{menulist}
\item\texttt{some\ \stringemphasized\string\ text}
\end{menulist} %
on the other. \
}
\par{Most \LaTeX{} commands are analogous to \textsc{SGML} elements. \ Moreover, the %
attribute list associated with an \textsc{SGML} element can be made to correspond %
with a \LaTeX{} command option. \ For example, %
\begin{menulist}
\item\texttt{}\verb+\+\texttt{a[href=\texttt{"}http://foo.dom/\texttt{"}}
\item\texttt{]}\verb+{+\texttt{The\ Foo\ Domain}\verb+}+\texttt{}
\end{menulist} %
matches %
\begin{menulist}
\item\texttt{\stringThe\ Foo\ Domain\string}
\end{menulist} %
}
\section*{3\ \ \label{SU-3}Basic GELLMU enhanced with \emph{\textbackslash{}newcommand}}
\par{The idea of using \LaTeX{}-like syntax for conscious writing under an %
\textsc{SGML} document type gains power when one realizes that although %
the notion of \textsc{SGML} entity provides, among other things, simple macro %
expansions, there is no provision under \textsc{SGML} for macros that take %
arguments. \ Moreover, there is no obvious method of extending \textsc{SGML} %
systems to accommodate macros with arguments apart from the idea %
of extending a document type\footnote{While document type extensions require enough work that they cannot be %
spontaneous, they provide a sound way to avoid the tangles that can %
arise working with \TeX{} or \LaTeX{} when attempting the simulataneous %
use of conflicting macro packages. \ }. \
}
\par{\textsc{GELLMU} provides a \LaTeX{}-like meta-command\footnote{In \textsc{GELLMU} while a \emph{command} corresponds to an \textsc{SGML} element, a %
\emph{meta-command} is something having the same syntax %
as a command that does not correspond to an \textsc{SGML} element and instead %
receives resolution into other \textsc{SGML} markup under the syntactic translator. \ } called \emph{newcommand} that may be invoked with arguments. \ %
For example, if one writes %
\begin{menulist}
\item\texttt{}\verb+\+\texttt{newcommand}\verb+{+\texttt{}\verb+\+\texttt{afoo}\verb+}+\texttt{[2]}\verb+{+\texttt{\%}
\item\texttt{}\verb+\+\texttt{a[href=\texttt{"}http://www.foo.org/\#1\texttt{"}]}\verb+{+\texttt{\#2}\verb+}+\texttt{}\verb+}+\texttt{}
\end{menulist} %
then a subsequent invocation %
\begin{menulist}
\item\texttt{}\verb+\+\texttt{afoo}\verb+{+\texttt{tex-archive/tds/}\verb+}+\texttt{}\verb+{+\texttt{TDS\ at\ Foo}\verb+}+\texttt{}
\end{menulist} %
will yield (without line breaks):\footnote{The syntactic translator maintains line number alignment between its %
input and its \textsc{SGML} output so that line numbers used by \textsc{SGML} %
parsers in flagging errors match those in source markup. \ } %
\begin{menulist}
\item\texttt{\stringTDS\ at\ Foo\string}
\end{menulist}
}
\par{This \emph{newcommand} markup differs from that of \LaTeX{} in that it %
is classical macro substitution rather than vocabulary expansion. \ %
Since the syntax of a \emph{newcommand} invocation is very similar to %
that of an \textsc{SGML} element, the use of \emph{newcommand} can, apart %
from its on-the-fly convenience, be a help in the development of %
\textsc{SGML} document type extensions. \ A new name in a test document can be %
moved from being that of a macro to being that of an element simply %
with the removal of a \emph{newcommand} definition. \ %
}
\section*{4\ \ \label{SU-4}SGML vs. XML}
\par{Basic \textsc{GELLMU} as enhanced by its macro facility is as far as one can %
sensibly go toward conscious writing under a language in the %
restricted subfamily of \textsc{SGML} document types known as \textsc{XML}. \
}
\par{From one viewpoint the differences between \textsc{SGML} and \textsc{XML} are not %
very important since most correct documents under the larger category %
can, if correct, be automatically translated into equivalent documents %
under \textsc{XML}. \ %
For example, classical \textsc{HTML} that passes validation can be translated %
into the newer \textsc{XML} form of \textsc{HTML} using either James Clark's classical %
\textsc{SGML} library %
\textsl{SP}\footnote{URI: http://www.jclark.com/sp/} or Dave Raggett's program %
\textsl{tidy}\footnote{URI: http://www.w3.org/People/Raggett/tidy/}. \
}
\par{However, the rules for \textsc{XML} were designed to make things easy for %
processors rather than for humans, and for that reason an author %
writing toward an ultimate \textsc{XML} document type usually is well-advised %
to write for a version of the document type under more author-friendly %
\textsc{SGML} rules. \
}
\par{For example, if in an \textsc{SGML} language a forced linebreak is represented %
by the defined-empty element \emph{brk}, then the markup %
\texttt{\string} is sufficient, whereas under the more restrictive \textsc{XML} %
version of the same language, either the markup \texttt{\string\string} %
or its abbreviated form \texttt{\string}\footnote{Moreover, some confusion may arise from the fact that under the \textsc{SGML} %
syntax (formally \textsc{SGML} declaration) specified for \textsc{HTML} neither of these %
\textsc{XML} forms of markup would be permitted. \ } %
must be used. \ For \textsc{GELLMU} this means that the markup \texttt{}\verb+\+\texttt{brk} %
and the markup \texttt{}\verb+\+\texttt{brk}\verb+{+\texttt{}\verb+}+\texttt{} are interchangeable under \textsc{SGML}, %
except for the case of \texttt{}\verb+\+\texttt{brk} abutting a following %
character without intervening whitespace, but not equivalent under %
\textsc{XML}. \ \textsc{GELLMU} provides the form \texttt{}\verb+\+\texttt{brk;} to represent the %
abbreviated form \texttt{\string} of an element that is defined as %
empty. \ %
}
\section*{5\ \ \label{SU-5}Advanced GELLMU}
\par{Basic \textsc{GELLMU} deals with markup languages more or less at the level %
of syntax without getting to the level of grammar. \
}
\par{Advanced \textsc{GELLMU} may be used to roll language-independent grammatical %
concepts into the picture. \
}
\par{The first of these is \LaTeX{}-like multiple argment/option syntax. \ %
For example under advanced \textsc{GELLMU} %
the markup %
\begin{menulist}
\item\texttt{}\verb+\+\texttt{frac}\verb+{+\texttt{a\ x\ +\ b}\verb+}+\texttt{}\verb+{+\texttt{c\ x\ +\ d}\verb+}+\texttt{}
\end{menulist} %
is converted in syntactic translation to %
\begin{menulist}
\item\texttt{\string\stringa\ x\ +\ b\string\stringc\ x\ +\ d\string}
\end{menulist}
}
\par{That is, a chain of \texttt{`}\verb+{+\texttt{'}, \texttt{`}\verb+}+\texttt{'} pairs and %
\texttt{`['}, \texttt{`]'} pairs following a command without %
intervening white space between the command name and the first %
delimiter nor between a close delimiter and the next open delimiter in %
the chain, constitutes an \textsc{SGML} element whose content begins with a %
sequence of generic positional arguments (tag name \emph{ag0}) and %
options (tag name \emph{op0}). \ Without knowledge of the document %
type it cannot be determined if a name used with multiple argument/option %
syntax has only \emph{ag0}, \emph{op0} content. \ The syntactic translator %
provides a list variable consisting of names that have only this type %
of content and that, therefore should be given close tags after the %
sequence of arguments and options. \ Absent that, the author must provide %
a close tag unless an \textsc{SGML} parser can infer it, and even in that case, %
if the element can appear in the mixed content model of another element %
such as, for example, a paragraph, then the parser's automatic placement %
of a close tag could lead to the unwanted collapse of a word boundary %
similar to that which occurs in \LaTeX{} when an author's careless markup %
\begin{center}
\texttt{}\verb+\+\texttt{TeX\ benchmark}
\end{center} gets typeset as ``\TeX{}benchmark'' %
instead of as ``\TeX{} benchmark''. \
}
\par{If multiple argument/option syntax is used, then there is ambiguity %
on the nature of the first pair of chained delimiters if it is the %
pair \texttt{`['}, \texttt{`]'} --- whether it represents an %
\emph{op0} or an attribute list. \ Therefore, in this case it is %
required that it is an attribute list if the first character after its %
\texttt{`['} opening delimiter is a colon (\texttt{`:'}). \
}
\par{In basic \textsc{GELLMU} the following four of the ten \LaTeX{}-special characters %
are special: %
\begin{center}
\texttt{}\verb+\+\texttt{\ \ }\verb+{+\texttt{\ \ }\verb+}+\texttt{\ \ \%}\ .
\end{center} %
Additionally, the character \texttt{`\#'} is special when used in the %
definition of a \emph{newcommand}, the characters \texttt{`['} and %
\texttt{`]'} are special when used for \LaTeX{}-like option syntax, %
and the character \texttt{`\&'} is special when followed immediately %
by a letter since then it is the introducer for \textsc{SGML} entity %
invocations. \
}
\par{Advanced \textsc{GELLMU} provides for the possibility of giving traditional \LaTeX{} %
meaning to \texttt{`\&'} when not followed immediately by a letter %
and also to the other four \LaTeX{}-special characters, which are %
\begin{center}
\texttt{\_\ \ \string^\ \ \$\ \ \string~}\ .
\end{center}
}
\par{Additionally, it provides for the possibility of giving traditional %
\LaTeX{}-like meaning to other short forms of markup such as %
\texttt{"}\verb+\+\texttt{( . . . }\verb+\+\texttt{)"} for inline math, %
\texttt{"}\verb+\+\texttt{[ . . . }\verb+\+\texttt{]"} for displayed math, %
\texttt{"--"} for a range-dash, %
\texttt{"---"} for a punctuation-dash, %
\texttt{"}\verb+\+\texttt{\ "} for an inter-word space, %
\texttt{"}\verb+\+\texttt{,"} for a short space, and others including also, %
if desired, the use of blank lines, as appropriate, for paragraph %
boundaries. \ %
}
\section*{6\ \ \label{SU-6}The GELLMU didactic production system}
\par{The conversion of both \emph{basic} and \emph{advanced} \textsc{GELLMU} source %
markup to \textsc{SGML} is performed by my program called the \emph{syntactic %
translator}. \
}
\par{If one wishes to write consciously for a public document type such %
as \textsc{HTML} or the %
Text Encoding Initiative\footnote{URI: http://www.tei-c.org}'s \textsc{TEI} %
using \textsc{GELLMU}'s \LaTeX{}-like syntax, the syntactic translator is the %
only part of \textsc{GELLMU} that will be of interest. \
}
\par{The optional features of advanced \textsc{GELLMU} described above can only be %
used when one is writing for a document type that provides markup %
in which the corresponding concepts have representation. \ For example, %
\LaTeX{}-like use of the character \texttt{`\string~'} for non-breaking %
space requires a markup that provides non-breaking space. \
}
\par{Moreover, if blank lines are going to be paragraph boundaries, then %
the syntactic translator will need a list of element names before %
which a new paragraph does not make sense, and, since there is no %
separate provision for a list of names after which a paragraph must %
end, the document type cannot be \textsc{XML}. \
}
\par{The \textsc{GELLMU} didactic production system provides such a document type %
and also provides tools, which can be used as inter-changeable %
components, for working with that document type. \
}
\par{The didactic production system consists of the syntactic translator %
and the following additional components: %
\begin{enumerate}
\item An \textsc{SGML} document type called ``article''. \
\item Its \textsc{XML} cousin, also called ``article''. \
\item A program for translating \textsc{SGML} article to \textsc{XML} article. \
\item A program for translating \textsc{XML} article to \textsc{HTML}. \
\item A program for translating \textsc{XML} article to \LaTeX{}. \
\end{enumerate}
}
\par{The document type is intended to be comfortable for authors with %
past experience in \LaTeX{}. \ The document type and the components %
are didactic. \ They are intended to illustrate how such a system %
can be assembled from inter-changeable components. \ They are not %
finished in any sense, and each has shortcomings. \
}
\par{They do serve, I hope, to demonstrate to the community of \LaTeX{} %
authors that it will find no limitations in this approach to %
document production. \
}
\par{At the same time it is intended to provide a whole new way of thinking %
about the subjects of package design and class design. \
}
\par{Its unfinished nature is intended to make it relatively easy for those %
who are so inclined to move in various ways to finish such a system %
that fits their needs. \ %
}
\section*{7\ \ \label{SU-7}Production of this Document}
\par{This document and the slides used during its presentation were %
prepared with the \textsc{GELLMU} didactic production system. Pre-publication %
versions of the sources and various automatic formattings are %
available in the author's web\footnote{URI: http://math.albany.edu:8000/math/pers/hammond/Presen/tug2001/}. \
}
\par{Subsequent to the \textsc{GELLMU} run on this document a copy of its \LaTeX{} %
output was manually modified for conformance with TUG guidelines. If I %
were going to submit a number of such TUG articles, it would be %
worthwhile to make another variant of the \LaTeX{} formatter for TUG. \ %
}
\end{document}