% LaTeX \documentclass[leqno]{article} \usepackage{url} \usepackage{graphicx} \usepackage{amsmath} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{bm} \usepackage{gellmu} \usepackage[margin=100bp,nohead]{geometry} \setlength{\parskip}{6bp} \setlength{\parindent}{0bp} \thispagestyle{empty} \hyphenation{gell-mu mark-up new-com-mand} \title{GELLMU: A Bridge for Authors\\[0.25\baselineskip] from \LaTeX{} to XML} \date{August 2001 \\{} (slightly revised: January 2003)} \newlength{\centerskip} \setlength{\centerskip}{\topsep} \newcommand{\hsf}{\hspace*{\fill}} \newcommand{\tdbc}[1]{\hsf\textbf{#1}\hsf} \newenvironment{menulist}{ \begin{list}{}{ \setlength{\topsep}{0bp} \setlength{\labelwidth}{0.03\linewidth} \setlength{\leftmargin}{0.06\linewidth} \setlength{\itemindent}{0bp} \setlength{\itemsep}{-6bp} \setlength{\parsep}{6bp}} }{\end{list}} \newenvironment{Menulist}{ \begin{list}{}{ \setlength{\topsep}{0bp} \setlength{\labelwidth}{0.03\linewidth} \setlength{\leftmargin}{0.06\linewidth} \setlength{\itemindent}{0bp} \setlength{\itemsep}{3bp} \setlength{\parsep}{6bp}} }{\end{list}} \newenvironment{toclist}{\normalsize \begin{list}{}{ }}{\end{list}} \newenvironment{Toclist}{\large \begin{list}{}{ }}{\end{list}} \newenvironment{citations}{ \begin{list}{}{ \setlength{\topsep}{0bp} \setlength{\labelwidth}{0bp} \setlength{\leftmargin}{0.04\linewidth} \setlength{\labelsep}{0bp} \setlength{\itemindent}{-0.2\leftmargin} \setlength{\itemsep}{3bp} \setlength{\parsep}{0bp}} }{\end{list}} \author{William F. Hammond\\{}Department of Mathematics \& Statistics \\{} University at Albany \\{} Albany, New York 12222 (USA) \\{} \texttt{hammond@math.albany.edu} \\{} \url{http://www.albany.edu/~hammond/}} \begin{document} \begin{center}\LARGE\bfseries{} GELLMU: A Bridge for Authors\\[0.25\baselineskip] from \LaTeX{} to XML \end{center} \begin{center}\large\bfseries{} Presentation at TUG 2001, University of Delaware \end{center} \begin{center}\Large\bfseries{} \textsl{William F. Hammond} \end{center} \begin{center}\large Department of Mathematics \& Statistics \\{} University at Albany \\{} Albany, New York 12222 (USA) \\{} \texttt{hammond@math.albany.edu} \\{} \url{http://www.albany.edu/~hammond/} \end{center} \begin{center} \large\bfseries{} August 2001 \\{} (slightly revised: January 2003) \end{center} \medskip \begin{abstract} \par{\textsc{GELLMU}, which stands for ``Generalized Extensible \LaTeX{}-Like % Markup'', is a system for using \LaTeX{}-like markup, though not \LaTeX{} % itself, to write consciously for a markup language in the \textsc{SGML} category % or in its popular \textsc{XML} subcategory. \ } \par{The \emph{basic} level of \textsc{GELLMU} offers a way to use \LaTeX{}-Like notation % together with a \LaTeX{}-Like \emph{newcommand} (with arguments) macro % facility to write web pages. \ } \par{The \emph{advanced} level of \textsc{GELLMU} enables one additionally to % incorporate certain \LaTeX{}-Like features, such as the use of a blank % line for a new paragraph, in writing for an \textsc{SGML} language. \ } \par{The didactic \textsc{GELLMU} production system provides an ``article'' % \textsc{XML} language, with some resemblance to \LaTeX{} itself, that is a % rigorous domain for translation to other formats. \ % } \end{abstract} \section*{Table of Contents} \begin{Toclist} \item[]{1\ \ Author Level Markup\dotfill{}~\pageref{SU-1}} \item[]{2\ \ The basics of \emph{basic} \textsc{GELLMU}\dotfill{}~\pageref{SU-2}} \item[]{3\ \ Basic \textsc{GELLMU} enhanced with \emph{\textbackslash{}newcommand}\dotfill{}~\pageref{SU-3}} \item[]{4\ \ \textsc{SGML} vs. \textsc{XML}\dotfill{}~\pageref{SU-4}} \item[]{5\ \ Advanced \textsc{GELLMU}\dotfill{}~\pageref{SU-5}} \item[]{6\ \ The \textsc{GELLMU} didactic production system\dotfill{}~\pageref{SU-6}} \item[]{7\ \ Production of this Document\dotfill{}~\pageref{SU-7}} \end{Toclist} \section*{1\ \ \label{SU-1}Author Level Markup} \par{Inasmuch as the World Wide Web is becoming an important library % resource, one wants one's publications to be accessible online, and one % wants web-crawling robots to be able to catalogue them properly. \ } \par{Despite the popularity of Adobe's Portable Document Format (\textsc{PDF}) % the distribution of \textsc{PDF} reading software is not as widespread as % the distribution of web browsing software, and web-crawling robots % often do not scan the contents of \textsc{PDF} documents. \ } \par{What is available for the \LaTeX{} author toward this end? \ More % specifically, consider the following situations: % % \begin{description} \item[{Online publication archives}] Specifically, I would like to cite the \TeX{}/\LaTeX{}-based e-print archive % begun at Los Alamos in the early 1990's by Paul Ginsparg, now known as % ``Arxiv''\footnote{URI: http://www.arxiv.org/}, which is now a % participant in the Open Archives % Initiative\footnote{URI: http://www.openarchives.org/}. \ While in its early time the term \emph{e-print} was % understood to mean ``electronic pre-print'', ArXiv has more % recently become a repository for established journals including now, % for example, the highly regarded % \emph{Annals of % Mathematics}\footnote{URI: http://www.math.princeton.edu/\textasciitilde{}annals/}, which was founded in 1884 by Ormond Stone of the % University of Virginia, and Ginsparg now tells us that the term % \emph{e-print} denotes ``self-archiving by the author'' under % a new overall academic publication\footnote{Paul Ginsparg, ``Electronic Clones vs. the Global Research % Archive'', \url{http://arXiv.org/blurb/pg00bmc.html}. \ } % design. \ % \item[{Course handouts}] How can a college teacher prepare course handouts for both paper and online % distribution? \ If the teacher writes \LaTeX{}, some manual intervention will % likely be needed in order to obtain correct \textsc{HTML}. \ If the teacher % writes \textsc{HTML}, then the paper distribution\footnote{If the \textsc{HTML} is correctly % written, then robust translation to \LaTeX{} is possible.} % will be limited by what can be expressed in \textsc{HTML}, which is not as rich % a markup as \LaTeX{}. \ % \item[{TUG articles}] Before preparing a TUG 2001 article an author is asked to read % \emph{Preparation % of documents for multiple modes of delivery}\footnote{URI: http://www.tug.org/TUGboat/Contents/contents20-4.html} % by Ross Moore, which is available on the web only as a two-column % \textsc{PDF} printed page image. \ From this % article one might conclude that carefully prepared \LaTeX{} may be % suitable for translation to \textsc{HTML} although no \textsc{HTML} version of the % article seems to be available. \ % \item[{GNU documentation}] While working for TUG on the \TeX{} Directory System (\textsc{TDS}) guidelines % --- see % \url{/tds/standard/}\footnote{URI: http://ctan.tug.org/tex-archive/tds/standard/} % at \textsc{CTAN} --- % in January 1998, Ulrik Vieth produced a \LaTeX{} document % and a tailored % program for translating that document into \textsl{Texinfo}, the language of the % GNU\footnote{URI: http://www.gnu.org/} documentation system. \ % \par{\textsl{Texinfo} is a \TeX{}-based system that pre-dates \textsc{HTML}. \ Its original % purpose was to provide both print and (early online hypertext) \textsl{Info} % versions of \textsc{GNU} software project documentation. \ When \textsc{HTML} came % along, it was possible to provide fairly reliable translation from \textsl{Texinfo} % to \textsc{HTML} because \textsl{Texinfo} is a well-structured markup. \ In fact, \textsl{Texinfo} is % very nearly equivalent to an \textsc{SGML} language, and, Daniele Giacomini in % August 2000 came up with an effort in that direction: % \textsl{Sgmltexi}\footnote{URI: http://master.swlibero.org/\textasciitilde{}daniele/software/sgmltexi/ }. \ } \end{description} } \par{Although programs are available for translating carefully structured % \LaTeX{} into \textsc{HTML} and sometimes into \textsc{XML} extensions of \textsc{HTML}, this % method of generating online content for the basic level of the web % sometimes requires manual intervention. \ A more direct approach to the % world of \textsc{SGML} offers better prospects for long-term access to new web % formats without sacrificing access to the quality of print typesetting % that is available through \LaTeX{}. \ % } \section*{2\ \ \label{SU-2}The basics of \emph{basic} GELLMU} \par{In looking over Vieth's set-up for the \textsc{TDS} document in the late % spring of 1998, I arrived at the idea of using \LaTeX{}-like notation % for conscious writing in document languages under \textsc{SGML} and I have % written a program in the GNU Emacs Lisp language, the \textsc{GELLMU} % syntactic translator, for converting this \LaTeX{}-like markup to % \textsc{SGML} markup. \ } \par{The advantage of \textsc{SGML} markup is that each markup language (formally % document type) under the \textsc{SGML} umbrella constitutes a structured % domain for the application of automatic processors that are easy to % create under any of a number of structured processing frameworks. \ There % are frameworks accessible in standard computing languages, and there % is also a recent framework % \textsl{xmltex}\footnote{URI: http://www.dcarlisle.demon.co.uk/xmltex/} by % David Carlisle for writing \TeX{} typesetting routines for \textsc{XML} % document types. \ } \par{The root idea in using \LaTeX{}-like markup for the conscious writing of % markup under \textsc{SGML} is the simple syntactic correspondence between % markup such as % \begin{menulist} \item\texttt{some\ }\verb+\+\texttt{em}\verb+{+\texttt{emphasized}\verb+}+\texttt{\ text} \end{menulist} % on the one hand, and the markup % \begin{menulist} \item\texttt{some\ \stringemphasized\string\ text} \end{menulist} % on the other. \ } \par{Most \LaTeX{} commands are analogous to \textsc{SGML} elements. \ Moreover, the % attribute list associated with an \textsc{SGML} element can be made to correspond % with a \LaTeX{} command option. \ For example, % \begin{menulist} \item\texttt{}\verb+\+\texttt{a[href=\texttt{"}http://foo.dom/\texttt{"}} \item\texttt{]}\verb+{+\texttt{The\ Foo\ Domain}\verb+}+\texttt{} \end{menulist} % matches % \begin{menulist} \item\texttt{\stringThe\ Foo\ Domain\string} \end{menulist} % } \section*{3\ \ \label{SU-3}Basic GELLMU enhanced with \emph{\textbackslash{}newcommand}} \par{The idea of using \LaTeX{}-like syntax for conscious writing under an % \textsc{SGML} document type gains power when one realizes that although % the notion of \textsc{SGML} entity provides, among other things, simple macro % expansions, there is no provision under \textsc{SGML} for macros that take % arguments. \ Moreover, there is no obvious method of extending \textsc{SGML} % systems to accommodate macros with arguments apart from the idea % of extending a document type\footnote{While document type extensions require enough work that they cannot be % spontaneous, they provide a sound way to avoid the tangles that can % arise working with \TeX{} or \LaTeX{} when attempting the simulataneous % use of conflicting macro packages. \ }. \ } \par{\textsc{GELLMU} provides a \LaTeX{}-like meta-command\footnote{In \textsc{GELLMU} while a \emph{command} corresponds to an \textsc{SGML} element, a % \emph{meta-command} is something having the same syntax % as a command that does not correspond to an \textsc{SGML} element and instead % receives resolution into other \textsc{SGML} markup under the syntactic translator. \ } called \emph{newcommand} that may be invoked with arguments. \ % For example, if one writes % \begin{menulist} \item\texttt{}\verb+\+\texttt{newcommand}\verb+{+\texttt{}\verb+\+\texttt{afoo}\verb+}+\texttt{[2]}\verb+{+\texttt{\%} \item\texttt{}\verb+\+\texttt{a[href=\texttt{"}http://www.foo.org/\#1\texttt{"}]}\verb+{+\texttt{\#2}\verb+}+\texttt{}\verb+}+\texttt{} \end{menulist} % then a subsequent invocation % \begin{menulist} \item\texttt{}\verb+\+\texttt{afoo}\verb+{+\texttt{tex-archive/tds/}\verb+}+\texttt{}\verb+{+\texttt{TDS\ at\ Foo}\verb+}+\texttt{} \end{menulist} % will yield (without line breaks):\footnote{The syntactic translator maintains line number alignment between its % input and its \textsc{SGML} output so that line numbers used by \textsc{SGML} % parsers in flagging errors match those in source markup. \ } % \begin{menulist} \item\texttt{\stringTDS\ at\ Foo\string} \end{menulist} } \par{This \emph{newcommand} markup differs from that of \LaTeX{} in that it % is classical macro substitution rather than vocabulary expansion. \ % Since the syntax of a \emph{newcommand} invocation is very similar to % that of an \textsc{SGML} element, the use of \emph{newcommand} can, apart % from its on-the-fly convenience, be a help in the development of % \textsc{SGML} document type extensions. \ A new name in a test document can be % moved from being that of a macro to being that of an element simply % with the removal of a \emph{newcommand} definition. \ % } \section*{4\ \ \label{SU-4}SGML vs. XML} \par{Basic \textsc{GELLMU} as enhanced by its macro facility is as far as one can % sensibly go toward conscious writing under a language in the % restricted subfamily of \textsc{SGML} document types known as \textsc{XML}. \ } \par{From one viewpoint the differences between \textsc{SGML} and \textsc{XML} are not % very important since most correct documents under the larger category % can, if correct, be automatically translated into equivalent documents % under \textsc{XML}. \ % For example, classical \textsc{HTML} that passes validation can be translated % into the newer \textsc{XML} form of \textsc{HTML} using either James Clark's classical % \textsc{SGML} library % \textsl{SP}\footnote{URI: http://www.jclark.com/sp/} or Dave Raggett's program % \textsl{tidy}\footnote{URI: http://www.w3.org/People/Raggett/tidy/}. \ } \par{However, the rules for \textsc{XML} were designed to make things easy for % processors rather than for humans, and for that reason an author % writing toward an ultimate \textsc{XML} document type usually is well-advised % to write for a version of the document type under more author-friendly % \textsc{SGML} rules. \ } \par{For example, if in an \textsc{SGML} language a forced linebreak is represented % by the defined-empty element \emph{brk}, then the markup % \texttt{\string} is sufficient, whereas under the more restrictive \textsc{XML} % version of the same language, either the markup \texttt{\string\string} % or its abbreviated form \texttt{\string}\footnote{Moreover, some confusion may arise from the fact that under the \textsc{SGML} % syntax (formally \textsc{SGML} declaration) specified for \textsc{HTML} neither of these % \textsc{XML} forms of markup would be permitted. \ } % must be used. \ For \textsc{GELLMU} this means that the markup \texttt{}\verb+\+\texttt{brk} % and the markup \texttt{}\verb+\+\texttt{brk}\verb+{+\texttt{}\verb+}+\texttt{} are interchangeable under \textsc{SGML}, % except for the case of \texttt{}\verb+\+\texttt{brk} abutting a following % character without intervening whitespace, but not equivalent under % \textsc{XML}. \ \textsc{GELLMU} provides the form \texttt{}\verb+\+\texttt{brk;} to represent the % abbreviated form \texttt{\string} of an element that is defined as % empty. \ % } \section*{5\ \ \label{SU-5}Advanced GELLMU} \par{Basic \textsc{GELLMU} deals with markup languages more or less at the level % of syntax without getting to the level of grammar. \ } \par{Advanced \textsc{GELLMU} may be used to roll language-independent grammatical % concepts into the picture. \ } \par{The first of these is \LaTeX{}-like multiple argment/option syntax. \ % For example under advanced \textsc{GELLMU} % the markup % \begin{menulist} \item\texttt{}\verb+\+\texttt{frac}\verb+{+\texttt{a\ x\ +\ b}\verb+}+\texttt{}\verb+{+\texttt{c\ x\ +\ d}\verb+}+\texttt{} \end{menulist} % is converted in syntactic translation to % \begin{menulist} \item\texttt{\string\stringa\ x\ +\ b\string\stringc\ x\ +\ d\string} \end{menulist} } \par{That is, a chain of \texttt{`}\verb+{+\texttt{'}, \texttt{`}\verb+}+\texttt{'} pairs and % \texttt{`['}, \texttt{`]'} pairs following a command without % intervening white space between the command name and the first % delimiter nor between a close delimiter and the next open delimiter in % the chain, constitutes an \textsc{SGML} element whose content begins with a % sequence of generic positional arguments (tag name \emph{ag0}) and % options (tag name \emph{op0}). \ Without knowledge of the document % type it cannot be determined if a name used with multiple argument/option % syntax has only \emph{ag0}, \emph{op0} content. \ The syntactic translator % provides a list variable consisting of names that have only this type % of content and that, therefore should be given close tags after the % sequence of arguments and options. \ Absent that, the author must provide % a close tag unless an \textsc{SGML} parser can infer it, and even in that case, % if the element can appear in the mixed content model of another element % such as, for example, a paragraph, then the parser's automatic placement % of a close tag could lead to the unwanted collapse of a word boundary % similar to that which occurs in \LaTeX{} when an author's careless markup % \begin{center} \texttt{}\verb+\+\texttt{TeX\ benchmark} \end{center} gets typeset as ``\TeX{}benchmark'' % instead of as ``\TeX{} benchmark''. \ } \par{If multiple argument/option syntax is used, then there is ambiguity % on the nature of the first pair of chained delimiters if it is the % pair \texttt{`['}, \texttt{`]'} --- whether it represents an % \emph{op0} or an attribute list. \ Therefore, in this case it is % required that it is an attribute list if the first character after its % \texttt{`['} opening delimiter is a colon (\texttt{`:'}). \ } \par{In basic \textsc{GELLMU} the following four of the ten \LaTeX{}-special characters % are special: % \begin{center} \texttt{}\verb+\+\texttt{\ \ }\verb+{+\texttt{\ \ }\verb+}+\texttt{\ \ \%}\ . \end{center} % Additionally, the character \texttt{`\#'} is special when used in the % definition of a \emph{newcommand}, the characters \texttt{`['} and % \texttt{`]'} are special when used for \LaTeX{}-like option syntax, % and the character \texttt{`\&'} is special when followed immediately % by a letter since then it is the introducer for \textsc{SGML} entity % invocations. \ } \par{Advanced \textsc{GELLMU} provides for the possibility of giving traditional \LaTeX{} % meaning to \texttt{`\&'} when not followed immediately by a letter % and also to the other four \LaTeX{}-special characters, which are % \begin{center} \texttt{\_\ \ \string^\ \ \$\ \ \string~}\ . \end{center} } \par{Additionally, it provides for the possibility of giving traditional % \LaTeX{}-like meaning to other short forms of markup such as % \texttt{"}\verb+\+\texttt{( . . . }\verb+\+\texttt{)"} for inline math, % \texttt{"}\verb+\+\texttt{[ . . . }\verb+\+\texttt{]"} for displayed math, % \texttt{"--"} for a range-dash, % \texttt{"---"} for a punctuation-dash, % \texttt{"}\verb+\+\texttt{\ "} for an inter-word space, % \texttt{"}\verb+\+\texttt{,"} for a short space, and others including also, % if desired, the use of blank lines, as appropriate, for paragraph % boundaries. \ % } \section*{6\ \ \label{SU-6}The GELLMU didactic production system} \par{The conversion of both \emph{basic} and \emph{advanced} \textsc{GELLMU} source % markup to \textsc{SGML} is performed by my program called the \emph{syntactic % translator}. \ } \par{If one wishes to write consciously for a public document type such % as \textsc{HTML} or the % Text Encoding Initiative\footnote{URI: http://www.tei-c.org}'s \textsc{TEI} % using \textsc{GELLMU}'s \LaTeX{}-like syntax, the syntactic translator is the % only part of \textsc{GELLMU} that will be of interest. \ } \par{The optional features of advanced \textsc{GELLMU} described above can only be % used when one is writing for a document type that provides markup % in which the corresponding concepts have representation. \ For example, % \LaTeX{}-like use of the character \texttt{`\string~'} for non-breaking % space requires a markup that provides non-breaking space. \ } \par{Moreover, if blank lines are going to be paragraph boundaries, then % the syntactic translator will need a list of element names before % which a new paragraph does not make sense, and, since there is no % separate provision for a list of names after which a paragraph must % end, the document type cannot be \textsc{XML}. \ } \par{The \textsc{GELLMU} didactic production system provides such a document type % and also provides tools, which can be used as inter-changeable % components, for working with that document type. \ } \par{The didactic production system consists of the syntactic translator % and the following additional components: % \begin{enumerate} \item An \textsc{SGML} document type called ``article''. \ \item Its \textsc{XML} cousin, also called ``article''. \ \item A program for translating \textsc{SGML} article to \textsc{XML} article. \ \item A program for translating \textsc{XML} article to \textsc{HTML}. \ \item A program for translating \textsc{XML} article to \LaTeX{}. \ \end{enumerate} } \par{The document type is intended to be comfortable for authors with % past experience in \LaTeX{}. \ The document type and the components % are didactic. \ They are intended to illustrate how such a system % can be assembled from inter-changeable components. \ They are not % finished in any sense, and each has shortcomings. \ } \par{They do serve, I hope, to demonstrate to the community of \LaTeX{} % authors that it will find no limitations in this approach to % document production. \ } \par{At the same time it is intended to provide a whole new way of thinking % about the subjects of package design and class design. \ } \par{Its unfinished nature is intended to make it relatively easy for those % who are so inclined to move in various ways to finish such a system % that fits their needs. \ % } \section*{7\ \ \label{SU-7}Production of this Document} \par{This document and the slides used during its presentation were % prepared with the \textsc{GELLMU} didactic production system. Pre-publication % versions of the sources and various automatic formattings are % available in the author's web\footnote{URI: http://math.albany.edu:8000/math/pers/hammond/Presen/tug2001/}. \ } \par{Subsequent to the \textsc{GELLMU} run on this document a copy of its \LaTeX{} % output was manually modified for conformance with TUG guidelines. If I % were going to submit a number of such TUG articles, it would be % worthwhile to make another variant of the \LaTeX{} formatter for TUG. \ % } \end{document}