Serving Information to the Network

GELLMU

The GELLMU project has many facets. For mathematicians the most interesting point is that regular GELLMU provides a way to use LaTeX-Like Markup to write in an author-level XML document type that admits reliable automatic translation both to PDF (via regular LaTeX) and to the modern form of HTML extended with MathML.

More generally, the GELLMU project provides a way to use LaTeX-Like markup to write directly for most author-level XML document types with the availability of newcommand-like macros taking arguments.

Q. Why use a LaTeX-Like interface to XML rather than use a LaTeX translator? A. LaTeX translates easily only to the DVI and PDF formats. Translating LaTeX to formats such as HTML and other SGML or XML document types requires Herculean efforts.

GELLMU is not LaTeX nor is it an HTML or XML generator (but see below); rather it is a general-purpose SGML authoring language that is based on traditional LaTeX syntax (to the extent possible).

When used beyond regular GELLMU, it is like XML in that there is no fixed set of tags (i.e., LaTeX-like commands). That feature is both good and bad. It is good because it creates flexibility. It is bad because it puts upon the author the responsibilities of

having a coherent set of tags.
writing or obtaining codelet packages (or style processors) to provide translation to standard formats.

In exchange for the extra effort the author gets to pick the target formats.

A well-tuned GELLMU authoring system might admit translation not only to LaTeX and HTML/MathML but also to other formats.

Some have said that SGML is not "rich enough" to encompass the needs of real mathematicians. That is certainly a correct statement about the language in the SGML family that most of us know best, the one that is called HTML. One needs to understand that SGML is about the organization of automatic processing. A single dialect may not be rich enough for everything. We want to think about the category of markup languages (modulo a somewhat elusive notion of equivalence) and the clever use of morphisms in that category.

Of course, the class of markup languages will not give rise to a category unless one knows what is a morphism. A morphism is any translation from one to another (or itself). That is, a morphism takes a document in one as input and produces a document in the other as output. There is certainly the identity translation for each markup language, and there is a null translation in any case where the target language contains the empty document.

Strictly speaking an isomorphism in this category would then be given by a pair of mutually inverting morphisms.

With this notion of isomorphism there could easily be infinitely many documents in a markup language that were reasonably equivalent to the empty document.

So one wants the objects in the category to be classes of markup language for some notion of equivalence. Even then it is not clear that one would arrive at a small category.

Ultimately the nature of restrictions imposed in SGML systems is that the morphisms in the category need to work fast. That understood, one can get where one wants by

trying to find a language that enjoys suitable initiality properties relative to those markups in which one is interested.
composing morphisms.

I offer an example of a document in HTML, a brief introduction, which is not the GELLMU entrance document, that was translated automatically from a Gellmu input document (plain text with tags). There is also a LaTeX document (plain text with tags) that was produced automatically from the same document.

Computing Goal: one (virtual) operating system, one editor, one mailer, and one authoring language, regardless of where I sit.

Draft Material

Much of what we need today for mathematical research is available electronically at our desks. The New York Journal of Mathematics is just one example. Many mathematics journals are now available online, although often not freely, and various mathematical preprint archives are also available.

The online appearance of Mathematical Reviews in the guise of MathSciNet (American Mathematical Society) enables a mathematician to accomplish in perhaps half an hour what formerly would have consumed days of work in the library gathering references.

The JSTOR project even brings crisp images of the journal pages, for selected journals, of bygone years to our screens.

Unfortunately, the mathematical community has lacked a format for presenting electronic articles that is (1) robust for notation-based searching, (2) satisfactory for efficient network delivery, and (3) easily renderable in various presentation formats by widely available inexpensive tools. Publishers still might wish to obstruct the free delivery of high quality typeset forms from network delivery while providing free (slightly ugly) forms with all content in tact.

What may not work well is for publishers to provide only "search hits" and "indexing information" to the network for free without providing free (slightly ugly) forms with content in tact. In both cases cataloging and indexing sites might fear that their customers will think that they are not being served well when all they get are pointers that cannot be followed with some means of verification of the soundness, for the customers' interest, of that which was retrieved. This reasoning might lead cataloging and indexing sites to ignore such publications.

That would represent a dissemblance of the mathematical community in which there will be at most suboptimal international network support for digging out the state of information about a specific topic that goes beyond what is now possible by going to MathSciNet (which is enormously better than going to a paper library for Math Reviews).

Much, much more is possible soon if there is this level of cooperation of publishers.

Beyond that free (slightly ugly) forms with content in tact of research articles and research monographs give publishers the hope of churning interest and increasing receipts thereby. There will be a relation between what can be charged for a high quality print copy and the dollar cost, not to mention the labor and the "mess", of printing from the web. For good articles and monographs the number of international sales should be enormous, as the price goes below the multiplier 1.5, relative to the dollar cost of printing from the web, compared to the mere 2500 subscriptions that a good contemporary mathematics print journal hopes to have today.

One might even imagine an increase in the quality of the average research article as publishers, as well as editors, become sensitive to the quality of the individual article as opposed to the running average quality of several years worth of the articles in a journal.

The widespread distribution, beginning around 1995, of a free viewer for Adobe's Portable Document Format (PDF) made it possible for the majority of PC users, not just those with special mathematically oriented tools (principally "TeX"), to view typeset mathematics on their screens.

From the standpoint of a mathematician wishing to serve typeset mathematics on the network the "PDF" format is not the final answer. Indeed, while it greatly expands the viewing audience, it still, in the typical situation, requires both a web browser and a PDF viewer, and it does not push the realm of possibilities much beyond that which had been accomplished with Knuth's TeX-related "DVI" format. "DVI" is desirable because it is a public standard. (Indeed, the complete definition of "basic DVI" is relatively short.) Finally, the use of PDF requires the author to acquire at cost a tool for the generation of PDF.

An early draft for version 3 of HTML made provision for mathematical text in Web pages. This was adequate for casual mathematical needs, but this simple math markup was later excluded from HTML. Today most observers regard HTML as a "closed" language. Nonetheless, the addition of a single tag, the "<lg>" ("lg" for "logical group") could make a very big difference if MathML does not take hold in the way that has been promised.

After the mathematical provisions in early HTML-3 were discarded from HTML, the World Wide Web Consortium (W3C) revisited the question of provision for mathematics on the web. This led to the development of Mathematical Markup Language (MML), which acquired the status of a W3C recommendation in the Spring of 1998.

Beyond the issues of electronic presentation the mathematical community needs formats that are (1) efficient for human authoring, (2) robust for notation-based searching and (3) satisfactory for perpetual archiving. Such formats must be powerful enough to admit subsequent automatic processing in many different directions.

Since 1991 I have been working, in part, as a provider to the network of information related to mathematical research. I have been involved in maintaining a public electronic archive since 1992. This archive became free-standing as a gopher in 1993, and began serving HTML hypertext and links to HTTP (HyperText Transfer Protocol) servers in early 1994. It began to function in parallel with an HTTP server (working with the same database) in 1995, functioning as part of the mathematics web, when it became apparent that it would be a while longer before most WWW browsing programs would be brought up to date (as of late 1993) on gopher protocol. Indeed by 1996 the use of gopher in the mathematical community had declined steeply.

I encourage those with an interest in the InterNet as a medium of information exchange to become acquainted with the UNIX (tm) philosophy. The principles therein have application as well to the design of information exchange protocols and mechanisms on the InterNet. Mathematicians tend to gravitate toward the UNIX philosophy because they understand how a magnificent structure can be made from the assembly of many carefully crafted nuggets.

TOP | Department