From: email@example.com (Tad McClellan) Newsgroups: comp.text.sgml Subject: Re: A novice needs help or at least pointers Date: Fri, 30 Apr 1999 22:01:36 -0400 Organization: TMI news testing Lines: 103 Message-ID: <firstname.lastname@example.org> References: <email@example.com> NNTP-Posting-Host: 188.8.131.52 X-Newsreader: TIN [version 1.2 PL2]
: Can anyone help me out? Here is the problem: : A data supplier used to send us data in a proprietary format. I was able to : read the file, according to the format, and take out the fields that I needed : for further work. : Now, this data supplier has decided to change to SGML. No one here has ever : come across this before, so we are, so to speak, SGML-idiots.
Well just in case you haven't gotten this yet, SGML (the ISO Standard) does not define any tags at all!
SGML is a meta-language. That is, it is a language for describing other (markup) languages.
See below for the "part" of SGML that contains this description.
: For example, in : the old days, the file said: : Number: 05704062 : Author: Olivastro; Dominic : and so on. Now I get something like this: : <ENTDOC> : <SDOBI> : <B100> : <DNUM> : <PDAT> : 05704062 : </PDAT> : </DNUM> : </B100> : and so on (and on and on and on). The problem is that I can not dream up any : clear programmatic way to extract the data I need. : Are there tools for this?
"down converting" from SGML into other formats is its reason for existing.
: Ideally, I want a program that will just take this : file and change it to something like the first file. Any ideas or pointers?
SGML requires a DTD.
Without a DTD, it is not (legal) SGML.
So that is not SGML there :-)
The DTD describes the grammar of the document in the DOCTYPE declaration. If you are lucky, it also documents the semantics assigned to the various element names (usually in comments or in an additional document).
The grammar allows nested constructs to be used (eg, book contains chapters, chapters contain paragraphs, paragraphs contain words), which may not be amenable to "flattening out" as you show above.
Since it is supposed to be just another format equivalent to what you have been getting, then you probably _can_ flatten it out though.
Can't tell without seeing the DTD.
You cannot talk sensibly about SGML data without having access to the DTD.
So ask your supplier to supply it :-)
We cannot really help you without the DTD, because we don't know what, for instance, SDOBI _means_.
All things SGML can eventually be found out by starting at:
-- Tad McClellan SGML Consulting firstname.lastname@example.org Perl programming Fort Worth, Texas