Simple text extract is an example of one possible way to present a short text of a simple paragraph structure. Once (after multiple trial-and-error iterations) I've developed the structure that I found to be satisfactory, I try to be consistent in using it throughout the entire web site. My first experiments were inevitably manual, later I've written a program in PowerBASIC to convert the scanned text into the HTML paragraph structure of a simple extract.
While each text extract is individual and on most part requires certain adjustments after automatic conversion, the basic principles of simple text extract presentation are the same for the format that I'm using. Some of them, like overall HTML layout, use of indentation, navigation bars, font usage, background setup, and HTML text HEAD structure are common for almost all pages of this web site. Those principles can be summarized as follows:
Text starts with 3 lines of descriptive title (3 separate lines of HTML comment). This title is an internal text description that helps me when I'm switching in my QuickEdit session between a dozen of texts (the way I'm doing it now while my train slowly approaches Hoboken). Internal title also helps me to handle the multitude of files comprising this web site (about 800 as of October 2000). I use ZTreeWin (32 bit shareware reincarnation of XTreeGold) to manage my web site files, and the power of 4DOS scripts to handle incremental uploads (via good old FTP). This choice of tools also calls for the internal descriptive titles that facilitates management of a large amount of small files.
Title consists of an underlined title text, text file URL, and three dates:
Modification dates are maintained manually when text is changed (not too often to become a burden). Second date reflects any change to the text, while third date reflects content change, and drives an automatic system of coloring dates in various site navigation lists to highlight the most recently modified pages.
TITLE in the HTML HEAD will be displayed by the browser in the top bar of its window. It consists here of an extract text title followed by a reference to the section of the web site and the name of the site). TITLE is most likely to be displayed when your page gets found by the search engine to present the title of the individual page to the person browsing the search results. Title should be kept short and preferably as precise as possible.
LINK line that immediately follows TITLE line, specifies information about site FavIcon image for the page (it's the same throughout entire site).
CharSet specification matches here the standard default for Windows. However, in order for a browser (Netscape Navigator at least) to be able to interpret special symbols correctly the CharSet should be specified explicitly. E.g.: symbol § is interpreted correctly as §, because the text you are looking at now has explicit CharSet specification (to check this hit [Ctrl]+[U] and see the HTML source line just below the TITLE line). Without explicit CharSet specification the browser might fail to interpret special symbols showing question marks instead. Another reason to have a CharSet specification in every HTML source is for structural consistency with the Russian section of my web site where Cyrillic character set specification is a must.
The HEAD elements that follow provide information for a search engine spiders. Good search engines retrieve this information and ask you to give only an URL when you register your site with them (less than good engines ask you to enter all or part of that data manually).
Description is most likely to be displayed right after the title (whatever will fit into 1–2 lines) when your page gets found by the search engine. Description can be viewed as an abstract that presents the content of the individual page to the person browsing the search results. Description should be kept reasonably short and preferably as precise as possible. Since my site is a mixture of many things, I split my Description into Page: and Site: parts, so that, if Page: part is small, some of the Site: sections' list might get displayed.
Keywords are used for the search itself — search engine will look for the closest match between requested set of keywords and keywords from the page (retrieved by a spider and saved in the search engine database). The careful choice of the keywords is very important. Ideally they should be a distinguishable key permitting the reasonably close association of a page content with the intended target of a potential searcher. The following keywords reflect, IMHO, the content of the extract presented:
On the contrary, the word Joy, for example, though it is used several times in the text, including text title, would be a wrong choice, because its usage will place the title and URL of this page in an avalanche of search results that have nothing to do with the programming. The page found will be lost anyway, since there are so many other "joys", which are much more popular than that of a programming.
I didn't use the word Mainframe either, because, though the book was written by the manager of one of the largest mainframe projects ever, and to a great extent describes this project, presented extract is about the craft of the programming in general, and doesn't bear any specific references to mainframes. This is equally applicable to the entire book as well — while it's inevitably based on the mainframe materials, it is greatly generalized and most of its insightful ideas can be successfully used as a guidelines for development on any computer architecture and in any software environment.
Note: Some of the search engines will ignore the Keywords altogether and will try to derive them from the text of the page. The reason for this is that Keywords can be easily abused by the author of web page — in order to get more traffic author might put Keywords which don't reflect the content of the page. In the "best" traditions of our "politically correct" age in order to prevent the abuse by some, those search engines prohibit the use by everybody. IMHO, it would be better (and not much of a trouble) to check wether the Keywords match the text and, if not, derive them from the text (better yet, reject the submission at all to punish the guilty only, instead of punishing everybody for somebody's potential fraud attempt). I still believe in a common sense, and do my best to put out Keywords that reflect the content of my pages (as it was shown above, it's not always straightforward).
Next line identifies the author of this HTML page.
Last block of lines of the HEAD is style information.
The BODY line starts the HTML text body, and sets background, foreground and hyperlinks' colors. There are browser defaults for all of them, but I prefer not to rely on these defaults, because I simply don't know what they are for each individual browser that is used to view my page. At the same time colors are an essential part of page design (user still always can, if he/she wishes, override any document color defaults by the defaults of his browser by making an appropriate choices of browser parameters).
The comment right after BODY replicates Description. The reason for this is an unfortunate fact that some of the search engines ignore Description, and put in its place the first couple lines of the text itself (I guess they do this for the same reason they ignore the Keywords). The texts of all my pages begin with a standard navigation bar, which would look senseless as a page abstract. Placing a comment replicating the Description is my attempt (not always successful) to give a search engine what it expects to get form the text starting lines — page abstract.
FONT FACE and SIZE are specified for the same reason as colors and can be overridden by the user in the same way.
Top of the text Anchor enables jump from the text bottom to text top.
I've used Italicised text for the entire extract. It looks less formal to me that way.
Top navigation bar contains several essential links within the site where user can jump before reading the text. It is centered, and is separated from the text by a horizontal line and line space. Top navigation bar provides hyperlinks to:
Extract title is CENTERed, Bolded and Underlined. It is COLORed Red and is one step larger font SIZE size than the rest of the text. Title is folowed by line space.
Extract author name is CENTERed, Bolded and COLORed Navy.
The body of the extract is the series of Paragraphs. All paragraphs are presented in the same way:
Paragraph sentences are separated by 2 spaces. Since HTML browser collapses any number of successive spaces into one space, (non-breakable space) is appended to the end of every sentence to get an additional separating space.
Paragraph text is justified (i.e., ALIGNed both on the Left and on the Right).
Paragraph first line is indented by 7  -s (non-breakable spaces). Mixture of  -s with regular spaces won't work properly for paragraph indentation, since paragraphs get justified, and I don't want the justification spaces to be inserted into the indent thus changing indent's size from paragraph to paragraph.
The first letter of the first word of first paragraph line is Bolded and COLORed Red. I saw this style some time ago on a web site, liked it, and I'm using it ever since.
Text extract is followed by the the name of the author and the name of the book containing that extract. Normally I use for this a simple two-line DIVision with right ALIGNment. This specific extract, however, has the counterpart The Woes of the Craft page, and it was quite natural to provide a direct link to it at the end of an extract text. I wanted to have the author name and the book name ALIGNed to the Right as usual, and have the link ALIGNed to the Left (default) while keeping it on the same line with the book name. I use a TABLE with the corresponding left and right ALIGNment of data in its two columns. Please note that specification of WIDTH=100% is essential for the whole method to work as intended.
Bottom navigation bar contains several essential links within the site where user can jump after reading the text. It is centered and is separated from the text by a line space and horizontal line. Bottom navigation bar provides hyperlinks to:
The natural question which arises at this point is: why not to use instead of those two navigation bars only one, which will be always present on the screen in a separate small horizontal frame, while text will be scrolling through the big frame occupying the rest if the screen? Been there, tried that... I found frames to be too much of a trouble to maintain and to use only for a minor convenience of having a navigation always on the screen. And this convenience itself is somewhat questionable — even a small frame pinches the main content window, while for the most time of the text browsing it has no use. It seems to me, that in the case of a simple text, frame usage involves an additional complexity that is not justified by any significant improvement. Thus, I've ended up with the navigation bars both at top and bottom as a compromise between convenience and complexity. This pair permits to jump to something else at both critical points of page browsing: before reading the text and after it. If the user wants to get out somewhere in between, he/she has to scroll either to text top or bottom — this is a slight inconvenience of top/bottom bar scheme (on most modern browsers [Ctrl]+[Home]/[Ctrl]+[End] permits to jump directly to text top/bottom). It's not much of a burden in any case, since most texts on my web site are short, and [Back] button along with History window is always close at hand. My own experiments with the frames, as well as the opinions about them that I've picked from the web, made me quite sceptical about their universal usefulness. I resort to them on my site only when there is serious functional justification for that.
Bottom of the text Anchor enables jump from the text top to text bottom.
The last 3 lines close global text FONT, HTML text BODY, and the entire HTML itself.
Note: To maintain a proper formatting of this document some excessively long lines of code have been split into parts (for a presentation purpose only). Those line split points are indicated either by a light green highlighted space " " (line splicing is optional), or by a light red highlighted space " " (line splicing is required for code to be valid). Regardless of the sliced line parts' indentation, line splicing should be done in such a way that first character of the next line part follows immediately the corresponding split point indicator (highlighted space) of the previous line part.