HTML 2: Text processing, Characters & Fonts
Word processors use formatting languages such as Rich Text Format (rtf), Microsoft Word (doc) and Lotus Word Pro (lwp), to precisely specify the format or layout of text. On the other hand, markup languages are concerned with the structure of text based information rather than the precise layout of individual characters.
The idea behind MarkUp languages is to establish an information management framework which structures data so that software produced by various vendors can present and manipulate information across a range of platforms and application packages.
HTML is a mixture between a formatting and a MarkUp language. HTML tags may represent structural entities such as:
or specify specify specific layout characteristics:
Text processing refers to the ability to manipulate words, lines, and pages. Typically, the term text refers to text stored as ASCII codes (that is, without any formatting). Objects that are not text include graphics, numbers (if they're not stored as ASCII characters), and program code. (http://www.webopedia.com/TERM/T/text.html)
All text consists of characters. The set of legal document characters is referred to as the character set.
The set of legal document characters together with their representation at the binary level is referred to as the character encoding
ASCII (American Standard Code for information interchange) has 128 characters (27 = 128). This is just enough characters for all the English upper and lower case letters, digits 0-9, some special characters and other control characters such as line breaks.
In ASCII the English characters are represented as numbers, with each letter assigned a number from 0 to 127. Most computers use ASCII codes to represent text, which makes it possible to transfer data from one computer to another. (http://www.webopedia.com/TERM/T/text.html)
Text files stored in ASCII format are sometimes called ASCII files. Text editors and word processors are usually capable of storing data in ASCII format, although ASCII format is not always the default storage format. Most data files, particularly if they contain numeric data, are not stored in ASCII format. Executable programs are never stored in ASCII format. (http://www.webopedia.com/TERM/T/text.html)
Extended ASCII & ISO-Latin 1
There are several larger character sets that use 8 bits, which gives them 128 additional characters (28 = 256). The extra characters are used to represent non-English characters, graphics symbols, and mathematical symbols. Several companies and organizations have proposed extensions for these 128 characters. The DOS operating system uses a superset of ASCII called extended ASCII or high ASCII. A more universal standard is the ISO Latin 1 set of characters, which is used by many operating systems, as well as Web browsers. (http://www.webopedia.com/TERM/T/text.html)
A comprehensive list, which includes the extended ASCII character set, can be found at: http://www.webopedia.com/quick_ref/asciicode.asp
The Extended Bindary Coded Decimal Interchange Code (EBCDIC) is used by some IBM mainframes. EBCDIC provides for 256 characters.
Windows 1252 provides a basic character encoding for Microsoft Windows.
Non English languages such as Chinese, Japanese and Korean can NOT be adequately represented using only 256 characters, so a different character encoding was developed. Unicode includes ASCII as a subset but also caters for many thousands of additional characters. Unicode can represent 64000+ characters. There is even a proposal to get Unicode to support the full set of Star Trek characters!
Writing HTML Using ASCII
You can get a web browser to write HTML special characters and all the plain English text by calling the decimal ASCII value using an escape sequence. I can write
in a paragraph with the space and the exclamation mark like this.
Paste the code into an HTML file and try it.
Make a copy of a blank HTML page and rename it char_fonts.htm Put in a level 1 heading and title:
Title and H1: HTML Characters and Fonts
followed by a level 2 heading:
H2: Using ASCII Characters
then a paragraph in which you write your name using the ASCII characters
<P>Write your 1st name and surname here in ASCII also use the ASCII sequence for the space</P>
You can use the ASCII escape sequences shown at: http://www.webopedia.com/quick_ref/asciicode.asp
Fonts & Typefaces
In recent times the term font has been used to describe a type-face, which is a prescriptive definition of how to present the various characters available in any particular character set.
Characters conforming to any given typeface can be presented in a range of:
There are two main groups of typefaces (Computing Studies - GK Powers p. 273)
Task Fonts Part B
Open your HTML file char_fonts.htm and after your name in ASCII put another level 2 heading:
Now make a table having 26 normal rows (plus a header row). Each row must represent one letter of the alphabet as shown.
The body of the document should also contain the following notes:
The following code was used to render the first two rows in the above table
Once you have finished the alphabet also do 5 extra rows for the numbers 1,2 & 3,4 & 5,6 & 7,8 & 9,0.
Task Fonts Part C
In the html file char_fonts.htm directly underneath the above table make a level 2 heading:
H2: Text Styles
and then enter the following text:
<P>There are over 20 tags which determine Text style. These can be classified as either Logical or Physical styles. Logical styles are concerned purely with the purpose of the style, while physical styles specify the way the text is meant to look.</P>
You must also add to the file all of the logical and physical tags shown below. The word describing the style should be formatted in the relevant style. For example, you would code line 1 like this:
H3: Logical Tags
H3: Physical Tags
The XML Handbook. CF Goldfarb, P Prescod P. Prentice Hall (2001). ISBN 0 13 055068-X