American Standard Code for Information Interchange


A chart of ASCII from a 1972 printer manual. 16x8.


The American Standard Code for Information Interchange (ASCII) is a character encoding scheme originally based on the `English alphabet`_ that encodes 128 specified characters : the numbers 0-9, the letters "a" to "z" and "A" to "Z", some basic punctuation symbols, some control codes that originated with Teletype machines and a blank space.

In the 1980s, almost all personal computers were 8-bit, meaning that bytes could hold values ranging from 0 to 255. ASCII codes only went up to 127, so some machines assigned values between 128 and 255 to accented characters. Different machines had different codes, however, which led to problems exchanging files. Eventually various commonly used sets of values for the 128-255 range emerged. Some were true standards, defined by the International Standards Organization, and some were de facto conventions that were invented by one company or another and managed to catch on. [1]

255 characters aren't very many. For example, you can't fit both the accented characters used in Western Europe and the Cyrillic alphabet used for Russian into the 128-255 range because there are more than 128 such characters. [1]

You could write files using different codes (all your Russian files in a coding system called KOI8, all your French files in a different coding system called Latin1), but what if you wanted to write a French document that quotes some Russian text? In the 1980s people began to want to solve this problem, and the Unicode standardization effort began. [1]

Most modern character-encoding schemes are based on ASCII, though they support many additional characters.

ASCII developed from telegraphic codes. Its first commercial use was as a 7-bit teleprinter code promoted by Bell data services. Work on the ASCII standard began on October 6, 1960, with the first meeting of the American Standards Association's (ASA) X3.2 subcommittee. The first edition of the standard was published during 1963,[3][4] a major revision during 1967,[5] and the most recent update during 1986.

Compared to earlier telegraph codes, the proposed Bell code and ASCII were both ordered for more convenient sorting (i.e., alphabetization) of lists, and added features for devices other than teleprinters.


1   Function

2   Substance

ASCII defined numeric codes for various characters, with the numeric values running from 0 to 127. For example, the lowercase letter ‘a’ is assigned 97 as its code value. [1]

ASCII includes definitions for 128 characters. 33 are non-printing control characters (many now obsolete) that affect how text and space are processed and 95 printable characters, including the space (which is considered an invisible graphic.

ASCII reserves the first 32 codes (numbers 0–31 decimal) for control characters: codes originally intended not to represent printable information, but rather to control devices (such as printers) that make use of ASCII, or to provide meta-information about data streams such as those stored on magnetic tape. For example, character 10 represents the "line feed" function (which causes a printer to advance its paper), and character 8 represents "backspace".

ASCII was an American-developed standard, so it only defined unaccented characters. There was an ‘e’, but no ‘é’ or ‘Í’. This meant that languages which required accented characters couldn’t be faithfully represented in ASCII. (Actually the missing accents matter for English, too, which contains words such as ‘naïve’ and ‘café’, and some publications have house styles which require spellings such as ‘coöperate’.) [1]

2.1   Control characters

2.1.1   Carriage return

The carriage return is a control character in ASCII code, Unicode, EBCDIC, and many other codes. It commands a printer, or other output system such as a display, to move the position of the cursor to the first position on the same line.

\r denotes the carriage return in the C programming language and many other languages influenced by it.

Carriage return by itself provided the ability to overprint the line with new text. Writing \r will move the cursor back to the beginning of the line. For example, this can be used to display a percentage counter:

import time
import sys

for i in range(100):
    sys.stdout.write("\r%d%%" % i)

3   History

The American Standard Code for Information Interchange (ASCII) was developed under the auspices of a committee of the American Standards Association, called the X3 committee, by its X3.2 (later X3L2) subcommittee, and later by that subcommittee's X3.2.4 working group. The ASA became the United States of America Standards Institute or USASI[15] and ultimately the American National Standards Institute.

ASCII was the most commonly used character encoding on the World Wide Web until December 2007, when it was surpassed by UTF-8, which includes ASCII as a subset.

? standardized ASCII in 1968. [1]

4   References

[1](1, 2, 3, 4, 5, 6) Unicode HOWTO.