CLC-INTERCAL Reference

Character sets

Table of contents:

EBCDIC

Normally, the compiler requires the program source to be in EBCDIC, although there are compiler options to translate from ASCII or Baudot. Since there isn't such thing as a standard EBCDIC, we have designed our own non-standard one. The principle is simple: for each character, we selected a code which was used for that character by at least one IBM terminal. However, to guarantee incompatibility, our set differs in at least one character from any IBM hardware for which we have been able to find documentation.

Here's the character table:

+0123456789abcdef
00        BSPTABLF  CR  
10                
20                
30                
40SP         ¢.<(+!
50&         ]$*);¬
60-/   xor    |,%_>?
70          :#@'="
80 abcdefghi      
90 jklmnopqr  { [ 
a0 ~stuvwxyz     ®
b0^£  ©           
c0 ABCDEFGHI      
d0 JKLMNOPQR  }   
e0  STUVWXYZ      
f00123456789     DEL

Baudot

While the compiler and runtime accept ASCII and EBCDIC for input/output, internally everything is represented in extended Baudot. The "letters" and "figures" sets are identical to the standard Baudot, but we have a nonstandard convention that shifting to letters while already in letters causes a shift to lowercase letters, and shifting to figures while already in figures causes a shift to a set containing special characters. Thus to guarantee uppercase letters one woule first shift to figures and then to letters, for example. If this extended Baudot is sent to a teletype which understands standard Baudot, the result will be a text in ALL CAPS with some of the symbols it cannot print replaced with others it can.

Here's the character table:

CodeUppercaseLowercaseFiguresSymbols
00Invalid code
01Ee3¢
02Line Feed
03Aa-+
04Space
05SsBell\
06Ii8#
07Uu7=
08Carriage Return
09Dd$*
10Rr4{
11Jj'~
12Nn,xor
13Ff!|
14Cc:^
15Kk(<
16Tt5[
17Zz"}
18Ww)>
19Ll2]
20HhInvalidbackspace
21Yy6@
22Pp0Invalid
23Qq1£
24Oo9¬
25Bb?delete
26Gg&Invalid
27FiguresSymbols
28Mm.%
29Xx/_
30Vv;Invalid
31LowercaseUppercase

Hollerith

CLC-INTERCAL 1.-94 introduces support for the "Hollerith" character set, for compatibility with punched card devices and similar. A column in a punched card corresponds to 12 bits, so tail registers can store one character per element (with 4 bits wasted); similarly, a Hollerith file requires two bytes per character. The first byte contains punch lines 12, 0, 2, 4, 6, 8; the second byte contains lines 11, 1, 3, 5, 7, 9. The 12 bit number corresponding to one column is therefore the interleave of the two bytes. The two most significant bits in each bytes are ignored; when producing Hollerith, CLC-INTERCAL will clear bit 7 and set bit 6 to the complement of bit 5: the result will be printable on an ASCII terminal, although it is unlikely to be easy to read.

The Hollerith encoding used by CLC-INTERCAL is an extension of one of the many character sets used for punched cards; lowercase are added by overpunching the corresponding uppercase character with a single extra hole. Some extra characters useful for INTERCAL programs have also been added.

Overpunches, where two different characters are punched on the same column are fully supported: when converting from Hollerith to another character set, these may result in sequences of characters.

The following three cards summarise the encoding. The third card shows two examples of overpunch and some control characters which do not exist in real punched cards, but may be useful for storing virtual punched cards in a file (real punched cards would just use a new card for a carriage return, newline sequence)).

 ' !"#$%&()*+,-./:;<=>?@[\]^_`{|}~¢¥0123456789 
12   *   * * *  *     *    * *   * *           12
11     *    *  *    * **    **  * * *          11
0  *   * *   *  ****    *     *   ***         0
1               *                 *  *        1
2*  *  *              *               *       2
3    **      * *                  *    *      3
4        ***       * * ** *   * *       *     4
5       *        *  *          *   *     *    5
6                 *        * ** *         *   6
7  *                    ***      *         *  7
8*  ******** * * ******* * * * *            * 8
9  *                             *           *9
 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 
12*********                 *********                 12
11         *********                 *********        11
0                  *********        *        ********0
1*        *                **       **       *       1
2 *        *       *        **       **      **      2
3  *        *       *        **       **      **     3
4   *        *       *        **       **      **    4
5    *        *       *        **       **      **   5
6     *        *       *        **       **      **  6
7      *        *       *        **       **      ** 7
8       *        *       *        **       **      **8
9        *        *       *        *        *       *9
  [] ". NL CR HT  
12***  12
11   * 11
0*   *0
1  ***1
2  ***2
3 ****3
4* ***4
5  ***5
6  ***6
7*****7
8 ****8
9  ***9
PLEASE NOTE: Versions of CLC-INTERCAL before 1.-94.-2 have a bug which causes a rabbit to be represented as 12-3-2-8 instead of 12-3-7-8. Cards punched with such older versions, and containing rabbits, will need to be copied with one of the rabbit holes moved from row 2 to row 7.
Back