Indian Script Code for Information Interchange

Coding scheme for Indian writing systems
This article contains Indic text. Without proper rendering support, you may see question marks or boxes, misplaced vowels or missing conjuncts instead of Indic text.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

ISCII has not been widely used outside certain government institutions, although a variant without the ATR mechanism was used on classic Mac OS, Mac OS Devanagari,[1] and it has now been rendered largely obsolete by Unicode. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.

Background

The Brahmi-derived writing systems have similar structure. So ISCII encodes letters with the same phonetic value at the same code point, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent [ki]. This will be rendered as കി in Malayalam, कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the ATR code described below.

One motivation for the use of a single encoding is the idea that it will allow easy transliteration from one writing system to another. However, there are enough incompatibilities that this is not really a practical idea.

ISCII is an 8-bit encoding. The lower 128 code points are plain ASCII, the upper 128 code points are ISCII-specific. In addition to the code points representing characters, ISCII makes use of a code point with mnemonic ATR that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.

Codepage layout

The following table shows the character set for Devanagari. The code sets for Assamese, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu are similar, with each Devanagari form replaced by the equivalent form in each writing system. Each character is shown with its decimal code and its Unicode equivalent.

ISCII Devanagari
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x NUL SOH STX ETX EOT ENQ ACK BEL  BS   HT   LF   VT   FF   CR   SO   SI  
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN  EM  SUB ESC  FS   GS   RS   US 
2x  SP  ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ' a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8x
9x
Ax
Bx
Cx य़
Dx INV ि
Ex ATR
Fx EXT
  Undefined
  Lead byte

Special code points

INV character—code point D9 (217)
The INV (invisible consonant) character is used as a pseudo-consonant to display combining elements in isolation. For example, क (ka) + ् (halant) + INV = क्‍ (half ka). The Unicode equivalent is U+200D ZERO WIDTH JOINER (ZWJ). However, as noted below, the ISCII halant character can be doubled or combined with the ISCII nukta to achieve effects created by ZWNJ or ZWJ in Unicode. For this reason, Apple maps the ISCII INV character to the Unicode left-to-right mark, so as to guarantee round-tripping.[1]
ATR character—code point EF (239)
The ATR (attribute) character followed by a byte code is used to switch to a different font attribute (such as bold) or to a different ISCII or PASCII language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points.
Presentational attributes
ATR + byte Mnemonic Formatting option
0x30 BLD Bold
0x31 ITA Italics
0x32 UL Underlining
0x33 EXP Expanded
0x34 HLT Highlight
0x35 OTL Outline
0x36 SHD Shadow
0x37 TOP Top half of character (used with LOW to create double-height characters)
0x38 LOW Bottom half of character (used with TOP to create double-height characters)
0x39 DBL Entire row double-width and double-height
Shifts to ISCII scripts
ATR + byte Mnemonic ISCII script
0x40 DEF Default script (i.e. the script which will be switched back to after a line break)
0x41 RMN Romanised transliteration
0x42 DEV Devanagari
0x43 BNG Bengali script
0x44 TML Tamil script
0x45 TLG Telugu script
0x46 ASM Assamese script
0x47 ORI Odia script
0x48 KND Kannada script
0x49 MLM Malayalam script
0x4A GJR Gujarati script
0x4B PNJ Gurmukhī
Shifts to PASCII
ATR + byte Mnemonic PASCII locale
0x71 ARB Arabic alphabet
0x72 PES Persian alphabet
0x73 URD Urdu alphabet
0x74 SND Sindhi alphabet
0x75 KSM Kashmiri alphabet
0x76 PST Pashto alphabet
EXT character—code point F0 (240)
The EXT (extensions for Vedic) character followed by a byte code indicates a Vedic accent. This has no direct Unicode equivalent, as Vedic accents are assigned to distinct code points.
Halant character ्—code point E8 (232)
The halant character removes the implicit vowel from a consonant and is used between consonants to represent conjunct consonants. For example, क (ka) + ् (halant) + त (ta) = क्त (kta). The sequence ् (halant) + ् (halant) displays a conjunct with an explicit halant, for example क (ka) + ् (halant) + ् (halant) + त (ta) = क्‌त. The sequence ् (halant) + ़ (nukta) displays a conjunct with half consonants, if available, for example क (ka) + ् (halant) + ़ (nukta) + त (ta) = क्‍त.
Correspondences between ISCII and Unicode halent/virama behaviour
ISCII Unicode
single halant E8 halant 094D
halant + halant E8 E8 halant + ZWNJ 094D 200C
halant + nukta E8 E9 halant + ZWJ 094D 200D
Nukta character ़—code point E9 (233)
The nukta character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.
Single Unicode characters corresponding to ISCII nukta sequences
ISCII
code point
Original
character
Character
with nukta
Unicode
code point
A1 (161) 0950
A6 (166) 090C
A7 (167) 0961
AA (176) 0960
B3 (179) क़ 0958
B4 (180) ख़ 0959
B5 (181) ग़ 095A
BA (186) ज़ 095B
BF (191) ड़ 095C
C0 (192) ढ़ 095D
C9 (201) फ़ 095E
DB (219) ि 0962
DC (220) 0963
DF (223) 0944
EA (234) 093D

Code pages for ISCII conversion

To convert from Unicode (UTF-8) to an ISCII / ANSI coding, the following code pages may be used:

  • 57002: Devanagari (Hindi, Marathi, Sanskrit, Konkani)
  • 57003: Bengali
  • 57004: Tamil
  • 57005: Telugu
  • 57006: Assamese
  • 57007: Odia
  • 57008: Kannada
  • 57009: Malayalam
  • 57010: Gujarati
  • 57011: Punjabi (Gurmukhi)
  • 54654: gg

Code points for all languages

Code set for all abugidas using ISCII
Hex Official
Listing
ISO 15919 Devanagari Bengali Assamese Gurmukhi Gujarati Oriya Tamil Telugu Kannada Malayalam
A0 Sign OM Ōm̐ 0950 0AD0
A1 Vowel-modifier CHANDRABINDU 0901 0981 0981 0A01 0A81 0B01 0C01
A2 Vowel-modifier ANUSWARAM 0902 0982 0982 0A02 0A82 0B02 0B82 0C02 0C82 0D02
A3 Vowel-modifier VISARGAM 0903 0983 0983 0A03 0A83 0B03 0B83 0C03 0C83 0D03
A4 Vowel A a 0905 0985 0985 0A05 0A85 0B05 0B85 0C05 0C85 0D05
A5 Vowel AA ā 0906 0986 0986 0A06 0A86 0B06 0B86 0C06 0C86 0D06
A6 Vowel I i 0907 0987 0987 0A07 0A87 0B07 0B87 0C07 0C87 0D07
A6* Vowel LI (Sanskrit) 090C 098C 098C 0A8C 0B0C 0C0C 0C8C 0D0C
A7 Vowel II ī 0908 0988 0988 0A08 0A88 0B08 0B88 0C08 0C88 0D08
A7* Vowel LII (Sanskrit) 0961 09E1 09E1 0AE1 0B61 0C61 0CE1 0D61
A8 Vowel U u 0909 0989 0989 0A09 0A89 0B09 0B89 0C09 0C89 0D09
A9 Vowel UU ū 090A 098A 098A 0A0A 0A8A 0B0A 0B8A 0C0A 0C8A 0D0A
AA Vowel RI 090B 098B 098B 0A8B 0B0B 0C0B 0C8B 0D0B
AA* Vowel RII (Sanskrit) 0960 09E0 09E0 0AE0 0B60 0C60 0CE0 0D60
AB Vowel E (Southern Scripts) e 090E 0B8E 0C0E 0C8E 0D0E
AC Vowel EY ē 090F 098F 098F 0A0F 0A8F 0B0F 0B8F 0C0F 0C8F 0D0F
AD Vowel AI ai 0910 0990 0990 0A10 0A90 0B10 0B90 0C10 0C90 0D10
AE Vowel AYE (Devanagari Script) ê 090D 0A8D
AF Vowel O (Southern Scripts) o 0912 0B92 0C12 0C92 0D12
B0 Vowel OW ō 0913 0993 0993 0A13 0A93 0B13 0B93 0C13 0C93 0D13
B1 Vowel AU au 0914 0994 0994 0A14 0A94 0B14 0B94 0C14 0C94 0D14
B2 Vowel AWE (Devanagari Script) ô 0911 0A91
B3 Consonant KA k 0915 0995 0995 0A15 0A95 0B15 0B95 0C15 0C95 0D15
B3* Consonant QA (Urdu) q क़ 0958
B4 Consonant KHA kh 0916 0996 0996 0A16 0A96 0B16 0C16 0C96 0D16
B4* Consonant KHHA (Urdu) kh ख़ 0959 ਖ਼ 0A59
B5 Consonant GA g 0917 0997 0997 0A17 0A97 0B17 0C17 0C97 0D17
B5* Consonant GHHA (Urdu) ġ ग़ 095A ਗ਼ 0A5A
B6 Consonant GHA gh 0918 0998 0998 0A18 0A98 0B18 0C18 0C98 0D18
B7 Consonant NGA 0919 0999 0999 0A19 0A99 0B19 0B99 0C19 0C99 0D19
B8 Consonant CHA c 091A 099A 099A 0A1A 0A9A 0B1A 0B9A 0C1A 0C9A 0D1A
B9 Consonant CHHA ch 091B 099B 099B 0A1B 0A9B 0B1B 0C1B 0C9B 0D1B
BA Consonant JA j 091C 099C 099C 0A1C 0A9C 0B1C 0B9C 0C1C 0C9C 0D1C
BA* Consonant ZA (Urdu) z ज़ 095B ਜ਼ 0A5B
BB Consonant JHA jh 091D 099D 099D 0A1D 0A9D 0B1D 0C1D 0C9D 0D1D
BC Consonant JNA ñ 091E 099E 099E 0A1E 0A9E 0B1E 0B9E 0C1E 0C9E 0D1E
BD Consonant Hard TA 091F 099F 099F 0A1F 0A9F 0B1F 0B9F 0C1F 0C9F 0D1F
BE Consonant Hard THA ṭh 0920 09A0 09A0 0A20 0AA0 0B20 0C20 0CA0 0D20
BF Consonant Hard DA 0921 09A1 09A1 0A21 0AA1 0B21 0C21 0CA1 0D21
BF* Consonant Flapped DA ड़ 095C ড় 09DC ড় 09DC 0A5C ଡ଼ 0B5C
C0 Consonant Hard DHA ḍh 0922 09A2 09A2 0A22 0AA2 0B22 0C22 0CA2 0D22
C0* Consonant Flapped DHA ṛh ढ़ 095D ঢ় 09DD ঢ় 09DD ଢ଼ 0B5D
C1 Consonant Hard NA 0923 09A3 09A3 0A23 0AA3 0B23 0BA3 0C23 0CA3 0D23
C2 Consonant Soft TA t 0924 09A4 09A4 0A24 0AA4 0B24 0BA4 0C24 0CA4 0D24
C3 Consonant Soft THA th 0925 09A5 09A5 0A25 0AA5 0B25 0C25 0CA5 0D25
C4 Consonant Soft DA d 0926 09A6 09A6 0A26 0AA6 0B26 0C26 0CA6 0D26
C5 Consonant Soft DHA dh 0927 09A7 09A7 0A27 0AA7 0B27 0C27 0CA7 0D27
C6 Consonant Soft NA n 0928 09A8 09A8 0A28 0AA8 0B28 0BA8 0C28 0CA8 0D28
C7 Consonant NA (Tamil) 0929 0BA9
C8 Consonant PA p 092A 09AA 09AA 0A2A 0AAA 0B2A 0BAA 0C2A 0CAA 0D2A
C9 Consonant PHA ph 092B 09AB 09AB 0A2B 0AAB 0B2B 0C2B 0CAB 0D2B
C9* Consonant FA (Urdu) f फ़ 095E ਫ਼ 0A5E 0CDE
CA Consonant BA b 092C 09AC 09AC 0A2C 0AAC 0B2C 0C2C 0CAC 0D2C
CB Consonant BHA bh 092D 09AD 09AD 0A2D 0AAD 0B2D 0C2D 0CAD 0D2D
CC Consonant MA m 092E 09AE 09AE 0A2E 0AAE 0B2E 0BAE 0C2E 0CAE 0D2E
CD Consonant YA y 092F 09AF 09AF 0A2F 0AAF 0B2F 0BAF 0C2F 0CAF 0D2F
CE Consonant JYA (Bengali, Assamese & Oriya) य़ 095F য় 09DF য় 09DF 0B5F
CF Consonant RA 0930 09B0 ৰ︎ 09F0 0A30 0AB0 0B30 0BB0 0C30 0CB0 0D30
D0 Consonant Hard RA (Southern Scripts) 0931 0BB1 0C31 0CB1 0D31
D1 Consonant LA l 0932 09B2 09B2 0A32 0AB2 0B32 0BB2 0C32 0CB2 0D32
D2 Consonant Hard LA 0933 ਲ਼ 0A33 0AB3 0B33 0BB3 0C33 0CB3 0D33
D3 Consonant ZHA (Tamil & Malayalam) 0934 0BB4 0D34
D4 Consonant VA v 0935 09AC 09F1 0A35 0AB5 0B35 0BB5 0C35 0CB5 0D35
D5 Consonant SHA ś 0936 09B6 09B6 ਸ਼ 0A36 0AB6 0B36 0BB6 0C36 0CB6 0D36
D6 Consonant Hard SHA 0937 09B7 09B7 0AB7 0B37 0BB7 0C37 0CB7 0D37
D7 Consonant SA s 0938 09B8 09B8 0A38 0AB8 0B38 0BB8 0C38 0CB8 0D38
D8 Consonant HA h 0939 09B9 09B9 0A39 0AB9 0B39 0BB9 0C39 0CB9 0D39
D9 Consonant INVISIBLE
DA Vowel Sign AA ā 093E 09BE 09BE 0A3E 0ABE 0B3E 0BBE 0C3E 0CBE 0D3E
DB Vowel Sign I i ि 093F ি 09BF ি 09BF ਿ 0A3F િ 0ABF ି 0B3F ி 0BBF ి 0C3F ಿ 0CBF ി 0D3F
DB* Vowel Sign LI (Sanskrit) 0962 09E2 09E2 0AE2 0B62 0C62 0CE2 0D62
DC Vowel Sign II ī 0940 09C0 09C0 0A40 0AC0 0B40 0BC0 0C40 0CC0 0D40
DC* Vowel Sign LII (Sanskrit) 0963 09E3 09E3 0AE3 0B63 0C63 0CE3 0D63
DD Vowel Sign U u 0941 09C1 09C1 0A41 0AC1 0B41 0BC1 0C41 0CC1 0D41
DE Vowel Sign UU ū 0942 09C2 09C2 0A42 0AC2 0B42 0BC2 0C42 0CC2 0D42
DF Vowel Sign RI 0943 09C3 09C3 0AC3 0B43 0C43 0CC3 0D43
DF* Vowel Sign RII (Sanskrit) 0944 09C4 09C4 0AC4 0B44 0C44 0CC4 0D44
E0 Vowel Sign E (Southern Scripts) e 0946 0BC6 0C46 0CC6 0D46
E1 Vowel Sign EY ē 0947 09C7 09C7 0A47 0AC7 0B47 0BC7 0C47 0CC7 0D47
E2 Vowel Sign AI ai 0948 09C8 09C8 0A48 0AC8 0B48 0BC8 0C48 0CC8 0D48
E3 Vowel Sign AYE (Devanagari Script) ê 0945 0AC5
E4 Vowel Sign O (Southern Scripts) o 094A 0BCA 0C4A 0CCA 0D4A
E5 Vowel Sign OW ō 094B 09CB 09CB 0A4B 0ACB 0B4B 0BCB 0C4B 0CCB 0D4B
E6 Vowel Sign AU au 094C 09CC 09CC 0A4C 0ACC 0B4C 0BCC 0C4C 0CCC 0D4C
E7 Vowel Sign AWE (Devanagari Script) ô 0949 0AC9
E8 Vowel Omission Sign (Halant) 094D 09CD 09CD 0A4D 0ACD 0B4D 0BCD 0C4D 0CCD 0D4D
E9 Diacritic Sign (Nuktam) 093C 09BC 09BC 0A3C 0ABC 0B3C 0CBC
EA Full Stop (Viram, Northern Scripts) 0964
EA* Vowel Stress Sign AVAGRAH 093D 09BD 09BD 0ABD 0B3D 0C3D 0CBD 0D3D
EB Unused
EC Unused
ED Unused
EE Unused
EF Attribute Code
F0 Extension Code
F1 Digit 0 0966 09E6 09E6 0A66 0AE6 0B66 0BE6 0C66 0CE6 0D66
F2 Digit 1 0967 09E7 09E7 0A67 0AE7 0B67 0BE7 0C67 0CE7 0D67
F3 Digit 2 0968 09E8 09E8 0A68 0AE8 0B68 0BE8 0C68 0CE8 0D68
F4 Digit 3 0969 09E9 09E9 0A69 0AE9 0B69 0BE9 0C69 0CE9 0D69
F5 Digit 4 096A 09EA 09EA 0A6A 0AEA 0B6A 0BEA 0C6A 0CEA 0D6A
F6 Digit 5 096B 09EB 09EB 0A6B 0AEB 0B6B 0BEB 0C6B 0CEB 0D6B
F7 Digit 6 096C 09EC 09EC 0A6C 0AEC 0B6C 0BEC 0C6C 0CEC 0D6C
F8 Digit 7 096D 09ED 09ED 0A6D 0AED 0B6D 0BED 0C6D 0CED 0D6D
F9 Digit 8 096E 09EE 09EE 0A6E 0AEE 0B6E 0BEE 0C6E 0CEE 0D6E
FA Digit 9 096F 09EF 09EF 0A6F 0AEF 0B6F 0BEF 0C6F 0CEF 0D6F
FB Unused
FC Unused
FD Unused
FE Unused
FF Unused

References

  1. ^ a b Apple (2005-04-05) [1998-02-05]. "Map (external version) from Mac OS Devanagari encoding to Unicode 2.1 and later". Unicode Consortium.

External links

  • Converters from/to ISCII to/from various fonts
  • The ISCII 1991 standard (PDF)
  • Padma – Mozilla extension for transforming ISCII to Unicode Archived 2019-10-01 at the Wayback Machine
  • Padma – Transformer from ISCII to Unicode for Telugu
  • PHP script for ISCII to and from Unicode
  • v
  • t
  • e
Early telecommunicationsISO/IEC 8859Bibliographic useNational standardsISO/IEC 2022Mac OS Code pages
("scripts")DOS code pagesIBM AIX code pagesWindows code pagesEBCDIC code pagesDEC terminals (VTx)Platform specificUnicode / ISO/IEC 10646TeX typesetting systemMiscellaneous code pagesControl characterRelated topics
Character sets