WebHeadStart.org is currently in beta. Please pardon our appearance as we work to provide you with the most comprehensive
reference on today's web technologies.
Interested in advertising on WebHeadStart? Become an advertising partner today!
Declaring The XHTML File Encoding
XHTML files, along with any other text files, are saved using a particular
character encoding. Since there are many different character encoding
in the world, and you have no idea what your visitor's browser default settings
are, it's always a good idea to explicitly declare which encoding you used to
make your Web page. Here's an example of how to declare the character encoding,
in this case, the Unicode encoding is used:
Now, when a browser sees this <meta> tag, it will know that the page was
encoded using UTF-8, and will display properly (provided that you really did
encode the page in UTF-8). XHTML requires that you declare the encoding if it
is anything other then the default UTF-8 or UTF-16 encodings. You can also use
the XML Declaration to specify character encodings.
International Character Set Codes
A note on XHTML Internationalization
Internationalization, the process of making a Web site available in different languages
for the global Internet community is sometimes refered to in short hand as simply: i18n -
the 18 corresponds to the 18 characters in between the starting "I" and the ending "n" in the
word "Internationalization".
The World Wide Web Consortium (W3C) highly recommends the use of UTF-8
wherever possible - UTF-8 can be used for all languages and is the recommended
charset on the Internet. Support for it is rapidly increasing. That being said,
here is a partial listing of languages, countries, and the older charsets typically
used for them:
Language (country)
Charset
Afrikaans (AF)
iso-8859-1, windows-1252
Albanian (SQ)
iso-8859-1, windows-1252
Arabic (AR)
iso-8859-6
Basque (EU)
iso-8859-1, windows-1252
Bulgarian (BG)
iso-8859-5
Byelorussian (BE)
iso-8859-5
Catalan (CA)
iso-8859-1, windows-1252
Croatian (HR)
iso-8859-2, windows-1250
Czech (CS)
iso-8859-2
Danish (DA)
iso-8859-1, windows-1252
Dutch (NL)
iso-8859-1, windows-1252
English (EN)
iso-8859-1, windows-1252
Esperanto (EO)
iso-8859-3*
Estonian (ET)
iso-8859-15
Faroese (FO)
iso-8859-1, windows-1252
Finnish (FI)
iso-8859-1, windows-1252
French (FR)
iso-8859-1, windows-1252
Galician (GL)
iso-8859-1, windows-1252
German (DE)
iso-8859-1, windows-1252
Greek (EL)
iso-8859-7
Hebrew (IW)
iso-8859-8
Hungarian (HU)
iso-8859-2
Icelandic (IS)
iso-8859-1, windows-1252
Inuit (Eskimo) languages
iso-8859-10*
Irish (GA)
iso-8859-1, windows-1252
Italian (IT)
iso-8859-1, windows-1252
Japanese (JA)
shift_jis, iso-2022-jp, euc-jp
Korean (KO)
euc-kr
Lapp
iso-8859-10* **
Latvian (LV)
iso-8859-13, windows-1257
Lithuanian (LT)
iso-8859-13, windows-1257
Macedonian (MK)
iso-8859-5, windows-1251
Maltese (MT)
iso-8859-3*
Norwegian (NO)
iso-8859-1, windows-1252
Polish (PL)
iso-8859-2
Portuguese (PT)
iso-8859-1, windows-1252
Romanian (RO)
iso-8859-2
Russian (RU)
koi8-r, iso-8859-5
Scottish (GD)
iso-8859-1, windows-1252
Serbian (SR) cyrillic
windows-1251, iso-8859-5***
Serbian (SR) latin
iso-8859-2, windows-1250
Slovak (SK)
iso-8859-2
Slovenian (SL)
iso-8859-2, windows-1250
Spanish (ES)
iso-8859-1, windows-1252
Swedish (SV)
iso-8859-1, windows-1252
Turkish (TR)
iso-8859-9, windows-1254
Ukrainian (UK)
iso-8859-5
* = scarce support in browsers.
** = Lapp doesn't have a 2-letter code, a three letter code (lap) is proposed in NISO Z39.53.
*** = Serbian can be written in Latin (most commonly used) and Cyrillic (mostly windows-1251).