Adverts

Open Access Articles- Top Results for Help:Multilingual support

Help:Multilingual support

For help for visitors to the English-language Wikipedia, see Wikipedia:Local embassy.

Articles on the English Wikipedia may contain words or texts written in different languages and scripts. To be able to correctly view and edit these articles requires that you have the appropriate fonts installed and to have correctly configured your operating system and browser. This guide will help you to do so.

Overview

Unicode

Articles on Wikipedia are encoded using Unicode (specifically UTF-8)[1], an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. Because UTF-8 is backwards compatible with ASCII, and most modern browsers have at least basic Unicode support, most users will experience little difficulty reading and editing Wikipedia.

For older browsers, MediaWiki, the Wikipedia software, serves the wikitext in a safe mode upon editing. Characters that cannot be represented in ASCII are temporarily converted to hexadecimal character references, looking like ሴ. Existing hexadecimal character references get an additional leading zero so they are not converted to actual characters when the page is saved, and look like ሴ. Likewise, to create a hexadecimal character reference in safe mode, not the character itself, a leading zero should be added. One can check whether safe mode is used by editing this section. If M looks like M rather than M, safe mode is used.

Font

Most computers with Microsoft Windows, Apple's OS X and many Linux variants will already have fonts with support for Latin, Greek, Cyrillic, Hebrew, Arabic, Chinese, Japanese, Korean and the International Phonetic Alphabet installed. Many mobile devices, such as the iPhone and iPad also include such fonts. Several historic and accented characters (used in the transliteration of foreign scripts) may be missing, though.

Microsoft fonts

Font Included with Scripts Description
Arial Unicode MS [1] Western, Japanese, Hangul, Johab, Big5, GB 2312, Hebrew, Arabic, Greek, Turkish, Baltic, Central European, Celtic, Cyrillic, Thai and Vietnamese Supports a wide number of scripts, but is of a slightly lower quality than Arial because it lacks kerning and is not smoothed. Contains a minor bug that causes double-wide diacritics to be placed on the wrong characters.
Lucida Sans Unicode [2] Western, Hebrew, Greek, Turkish, Baltic, Central European, Cyrillic Has a much smaller character repertoire than that of Arial Unicode MS, but is more legible.
Tahoma [3] Western, Hebrew, Arabic, Greek, Turkish, Baltic, Central European, Celtic, Cyrillic, Thai and Vietnamese Has a much smaller character repertoire than that of Arial Unicode MS, but is more legible, especially (according to Meta) in terms of Arabic and Persian characters.
Microsoft Sans Serif [4]
Not to be confused with MS Sans Serif
Western, Hebrew, Arabic, Greek, Turkish, Celtic, Baltic, Central European, Cyrillic, Thai, Vietnamese Has better support for historical and accented Latin characters.

Other available unicode fonts

Bolded fonts are recommended.

Font Typeface License Format Encoding
Aboriginal Sans-serif, Serif Freeware OpenType Unicode 5.2
Charis SIL Serif Open Source OpenType Unicode 5.1
Code2002 Archive copy at the Wayback Machine Freeware (must not be altered) TrueType Unicode, plane 2
Code2001 0.919 Archive copy at the Wayback Machine Freeware (must not be altered) TrueType Unicode, plane 1
Code2000 1.171 Sans-serif Shareware (unrestricted) TrueType Unicode, plane 0
DejaVu (free font) Sans-serif, Sans-mono (Sans-serif monospaced typeface), Serif Open Source OpenType Unicode 5.1
Doulos SIL Serif Open Source OpenType Unicode 5.1
Everson Mono 3.2b4 Monospaced Shareware TrueType Unicode
Fonts for Ancient Scripts (Greek, Egyptian, cuneiform...) Aegean, Aegyptus, Akkadian, Alexander, Analecta... No license, but may be used for any purpose TrueType Unicode
Google Noto (Project to support all Unicode scripts) Sans-serif, Serif Open Source OpenType Unicode 6.2
Hanazono (80,000+ Chinese characters supported) Ming (comparable to serifed typefaces) Freeware TrueType Unicode
TITUS Cyberbit Basic Serif Non-commercial TrueType, but requires Windows to install Unicode 4.0

Browsers

Internet Explorer
supports Latin (however not all extended sets), Greek, Cyrillic, Arabic and Hebrew. Support for East Asian and some Indic scripts is available if support for this has been installed for Windows. As Internet Explorer will only use the default font for other scripts, those are usually not supported (unless the default font does).
Firefox
tries to render any character using all the fonts available on the system so multilingual support is generally good. The default rendering engine does not support complex script rendering, however. Some Linux distributions ship with a Pango-based rendering engine which does, this may currently cause some display glitches with justified text, though.
Opera
tries to render any character using all the fonts available on the system so multilingual support is also good.[2] Opera uses the operating system to perform contextual glyph selection, ligature forming, character stacking, combining character support and other character shaping tasks.[3]
Chrome
Does not support the languages of India, but otherwise renders many characters. Renders Sinhala, Gurmukhi, and Tibetan scripts in the examples below, but not Devanagari (used for Hindi), Bengali, or any of the other official languages of India.

Scripts

Avestan

The Avestan alphabet is used to write the Avestan language. It is supported by the following fonts:

Correct rendering Your computer
100px 𐬯𐬭𐬀𐬊𐬔𐬁

Canadian Aboriginal Syllabics

Canadian Aboriginal syllabics are an abugida used to write a number of First Nations languages in Canada, including Cree, Ojibwe, Naskapi, Inuktitut, Blackfoot, Sayisi, and Carrier. It is supported by the following fonts:

Correct rendering Your computer
100px ᓀᐦᐃᔭᐍᐏᐣ

Cherokee

Cherokee is supported by the following fonts:

Correct rendering Your computer
100px ᎠᏂᏴᏫᏯ

Coptic

The Coptic alphabet is used to write Coptic, the language used in Egypt before Arabic. It is currently used solely as a liturgical language, and is supported by the following fonts:

  • Alphabetum is a commercial unicode font, but it is the only font that provides Bohairic Coptic letters rather than Sahidic.
  • GNU FreeSerif
  • Noto Sans Coptic (direct download link), a font made by Google.
  • Segoe UI Symbol (Microsoft Windows font, available in Windows 7 and later)
  • Quivira: Use this for the best Coptic letter/ word spacing and sizing. It provides full Unicode support for all Coptic letters.
Correct rendering Your computer
20px20px20px20px20px20px20px20px20px20px ⲙⲛⲧⲣⲙⲛⲕⲏⲙⲉ

Cuneiform

The cuneiform script was primarily used to write Akkadian and Sumerian (including Assyrian and Babylonian). It is supported by the following fonts:

Correct rendering Your computer
150px 𒅎𒀝𒂵𒌈

Deseret

Deseret is supported by the following fonts:

Correct rendering Your computer
x20px 𐐔𐐯𐑅𐐨𐑉𐐯𐐻 𐐈𐑊𐑁𐐩𐐺𐐯𐐻

East Asian

Script Correct rendering Your computer
Traditional Chinese File:Chinesetexttest.png

人人生來自由,
在尊嚴和權利上一律平等。
他們有理性和良心,
請以手足關係的精神相對待。

Simplified Chinese File:SimChinesetexttest.png

人人生来自由,
在尊严和权利上一律平等。
他们有理性和良心,
请以手足关系的精神相对待。

Japanese 375px

すべての人間は、生まれながらにして自由であり、
かつ、尊厳と権利と について平等である。
人間は、理性と良心とを授けられており、
互いに同胞の精神をもって行動しなければならない。

Korean File:Korean text test.svg

모든 인간은 태어날 때부터
자유로우며 그 존엄과 권리에
있어 동등하다. 인간은 천부적으로
이성과 양심을 부여받았으며 서로
형제애의 정신으로 행동하여야 한다.

Ethiopic

The Ethiopic syllabary is used in central east Africa for Amharic, Bilen, Oromo, Tigré, Tigrinya, and other languages. It evolved from the script for classical Ge'ez, which is now strictly a liturgical language. It is supported by the following fonts:

Correct rendering Your computer
x25px ኢትዮጵያ

Indic

The following table compares how a correctly enabled computer would render the following scripts with how your computer renders them:

Script Correct rendering Your computer Help page
Bengali File:Examples.of.complex.text.rendering.Bengali.png ক + িকি Wikipedia:Bangla script display help
Devanāgarī File:Examples.of.complex.text.rendering.Devanagari.png क + िकि Template:Devfonthelp
Gujarati File:Examples.of.complex.text.rendering.Gujarati.png ક + િકિ
Gurmukhī File:Examples.of.complex.text.rendering.Gurmukhi.png ਕ + ਿਕਿ
Kannada File:Examples.of.complex.text.rendering.Kannada.png ಕ + ಿಕಿ
Malayalam File:Examples.of.complex.text.rendering.Malayalam.png ക + െകെ
Oriya File:Examples.of.complex.text.rendering.Oriya.png କ + େକେ
Sinhala 110px ඵ + ේඵේ
Tibetan File:Examples of complex text rendering Tibetan.png ར + ྐ + ྱརྐྱ
Tamil File:Examples.of.complex.text.rendering.Tamil.png க + ேகே
Telugu File:Examples.of.complex.text.rendering.Telugu.png య + ీయీ

Old Persian cuneiform

The Old Persian cuneiform script was used to write the Old Persian language. The script is encoded in block "Old Persian", code points 103A0–103DF (Unicode.org chart). It is supported by the following fonts:

Correct rendering Your computer Transliteration
kabaujiiya
  1. REDIRECT Template:Script/Cuneiform


  • This is a redirect from a title that is a shortened form of a person's full name, a book title or other more complete article title.
    • Use this rcat (not {{R from initialism}}) to tag redirects that are the initials of a person's name. For more information follow the category link.
  • This is a redirect from a page that has been moved (renamed). This page was kept as a redirect to avoid breaking links, both internal and external, that may have been made to the old page name. For more information follow the category link.
Kambujiya (Cambyses II)

Syriac/Aramaic script

Syriac and Aramaic scripts like most Semitic scripts flow from right-to-left which can cause letters to appear in the wrong order. The tag {{rtl-lang}} fixes this issue.

Most operating systems provide support for Syriac scripts natively[citation needed], however only the Madnḥāyā variety (Template:Script/Mdnh) is rendered correctly. In order to render the Serṭā (Template:Script/Serto) and Estrangelo (Template:Script/Strng) varieties, additional fonts are needed. These scripts are supported by the following fonts:

Script Correct rendering Your computer
Madnḥāyā 220px Template:Script/Mdnh
Serṭā 240px Template:Script/Serto
Estrangelo 240px Template:Script/Strng

Tifinagh script

The Tifinagh alphabet is used to write the Berber languages. IRCAM (Institut Royal de la Culture Amazighe) has a software suite developed for Windows XP that contains a Tifinagh keyboard and a font available for download here. The script is supported by the following fonts:

Correct rendering Your computer
100px ⵜⵉⴼⵉⵏⴰⵖ

South East Asian

Balinese

The Balinese script is used to write the Balinese language. The script is encoded in block "Balinese", code points 1B00–1B7F (Unicode.org chart). It is supported by the following fonts:

Correct rendering Your computer Transliteration
250px ᬩᬮᬶ᭞᭑᭞ᬚᬸᬮᬶ᭞᭑᭙᭘᭒᭟ Bali, 1 Juli 1982.

Burmese

The Burmese alphabet is used to write the Burmese language. The script is encoded in block "Myanmar", code points 1000-109F (Unicode.org chart). It is supported by the follow fonts:

Correct rendering Your computer
100px ဃ + ြ → ဃြ

Javanese

The Javanese script is used to write the Javanese language. It has been supported by Unicode 5.2 above. The Tuladha Jejeg font for Javanese is available as a webfont in the English Wikipedia, so no fonts need to be installed. This script, however, can only be displayed in a browser that supports the Graphite technology. As of July 2013 the only such browser is Firefox. Since Firefox 22 this is enabled by default (from version 11 until 21 you had to enable the setting gfx.font_rendering.graphite.enabled in about:config, but this is no longer needed). The script is supported by the following fonts:

Correct rendering 450px
Your computer, Tuladha Jejeg ꧋ꦱꦸꦒꦼꦁꦫꦮꦸꦃꦮꦺꦴꦤ꧀ꦠꦼꦤ꧀ꦲꦶꦁꦮꦶꦏꦶꦥꦺꦝꦶꦪꦃꦗꦮꦶ꧉
Transliteration Sugeng Rawuh Wonten ing Wikipédia Jawi

Lontara

The Lontara script is used to write the Buginese, Makassarese, and Mandar language. The script is encoded in block "Buginese", code points 1A00–1A1F (Unicode.org chart). It is supported by the following fonts:

Correct rendering Your computer Transliteration
100px ᨅᨔ ᨕᨘᨁᨗ Basa Ugi

Old Tagalog/Baybayin

Baybayin (also known as the Tagalog script in Unicode and Alibata) is a form of pre-Spanish Philippine writing system in which modern minority scripts in the Philippines has descended. It is supported by the following fonts:

Correct rendering Your computer
400px

ᜀᜅ᜔ ᜊᜏᜆ᜔ ᜆᜂ ᜀᜌ᜔ ᜁᜐᜒᜈᜒᜎᜅ᜔ ᜈ ᜋᜌ᜔ ᜃᜍᜉᜆᜈ᜔,
ᜀᜆ᜔ ᜉᜈ᜔ᜆᜌ᜔ ᜐ ᜇᜒᜄ᜔ᜈᜒᜇᜇ᜔,
ᜀᜆ᜔ ᜃᜍᜉᜆᜈ᜔ ᜀᜅ᜔ ᜆᜂ ᜀᜌ᜔ ᜊᜒᜈᜒᜌᜌᜀᜈ᜔ ᜅ᜔ ᜉᜄᜒᜁᜐᜒᜉ᜔,
ᜀᜆ᜔ ᜃᜍᜓᜈᜓᜅᜈ᜔ ᜈ ᜃᜁᜎᜅᜅ᜔ ᜋᜄ᜔ᜃᜁᜐ ᜐ ᜃᜉᜆᜒᜍᜈ᜔

Sundanese

The Sundanese script is used to write the Sundanese language. The script is encoded in block "Sundanese", code points 1B80–1BBF (Unicode.org chart). It is supported by the following fonts:

Correct rendering Your computer Transliteration
File:Ladrang-sunda.png ᮜᮓᮢᮀ

ᮃᮚ ᮠᮤᮏᮤ ᮛᮥᮕ ᮞᮒᮧ ᮜᮩᮒᮤᮊ᮪,
ᮆᮀᮊᮀ-ᮆᮀᮊᮀ, ᮆᮀᮊᮀ-ᮆᮀᮊᮀ,
ᮞᮧᮊ᮪ ᮜᮥᮜᮥᮔ᮪ᮎᮒᮔ᮪ ᮓᮤ ᮎᮄ,
ᮃᮛᮤ ᮘᮍᮥᮔ᮪ ᮃᮛᮦᮊ᮪ ᮞᮛᮥᮕ ᮏᮩᮀ
ᮜᮔ᮪ᮎᮂ.

Ladrang

Aya hiji rupa sato leutik,
Éngkang-éngkang, éngkang-éngkang,
Sok luluncatan di cai,
Ari bangun arék sarupa jeung lancah.

Special cases

Esperanto

In edit box In database and output
S S
Sx Ŝ
Sxx Sx
Sxxx Ŝx
Sxxxx Sxx
Sxxxxx Ŝxx

Mediawiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.

The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ. you may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (C, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.

For example, the interlanguage link [[en:Luxury car]] to en:Luxury car has to be entered in the edit box as [[en:Luxxury car]] on eo:. This has caused problems with interwiki update bots in the past.

Romanian

The Romanian alphabet contains an S-comma (Ș ș) and T-comma (Ț ț). These characters were added to Unicode 3.0 at the request of the Romanian standardization institute. As font support for these characters has been poor in the past, many computer users use the similar characters S-cedilla (Ş ş) and T-cedilla (Ţ ţ) instead. However, on Wikipedia it is recommended to use the correct characters with comma below.

See also

Notes

  1. ^ Until June 2005, when MediaWiki 1.5 came into use on the Wikimedia projects, articles on the English Wikipedia were encoded using ISO/IEC 8859-1 (although the additional characters from the Windows-1252 character set were used in practice.) All characters from the ISO/IEC 10646 Universal Character Set could be accessed through numerical entities, as specified by the HTML 4.01 specification. Since, nearly all pages have been converted to use Unicode directly.
  2. ^ http://www.opera.com/support/kb/view/435/
  3. ^ http://www.opera.com/docs/specs/#text

External links

ar:مساعدة:دعم متعدد اللغات

as:সহায়:Contents zh-min-nan:Help:Án-chóaⁿ tha̍k dv:ކޮންޕީޓަރުން ތާނަ ލިޔެކިޔުމަށް މަގެއް bpy:উইকিপিডিয়া:BN/AS/BPY script display help mr:सहाय्य:Setup For Devanagari ja:Help:特殊文字 pa:ਮਦਦ:Set up for Gurmukhi sa:सहाय्यम्:Setup For Devanagari सम्पादन tl:Tulong:Tukod para sa maraming wika ur:امدادی ہدایات برائےاردو