The l2h help pages   © Leon van Dommelen 
The latest version of this document is online at eng.famu.fsu.edu or at dommelen.net.

Internationalization

Simply put, this web page is concerned with dealing with characters that are not the normal English (i.e. plain ASCII) ones. So if you write only in English or Dutch, you can skip this web page completely.

If you write in a Western European language except Greek, it is not much different. Any of the example documents 1 though 4, (as found on the examples.html web page) will probably tell you all you need to deal with those characters.

That then leaves the people who do have to deal with Greek or nonWestern European characters. To keep it simple, such characters will be called "nonlatin" on this web page.

The first thing they should note that the author of this page knows almost nothing about internationalization. He was born Dutch and now writes in English. However, he did figure out how to get various characters to show up using l2h. And that is all you will find on this web page. Do not ask the author about any other internationalization issue and expect more than a "Huh?" response.

There are three possible ways to deal with nonlatin characters:

  1. Base your document on one of the examples 1 through 4 and use some trick to put in a few nonlatin characters.
  2. Use an old style encoding supported by LaTeX, (and by LaTeX2HTML, if you want to make web pages).
  3. Base your document on example 5 or 6 and put in any character in any language you want without tricks.
The following three major sections address each of these possibilities in turn.

1. Use example 1 through 4 with a few nonlatin characters

This discusses how, if you base your document on one of the examples 1 through 4, you can put in some nonlatin characters using a trick.

All l2h examples, including also 5 and 6, assume that you enter your international characters in "UTF-8" form into index.tex. So the first thing you need to know is how to do that.

As an example, consider character "ñ" (n-tilde). (You can of course use the standard LaTeX encoding for this character, where it is written as "\~n". But you are not going to get nonlatin characters that way.) To enter it as "UTF-8", the first thing you need to know is that its (hex) "unicode" number is "F1", (often written as "00F1"). To figure out what unicode number a given character has, you can simply use a browser. There are extensive tables on the web to tell you what character has what unicode number. The "Character Map" program mentioned below can also tell you those numbers.

Next the documentation of your editor should tell you how to put these characters in if you know their unicode number. (If not, upgrade to a more recent editor.)

For example, to enter the example character ñ, number F1, in Windows, you usually type "F 1 Alt+x", without the quotes or spaces. Here "Alt+x" means that you press "x" while holding down the Alt key. This method works for WordPad, for example. Unfortunately, it does not work for Notepad. For Notepad, you can use the "Character Map" method described below. For TeXstudio, use "Ctrl+Alt+u F 1 Return".

To enter the example character ñ, number F1, in linux you usually type "Ctrl+Shift+u f 1 Return", without the quotes or spaces. Here "Ctrl+Shift+u" means that you press "u" while holding down the Ctrl and Shift keys. Note the lowercase f. This method works for gedit as well as for Firefox, the gimp, and gnome terminal. For TeXstudio, use "Ctrl+Alt+u" instead of "Ctrl+Shift+u". For emacs, use "Ctrl+q" instead of "Ctrl+Shift+u" and make sure that your "read-quoted-char-radix" option is set to 16, hex.

You can also pick the characters you need out of a list using the "Character Map" (charmap) program that comes with MS Windows or Linux. Open character map. (In MS Windows, it is at "Start", "All Programs", "Accessories", "System Tools". Or search for "charmap".) Select the font that has the characters you need. On Windows more recent than XP, "Times New Roman" and "Arial Unicode MS" have large amounts of Greek and Cyrillic characters. On linux, first install "Linux Libertine O" for such characters. For CJK (Chinese-Japanese-Korean) characters, you will need to look at specialized fonts. "SimSun" on Microsoft is a typical choice for Chinese. With the right font selected, and maybe the appropriate "Script" or "Unicode Subrange" too, double-click the characters you want. They will accumulate in a textbox that you can "Copy" and "Paste" into your editor window with the latex source. (Paste is normally on the "Edit" menu of an editor.)

Windows Notepad users only: If you create a document from scratch using Windows Notepad, you need to be careful how you save it for the first time. From the "File" menu, choose "Save As...". Then make the name to save as "index.tex" and make sure that it says "All Files", not "Text Files" in the drop down box. Otherwise stupid Notepad is going to save it as "index.tex.txt". And make sure that it says "UTF-8", not "Unicode" or something like that, in the encoding drop down box. When that is all OK, you can save. After that, you should be able to save further changes simply using "Save", or better, by pressing "s" while holding down "Ctrl".

But knowing how to enter the UTF-8 characters into index.tex, or elsewhere, is only half the story. You want them to eventually show up in the final document. The next subsections tell you how to do that for examples 1 through 4.

1.1 Getting UTF-8 characters into the text of the web pages.

That is easy: just put them in the latex source file index.tex using your editor. Any UTF-8 characters, in any language, in the text pass through unmolested onto the web pages. There your browser should display them correctly (if it has the needed fonts).

However, this does not work inside figures, (except in their captions), mathematics, or similar. The reason is that images are made of these. Images are made by plain latex, which does not understand nonlatin UTF-8 characters, except a few. So if the nonlatin character must be inside an equation or figure or whatever, you are out of luck. The method of the next subsection will have to be used.

One exception might be worth mentioning: If the nonlatin text is a simple separate part inside a figure, you may be able to avoid having an image made of it using the "makeimage" command. For example, if you have a figure with a few lines of text in between two plots or pictures, you would use something like:

\begin{figure}
   \centering
   \begin{makeimage}
      \putpicture{FIRST_PICTURE_OR_PLOT}
   \end{makeimage}
   \\
   Nonlatin blah blah, any UTF-8 characters OK.
   \\
   \begin{makeimage}
      \putpicture{SECOND_PICTURE_OR_PLOT}
   \end{makeimage}
   \caption{More nonlatin blah blah, any UTF-8 characters OK.}
\end{figure}
In the above example, images will only be made of the parts inside the makeimage commands, not of the text in between the two makeimage environments.

1.2 Getting UTF-8 characters into the pdf as well.

If you want nonlatin characters to show up in the pdf version of the document as well, you will need to make an image of the text. You can do that with an image manipulation program like the commercial Adobe Photoshop, or its free clone the gimp. Make the text big; you would want at least 300 pixels per inch when resized to normal text size using \resizebox. Save in an appropriate format.

As an alternative way to make the image, you could load a web page with the desired text in your web browser. Magnify the text to high resolution using the "View" menu of the browser or using "Ctrl+Shift+=" (i.e. hit the "+/=" key while holding down "Ctrl" and "Shift".) Then press the "Print Screen" key. Usually that will take a picture of the entire computer screen. You can then use a variety of image manipulation program, including the two above, but also the simpler paint, xpaint, ..., to crop the image down to the text you want.

As still another alternative, the way I understand it, gnuplot will handle UTF-8 characters if you use the "pdfcairo" terminal instead of "postscript". Create a plot with no axes, no curve, no tick marks, just a label? This might solve the resolution problem. However, I have not tried it.

Whatever method you choose, the image can be put in index.tex using the \putpicture (i.e. \epsffile or \includegraphics) command.

If you have a jpg, pdf, or gif image but you need an eps, (examples 1 and 2), put the image inside the "any_eps" subfolder inside the "convert" folder inside "l2h". Then double-click the appropriate "convert_..." icon. An eps will be made

If you have an eps image but you also need an noneps one, (examples 3 and 4), put the image inside the "eps_pdf" subfolder inside the "convert" folder inside "l2h". Then double-click the appropriate "convert_eps" icon. A pdf will be made.

2. Use an old style encoding

The current preferred way to enter nonlatin characters is "UTF-8" encoding. However, there are many older encoding schemes, and LaTeX supports a number of them, as does LaTeX2HTML. You might want to use one of these. For example, LaTeX2HTML supports the hebrew encoding, nowadays known as ISO-8859-8. So maybe you want to use this encoding in your documents.

Why would you want that? Maybe because you have existing documents that use this encoding. Or maybe it is easier for you to type in hebrew characters in ISO-8859-8 than in UTF-8. (If so, time to upgrade your editor?)

Note that neither of the reasons above is compelling. If you have an ISO-8859-1 document, you can put a copy in the "ISO-8859-8_UTF-8" folder inside the "convert" folder inside "l2h" and double-click the convert_tex icon. This will convert the ISO-8859-8 characters to UTF-8. Then all you would need to do is replace \usepackage[hebrew]{\inputenc} by the internationalization header of the appropriate example 1 through 6.

However, if you do not want to do that, this section is for you.

First a disclaimer. The author has not actually tried anything described in this section. It is all second-hand info.

There should not be any particular problem in making a pdf of the document using l2h. All l2h does here is simply run plain latex, or pdflatex if set in \pdfengine. If [pdf]latex runs fine outside l2h, it should run fine inside.

However, in case you want to make web pages, the first thing is to check whether LaTeX2HTML supports your encoding. Look inside the "versions" folder in "l2h" to see whether the encoding is there. If it is, you will need to go into file settings.pl (in the document folder where index.tex is, at least after you have selected a theme) and change the line

$HTML_VERSION = '4.0,latin1,unicode';
Leave the 4.0, but replace latin1,unicode by, say, hebrew. See the latex2html manual for more details. But note that you will need to modify file settings.pl, not .latex2html-init as the manual says.

Next, latex removal in the l2h menu will no longer work correctly. If you want to fix this up, do the following: Look inside the "convert" folder and find the appropriate UTF-8 conversion. For example, hebrew.pl is ISO-8859-8. So open up the "UTF-8_ISO-8859-8" folder. You need to copy the files in this folder in with your document. To do so, click the first file in the folder. Then click the last file in the folder while holding down Shift. This should select all three files in the folder. Right-click any one and select "Copy". Find your document folder, right-click an empty spot in it, and select "Paste". This will copy the files in with your document. Next right-click the just-pasted file "convert.sub" and rename it to "lremcnvbck.sub". Similarly, rename "convertb.sub" to "lremcnv.sub".

Hyphenating or finalizing your web pages will not work. However, if you take the web pages out of folder "web-pages" in your document folder, put them into "ISO-8859-8_UTF-8" inside "convert", and double-click "convert_html", they will be converted to UTF-8. You can then put them back in folder "web-pages" and hyphenate and finalize them now.

3. Use example 5 or 6

Examples 5 and 6 allow you to use any character you want, in any language. But to use these examples, you must have XeLaTeX installed, not just plain LaTeX. If you want to make web pages, you also need pdftk and pdftops. (For Microsoft Windows, l2h will install the latter two for you, but not XeLaTeX. But modern LaTeX distributions come with XeLaTeX. XeLaTeX is ready to go in TeX Live, while MiKTeX will fetch and install the needed packages when it sees they are needed.) You also need to have some font that contains the characters you want. Microsoft Windows versions more recent than Windows XP probably already have a font with the characters you want. And it is not difficult to install another font.

Examples 5 and 6 require that you type in your nonlatin characters in "UTF-8" form. If you write in a nonlatin language, you may already know how to do that. If not, see the beginning of section 1.

But knowing how to enter the UTF-8 characters into index.tex is only half the story. You want them to eventually show up in the final document. The comments in the Internationalization section in examples 5 and 6 explain how to do this. See fontspec.html for more information on the used fontspec package. (I do not think this documentation is very clear; try to find a more recent version on the web.) But the key problem is to have fonts with the characters you need, and to know the names of these fonts. This section provides some further elaboration on that issue.

3.1 General observations

For Microsoft Windows, the available Greek and Cyrillic characters in, say, font "Times New Roman" have been increasing over time. My Windows XP version missed most Greek characters in the starting versus of Homer's Illiad. However, Windows Vista was fine. On Vista and later, "Arial Unicode MS" also has large amounts of these characters, in sans-serif. For CJK (Chinese-Japanese-Korean) characters, try, say, "SimSun", which is targeted towards Chinese.

For linux, the "Times New Roman" font is apparently the Windows XP version, pretty useless. I suggest you install the "Linux Libertine O" fonts using your package manager. That seems to take care of most Greek and Cyrillic characters. For CJK (Chinese-Japanese-Korean) characters, things are not so clear to me. For what I had, "WenQuanYi Micro Hei" worked. Searching through your fonts, or installing additional fonts, as described below, may work.

3.2 Searching your fonts for the characters you need

The most convenient way to look at your fonts in Windows or linux is to use the "Character Map" program. (This program was discussed at the beginning of section 1. See there for more) You can see the characters in each font. Or, since that may be a lot, these programs allow you to select a subset of the characters to view.

Warning: The latter also means that you might think a font does not have the needed characters while actually it does. The wrong subset may be selected.

On linux, the "fc-list" command in a terminal may also be useful.

3.3 Installing additional fonts

If you do not have the characters you want, installing additional fonts may be the way to go.

First of course, you need obtain the fonts to be installed. You can buy fonts. Or you can find fonts on the internet.

For example, if you need a large variety of Greek, Cyrillic, Hebrew, Arabic, Devanagari, Thai, Ethiopic, Runic, Coptic, Hiragana, Katakana, Bopomofo, etcetera, characters, noncommercial users may go to http://titus.fkidg1.uni-frankfurt.de/unicode/tituut.asp and download "TITUS Cyberbit Basic". This comes in a zip file. Double-click it to open it and drag the font out of it to your desktop. Or get the font out of the zip file some other way. Usually you cannot install it if it is inside a zip file. If you are a linux user, make sure to rename the font from TITUSCBZ.TTF to tituscbz.ttf, i.e. to lowercase.

If you want a large amount of CJK (China-Japan-Korea) characters, get Cyberbit.zip from http://ftp.netscape.com/pub/communicator/extras/fonts/windows/ and unzip as above.

Linux users can of course also swipe fonts from their Windows partition, if they have one. The fonts should be in C:\Windows\Fonts. If Windows users want a linux font for some reason, they might look in /etc/fonts/, /usr/local/share/fonts, /usr/share/fonts/truetype/, or wherever. In any case, by the time you read this, the linux powers that be have no doubt decided that all that is no longer supported and that the fonts, really, really, must be in /usr/lib/local/share/etc/tools/display.d3/graphics/text/nonfree/fonts/microsoft. I have been told that Mac uses /Library/Fonts.

See http://en.wikipedia.org/wiki/List_of_CJK_fonts and http://tex.stackexchange.com/questions/53599/ for more font suggestions.

But having the font is not enough. You need to install it.

Windows 7 users simply right-click each font and select "Install".

Windows XP and Vista users need to put the font inside the Windows folder on the C: disk in "My Computer". Vista will whine, but be strong. Then click "Start", "Settings", "Control Panel", "Fonts", "File" and select "Install New Font". Double-click each font. Or "Select all" and "OK".

Linux users may want to first install "gnome-font-manager" using their package manager. Then they can use that to install the fonts. In the Graphical Installation on Ubuntu 13.04 log, there is a section "Internationalization" that shows you graphically how to do it. Basically, you start up Font Manager, click the "Manage Fonts" button, (has gears on it), and select "Install Fonts". Then browse down to the fonts on your Desktop and install.

Alternatively, linux users can install the fonts from the command line. The following works on Ubuntu 12.04 LTS. Open a terminal and issue the commands:

   cd Desktop
   sudo cp FONTNAME.ttf /usr/share/fonts/truetype/
   sudo chmod u=rw,go=r /usr/share/fonts/truetype/FONTNAME.ttf
   sudo fc-cache -fv
   fc-cache -fv
Here FONTNAME might be Cyberbit (for Bitstream Cyberbit), tituscbz (for TITUS Cyberbit Basic), etcetera. To check that the font is now installed, use
   fc-list -v >! tmp.txt
Then edit tmp.txt with a text editor and search for the filename.

 Index   Examples