2012-05-01

Proper Unicode Serbo-Croatian digraphs

Here in ex-Yugoslavia we have three letters that are nearly never properly spelled out: digraphs "DŽ", "LJ" and "NJ" (Cyrillic corresponding letters are "Џ", "Љ" and "Њ". In fact, the standard Yugoslav keyboard doesn't even have corresponding keys:

Though in most fonts "NJ" (digraph, single letter) is visually [nearly] indistinguishable from "NJ" (letters "N" and "J"), some practical concerns still apply. Eg., the length of word "injekcija" in any case will be correctly counted, as it has two letters "n" and "j" (Cyr. "инјекција"), but the length of the word "njen" (Cyr. "њен") will depend on the way it is entered, and if it is spelled incorrectly, will be reported to be one letter longer and will get improperly sorted.

With the introduction of Unicode the digraphs received their own range: 0x01C4–0x01CC; so in theory we got a chance to fix the problem. Still, until yet to be released Windows 8 Microsoft's operating systems simply didn't render the single-letter Unicode versions digraphs, so using them in a wild was problematic.

The situation was quite a bit better in X11 environments: the "Unicode" family of XKB keymaps for Serbo-Croatian language1 maps these digraphs to the keys where Cyrillic counterparts can be found in standard layout. Still this approach isn't very useful, as it dismisses the one of the design goals of the Yugoslav keyboard layout standard: the ability to enter text in virtually any language – the digraphs replace letters "Q", "W" and "X", that are used quite extensively in English and many other languages.

As my needs demand being able to write also in English, French, German and Russian, I had three options:

  1. use two separate Latin layouts (one for digraphs, one for "Q", "W" and "X");
  2. create custom XKBlayout with some kind of magic;
  3. use Compose X11 extention to create the key mappings.

The first option was rejected from the very beginning: as i already have Cyrillic layout for typing in Russian (no native Latin alphabet to date, unfortunately), switching between three layouts would certainly create too much confusion to overcome the benefit of making less moves to get things done. The second option, while generally appealing (I already had to create a layout for typing Russian on Yugoslav keyboard without pain), had significant flow: unlike the rest of Latin letter the digraphs have 3 (yes, three) cases: lower, upper and title ("njen", "NJEN" and "Njen" respectively), so the letters can't be simply mapped to third level "L", "N" and "D".

So I was left with the only option to define my custom ~/.XCompose list:

<Multi_key> <D> <Zcaron> : U01c4
<Multi_key> <D> <zcaron> : U01c5
<Multi_key> <d> <zcaron> : U01c6
<Multi_key> <L> <J>      : U01c7
<Multi_key> <L> <j>      : U01c8
<Multi_key> <l> <j>      : U01c9
<Multi_key> <N> <J>      : U01ca
<Multi_key> <N> <j>      : U01cb
<Multi_key> <n> <j>      : U01cc

Given the fact that this solution allowed me to avoid relearning my Serbo-Croiatian typing skills (I just have to press a Compose2 key once before entering digraph), I would say that this solution is nearly perfect. The only caveat is the necessity of amending ~/.xsession script with the export GTK_IM_MODULE=xim (similar variable exists for Qt which I don't have).

Similarly the support for other weird Latin letters can be added if needed simply by amending this file.

P.S.: I'm still somehow uncomfortable with two features of standard Yugoslav keyboard: pressing 3 keys to emit "^" character drives me nuts. Though I don't use the "Ł" and "ł" letters, having them on different keys ("K" and "L" respectively) also seems insane. Probably it's time to make a dvorak-like layout for Serbo-Croatian...


1 The Special Olympics naming conventions for Serbo-Croatian language include "Bosnian", "Croatian", "Montenegrin" or "Serbian".

2 On my ThinkPad Edge E325 there is a PrtSc key next to AltGr; as I don't take screenshots regularly, using "compose:pscr" XKB option was a natural choice.