Tag Archives: bengali-l10n

Ra-Jhaphala in Qt Applications

While writing text in many Indian languages we encounter composite characters comprised of various combinations of more than 1 consonants and/or dependent vowels. Generally, these are written as:

1. Consonant + Joiner + Consonant (+ Dependent Vowel Sign)
2. Consonant + Dependent Vowel Signs (which will determine how what vowel sound would be used to pronounce the consonant)

However, there are exceptions where a straight implementation of the writing rules cannot be used for text input in an i18n-ized application. An example is the curious case of the two alphabets – র (aka RA, Unicode: U+09B0) and য (aka Ya, Unicode: U+09AF). These two consonants allow two different composite characters to be written, in the same sequence of usage[1].

Sequence 1:

র্য = To write words like আর্য (pronounced as ‘Ar-j-ya’, the ‘j’ is an exception in pronouncation practised in Bengali)

Sequence 2:

র‍্য = To write words like র‍্যান্ডম (i.e. transliterated version of the word ‘random’ that is pronounced as ‘rya-n-dom’ and hence has to be transliterated appropriately)

In both the above cases, র and য need to combine in the same sequence. Hence the simple method of writing them as র + joiner + য would not work in both cases. Due to a higher frequency of usage in Bengali words, this combination has been assigned to Sequence 1. For Sequence 2, an additional character ZWNJ (U+200C) had to be used. However, since Unicode 5.0 this has been changed and instead of ZWNJ, ZWJ (U+200D) is to be used to write Sequence 2.

” …Unicode Standard adopts the convention of placing the character U+200D ZWJ immediately after the ra to obtain the ra-yaphaala…”

– from the Unicode 5.0 book, pg. 316 (afaik the online version is not available)

The next challenge was to ensure that this sequence was rendered correctly when used in a document. While it was correctly displayed on Pango, ICU and Uniscribe, Qt majorly broke [bug links: KDE Bugs, Qt, Fedora/Red Hat Bugzilla]. After much prolonged contemplation, Pravin managed to push in a patch to fix this issue in Harfbuzz that’ll also make it to Qt. This fixes the issue of rendering.

The review discussion for this patch (which is also expected to resolve a few other issues) is happening here. However, the delay in updation of the much outdated entry in the Unicode FAQ led to a lot of confusion about whether the usage of U+200C had indeed been discontinued in favour of U+200D. This needs some kind of prompt action on the part of whoever maintains that FAQ. (Sayamindu had also mentioned it in his blog earlier)

[1] Two consonants can be used to write two different composite characters, varied by different sequence of usage.

The other major issue that is underway in the same review discussion is about allowing the input of multple split dependent vowel signs as a separate valid dependent vowel.

Eg. ক (U+0995) + ে (U+0997) + া (U+09BE) to be allowed as an alternative input sequence for ক (U+0995) + ো (U+09CB)

The Devanagri equivalent would be:

क (U+0915) +े (U+0947) +ा (U+093E) to be allowed as an alternative input sequence for क (U+0915) +ो (U+094B)

In general practice, when a dependent vowel is written after a consonant it completes the composite character. Multiple dependent vowels are not allowed to be written for one single consonant. While the pictoral representation in the above example may be similar, but in reality the spilt vowel sequence may lead to incorrect rendering across applications (in future for URLs as well) if the code points are stored as such. In applications using Pango, the second vowel input is displayed to the user as an unattached vowel sign with a dotted circle. This would automatically warn the user about an invalid sequence entry.

Since Qt (and looks like Uniscribe too) uses this practice, perhaps a specification is floating around somewhere about how the conversion and storage for such input sequences is handled. Any pointers to this would be very helpful. At present I am keeping any eye on the Review discussion and hopefully the issues would be resolved to ensure an uniform standard persists across all platforms.

Compiz Translation Bugs

While searching around for some related stuff, I came across the Bengali India (bn_IN) Translation for compiz. Gave a hurried look through the file and came across some bits and pieces, which imho may need a relook from the translator/reviewer.

Some examples here:

en: “Stick”
bn_IN: “চেটে যাক”
retranslated en: “Let it lick”

en: “Unstick”
bn_IN: “না চাটবে না”
retranslated en: “No, it won’t be licked”

en: “Make Above”
bn_IN: “উপরে আনোন”

en: “Forcing this application to quit will cause you to lose any unsaved changes.”
bn_IN: “এপ্লিকেছন টি যোরকরে বন্ধ করতে হবে আর আপনার সকল unsaved হারাটে হবে |”

etc. etc.

Most of the strings in these files are general terms related to desktop graphics, and can be easily fixed. So if the compiz translators need a hand, then do feel free to holler around here.

The Short(cut) story

The following is a mail written long time ago justifying the usage of the English shortcut keys for localized Bengali applications.

Hi everyone,

Deepayan Sarkar wrote:

I’m slightly confused about this. There are two types of shortcuts,
one in menu items etc (indicated by _ or & in the translated strings),
and one like CTRL-Q to quit an application. Which ones are we talking
about? The first type are activated by pressing ALT. Where does the
CTRL key come in? If we are talking about the second, I didn’t even
know that they could be translated. Can they? How?

First up to clarify matters, this issue concerns the alt+hotkey combinations. the ctrl+key combinations as far as I am aware cannot be translated. atleast I have never come across it ever. Hence for saving a file: alt+f+s is different from ctrl+s.

Currently, I use a system with bengali locale and interface as my primary production system. The input method I use is IIIMF. This is an application specific input method switcher. Same goes for SCIM. i.e. these two do not change the keyboard for the entire system. Earlier I have used setxkbmap which was a system level input method switcher and looks like most people in the thread are familiar with this method.

What I have come across regarding alt+hotkeys during my regular work are as follows [along with details]:

#1. Dysfunctional

I will not call the hotkeys non-functional, but dysfunctional. Reason being:


-> En shortcuts work as alt+en key even when the active keyboard is a bengali keyboard [ i use probhat] -> bn shortcuts like alt+bn key do not work even when the active keyboard is a bengali keyboard.

@ setxkbmap

-> bn shortcuts work as alt+bn key or alt+shift+bn key [in case of a character like ফ] with an active bn keyboard.

-> en shortcuts work as alt+key with an active en keyboard

additional info: ctrl+s types shortcuts [which cannot be translated] did not function with an active setxkbmap bn keyboard. but functioned with an active bn keyboard on IIIMF and SCIM.

#2. Inconsistent

Duplication of top-level menu hotkeys as well as submenu hot-key under the same top-level menu item. for e.g. in gnome-games mines বৈশিষ্ট্য -> পূর্ণপর্দা and বৈশিষ্ট্য -> পছন্দ both have hotkeys as alt+প.

#3 Partial Implementation

Now there are two offspins from this one.

@ gtk overrides [for gnome]:

this is specifically for the gnome desktop. as golum was kind of confused about it let me explain in detail. currently in gedit.po file Cut (_C) is translated as কাট করুন (_C), whereas in gtk+ file it is translated as কাট(_ট)। But when populating the menu items for gedit, in some cases instead of the translation being used in gedit.po file the one from gtk+.po file is being used. [http://runa.randomink.org/AnkurBangla/gedit1.png]. I use a .mo file compiled from the original .po file and the same thing looks as this image. [http://runa.randomink.org/AnkurBangla/gedit2.png]

@ application related:

this is for applications that do not have a text editor on its primary interface. Again using the example for gnome-games mines [alternatively same game]. this application uses alt+bn key hotkeys.

** IIIMF and SCIM: bn keyboard for this particular application cannot be activated, because there is no text entry box on the primary interface. hence alt+bn hotkeys do not work. alt+en hotkey works. a text entry box appears only when the user is allowed to write in his/her name for the score.

** setxkbmap: alt+bn hotkey combination works on the main interface as setxkbmap sets the system level keyboard to bn.


The reasons for the above mentioned behaviour is unknown to me and I can only comment about them as observations from the perspective of a user. Whether geeky or not, one cannot assume the requirements of a user. sometime back while doing an installation on a test system i had to resort to the hotkeys due to a malfunctioning mouse. at that point of time the shortcuts on anaconda did not function as they were in english and the keyboard used during installation is en. and unlike the gedit solution I mentioned earlier, hacking on installers is not really an available option. This issue has been resolved and I mention this only to highlight the fact that requirements from users can be varied and at times maybe due to unexpected circumstances.

Secondly, the issue regarding consistency between KDE and Gnome. Barring contexts I guess issues for both the desktops ought to be dealt with separately. Yet, known issues in gnome can be used to reference any similar issues arising in KDE and vice versa. I guess kcontrol would be apt example in this case where multiple backend files are being used and consistency is a key element. Similar to the gedit+gtk scenario.

Given that we have come across multiple results, it might be a good idea to go behind the scenes to figure out where exactly things are going wrong. Whether using en shortcuts is a regressive step backward is somewhat fuzzy as of now. Currently, the bn shortcuts are comparitively more dysfunctional and inconsistent. If we need to implement bn hotkeys successfully, first we need to get our homework done and check in the inconsistency factor. Secondly, given the fact that most distros are shipping with IIIMF and SCIM as the default input method framework for localized versions can we afford to promote a bengali desktop that shows stark flaws on the primary desktop interface. [refer #1]. To conclude, imho, it is always better to provide a functional interface that would be open to change and improvement in the future rather than restricting usage in the present.


Eight annas….

Looks like the Unicode Chart for Bengali does not contain the symbol for Bengali 8 annas (half a rupee).

The following 3 in this particular series for currency calculation are however available:

4 anna (quarter rupee) = U09F7 (Described as “BENGALI CURRENCY NUMERATOR FOUR”)
12 anna (three quarter of a rupee) = U09F8 (Described as “BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR”)
16 anna (one rupee) = U09F9 (Described as “BENGALI CURRENCY DENOMINATOR”)

Could someone perhaps please validate if this is indeed the case.

Happens.. but I want to focus ahead

Much has been happening over the past few days that distressed quite a few of us. The West Bengal government announced a Bengali Linux Distro called Baishakhi Linux. Thats the news. The story behind news has already been blogged by Sankarshan[1][2] and Sayamindu[1][2]. As of now, the source code is still not available. As I had twittered already..it hurt us all, but did not surprise me. A specimen below:

Moving along…sometime back I had blogged about Glossaries for Bengali Localization that I was working upon. The work done has moved to a more permanent place now: ankur.org.in/wiki/WordCollections. Content is still in the process of being added. From our past 6-7 years of experience of translating User Interface messages, one of the primary requirements that emerges for such efforts is the understanding of the contextual importance of terminology. The current focus of this effort rests on the following criteria:

  • Identification of context specific terminology
  • Application genre specific (as against specific applications) content
  • Reusability of terminology across multiple projects
  • Modular segments for ease of extension and distribution

    At this point of time, we (more hands are always welcome) are in the process of classifying the available terms (english) into appropriate sections and mapping translations. Along with Translation-filter (Kushal r0cks!!) we intend to ensure complete standardization of the bn_IN localized content.

  • Motivation is not always the keyword

    A recent article published in a leading daily here in India touched quite a few raw nerves. Besides talking about the presence of the Gujarati and Punjabi localized versions of Firefox in its latest version, the article somewhat prominently highlights the absence of quite a few other languages due to “a lack of motivation”. Needless to say, there was quite a flutter amongst the volunteer-driven localization communities, who were quick to exchange notes on various mailing lists. So much so, that Chris Hofmann had to come forward with his apologies. Given all the hullaballoo, even I wanted to add my 2 paisa to the entire episode, in my own way.

    First up, its important to understand how the Firefox Localization process works. It is rather different from most other localization projects and can pose quite a challenge even for old-hats. I shall try to summarize it, in the best way I can.

    During the process of Firefox localization, two variants of localized components would surface:

  • A Language Pack: – This is essentially the translations of the user interface messages and can be downloaded as an add-on. In essence, it is like an additional appendage for the Firefox version that one is running.
  • An Official Build: These would be the translations+extra components (like the translations for the various default pages displayed by Firefox), which would be shipped as part and parcel of a Firefox release. ie. (using a similar analogy as earlier), it is like an arm that is part of the body since birth.

    Each language shipped with Firefox, goes from the “Language Pack” stage to the “Official Build” stage in a phased manner. Unlike other translation projects (e.g. Gnome, Fedora), a new language is not included for the offical development version rightaway. Rather, one has to first work on the previously released stable version (so for Firefox 3, one needs to get the Firefox 2 source), complete all the translation and other tidbits, and only then would a language be accepted as official or as its called “productized”.

    This is where most of the fun starts. The tidbits include quite a few default pages for Firefox (Complete List is here). The pages are in English and provide the templates for the localized versions. Yet, while translating one has to ensure that the local effects are maintained. For eg. The FireFox First-Run page. Notice the links to “Hype Machine” and “Yelp“. Now try figuring out the Indian equivalent for each of them. Yelp could be mapped to burrp.com. But after hunting all over the place for something similar to Hype Machine, finally it was decided to go with an internet Radio station instead. Enter RadioVeRVe. Same goes for search plugins etc. These bits and pieces of the productization part is tracked and submitted through various bugs, which are handled by different Mozilla Developers. I have been working on the Bengali India productization for quite sometime now. Bug id – 398992. As mentioned earlier its a Firefox 2 productization bug and there has been quite a bit of back-and-forth action on the bugs. There is a bug for Firefox 3 as well, but it would only come into effect once the Firefox 2 productization is complete. Bug id – 415575. So it might be a while before all the nitty gritties are worked out and issues finally ironed out. Since Rajesh and I have been closely working with each other, it is quite understandable why the newspaper article came as a mighty blow to him.

    However, what I do find disturbing is the nonchalance on the part of the mainstream media before making such blanket comments. Little knowledge is always a bad thing. Especially in matters like community driven projects, which are to this day an unfamiliar or at best a hazy idea for most people watching things from outside the perimeter of the action. Perhaps it might have helped, if credentials of the commentator could have been judged prior to the interview, so that the messaging would have been drafted in a comprehensible format of the audience. Damage control measures are not always a way out. As are not single points of coordination and failures.

    The Indian community of Localization volunteers are probably one of the most closely knit group of people. With a common culture and geographic proximity that bonds us at a personal level, friendships have been forged, experiences shared and help is always at hand. More than anything, its personal for us. Very very personal and its about time people understand it.

    The Complete process of making an Official Firefox Build.

  • Dhvani needs more love – in Bengali

    Santhosh has been looking for volunteers to help him out with Dhvani in Bengali. You need to be fairly skilled with programming in C, a native Bengali speaker (reading, writing, comprehension and understanding of correct pronounciation skills) and of-course excited enough to work on an award-winning project. Catch hold of Santhosh and do let him know if you like to participate in the project.