Tag Archives: bengali-l10n

Translation Sprint for Gaia

Last saturday i.e. 29th December 2012 we had a translation sprint for Ankur India with specific focus on Gaia localization. The last few weeks saw some volunteers introducing themselves to participate in translation and localization. The Firefox OS seemed like a popular project with them that was also easy to translate. However, the reigning confusion with the tool of choice, was not easy to workaround. The new translators were given links to the files they could translate, and send over to the mailing list/mentor for review. Going back and forth in the review process was taking time and we quickly decided on the mailing list, a date to have a translation sprint. We used the IRC channel #ankur.org.in and gathered there from 11 in the morning to 4 in the evening. The initial hour was spent to set up the repository and to decide how we were going to manage the tasks between ourselves. Two of us had commit rights on the Mozilla mercurial repository. Of the 5 translators, two participants were very new to translation work, so it was essential to help them with constant reviews. By the end of the second hour, we were string crunching fast and hard, translators were announcing which modules they were picking (after some initial overlooking of this, prolly due to all the excitement) and then pushing them into the mercurial repository. We shut shop at the closing time, but had a clear process in place which allowed people to continue their work and continue the communication over email. All it needed was an IRC channel and a fundamental understanding of the content translation and delivery cycle.

SUMMARY

Participants:

  • Biraj Karmakar (biraj),
  • Priyanka Nag (priyanka_nag),
  • Runa Bhattacharjee (arrbee/runa_b)
  • Samrat Bhattacharya (samratb),
  • Sayak Sarkar (sayak)

Translation Statistics:

  • At Mozilla Dashboard – 39% translated. (Does not include the files still being reviewed)

What worked:

  • Communication was live
  • Faster turnaround of translation -> reviews -> revision
  • Queries were resolved faster
  • Commits were immediately made into the repository
  • Workflow was established to ensure the committers were being notified of files ready to go into the repository
  • No overlapping of translation

What could have been nicer:

  • A simpler tool to track the translation, through *one* interface. (Discussed many times earlier, and comments can be directed to the earlier post)
  • Pre-decided work assignments to start things off (this was rather hastily put up)
  • More time

Follow up:

  • There is still more to do and the translation has to continue. Not just for Gaia, but for other projects as well.
  • A review session for all the translated content. Besides catching errors and omissions of various nature, this can be of particular benefit to the new translators who can gauge the onscreen context of the content that they had to blindly translate

“The Sun Goes Around The Earth”

“THE SUN GOES AROUND THE EARTH”

If one grew up in the city of Kolkata in the 1980s and 90s, they would not be unfamiliar with the above graphiti planted on innumerable walls and lamposts. The graphiti and the adamant proponent of this theory is a legend that a generation would remember.

I was reminded of this, by a rather unfortunate turn of events that happened, on a mailing list of much repute. Just this morning, I was speaking with a colleague about how often and unknowingly we are drawn into stressful situations which make us lose focus from the task at hand. After having responded to a mail thread now crossing the 80+ mark, I wanted to step back, summarize and review this entire situation.

It all started when someone, who by his own admission is not a native speaker of Bangla/Bengali language, wanted to transcribe Sanskrit Shlokas (hymns) in the Bangla script into a digital format and requested for modifications in a in-use keymap. To what final end, is however unclear. This is not an unusual practice as there are numerous books and texts of Sanskrit that have been written in the Bengali script and this effort can be assumed as a natural progression to digitizing texts of this nature. What stands out is the unusual demand for the addition of a certain character, which is not part of the Bengali script, into a Bengali keymap (much in use) that this gentleman wanted to use to transcribe them. The situation worsens with more complications because this character is not a random one and belongs to the Assamese script.

The character in question is the Assamese character RA, written as ৰ and has the Unicode point U+09F0. This is part of the Unicode chart for the Bengali script, which is used to write Bengali, Assamese, and Manipuri (although Meitei is now the primary script for Manipuri). Although exclusively used for Assamese, this character does have a historical connection with the Bengali script. ৰ was also used as the Bengali character RA before the modern form র (Unicode point U+09B0) came into practice. At which exact point of time this change happened is somewhat unclear to me, but references to both the forms can be found as early as 1778 when Nathaniel Brassey Halhed published the A Grammar of the Bengali Language. Dr.Fiona Ross‘ extensively researched The Printed Bengali Character: Its Evolution contains excerpts from texts where the ancient form of র i.e. ৰ has been used. However, this is not the main area of concern.

Given its pan-Indian nature, Sanskrit has been written in numerous regional scripts. I remember, while at school Sanskrit was a mandatory third language of study. The prescribed book for the syllabus used the Devanagari script. On the other hand, the Sanskrit books that I saw in my home were in the Bengali script (some of my ancestors, including my maternal Grandfather were Priests and Sanskrit teachers who had their own tol). Anyway, I digress here. The main concern is around the two characters of ‘BA‘ and ‘VA‘ . In Devanagari, ‘BA‘ i.e. and ‘VA‘ i.e. are two very distinct characters with distinct pronunciations. While ‘BA is used for words that need a pronunciation such as बालक (phonetic: baa-lak), ‘VA is used for words such as विद्या (phonetic:weedh-ya). In Bengali, these two variations are respectively known as ‘Borgiyo BA‘ and ‘Antastya BA‘. However, unlike Devanagari they do not have separate characters. So both of them are represented by (U+09AC in the Unicode chart). Earlier they held two different positions in the alphabet chart, but even that has been relinquished. The pronunciation varies as per the word, a practice not dissimilar to the behaviourial aspects of the letters, ‘C‘ and ‘T‘ in English.

This is where it starts getting muddled. The gentleman in question requests for a representation of the Devanagari equivalent of the separation of BA and VA, for Bengali as well. Reason stated was that the appropriate pronunciations of the Sanskrit words were not possible without this distinction. So as a “solution” he suggested the use of the Assamese RA glyph in place of the Borgiyo BA sounds and the Bengali BA to be reserverd exclusively for the lesser used Antastya BA i.e. VA sounds. Depicted below as a diagram for ease of reference.

On the basis of what legacy this link is to be established or how the pronunciation for the two characters have been determined, meets a dead end in the historical references of the Bengali script[1].

To support his claims he also produces a set of documents[1][2] which proudly announces itself as the “New Bengali character set” (নূতন বর্ণপরিচয়/Nutan Barnaparichay) at the top of the pages. The New Bengali character set seems quite clandestine and no record of it is present in the publications from the Paschimbanga Bangla Academy, Bangla Academy Dhaka or any of the other organisations that are considered as significant contributors for the development and regulation of the language. Along with the New character set, there are also scanned images from books where the use of this character variation can be seen. However the antecedents of these books have not been clearly identified. In one of them, the same word (বজ্র) has been spelt differently in two sentences, which imho adds more confusion to the melee.

On my part, I have also collected some excerpts from Sanksrit content written in Bengali, with particular emphasis on the use of ব. Among them is one from the almanacs (ponjika) which are widely popular amongst householders and priests in everyday reference of religious shlokas and hymns.

The character in the eye of the storm i.e. the Assamse RA and its Bengali counterpart are very special characters. These form two different conjuncts each with the ‘YA’ (U+09AF that is shared by both the scripts) without changing the sequence of the characters:

র + য = র্য
র + য = র‍্য (uses ZWJ)

ৰ + য = ৰ্য
ৰ + য = ৰ‍্য (uses ZWJ)

The Bengali character set as we know it today was created by Ishwar Chandra Bidyasagar, in the form of the বর্ণপরিচয়/Barnaparichay written by him. Since much earlier, the script also saw modern advancements mostly to cater to the requirements of the printing industry. His reforms added a finality to this. The বর্ণপরিচয়/Barnaparichay still remains as the first book that Bengali children read while learning the alphabets. This legacy is the bedrock of the printed character and, coupled with grammar rules, defines how Bengali is written and used since the last 160 years. The major reform that happened after his time was the removal of the character ঌ (U+098C) from everyday use. Other than this, the script has remain unchanged. In such a situation, a New Barnaparichay with no antecedents and endorsements from the governing organisations cannot shake the solid foundations of the language. The way the language is practised allows for some amount of liberty mostly in terms of spellings mainly due to the legacy and origins of the words. Some organisations or publication houses prefer to use the conservative spellings while others recommend reforms for ease of use. The inevitable inconsistencies cannot be avoided, but in most cases, the system of use is documented for the reader’s reference. Bengali as a language has seen a turbulent legacy. An entire nation was created from a revolution centered around the language.

During this entire fiasco the inputs from the Bengali speaking crowd (me included) were astutely questioned. Besides the outright violation of the Bengali script, complications arising out of non-standard internationalized implementations which were highlighted, were waived off. What is more disappointing is the way the representatives from IndLinux handled the situation. As one of the pioneering organisations in the field of Indic localization they have guided the rest of the Indic localization groups in later years. With suggestions for implementing the above requests in the Private Use Area of the fonts (which maybe a risky proposition if the final content, font and keymap are widely distributed) and providing customized keymaps they essentially risked undoing critical implementational aspects of the Bengali and Assamese internationalization. Whether or not the claims from the original requestor are validated and sorted, personally I am critically concerned about the advice that was meted out (and may have also been implemented) by refuting the judgement of the Bengali localization teams without adequate vetting.

Note:A similar situation was seen with the Devanagari implementation of Kashmiri. Like the Bengali Unicode chart, the Devanagari chart caters to multiple languages including Hindi, Marathi, Konkani, Maithili, Bodo, Kashmiri and a few others. Not all characters are used for all the languages. While implementing Kashmiri, a few of the essential characters were not present in the Devanagari chart. However, similar looking characters were present in the Gurumukhi chart and were used while writing Kashmiri. This was rectified through discussions with Unicode, and the appropriate code points were alloted in the Devanagari chart for exclusive use in Kashmiri.

Indic Typing Booster – Bengali

My colleagues Pravin Satpute and Anish Patil have been working for sometime on a cool tool called the Indic Typing Booster. The premise for this tool is to aid users new to typing in Indian languages. Using a normal US English keyboard (i.e. the widely available generic keyboard around here) users begin typing a word in a keyboard sequence of their choice and after a couple of key presses the typing booster prompts the user with a series of words that match the initially typed in key sequences.

For instance, if the user wanted to type the word ‘कोमल’ (pronounced as: komal) in a phonetic keyboard sequence that maps क to k and ो to o, they could start by pressing ‘k’ and ‘o’ and lo and behold (no not Baba Yaga, but) a drop down menu opens up with possible words starting with ‘ को’ . From this list the user may then choose one to complete the word they had intended to type. List of words from a backend database feeds this list. Each language gets a database of its own, compiled from available text in that language. Users can add new words to the list as well.

The typing booster requires that the IBus Input Method is installed in the system. The other necessary packages to get Indic Typing Booster working are:

  • ibus-indic-table
  • <language-name>-typing-booster-<keymap-name> (i.e. for Bengali Probhat you would be looking for the bengali-typing-booster-probhat package)

If you are using Fedora, then all these packages can be easily installed with yum. If you are not, then the necessary information for download and installation is available at the Project Home page: https://fedorahosted.org/indic-typing-booster

Besides erasing the need for looking for appropriate keys while maneuvering through the inherent complications of Indic text, the typing booster could evolve into the much needed solution for Indic typing on tablets and smartphones.

After Marathi, Gujarati and Hindi, the Indic Typing Booster is now available for Bengali (yay!). The Bengali database is by far the biggest store so far, thanks to the hunspell list that was created through an earlier effort of Ankur. Pravin announces the new release here.

This is what it looks like.

So to write কিংকর্ত্যবিমূঢ়, I could either type r/f/ZbimwX or just press 4 to complete it.

Do please give the Indic Typing Booster a go and if you’d like to contribute then head over to the mailing list – indic-typing-booster-devel AT lists.fedorahosted.org or IRC channel – #typing-booster channel (FreeNode).

হ য ব র ল – Level up

A couple of days back the following announcement was made by the Government of India through the PTI:

In a bid to overcome problems posed by difficult Hindi words, Government has asked section officers to use their ” hinglish” replacements for easy understanding and better promotion of the language.

official circular here.

Excuse me while I whoop with joy for a moment here. Reason being, its a clear endorsement of something that I have forever followed in Bengali (India) Translations. I have argued, fought and have been ocassionally berated for not coming up with innovative Bengali words for the various technical terminology that I have translated. My steady answer has been something to the tune of – ‘don’t fix it, if it ain’t broken’.

At conferences and other places when I used to interact with people who had suddenly taken an interest in localization, they were often pretty upset that things like ‘files‘, ‘keyboards‘, ‘cut‘, ‘print‘ etc. were simply translitered in Bengali. (I am sure they did not hold very high opinions about the bunch of Bengali localizers.) So we got suggestions like – “you could consider translating ‘paste’ as ‘লেপন’ “(similar to গোবর লেপা, i suspect), or “you need to write মুদ্রণযন্ত in place of a printer“. There were more bizarre examples, which were more like words constructed with several other words (for things like URL, UTC etc.). I held my ground at that time, and hopefully this announcement has at last put my doubts (well, I did have second thoughts about whether I was being too adamant while “compromising authenticity for practicality“) to rest.

After getting the necessary i18n bits fixed, Bengali localization for desktop applications primarily came about around circa 2000. However, computer usage among the Bengali speaking/reading population has been happening for decades before that. By the time the first few desktop applications started to peek through in Bengali, there already were a good many users who had familiarized themselves with the various terms on the desktop. Users were well-familiar with:

  • clicking‘ on ‘buttons‘, or
  • going to a link, or
  • printing‘ a ‘document‘,
  • cutting‘ and ‘pasting‘,
  • pointing‘ with a ‘mouse‘ etc.

Subjecting them to barely relatable or artifically constructed terms would have squeezed in another learning phase. It just did not make sense.

In response, the other question that creeped in was – ‘then why do you need to localize at all?‘ It is a justified query. Especially in a place like India, which inherited English from centuries of British rule. However, familiarity with a language is not synonymous to comfort. Language has been a hindrance for many things for ages. Trying to read a language, one is not fully comfortable with can be a cumbersome experience. For eg. I can speak and understand Hindi quite well, but lack the fluency to read it. Similarly, there were a good number of people who did not learn English as their primary language of communication[1]. Providing a desktop which people can read faster would have gotten rid of one hurdle that had probably kept away a lot of potential users.

There were also people who knew the terms indirectly, perhaps someone like a clerk in the office who did not handle a computer but regularly needed to collect printouts from the office printer. This group of people could mouth the words but did not read them often and if the language on the desktop was not the primary language of everyday business, they probably did not even know what the word looked like. When getting them to migrate their work desks to a desktop, it is essential to ensure that the migration is seamless and gave prime importance to the following:

  • Familiar terms should not be muddled up, and
  • Readability of the terms is not compromised

Point 1 is also required to ensure that the terminology is not lost in translation when common issues are discussed across geographies and locales. For eg. in institutes of higher education or global business houses. Getting it done by integrating transliterated terminology for highly technical terms that were already in prevalence seemed like the optimum solution. It has not worked badly for Bengali (India) localization so far. We have been able to preserve a high quality of consistency across desktop applications primarily because the core technical terminology never needed to be artificially created, which also allows new translators (already familiar with desktops in most cases) to get started without too much groundwork.

Note: it is not unusual to find people in India speak fluently in 2-3 languages and not always in a pure form of any. Mixing words from several languages while conversing is quite a prevalent practice these days.

Not Legal, But Safe?

For quite some time now, much discussion has happened about how the commits made to the Fedora packages through http://fedora.transifex.net does not preserve (among other things) translation credits.

Explained better below:

Downloaded version from Transifex: (All original credits have been removed)

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Red Hat, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.
#
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-06 18:08+0000\n”
“Last-Translator: clumens \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”

Local Update (with manual addition of past credits):

# translation of anaconda.master.po to Bengali INDIA
# Bangla INDIA translation of Anaconda.
# Copyright (C) 2003, 2004, Red Hat, Inc.
# This file is distributed under the same license as the anaconda package.
#
# Deepayan Sarkar , 2003.
# Jamil Ahmed , 2003.
# Progga , 2003, 2004.
# Runa Bhattacharjee , 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011.
# Runa Bhattacharjee , 2007.
# Runa Bhattacharjee , 2008, 2009, 2011.
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-12 11:51+0530\n”
“Last-Translator: Runa Bhattacharjee \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”
“X-Generator: Lokalize 1.1\n”

Commited version on Transifex ( credit and user information deleted again by Transifex after the local updated file was committed):

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Red Hat, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.
#
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-12 06:27+0000\n”
“Last-Translator: runa \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”

All this while, none of us worked on Fedora 15 modules waiting for a resolution. A query to Fedora Legal is also waiting to be answered. After some discussions to review the current status of things, it was decided to stop translation work for Bengali-India for all Fedora modules until this situation is rectified in some way or the legal status of things is established with clarity. We have earlier been victims of credit related violations, and feel very strongly about it and would not like to endorse similar violations in any way.

This has been filed as a ticket and is being worked upon by the Transifex team.

Meanwhile, I am trying to keep a record of all the past credits (Bengali & Bengali-India translators) for all the Fedora modules translated for Bengali-India. It may take a little time to track them through the upstream repositories as a lot of modules have already gotten updated into their respective repositories with stripped .PO files (due to automatic merges/updates).


The title for this post is derived from the now (in)famous twitter hashtag that came about after a tweet from a ‘well-known’ Indian blogger.

Not good enough..

So while the world ‘rejoices’ with the release of a state-of-the-art desktop, here I am sobbing at my lonely corner. Bengali-India (bn_IN) goes unsupported for GNOME3.0 as the mandatory 80% string count was not met. Among many other reasons, here’s one that I noticed pretty late.

The damned lies statistics page was redone to display a reduced number of strings. So teams, could choose to avoid unnecessary strings and work on the ones that would be more prominent on the user interface. Thats good news. Both the total and reduced string statistics for the user interface are now displayed on the status pages. However, what somehow escaped my notice is that the ‘reduced’ string count percentage on the front page (80%)…,

…differs from the number on the Ui-part page (76%)

As a result, while I was quite happy to move on to some other pressing matters with a supported status under the belt, turns out the joke’s on me this time.

Instead of the arbitrary string count statistics as a criteria for  Supported Status, personally I’d rather like something on the lines of what KDE marks as the ‘Essential Packages’. Friedel explains it much better though.

So until the time GNOME decides to rethink on its Supported Status criteria, I guess I’ll just need to gear up to face the string nazi for v 3.2

xkb keyboards on iBus

A few days ago, Dipankarda forwarded a request from a friend who was trying to figure out why certain fonts looked garbled on Fedora 14. In the process, we found that there were a couple of other keyboards for Bengali that could be added from iBus Preferences – ‘India-Bengali’ and ‘Bangladesh’. With a little tinkering around with Parag Nemade, it turned out to be an enhancement, which allows keyboards from xkbmaps to be integrated with iBus.

The ‘India-Bengali’ keymap is the default/first layout in the /usr/share/X11/xkb/symbols/in file, while ‘Bangladesh’ keyboard is being generated from the /usr/share/X11/xkb/symbols/bd file. The first is a version of Inscript, while the latter seems to be based upon the Bijoy layout. However, there was one more layout for Bengali (ben_probhat) in the same file (/usr/share/X11/xkb/symbols/in ) which does not get added to the list of available keyboards. Could be a bug or a feature. To get these keyboards, its necessary to have the xkeyboard-config package installed as well. However, both of them may need to be reviewed.

A more detailed version of things are available in Fujiwara’s blog post.

Ra-Jhaphala in Qt Applications

While writing text in many Indian languages we encounter composite characters comprised of various combinations of more than 1 consonants and/or dependent vowels. Generally, these are written as:

1. Consonant + Joiner + Consonant (+ Dependent Vowel Sign)
2. Consonant + Dependent Vowel Signs (which will determine how what vowel sound would be used to pronounce the consonant)

However, there are exceptions where a straight implementation of the writing rules cannot be used for text input in an i18n-ized application. An example is the curious case of the two alphabets – র (aka RA, Unicode: U+09B0) and য (aka Ya, Unicode: U+09AF). These two consonants allow two different composite characters to be written, in the same sequence of usage[1].

Sequence 1:

র্য = To write words like আর্য (pronounced as ‘Ar-j-ya’, the ‘j’ is an exception in pronouncation practised in Bengali)

Sequence 2:

র‍্য = To write words like র‍্যান্ডম (i.e. transliterated version of the word ‘random’ that is pronounced as ‘rya-n-dom’ and hence has to be transliterated appropriately)

In both the above cases, র and য need to combine in the same sequence. Hence the simple method of writing them as র + joiner + য would not work in both cases. Due to a higher frequency of usage in Bengali words, this combination has been assigned to Sequence 1. For Sequence 2, an additional character ZWNJ (U+200C) had to be used. However, since Unicode 5.0 this has been changed and instead of ZWNJ, ZWJ (U+200D) is to be used to write Sequence 2.

” …Unicode Standard adopts the convention of placing the character U+200D ZWJ immediately after the ra to obtain the ra-yaphaala…”

– from the Unicode 5.0 book, pg. 316 (afaik the online version is not available)

The next challenge was to ensure that this sequence was rendered correctly when used in a document. While it was correctly displayed on Pango, ICU and Uniscribe, Qt majorly broke [bug links: KDE Bugs, Qt, Fedora/Red Hat Bugzilla]. After much prolonged contemplation, Pravin managed to push in a patch to fix this issue in Harfbuzz that’ll also make it to Qt. This fixes the issue of rendering.

The review discussion for this patch (which is also expected to resolve a few other issues) is happening here. However, the delay in updation of the much outdated entry in the Unicode FAQ led to a lot of confusion about whether the usage of U+200C had indeed been discontinued in favour of U+200D. This needs some kind of prompt action on the part of whoever maintains that FAQ. (Sayamindu had also mentioned it in his blog earlier)

[1] Two consonants can be used to write two different composite characters, varied by different sequence of usage.


The other major issue that is underway in the same review discussion is about allowing the input of multple split dependent vowel signs as a separate valid dependent vowel.

Eg. ক (U+0995) + ে (U+0997) + া (U+09BE) to be allowed as an alternative input sequence for ক (U+0995) + ো (U+09CB)

The Devanagri equivalent would be:

क (U+0915) +े (U+0947) +ा (U+093E) to be allowed as an alternative input sequence for क (U+0915) +ो (U+094B)

In general practice, when a dependent vowel is written after a consonant it completes the composite character. Multiple dependent vowels are not allowed to be written for one single consonant. While the pictoral representation in the above example may be similar, but in reality the spilt vowel sequence may lead to incorrect rendering across applications (in future for URLs as well) if the code points are stored as such. In applications using Pango, the second vowel input is displayed to the user as an unattached vowel sign with a dotted circle. This would automatically warn the user about an invalid sequence entry.

Since Qt (and looks like Uniscribe too) uses this practice, perhaps a specification is floating around somewhere about how the conversion and storage for such input sequences is handled. Any pointers to this would be very helpful. At present I am keeping any eye on the Review discussion and hopefully the issues would be resolved to ensure an uniform standard persists across all platforms.

Compiz Translation Bugs

While searching around for some related stuff, I came across the Bengali India (bn_IN) Translation for compiz. Gave a hurried look through the file and came across some bits and pieces, which imho may need a relook from the translator/reviewer.

Some examples here:

en: “Stick”
bn_IN: “চেটে যাক”
retranslated en: “Let it lick”

en: “Unstick”
bn_IN: “না চাটবে না”
retranslated en: “No, it won’t be licked”

en: “Make Above”
bn_IN: “উপরে আনোন”

en: “Forcing this application to quit will cause you to lose any unsaved changes.”
bn_IN: “এপ্লিকেছন টি যোরকরে বন্ধ করতে হবে আর আপনার সকল unsaved হারাটে হবে |”

etc. etc.

Most of the strings in these files are general terms related to desktop graphics, and can be easily fixed. So if the compiz translators need a hand, then do feel free to holler around here.

The Short(cut) story

The following is a mail written long time ago justifying the usage of the English shortcut keys for localized Bengali applications.


Hi everyone,

Deepayan Sarkar wrote:

I’m slightly confused about this. There are two types of shortcuts,
one in menu items etc (indicated by _ or & in the translated strings),
and one like CTRL-Q to quit an application. Which ones are we talking
about? The first type are activated by pressing ALT. Where does the
CTRL key come in? If we are talking about the second, I didn’t even
know that they could be translated. Can they? How?


First up to clarify matters, this issue concerns the alt+hotkey combinations. the ctrl+key combinations as far as I am aware cannot be translated. atleast I have never come across it ever. Hence for saving a file: alt+f+s is different from ctrl+s.

Currently, I use a system with bengali locale and interface as my primary production system. The input method I use is IIIMF. This is an application specific input method switcher. Same goes for SCIM. i.e. these two do not change the keyboard for the entire system. Earlier I have used setxkbmap which was a system level input method switcher and looks like most people in the thread are familiar with this method.

What I have come across regarding alt+hotkeys during my regular work are as follows [along with details]:

==================
#1. Dysfunctional
==================

I will not call the hotkeys non-functional, but dysfunctional. Reason being:

@ IIIMF and SCIM

-> En shortcuts work as alt+en key even when the active keyboard is a bengali keyboard [ i use probhat] -> bn shortcuts like alt+bn key do not work even when the active keyboard is a bengali keyboard.

@ setxkbmap

-> bn shortcuts work as alt+bn key or alt+shift+bn key [in case of a character like ফ] with an active bn keyboard.

-> en shortcuts work as alt+key with an active en keyboard

additional info: ctrl+s types shortcuts [which cannot be translated] did not function with an active setxkbmap bn keyboard. but functioned with an active bn keyboard on IIIMF and SCIM.

===============
#2. Inconsistent
===============

Duplication of top-level menu hotkeys as well as submenu hot-key under the same top-level menu item. for e.g. in gnome-games mines বৈশিষ্ট্য -> পূর্ণপর্দা and বৈশিষ্ট্য -> পছন্দ both have hotkeys as alt+প.

========================
#3 Partial Implementation
========================

Now there are two offspins from this one.

@ gtk overrides [for gnome]:

this is specifically for the gnome desktop. as golum was kind of confused about it let me explain in detail. currently in gedit.po file Cut (_C) is translated as কাট করুন (_C), whereas in gtk+ file it is translated as কাট(_ট)। But when populating the menu items for gedit, in some cases instead of the translation being used in gedit.po file the one from gtk+.po file is being used. [http://runa.randomink.org/AnkurBangla/gedit1.png]. I use a .mo file compiled from the original .po file and the same thing looks as this image. [http://runa.randomink.org/AnkurBangla/gedit2.png]

@ application related:

this is for applications that do not have a text editor on its primary interface. Again using the example for gnome-games mines [alternatively same game]. this application uses alt+bn key hotkeys.

** IIIMF and SCIM: bn keyboard for this particular application cannot be activated, because there is no text entry box on the primary interface. hence alt+bn hotkeys do not work. alt+en hotkey works. a text entry box appears only when the user is allowed to write in his/her name for the score.

** setxkbmap: alt+bn hotkey combination works on the main interface as setxkbmap sets the system level keyboard to bn.

================================

The reasons for the above mentioned behaviour is unknown to me and I can only comment about them as observations from the perspective of a user. Whether geeky or not, one cannot assume the requirements of a user. sometime back while doing an installation on a test system i had to resort to the hotkeys due to a malfunctioning mouse. at that point of time the shortcuts on anaconda did not function as they were in english and the keyboard used during installation is en. and unlike the gedit solution I mentioned earlier, hacking on installers is not really an available option. This issue has been resolved and I mention this only to highlight the fact that requirements from users can be varied and at times maybe due to unexpected circumstances.

Secondly, the issue regarding consistency between KDE and Gnome. Barring contexts I guess issues for both the desktops ought to be dealt with separately. Yet, known issues in gnome can be used to reference any similar issues arising in KDE and vice versa. I guess kcontrol would be apt example in this case where multiple backend files are being used and consistency is a key element. Similar to the gedit+gtk scenario.

Given that we have come across multiple results, it might be a good idea to go behind the scenes to figure out where exactly things are going wrong. Whether using en shortcuts is a regressive step backward is somewhat fuzzy as of now. Currently, the bn shortcuts are comparitively more dysfunctional and inconsistent. If we need to implement bn hotkeys successfully, first we need to get our homework done and check in the inconsistency factor. Secondly, given the fact that most distros are shipping with IIIMF and SCIM as the default input method framework for localized versions can we afford to promote a bengali desktop that shows stark flaws on the primary desktop interface. [refer #1]. To conclude, imho, it is always better to provide a functional interface that would be open to change and improvement in the future rather than restricting usage in the present.

regards
Runa