Category Archives: planetarium

Indic Typing Booster – Bengali

My colleagues Pravin Satpute and Anish Patil have been working for sometime on a cool tool called the Indic Typing Booster. The premise for this tool is to aid users new to typing in Indian languages. Using a normal US English keyboard (i.e. the widely available generic keyboard around here) users begin typing a word in a keyboard sequence of their choice and after a couple of key presses the typing booster prompts the user with a series of words that match the initially typed in key sequences.

For instance, if the user wanted to type the word ‘कोमल’ (pronounced as: komal) in a phonetic keyboard sequence that maps क to k and ो to o, they could start by pressing ‘k’ and ‘o’ and lo and behold (no not Baba Yaga, but) a drop down menu opens up with possible words starting with ‘ को’ . From this list the user may then choose one to complete the word they had intended to type. List of words from a backend database feeds this list. Each language gets a database of its own, compiled from available text in that language. Users can add new words to the list as well.

The typing booster requires that the IBus Input Method is installed in the system. The other necessary packages to get Indic Typing Booster working are:

  • ibus-indic-table
  • <language-name>-typing-booster-<keymap-name> (i.e. for Bengali Probhat you would be looking for the bengali-typing-booster-probhat package)

If you are using Fedora, then all these packages can be easily installed with yum. If you are not, then the necessary information for download and installation is available at the Project Home page: https://fedorahosted.org/indic-typing-booster

Besides erasing the need for looking for appropriate keys while maneuvering through the inherent complications of Indic text, the typing booster could evolve into the much needed solution for Indic typing on tablets and smartphones.

After Marathi, Gujarati and Hindi, the Indic Typing Booster is now available for Bengali (yay!). The Bengali database is by far the biggest store so far, thanks to the hunspell list that was created through an earlier effort of Ankur. Pravin announces the new release here.

This is what it looks like.

So to write কিংকর্ত্যবিমূঢ়, I could either type r/f/ZbimwX or just press 4 to complete it.

Do please give the Indic Typing Booster a go and if you’d like to contribute then head over to the mailing list – indic-typing-booster-devel AT lists.fedorahosted.org or IRC channel – #typing-booster channel (FreeNode).

Advertisements

হ য ব র ল – Level up

A couple of days back the following announcement was made by the Government of India through the PTI:

In a bid to overcome problems posed by difficult Hindi words, Government has asked section officers to use their ” hinglish” replacements for easy understanding and better promotion of the language.

official circular here.

Excuse me while I whoop with joy for a moment here. Reason being, its a clear endorsement of something that I have forever followed in Bengali (India) Translations. I have argued, fought and have been ocassionally berated for not coming up with innovative Bengali words for the various technical terminology that I have translated. My steady answer has been something to the tune of – ‘don’t fix it, if it ain’t broken’.

At conferences and other places when I used to interact with people who had suddenly taken an interest in localization, they were often pretty upset that things like ‘files‘, ‘keyboards‘, ‘cut‘, ‘print‘ etc. were simply translitered in Bengali. (I am sure they did not hold very high opinions about the bunch of Bengali localizers.) So we got suggestions like – “you could consider translating ‘paste’ as ‘লেপন’ “(similar to গোবর লেপা, i suspect), or “you need to write মুদ্রণযন্ত in place of a printer“. There were more bizarre examples, which were more like words constructed with several other words (for things like URL, UTC etc.). I held my ground at that time, and hopefully this announcement has at last put my doubts (well, I did have second thoughts about whether I was being too adamant while “compromising authenticity for practicality“) to rest.

After getting the necessary i18n bits fixed, Bengali localization for desktop applications primarily came about around circa 2000. However, computer usage among the Bengali speaking/reading population has been happening for decades before that. By the time the first few desktop applications started to peek through in Bengali, there already were a good many users who had familiarized themselves with the various terms on the desktop. Users were well-familiar with:

  • clicking‘ on ‘buttons‘, or
  • going to a link, or
  • printing‘ a ‘document‘,
  • cutting‘ and ‘pasting‘,
  • pointing‘ with a ‘mouse‘ etc.

Subjecting them to barely relatable or artifically constructed terms would have squeezed in another learning phase. It just did not make sense.

In response, the other question that creeped in was – ‘then why do you need to localize at all?‘ It is a justified query. Especially in a place like India, which inherited English from centuries of British rule. However, familiarity with a language is not synonymous to comfort. Language has been a hindrance for many things for ages. Trying to read a language, one is not fully comfortable with can be a cumbersome experience. For eg. I can speak and understand Hindi quite well, but lack the fluency to read it. Similarly, there were a good number of people who did not learn English as their primary language of communication[1]. Providing a desktop which people can read faster would have gotten rid of one hurdle that had probably kept away a lot of potential users.

There were also people who knew the terms indirectly, perhaps someone like a clerk in the office who did not handle a computer but regularly needed to collect printouts from the office printer. This group of people could mouth the words but did not read them often and if the language on the desktop was not the primary language of everyday business, they probably did not even know what the word looked like. When getting them to migrate their work desks to a desktop, it is essential to ensure that the migration is seamless and gave prime importance to the following:

  • Familiar terms should not be muddled up, and
  • Readability of the terms is not compromised

Point 1 is also required to ensure that the terminology is not lost in translation when common issues are discussed across geographies and locales. For eg. in institutes of higher education or global business houses. Getting it done by integrating transliterated terminology for highly technical terms that were already in prevalence seemed like the optimum solution. It has not worked badly for Bengali (India) localization so far. We have been able to preserve a high quality of consistency across desktop applications primarily because the core technical terminology never needed to be artificially created, which also allows new translators (already familiar with desktops in most cases) to get started without too much groundwork.

Note: it is not unusual to find people in India speak fluently in 2-3 languages and not always in a pure form of any. Mixing words from several languages while conversing is quite a prevalent practice these days.

MIA

Just a general FYI (instead of emailing across a few dozen mailing lists) that I’ll be mostly offline for the next two weeks. So any bug, ticket, e-mail etc. waiting on me during this time may go unanswered. Thanks.

Fedora Translation – update about the translation credit loss problem

Looks like the problem related to the loss of translation credits in Fedora translations via Transifex.net has been resolved (as announced by diegobz).

The translators’ names are now kept/written in the PO files headers. It might take a while (hours) for all resources to be affected.

However, just to be sure it may be better to do a few trial runs first before restarting full fledged commits. 🙂

Along with that, .POT files have also been returned:

Users can now download POT file for PO based resources

(They may need locating though, could not find them yet One can get them as the ‘original .pot’ from the same dialog that is displayed when the language name/module is clicked on for online translation or download.)

More information about these may be coming in via the trac tickets: [1 ] [2]

That was total yayness! from the Transifex team. Thank you!

Not Legal, But Safe?

For quite some time now, much discussion has happened about how the commits made to the Fedora packages through http://fedora.transifex.net does not preserve (among other things) translation credits.

Explained better below:

Downloaded version from Transifex: (All original credits have been removed)

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Red Hat, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.
#
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-06 18:08+0000\n”
“Last-Translator: clumens \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”

Local Update (with manual addition of past credits):

# translation of anaconda.master.po to Bengali INDIA
# Bangla INDIA translation of Anaconda.
# Copyright (C) 2003, 2004, Red Hat, Inc.
# This file is distributed under the same license as the anaconda package.
#
# Deepayan Sarkar , 2003.
# Jamil Ahmed , 2003.
# Progga , 2003, 2004.
# Runa Bhattacharjee , 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011.
# Runa Bhattacharjee , 2007.
# Runa Bhattacharjee , 2008, 2009, 2011.
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-12 11:51+0530\n”
“Last-Translator: Runa Bhattacharjee \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”
“X-Generator: Lokalize 1.1\n”

Commited version on Transifex ( credit and user information deleted again by Transifex after the local updated file was committed):

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Red Hat, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.
#
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-12 06:27+0000\n”
“Last-Translator: runa \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”

All this while, none of us worked on Fedora 15 modules waiting for a resolution. A query to Fedora Legal is also waiting to be answered. After some discussions to review the current status of things, it was decided to stop translation work for Bengali-India for all Fedora modules until this situation is rectified in some way or the legal status of things is established with clarity. We have earlier been victims of credit related violations, and feel very strongly about it and would not like to endorse similar violations in any way.

This has been filed as a ticket and is being worked upon by the Transifex team.

Meanwhile, I am trying to keep a record of all the past credits (Bengali & Bengali-India translators) for all the Fedora modules translated for Bengali-India. It may take a little time to track them through the upstream repositories as a lot of modules have already gotten updated into their respective repositories with stripped .PO files (due to automatic merges/updates).


The title for this post is derived from the now (in)famous twitter hashtag that came about after a tweet from a ‘well-known’ Indian blogger.

Not good enough..

So while the world ‘rejoices’ with the release of a state-of-the-art desktop, here I am sobbing at my lonely corner. Bengali-India (bn_IN) goes unsupported for GNOME3.0 as the mandatory 80% string count was not met. Among many other reasons, here’s one that I noticed pretty late.

The damned lies statistics page was redone to display a reduced number of strings. So teams, could choose to avoid unnecessary strings and work on the ones that would be more prominent on the user interface. Thats good news. Both the total and reduced string statistics for the user interface are now displayed on the status pages. However, what somehow escaped my notice is that the ‘reduced’ string count percentage on the front page (80%)…,

…differs from the number on the Ui-part page (76%)

As a result, while I was quite happy to move on to some other pressing matters with a supported status under the belt, turns out the joke’s on me this time.

Instead of the arbitrary string count statistics as a criteria for  Supported Status, personally I’d rather like something on the lines of what KDE marks as the ‘Essential Packages’. Friedel explains it much better though.

So until the time GNOME decides to rethink on its Supported Status criteria, I guess I’ll just need to gear up to face the string nazi for v 3.2