Translation – dive in!

The reason I started writing this post is the recent rise in the interest towards things related to translation and localization. Everywhere one turns to there is someone evangelising this revolution from atop a soapbox and gathering people around for quick win localization projects. It may be reasonable to question if I consider this innundation of localizers as an unhappy turn of events. Hardly. After having toiled alone for ages, at times through uncharitable sneers it is indeed a welcome change. However, I have some grave reservations about how this is being done.

Off-late there has been a rising impetus on forming geography based communities around some of the significant (eye-ball grabbing) FOSS projects. With the proliferation of the projects’ user base this is a natural progression in the scheme of things. When communities are based on geographies one of the first things they tend to find commonality in is their language. Thus, enter localization. So far so good. However, this is where the slightly disruptive butterfly starts to flutter its wings.

The localization projects are also a major entry point for new contributors to be lured into the projects. It has forever been a perception that translation was the easiest way to start contributing to any open-source project. And why not? Everyone seemed to be able to read and comprehend English – the original language used in most components and the same ‘everyone’ also knew how to read and write the language that they were going to translate into. Fair enough, come join. All Hail Crowdsourcing!!

This is where the fluttering starts to get serious. Most of these localization projects were not new discoveries. Depending upon the maturity of their localization sub-projects, there are established norms of translation, review, terminology and validation, including certain methods to groom new translators. Teams are formed around a language to ensure that translations are consistently updated and polished to attain a high degree of consistency and perfection. Conventions evolve and rules honoured.

Does that make it difficult for new entrants to join? Marginally, yes. But then which other projects do not have this barrier. If it is acceptable for projects to validate and audit codes before accepting them, why should localized content be considered an open field for experiements. Especially, when compared to codes the latter is far more difficult to trace and rectify.

The following is an excerpt from an interview with Sue Gardner, Director of the Wikimedia Foundation, where she answers a query about whether new contributors were finding it difficult to work their way around the policies:

We queried her take on this second area, pointing out that all publishers that aim to present high-quality information find they need complex rules, whether explicit or via accepted standards of writing and scholarship. Could she give specific examples of areas where we could simplify policy without sacrificing standards?

Yes, the premise of this question is absolutely correct. The analogy I often use is the newsroom. Anybody who’s curious and reasonably intelligent can be a good journalist, but you do need some orientation and guidance. Just like a newsroom couldn’t invite in 100 random people off the street and expect them to make an immediate high-quality contribution, neither can Wikipedia expect that.”

What most of these populist programs tend to miss are the percolations that are felt elsewhere. For languages with large amount of published localized content that have been filtered through long periods of (mostly) manual validation, experiments on ancilliary components introduce inconsistency and worse, errors. For instance, non-validated translations in add-on components ruin the user-interface of the main component. Which in most cases is an extremely prominent application and often part of enterprise level products. These errors can be resolved by the usual bug tracking systems, but how does one chase up volunteers who had turned up for localization sprints and have moved on?

Crowdsourcing is here to stay. So will crowdsourced contributions. With more flexibility in translation tools, the new age translators do not have to go through the rigourous grooming process that were prevalent until a few years back and has shaped a lot of the veteran translators.They can get their contributions into the main projects without any delay. Often with the blessings of the sponsoring project who do not have to wait for their translation assets to multiply and their local communities to expand. With some amount of experience both as a translator and as a homemaker, the one thing that I can vouch for is that technical translation is not unlike housework – everyone has an opinion oh how easy it is but you don’t know how many corners you end up cleaning until you are down on your knees doing it.

Indic Typing Booster – Bengali

My colleagues Pravin Satpute and Anish Patil have been working for sometime on a cool tool called the Indic Typing Booster. The premise for this tool is to aid users new to typing in Indian languages. Using a normal US English keyboard (i.e. the widely available generic keyboard around here) users begin typing a word in a keyboard sequence of their choice and after a couple of key presses the typing booster prompts the user with a series of words that match the initially typed in key sequences.

For instance, if the user wanted to type the word ‘कोमल’ (pronounced as: komal) in a phonetic keyboard sequence that maps क to k and ो to o, they could start by pressing ‘k’ and ‘o’ and lo and behold (no not Baba Yaga, but) a drop down menu opens up with possible words starting with ‘ को’ . From this list the user may then choose one to complete the word they had intended to type. List of words from a backend database feeds this list. Each language gets a database of its own, compiled from available text in that language. Users can add new words to the list as well.

The typing booster requires that the IBus Input Method is installed in the system. The other necessary packages to get Indic Typing Booster working are:

  • ibus-indic-table
  • <language-name>-typing-booster-<keymap-name> (i.e. for Bengali Probhat you would be looking for the bengali-typing-booster-probhat package)

If you are using Fedora, then all these packages can be easily installed with yum. If you are not, then the necessary information for download and installation is available at the Project Home page: https://fedorahosted.org/indic-typing-booster

Besides erasing the need for looking for appropriate keys while maneuvering through the inherent complications of Indic text, the typing booster could evolve into the much needed solution for Indic typing on tablets and smartphones.

After Marathi, Gujarati and Hindi, the Indic Typing Booster is now available for Bengali (yay!). The Bengali database is by far the biggest store so far, thanks to the hunspell list that was created through an earlier effort of Ankur. Pravin announces the new release here.

This is what it looks like.

So to write কিংকর্ত্যবিমূঢ়, I could either type r/f/ZbimwX or just press 4 to complete it.

Do please give the Indic Typing Booster a go and if you’d like to contribute then head over to the mailing list – indic-typing-booster-devel AT lists.fedorahosted.org or IRC channel – #typing-booster channel (FreeNode).

যাহা পাও তাই লও, হাসিমুখে ফিরে যাও।
কারে চাও, কেন চাও– তোমার আশা কে পূরাতে পারে॥
সবে চায়, কেবা পায় সংসার চলে যায়–
যে বা হাসে, যে বা কাঁদে, যে বা প’ড়ে থাকে দ্বারে॥

হ য ব র ল – Level up

A couple of days back the following announcement was made by the Government of India through the PTI:

In a bid to overcome problems posed by difficult Hindi words, Government has asked section officers to use their ” hinglish” replacements for easy understanding and better promotion of the language.

official circular here.

Excuse me while I whoop with joy for a moment here. Reason being, its a clear endorsement of something that I have forever followed in Bengali (India) Translations. I have argued, fought and have been ocassionally berated for not coming up with innovative Bengali words for the various technical terminology that I have translated. My steady answer has been something to the tune of – ‘don’t fix it, if it ain’t broken’.

At conferences and other places when I used to interact with people who had suddenly taken an interest in localization, they were often pretty upset that things like ‘files‘, ‘keyboards‘, ‘cut‘, ‘print‘ etc. were simply translitered in Bengali. (I am sure they did not hold very high opinions about the bunch of Bengali localizers.) So we got suggestions like – “you could consider translating ‘paste’ as ‘লেপন’ “(similar to গোবর লেপা, i suspect), or “you need to write মুদ্রণযন্ত in place of a printer“. There were more bizarre examples, which were more like words constructed with several other words (for things like URL, UTC etc.). I held my ground at that time, and hopefully this announcement has at last put my doubts (well, I did have second thoughts about whether I was being too adamant while “compromising authenticity for practicality“) to rest.

After getting the necessary i18n bits fixed, Bengali localization for desktop applications primarily came about around circa 2000. However, computer usage among the Bengali speaking/reading population has been happening for decades before that. By the time the first few desktop applications started to peek through in Bengali, there already were a good many users who had familiarized themselves with the various terms on the desktop. Users were well-familiar with:

  • clicking‘ on ‘buttons‘, or
  • going to a link, or
  • printing‘ a ‘document‘,
  • cutting‘ and ‘pasting‘,
  • pointing‘ with a ‘mouse‘ etc.

Subjecting them to barely relatable or artifically constructed terms would have squeezed in another learning phase. It just did not make sense.

In response, the other question that creeped in was – ‘then why do you need to localize at all?‘ It is a justified query. Especially in a place like India, which inherited English from centuries of British rule. However, familiarity with a language is not synonymous to comfort. Language has been a hindrance for many things for ages. Trying to read a language, one is not fully comfortable with can be a cumbersome experience. For eg. I can speak and understand Hindi quite well, but lack the fluency to read it. Similarly, there were a good number of people who did not learn English as their primary language of communication[1]. Providing a desktop which people can read faster would have gotten rid of one hurdle that had probably kept away a lot of potential users.

There were also people who knew the terms indirectly, perhaps someone like a clerk in the office who did not handle a computer but regularly needed to collect printouts from the office printer. This group of people could mouth the words but did not read them often and if the language on the desktop was not the primary language of everyday business, they probably did not even know what the word looked like. When getting them to migrate their work desks to a desktop, it is essential to ensure that the migration is seamless and gave prime importance to the following:

  • Familiar terms should not be muddled up, and
  • Readability of the terms is not compromised

Point 1 is also required to ensure that the terminology is not lost in translation when common issues are discussed across geographies and locales. For eg. in institutes of higher education or global business houses. Getting it done by integrating transliterated terminology for highly technical terms that were already in prevalence seemed like the optimum solution. It has not worked badly for Bengali (India) localization so far. We have been able to preserve a high quality of consistency across desktop applications primarily because the core technical terminology never needed to be artificially created, which also allows new translators (already familiar with desktops in most cases) to get started without too much groundwork.

Note: it is not unusual to find people in India speak fluently in 2-3 languages and not always in a pure form of any. Mixing words from several languages while conversing is quite a prevalent practice these days.

Pride and Prejudice

One of the first things that you would notice if you walk into one of the mammoth old buildings around Dalhousie Square in Kolkata are the rows and rows of electrical cables that hang from various corners of the ceiling. The tangles would put to shame a highly intricate streamer decoration at a party.(see some here) They are dangerous, yet everyday people walk in and out or sit for hours under them without a stutter.

In some kind of graphical representation, that is probably what our country looks like. A montrosity thats bursting at its seams, waiting to spill out its contents and held together by a network of flimsy patches at various places. Yet, it stays in place. Just like inside those old buildings, people carry on with their lives nonchalantly. More as an existential pattern they have known for a lifetime. Any alternative is unknown or doesn’t seem to work (and i am guessing here) mostly due to a lack of familiarity. With a billion other people to fight against for a share of food, jobs, a berth on the train and everything else, life as we know it here in India is a constant challenge that most of us don’t really sign up for, but nevertheless accept because otherwise we may risk losing what we have managed to gather.

What breaks this mad rush are incidents induced by nature’s fury or misguided human fury. Like the other day. Bombs, in Mumbai (yet again). What followed was the usual round of calling up friends, family and other folks to check if things were ok. When things settled without the detection of any cause for alarm, one could divert their attention to the messages of wrath that started pouring on various timelines. Some called for an attack on the perpetrators, while others lamented upon the lack of tooth and nail within the general populace. Honestly, even I have felt the same way, when accosted by a situation grave enough to rattle me in some way. However, in most other cases I prefer to maintain a reserve. Not because I do not empathize, but rather I have inherited a trait from a parent who describes it as – unless there is a fully informed solution that has any practical implementation in a conducive environment, it is never a good idea to ramble opinions about sensitive matter. Well.. not in gentle company atleast.

A lot of people have questioned the effectiveness of our intelligence agencies and how porous our defences are that terrorists can make a serious attack with the least of efforts. Personally, I am not in a position or informed enough to provide a serious analysis of where the failure was and how things could be strengthened. Instead what I see is an unmanageable chaos. Stop for a moment and look around. What you’ll see is a unstructured mass – not just of tangible objects like people, vehicles, buildings, but a carefully nurtured cultural shroud that binds all of these. Call it rich Indian heritage, difference in castes, inequality of the classes, regional biases, the all encompassing ‘jugaad’ – in short the cultural fibre that dictates how the people of the land live with each other. And one of the things that rarely finds itself on this list is perhaps ‘respect’.

Its probably hard to describe how thats a conclusion I can come up with, except for the various instances that I see around me. Being a microscopic instance of a billion+ population, it comes down essentially to the equation of demand and supply. The more in number, the more devalued it is. In this case human lives. No one really cares about another person, because they have to struggle to ensure that atleast that one human life still gets a bit of importance – their own. Stretch it maybe a little further to family, children, parents, someone-who-matters. As long as this coocooned bunch is taken care of, nothing else matters. Trains can burn, young children can beg, a hapless guard can be yelled at, plastic bottles can be thrown into rivers, walls can be defaced, red traffic signals can be run over, a bribe paid, examinations cheated, or the nextdoor neighbour called a racist vile term.

Seriously, where is that element of respect that drives a community to stand up with pride and reclaim its glory. I find it really funny when people mouth the cock and bull statements about a ‘country that is unified in its diversity’. Bull crap. Define diversity – the politically correct regional culture or things that create differences worse than plague – religious rigidity, caste based divisions, financial demarcations, occupational supremacy…you name it and we have it. There is always a reason to disrespect the other person standing next to you. How would anyone be able to collaborate with harmony with people they don’t feel good about? Even if its for their own safety? I seriously don’t know. These differences have been passed on for generations and I don’t see it changing very soon.

Its probably like working at a place where you don’t care much about the work, but you get your paycheck at the end of month and go home happy as long as you get to buy that perfect pair of shoes or a crate of poison. Well.. as long as the next bomb doesn’t get you.

MIA

Just a general FYI (instead of emailing across a few dozen mailing lists) that I’ll be mostly offline for the next two weeks. So any bug, ticket, e-mail etc. waiting on me during this time may go unanswered. Thanks.

Fedora Translation – update about the translation credit loss problem

Looks like the problem related to the loss of translation credits in Fedora translations via Transifex.net has been resolved (as announced by diegobz).

The translators’ names are now kept/written in the PO files headers. It might take a while (hours) for all resources to be affected.

However, just to be sure it may be better to do a few trial runs first before restarting full fledged commits. 🙂

Along with that, .POT files have also been returned:

Users can now download POT file for PO based resources

(They may need locating though, could not find them yet One can get them as the ‘original .pot’ from the same dialog that is displayed when the language name/module is clicked on for online translation or download.)

More information about these may be coming in via the trac tickets: [1 ] [2]

That was total yayness! from the Transifex team. Thank you!

Not Legal, But Safe?

For quite some time now, much discussion has happened about how the commits made to the Fedora packages through http://fedora.transifex.net does not preserve (among other things) translation credits.

Explained better below:

Downloaded version from Transifex: (All original credits have been removed)

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Red Hat, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.
#
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-06 18:08+0000\n”
“Last-Translator: clumens \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”

Local Update (with manual addition of past credits):

# translation of anaconda.master.po to Bengali INDIA
# Bangla INDIA translation of Anaconda.
# Copyright (C) 2003, 2004, Red Hat, Inc.
# This file is distributed under the same license as the anaconda package.
#
# Deepayan Sarkar , 2003.
# Jamil Ahmed , 2003.
# Progga , 2003, 2004.
# Runa Bhattacharjee , 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011.
# Runa Bhattacharjee , 2007.
# Runa Bhattacharjee , 2008, 2009, 2011.
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-12 11:51+0530\n”
“Last-Translator: Runa Bhattacharjee \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”
“X-Generator: Lokalize 1.1\n”

Commited version on Transifex ( credit and user information deleted again by Transifex after the local updated file was committed):

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Red Hat, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.
#
msgid “”
msgstr “”
“Project-Id-Version: Anaconda\n”
“Report-Msgid-Bugs-To: http://bugzilla.redhat.com/\n”
“POT-Creation-Date: 2011-05-06 14:41-0400\n”
“PO-Revision-Date: 2011-05-12 06:27+0000\n”
“Last-Translator: runa \n”
“Language-Team: Bengali (India) \n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=UTF-8\n”
“Content-Transfer-Encoding: 8bit\n”
“Language: bn_IN\n”
“Plural-Forms: nplurals=2; plural=(n != 1)\n”

All this while, none of us worked on Fedora 15 modules waiting for a resolution. A query to Fedora Legal is also waiting to be answered. After some discussions to review the current status of things, it was decided to stop translation work for Bengali-India for all Fedora modules until this situation is rectified in some way or the legal status of things is established with clarity. We have earlier been victims of credit related violations, and feel very strongly about it and would not like to endorse similar violations in any way.

This has been filed as a ticket and is being worked upon by the Transifex team.

Meanwhile, I am trying to keep a record of all the past credits (Bengali & Bengali-India translators) for all the Fedora modules translated for Bengali-India. It may take a little time to track them through the upstream repositories as a lot of modules have already gotten updated into their respective repositories with stripped .PO files (due to automatic merges/updates).


The title for this post is derived from the now (in)famous twitter hashtag that came about after a tweet from a ‘well-known’ Indian blogger.

रंग्रेज़ मेरे

ये बात बता रंग्रेज़ मेरे
ये कौनसे पानि मे तुने कौनसा रंग घोला है
के दिल बन गया सौदाइ और मेरा बसंति चोला है

अब तुम से क्या मे शिक़वा करु
मैंने हि कहा था ज़िद करके, रंग दे चुनरि पि के रंग मे
करमुहे कपास पर रंग ये ना रुके
रंग इत्ना गेहरा तेरा कि जानो जिगर तक को भि रंग दे

Rangrez Mere from Tanu Weds Manu – sung by the Wadali Brothers

(There could be spelling errors as I am not much familiar with the written form of Hindi)