Tag Archives: kde-l10n

Traditions And Technology – Talk transcript from KDE India conference

The following is the transcript of my talk at the KDE India conference.

Hello, my name is Runa and I have been working as a Localization Specialist for the past 8-9 years. I often get asked as to why I do what I do and most of the time I end up saying “I am making way for choices”. Well.. don’t we all love choices… candies, clothes, books, shoes, gadgets, food. Anything, as long as compulsion can be repelled. Choice equates to Liberation.

However, there are times when compulsions emerge as a necessary evil that one has to live with. Sometimes they are not really evils but things not within the known circuit that it is natural to resist it. The journey of time and events in history of human kind has enough examples when ordinary lives have been irreversably changed. Thankfully most of us in this room have not had to live through the great wars or even perhaps the most important historical event of our country – the turmoil during the independence, partition and migration of population. Nevertheless we have lived through events which are significant in the current times – the dotcom bust, global economic turmoil, unemployment, inflation, population and most importantly the rise of information based technology and remote communication.

During my schooldays when dinosaurs roamed the earth, people socialized over tea parties. Very few homes had a telephone. Mobile phones were unheard of and telex machines were the most advanced technolgy used in offices. During college years it was still a miracle if one could actually have an email id for themselves. And even then it was just one of the fancy things to play around with… nothing mainstream. And this is probably not my story only, quite a number of us would vouch about a similar chain of events. When we add all these stories up, we actually see a much larger inter dependent impact. It has impacted people of multiple generations, multiple academic background, professions et al.

The wheel changed the human civilization, steam engine brought in the industrial revolution, steel ushered in the contemporary industrial advancement for economic development. And advancement in remote communication methods brought in what we know as globalization. All of these have influenced the adapted lifestyles of the human civilization in varying degrees and speed of course.

Socio-economic factors have always left a significant impact. We live in bigger, wider cities, travel longer, and far. The houses we live in don’t just need to be protected against earthquakes and fire and equipped to have special corners for refridgerators and washing machines. They need to be provided sockets for network ports, a place where the computer table would go, electric sockets for the mobile charging ‘stations’ – considering there’ll be quite a few in every household and also for the visitors. Well these are signs of a natural evolution over time. But this time, the evolution was so fast and hurried that it seemed to have happened overnight.

Modern communication methods seem to have wrapped us in an invisible web. There are probably multiple studies in various business schools about the expansion of the mobile phone consumer market and the impact it created. As a general observer of the modern Indian soceity I see it happening in two folds – compulsion exerted by one segment of the soceity and eventual cost/benefit ration. So this is how I’d explain the first factor – there was the initial group of users who had the means of using the technology and also saw some merits of using it. But their real life communications was never limited to groups of similar means… it included a lot many other people who were then compelled to adopt the same technology out of necessity. The chain continued and with subsequent affordable plans in place it did not take long for this chain reaction to continue. Now we have the individual service providers like cab drivers, grocers, milkmen and others who have to deal with consumers armed with mobile phones making use of this technology, not just to expand their businesses but to survive!

Similarly, the analogy extends to web and internet based communication methods. Where the visual medium plays an advantageous role. Imagine a scenario like this one. You have moved into a new house and you are getting some furniture customized for your rooms. However, you being an extremely busy person with very limited time it has become impossible for you to go check the designs at the carpenter’s workshop. As a result your furniture is not complete and the impasse continues. This situation could have been easily avoided if your carpenter and you could have connected where neither party needed to by physically present. For eg. if the designs were sent over to you over email and then you and your carpenter could have worked on the details over the phone. Now this is scenario not unusual now, but was until a few years back. However, it is not that all the carpenters have armed themselves to communicate with their consumers this way. But the ones who haven’t do risk losing their business because their consumer base is changing. The younger technology charged generation in the households are coming of age and they are the ones who would be doing the major portion of the spending.

However, this is from a business perspective. Even homes have seen a significant change in lifestyle. With the industries and employment opportunities being created around information based technology and resultant industries like real estate, hospitality, banking, retail etc. expanding around it, a large group of the employable population are seeking employment in the cities that are considered as the hubs. Children are leaving home much more than the earlier generations and with communication not being restricted to a postcard or an odd trunk call like earlier times, migrating is not seen as a major upheaval in the household. But what we do see is the way the older generation is being touched by these gadgets of modern communication. Stories about resistance to using them is what urban legends are made of in households.

Well, we can go on for hours about similar stories and analogies, but I am assuming that most of us in this room have been through similar situations. Further emphasis is not needed. Cut to the bottomline, the fact that emerges is that THIS is the world we now live in and we are not going backwards. This is somewhat similar to how things are at the place where I live. I live in this city called Pune and the first thing that anyone would notice when they land in the city are the huge numbers of crawling 2 wheeled vehicles. Motorbikes and scooters jostle for space on the narrow roads. Most car owners also have a bike or a scooter as well. The city however does not have much public transport. Now this is a city that was essentially a bicycle town. The bicycles graduated onto scooters and motorbikes. With the city expanding horizontally and the population increasing in large numbers the two wheeler population also increased. Road space is also at a premium in most places with limited scope of expansion. As a result the jostling on the road continues, but we are not going back to the era of the bicycles. In both cases this is the foundation that everyone would need to adapt to.

Well, this is not necessarily a bad situation, but it certainly is largely an unprecedented one. The most important difference is that we have a reversal of traditional roles in knowledge dispersion. Traditionally, knowledge has been passed in a linear format. The older generation learns, gathers experience, mentors the new generation. The new generation merges new knowledge to the pool, gathers further experience and passes on the next generation. This is how workplaces and households have functioned for ages. The linear chain is broken and the information flow has shifted to start from the newer generation and work its way upwards. The older generation are faced with the task of putting aside their traditional tools and learn how to function with the new. Many of us have probably encountered this in our homes, where our parents, uncles, aunts and other elders have suddenly been confronted with a computer at their office desks and have been made to take mandatory lessons to learn to work with them.

If like me, you were born sometime in the late 1970s or 1980s in India, then you would be somewhere in the middle of all this turmoil. We have seen world the before the communication revolution happened and also being young learners have adapted rather fast to the change. Children born sometime after the middle of the 1990s have woken up to a wired world and do not feel lost when confronted with a desktop computer or a smart phone.

Given the comparitively modern nature of technologies that is being used, the workforce engaged in its development, proliferation and maintenance is the younger generation. What this group of creators bring in is an infusion of modern culture, their language and fusions, terminology. Most importantly their slang – by this I mean the term as used in linguistics i.e. colloquial terms and also local flavours. Local flavours – primarily because of the global nature of the workforce and their consumers.

Infusion of local flavours whether in language or culture to a basic substrance is not a new phenomenon. And it has been widely used in areas like advertising , where the same films and prints have been used with dubbed scripts or redone to be more identifiable with the target audience. Even television shows (like Idols or Dancing with the stars, Who wants to be a millionaire etc), which have been redone to suit viewer tastes . When it comes to modern technology it is assumed to be based upon English. Which is not completely untrue. Compounded by the second assumption of having a very high learning curve, this adds to a fear of the unknown by a large group of the population. The second reason is understandable and can be a cause for worry although not unsurmountable, but its the first one that poses a very interesting problem.

When we considered the userbase earlier, I had mentioned that the role reversal had caused the older generation to have assumed the role of learners. Without prejudice I’d like to state here from personal experience, that I do sympathise with them after coming to terms with the fact that learning new things – whether its a language or a skill like driving does become comparitively difficult after one hits a certain age. That may differ for individuals so I would not put a number to it.

A good percentage of this group of people have not been academically trained to use english as their primary language of communication. They may have picked up the skills from frequently using it at their workplace or other areas while communicating with this very large group of multi-lingual people otherwise known as their countrymen. Also perhaps this is the legacy that we would continue by the name of ‘Indian English’ and we already have several dialects of it in various corners of the country. This group of people also consist a large number of the present generation who have adapted the use of the spoken and written form of English to be used along the multiple other languages that they speak. Because in India children grow up learning atleast 2 to 3 languages. Here I’d like to mention about a student who is hosted with my family back home in Kolkata. This young lad comes from a small town outside Kolkata and is an undergraduate student at a city college. Ever since the local cable operators have started providing the Nat Geo channel with Bengali dubbings, it has been a trying task to get him away from the television. For a couple of hours every night he is engrossed in a world that he probabaly did not know existed earlier. And for once no one is really complaining about spending too much time in front of the telly.

Anyways, back to our demographics. There is this other group of people who have learnt to use a version of English that does not include the contemporary flavour that has evolved out of primarily US American and even Australian slang – yet again I say the word from the linguistic perspective. Hence for these people, ‘default’ translates into ‘breaking a rule’ and the word they would have used to mean ‘standard value’ would be ‘de facto’. There are more such examples where conflicts exist.

Another group of people are the vocationally trained service providers. This is the group where our neighbourhood carpenter comes into the picture. They are highly skilled people but we might be stretching our expectations a tad too much, if we assume that they would be able to parse l33t if they encounter it on the interface of a desktop application. With repeated usage – perhaps yes. However, the substance may not be interpreted with full appreciation.

As part of localizing desktop applications I have been reading user interface messages from a very large number of applications, including Office suites, file browsers, web browsers, chat clients, games etc. Now one of the fun things that people like to do with their workspaces and gadgets is decorating them. Like setting a nice wallpaper or ringtone. Wallpaper names and colours often pose a very serious problem. These are steeped in cultural connotations. Buildings, landmarks, plants, flowers, fruits, food, fauna, dances, musical instruments, scenes from festivals, sports are very often used as images. Same with colours. I had a very tough time some years back when trying to translate the names of some colours named after speciality wines from various regions of France. And another example that I can recall are various shades of black and white named after the stages of a snowstorm. Translating that for an audience of a tropical country is a serious challenge. In this regard, I really have to mention and I can’t say this enough is that the user interface messages of Mozilla Firefox Browser are perhaps one of the best examples of extremely culture-neutral UI messages. They are precise, and convey the matter with extreme brevity without being flowery.

Anyways, going back to the part about tailoring products according to local requirements what we have seen so far is and attempt to create localized versions based upon language rather than identifiable cultural parallels. One of the basic methodology when it comes to helping people to adapt to a new technique or technology is to create an environment that is familiar for them to play around with. Familiarity creates a comfortable footing for further exploration. Choices and options allow flexibility to the process. With additional handholding it gets better and easier to learn. And what better way to create a comfortable learning environment than to use local and time tested analogies from the mainstream.

This is where a good number of us get to play a part. As creators either primary or secondary – i.e. people like the localizers like me. In the first place we need to settle the part about why we do need localized or localization ready application. Right from education to social networking, digital libraries, information gathering like the ongoing census, GIS to disaster management the userbase across the globe is going one way – increasing. We need production ready pieces of software to be provided to users who may not be in a position to learn them at leisure or may have to learn them before they are repelled enough to give up. In such cases, facilitating the uptake can be provided through multiple fronts of convenient choice including a choice of languages and/or script familiarity.

Also in the process, power users can be converted to the role of creators. With a better understanding of the operational domain they can interpret the functional aspects of the applications with more precision. In domains like disaster management where local inhabitants of the affected areas are deployed, choice of language plays an important part in ensuring that participants can seamlessly work in ensuring that the operation runs without hindrances.

Lets take the example of GPS devices are being increasingly used by people who drive around on highways and within cities. In earlier days, one just had to pull down the window and holler at a passerby to ask for directions. If one had ventured into an area where the local language was an unfamiliar one, a local driver or guide could be hired to do away with that problem. This option has not really vanished, but for all practical purposes people do not mind having added features in their GPS devices that would bridge this gap as well to ensure that they are not really stuck without an option. Localization is often referred to as a low hanging fruit for entrants into the Open Source Software world. Maybe it is. The way I see it is, the lower it hangs the more number of mouths it feeds.

The primary aim of all application developers is to have people use the products that they create. Probably some start their projects to solve a nagging itch of a problem that they encounter. Most open source projects gather people along the way. From nearly every possible corner of the world. While we here in India ponder over what a ‘Bordeux’ colour of wallpaper would look like, someone in South Africa would probably be posting on wordpress using the blogging application with a Bengali title – Lekhonee. With an open gate for new entrants to contribute in evey possible aspect of creating the tools of the modern and wired world, each one of us have the potential to bring forth the learnings from our individual culture and homes to better equip these tools and resources.

There have always been passionate people who do not hesitate to dive headlong into what they believe would take forward their ideals and beliefs. I remember the blogger greatbong, write about the 10paisa poet, who came from a district town in West Bengal to the Kolkata Book Fair and used to roam all over the fair grounds reciting his poems. During those 10 days the man lived on the pavements. All for the love of his creations. In most cases passion emerges during adversity and thrives in stability. Our positions are enviable in many ways. We live in a free culture where social mouthpieces like microblogging can create a direct impact. One does not have to wait for newspapers or politicians to take up cudgels on their behalf. Armed with academic qualifications, a better understanding of the modern technology and global culture , and the freedom to create, we hold a key to adopt the roles of makers and set a direction for the society. How we do it though is something that each of us has to discover.

Lastly, I’d like to conclude with this message in memory of a fellow member of the Pune Linux User’s group and Blender artist Zoyd aka Vinay Paway. He passed away last year in an unfortunate road accident. He had worked on the movie Sintel and couple of days before he passed away he was working on getting the English dialogues in the film translated in various Indian languages. We could only finish the subtitling work in Bengali and Hindi. More language translations have been pledged in his memory and do please come forward to complete if you’d like to join in.

Ra-Jhaphala in Qt Applications

While writing text in many Indian languages we encounter composite characters comprised of various combinations of more than 1 consonants and/or dependent vowels. Generally, these are written as:

1. Consonant + Joiner + Consonant (+ Dependent Vowel Sign)
2. Consonant + Dependent Vowel Signs (which will determine how what vowel sound would be used to pronounce the consonant)

However, there are exceptions where a straight implementation of the writing rules cannot be used for text input in an i18n-ized application. An example is the curious case of the two alphabets – āĻ° (aka RA, Unicode: U+09B0) and āĻ¯ (aka Ya, Unicode: U+09AF). These two consonants allow two different composite characters to be written, in the same sequence of usage[1].

Sequence 1:

āĻ°ā§āĻ¯ = To write words like āĻ†āĻ°ā§āĻ¯ (pronounced as ‘Ar-j-ya’, the ‘j’ is an exception in pronouncation practised in Bengali)

Sequence 2:

āĻ°â€ā§āĻ¯ = To write words like āĻ°â€ā§āĻ¯āĻžāĻ¨ā§āĻĄāĻŽ (i.e. transliterated version of the word ‘random’ that is pronounced as ‘rya-n-dom’ and hence has to be transliterated appropriately)

In both the above cases, āĻ° and āĻ¯ need to combine in the same sequence. Hence the simple method of writing them as āĻ° + joiner + āĻ¯ would not work in both cases. Due to a higher frequency of usage in Bengali words, this combination has been assigned to Sequence 1. For Sequence 2, an additional character ZWNJ (U+200C) had to be used. However, since Unicode 5.0 this has been changed and instead of ZWNJ, ZWJ (U+200D) is to be used to write Sequence 2.

” â€ĻUnicode Standard adopts the convention of placing the character U+200D ZWJ immediately after the ra to obtain the ra-yaphaalaâ€Ļ”

– from the Unicode 5.0 book, pg. 316 (afaik the online version is not available)

The next challenge was to ensure that this sequence was rendered correctly when used in a document. While it was correctly displayed on Pango, ICU and Uniscribe, Qt majorly broke [bug links: KDE Bugs, Qt, Fedora/Red Hat Bugzilla]. After much prolonged contemplation, Pravin managed to push in a patch to fix this issue in Harfbuzz that’ll also make it to Qt. This fixes the issue of rendering.

The review discussion for this patch (which is also expected to resolve a few other issues) is happening here. However, the delay in updation of the much outdated entry in the Unicode FAQ led to a lot of confusion about whether the usage of U+200C had indeed been discontinued in favour of U+200D. This needs some kind of prompt action on the part of whoever maintains that FAQ. (Sayamindu had also mentioned it in his blog earlier)

[1] Two consonants can be used to write two different composite characters, varied by different sequence of usage.


The other major issue that is underway in the same review discussion is about allowing the input of multple split dependent vowel signs as a separate valid dependent vowel.

Eg. āĻ• (U+0995) + ā§‡ (U+0997) + āĻž (U+09BE) to be allowed as an alternative input sequence for āĻ• (U+0995) + ā§‹ (U+09CB)

The Devanagri equivalent would be:

ā¤• (U+0915) +āĨ‡ (U+0947) +ā¤ž (U+093E) to be allowed as an alternative input sequence for ā¤• (U+0915) +āĨ‹ (U+094B)

In general practice, when a dependent vowel is written after a consonant it completes the composite character. Multiple dependent vowels are not allowed to be written for one single consonant. While the pictoral representation in the above example may be similar, but in reality the spilt vowel sequence may lead to incorrect rendering across applications (in future for URLs as well) if the code points are stored as such. In applications using Pango, the second vowel input is displayed to the user as an unattached vowel sign with a dotted circle. This would automatically warn the user about an invalid sequence entry.

Since Qt (and looks like Uniscribe too) uses this practice, perhaps a specification is floating around somewhere about how the conversion and storage for such input sequences is handled. Any pointers to this would be very helpful. At present I am keeping any eye on the Review discussion and hopefully the issues would be resolved to ensure an uniform standard persists across all platforms.

Of many things and one

Long time since this page saw some activity. *sigh*. This could have been a post of many things, like:

  • How stress induced fatigue (my dad’s words not mine) caused me to sleep for nearly 36 hours at a stretch
  • The updates to the Gnome Mango system done by Olav Vitters and account system documentation done by Christian Rose has made things so easy for us translators
  • The mad rush for KDE 4.1 Translations
  • The LC Python workshop conducted by Ramkrsna at our office in Pune. Rahul Sundaram followed up with a talk on contributing opportunities in Fedora
  • Our new car
  • The huge power and water shortage that happened in Pune and messed up our daily schedules
  • The much-delayed fun trip to Mumbai and about the time spent with Barkha and her family, the ride on the deccan queen, boat ride to elephanta, visit to mahesh lunch home, getting soaked in the rain at Juhu beach, riding back to Pune in an ambassador taxi amidst pouring rain
  • My views on why overt channel admins (the pronounced green medals, not the access lists) on irc channels in some open-source projects creates unwanted hierarchical levels.
  • Mozilla 3.0.2 translation sprint. Am waiting for a few bug responses at the moment, but hopefully that should not stop the inclusion of bn-IN this time.

    But then let me talk about something thats really much more important. The other day Ani showed me the search feature on the KDE Translation Project website, that allows searching of a term/string in translated content. The setup in this case gets the content from a selected directory of the svn, runs a query for the search string and presents the output results (string and its translated version) with direct link to the source documents. A database is also involved somewhere in between the process.

    So a few of us were talking about having a similar tool that would allow us to search strings on user-defined content locations and present the strings with the search items, corresponding translated content and pointers to the source document. And so evolved Translation-Filter, by Kushal. A nifty little tool, that does just what we need. Its still being worked upon, but at the moment what you can do with it is:

  • Define a custom location with multiple .po files
  • Provide a string to search in the defined location
  • Get an output with the original english string containing the search item,corresponding translated string and the source file name
  • Provide a list of strings to search via a plain text file
  • Save search results as .html pages
  • Use the tool from the command line and a basic GUI dialog box

    The project is a part of Fedora already and Kushal has packaged it.

    At this moment the benefits look huge. Primarily it will allow us to ensure consistency of bn-IN translated content across projects (the ones using .po files at the very least). Perhaps (as Sayam thinks) very soon we can make a web-based version of it too. So right now… kushal++ 😀

  • Translating Strings with Plural forms in .po files

    Claude Paroz reported on the gnome-i18n mailing list today about the msgfmt errors that were showing up on the damned lies page against some files in a few languages. Bengali_India also has a few red medallions and the problem was baffling me since msgfmt checks on the files locally were not showing any errors. It was only after submitting them to the Gnome svn that the msgfmt error messages were showing up. After discussions on the gnome-i18n mailing list and irc conversations, the following seems the correct solution for plural form related errors for languages that do not have any plural forms.

    Open .po file in a text editor (like gedit) and insert the following line in the header section:

    “Plural-Forms: nplurals=1; plural=0;\n”

    should look like this:

    Once you encounter a string with a plural form there are two ways to go about it.

    1. If both the strings contain a variable (like %d) indicating a number, then the translation can be done as per the language requirements for handling single and plural numbers. e.g.

    msgid “%d file has been modified”
    msgstr_plural “%d files have been modified”
    msgstr [0] “”
    msgstr [1] “”

    In this case, after translation the string should look like the following in a text editor:

    msgid “%d file has been modified”
    msgid_plural “%d files have been modified”
    msgstr [0] “%d-āĻŸāĻŋ āĻĢāĻžāĻ‡āĻ˛ āĻĒāĻ°āĻŋāĻŦāĻ°ā§āĻ¤āĻ¨ āĻ•āĻ°āĻž āĻšā§Ÿā§‡āĻ› “

    Note: The %d has been retained in the msgstr.

    2. The second way is the more important of the two and requires caution. If the singular version string does not have a variable, but the plural string contains a variable, then the translation of only the plural string has to be done. e.g.

    msgid “one file has been modified”
    msgstr_plural “%d files have been modified”
    msgstr [0] “”
    msgstr [1] “”

    In this case, after translation the string should look like the following in a text editor:

    msgid “one file has been modified”
    msgstr_plural “%d files have been modified”
    msgstr [0] “%d-āĻŸāĻŋ āĻĢāĻžāĻ‡āĻ˛ āĻĒāĻ°āĻŋāĻŦāĻ°ā§āĻ¤āĻ¨ āĻ•āĻ°āĻž āĻšā§Ÿā§‡āĻ› “

    Note: The %d has to be present in the msgstr and will take the relevant number of “files modified” when the program encounters such a situation.

    Also note, in both cases above the msgstr [1] “” has to be manually removed from the strings.

    As a final check, please do ensure that the following command does not throw any errors:

    msgfmt -vc -o /dev/null (filename.po)

    For more information about plural-forms this document is extremely helpful.