Tag Archives: l10n-tools

Testing Multilingual Applications – Talk Summary from Wikimania 2014

Its been a while since I managed to write something important on this blog. My blogging efforts these days are mostly concentrated around the day job, which by the way has been extremely enriching. I have had the opportunity to widen my perspective of global communities working with the multilingual digital world. People from diverse cultures come together in the Wikimedia projects to collaborate on social issues like education, digital freedom, tolerance of expression & lifestyles, health, equal opportunities and more. What better place to see this happen than at Wikimania – the annual conference of the Wikimedia movement. The 10th edition of the conference was held this year in London, UK. I was fortunate to participate and also present, along with my team-mate Kartik Mistry. This was our first presentation at a Wikimania.

Since the past few years, I have tried to publish the talking points from my presentations. This was my first major presentation in a long time. Kartik and I presented about the challenges we face everyday when testing applications, that our team creates and maintains for the 300+ languages in Wikimedia projects. We have been working actively to make our testing processes better equipped to handle these challenges, and to blend them into our development work-flow. The slides and the talking points are presented below. I will add the link to the video when its available. Feedback is most welcome.

Talk Abstract

As opposed to traditional testing methodologies an important challenge for testing internationalized applications is to verify the preciseness of the content delivered using them. When we talk about applications developed for different scripts and languages, the key functionalities like display, and input of content may require several levels of verification before it can be signed off as being adequately capable of handling a particular language. A substantial part of this verification process includes visual verification of the text, which requires extensive collaboration between the language speakers and the developers. For content on Wikimedia this can stretch to more than 300 languages for websites that are active or waiting in the incubator. In this session, we would like to present about our current best practices and solutions like tofu-detection – a way to identify if scripts are not being displayed, that can narrow down the long-drawn manual testing stages to address specific problems as they are identified. Talk Submission.


Talk Summary

Slide 2

As we know, the mission of the Wikimedia Projects is to share the sum of all human knowledge. In the Wikimedia universe we have active projects in over 300 languages, while the multilingual resources have the capability to support more than 400 languages.

For using these languages we use extra tools and resources all the time (sometimes even without our knowledge). But these are not developed as widely as we would like them to be.

You may know them already…

Slide 3

Fonts, input methods, dictionaries, the different resources that are used for spell checking, grammar and everything else that is needed to address the special rules of a language. To make it work in the same way we can use English in most systems.

Slide 4

The applications that we develop to handle multilingual content are tested in the same way other applications are tested. The code sanity, the functionality and everything else that needs to be tested to validate the correctness of the design of the application is tested during the development process. (Kartik described this in some details.)

Slide 5

However, this is one part. The other part combines the language’s requirements to make sure that what gets delivered through the applications is what the language needs.

So the question we are trying to answer as developers is – my code works but does the content look good too?

Slide 6

At this point what becomes important is a visual verification of the content. Are the t-s being crossed and the i-s being dotted but in more complex ways.

Lets see some of the examples here to help explain better what we are trying to say:

  • Example 1 and Example 2 : Fonts entirely missing. Displays Tofu or blocks
  • Example 3: Partially available Text – makes it hard to understand what the User Interface wants you to do
  • Example 4: Input Methods on Visual Editor doesn’t capture the sequence of typed characters
  • Example 5: The otherwise functional braces lose their position when used with RTL text
  • Example 6: Dependent Vowels in complex scripts appear broken with a particular font

Slide 13

There are always more interesting things that keep coming up. The takeaway from this is that, we haven’t yet found a way to escape manual tests when developing applications that are expected to handle multilingual content.

Slide 14

For now, what we have been trying to do is to make it less painful and more organised. Lets go over a checklist that we have been using as a  guideline.

  1. Standard Tests – These are the tests that the developers are doing all the time. Unit tests etc. Its part of the development plans.
  2. Identify must-check items – Once you are through with the standard tests, try to identify the issues and checks that are most important for some languages of a similar type or individual languages. For instance, in languages with complex scripts you may want to check some combinations that should never break.
  3. Note the new and recurring bugs – This list should by no means be rigid. If during tests there are problems that seem to recur or new bugs of major impact surface, add them into your test set of must-checks so that you are aware that these need to be tested again when you make the next release.
  4. Predictable regression tests – The idea is to keep the regression tests organised to some extent so that you don’t miss the really important things.
  5. Ad-hoc testing – However, by no means should the hunt for hidden bugs be stopped. Explore as far as you can. However, you may have to be a little careful because you might find a really ugly bug, and may not remember how to ended up there. So retracing your steps can be a challenge, but that shouldn’t be a major blocker. Once you find it, you can note it down.
  6. Track the results – For precisely this purpose we keep the tests that we regularly want to do in a test tracking system. We use TestLink, where you can organise the tests, the steps that a user can follow and the expected results. Success and failures can be noted and tests can be repeated very easily across releases.
  7. Seek expert help – However, the two most important things to keep in mind is to make sure that you speak to native speakers of the language and maybe to an expert, if you are already a native speaker. There may be situations where your understanding of a language will be challenged. For instance, ancient scripts may have to be tested for projects like WikiSource, and it may even be unfamiliar for regular users of the modern version of the script.
  8. Testing Environments – Secondly, make sure you have stable testing environments in place where people can come and test the applications on their own time

So that’s all we are currently doing to keep things organised. However, we would also like to explore options that can cut down this Herculean effort.

Contact Us

We had a blooper moment, when during the presentation we realised that the screenshot for Example 6 had been accidentally removed. We did not plan for it, but the audience got a glimpse of how manual tests can save the day on more serious occasions.

Of many things and one

Long time since this page saw some activity. *sigh*. This could have been a post of many things, like:

  • How stress induced fatigue (my dad’s words not mine) caused me to sleep for nearly 36 hours at a stretch
  • The updates to the Gnome Mango system done by Olav Vitters and account system documentation done by Christian Rose has made things so easy for us translators
  • The mad rush for KDE 4.1 Translations
  • The LC Python workshop conducted by Ramkrsna at our office in Pune. Rahul Sundaram followed up with a talk on contributing opportunities in Fedora
  • Our new car
  • The huge power and water shortage that happened in Pune and messed up our daily schedules
  • The much-delayed fun trip to Mumbai and about the time spent with Barkha and her family, the ride on the deccan queen, boat ride to elephanta, visit to mahesh lunch home, getting soaked in the rain at Juhu beach, riding back to Pune in an ambassador taxi amidst pouring rain
  • My views on why overt channel admins (the pronounced green medals, not the access lists) on irc channels in some open-source projects creates unwanted hierarchical levels.
  • Mozilla 3.0.2 translation sprint. Am waiting for a few bug responses at the moment, but hopefully that should not stop the inclusion of bn-IN this time.

    But then let me talk about something thats really much more important. The other day Ani showed me the search feature on the KDE Translation Project website, that allows searching of a term/string in translated content. The setup in this case gets the content from a selected directory of the svn, runs a query for the search string and presents the output results (string and its translated version) with direct link to the source documents. A database is also involved somewhere in between the process.

    So a few of us were talking about having a similar tool that would allow us to search strings on user-defined content locations and present the strings with the search items, corresponding translated content and pointers to the source document. And so evolved Translation-Filter, by Kushal. A nifty little tool, that does just what we need. Its still being worked upon, but at the moment what you can do with it is:

  • Define a custom location with multiple .po files
  • Provide a string to search in the defined location
  • Get an output with the original english string containing the search item,corresponding translated string and the source file name
  • Provide a list of strings to search via a plain text file
  • Save search results as .html pages
  • Use the tool from the command line and a basic GUI dialog box

    The project is a part of Fedora already and Kushal has packaged it.

    At this moment the benefits look huge. Primarily it will allow us to ensure consistency of bn-IN translated content across projects (the ones using .po files at the very least). Perhaps (as Sayam thinks) very soon we can make a web-based version of it too. So right now… kushal++ 😀

  • Bitter Mango

    Dear Lazyweb,

    I have been stuck with a problem for quite sometime now. The new account system for getting Gnome SVN accountsMango – behaves a bit oddly at times. The following are the problems that I have seen happening in the past:

    #1: There was no option to request an account as a Translator. It got fixed eventually.
    #2: A voucher is required for a Translator, when she/he requests for an svn account. The voucher is essentially the Translation Team co-ordinator and to ensure that happens, the person requesting the account has to select the Translation Team from a dropdown menu in the New Accounts page. Sounds easy? Well not exactly, if your team is not present in the dropdown list.
    #3: So up next, one has to file a bug against Mango to get the team listed.
    #4: The team is not listed on the dropdown menu if the Team’s coordinator does not have an account. Pretty valid. But then how does the co-ordinator request an account, if Mango is the way and the language team will not be listed until she/he actually has an account. Chicken-egg….gaah!
    #5: And even if the co-ordinator does have an account, things might not be all that rosy. ( Kannada, Gujarati, Bengali-India have valid co-ordinators with existing accounts. In case of Kannada, the account stopped working for Pramod one fine day. Marathi has a somewhat similar situation.)

    To be honest, I am completely frustrated. It is understandable, that with limited number of volunteer sysadmins things might run slow. However, the complaints need to be addressed some way or the other. Here in India, most of the language groups are close to each other and the problems come across too starkly.

    I don’t know if anyone from Gnome actually reads my mails to the mailing list (most linked above). I was really hoping that someone who could help out would read these words and do something about it. It would be much appreciated around here.