Learning yet another new skill

About 3 weeks ago when the autumn festival was in full swing, away from home, in Bangalore I made my way to a maker space nearby to spend a weekend learning something new. In addition to the thought of spending a lonely weekend doing something new, I was egged on by a wellness initiative at my workplace that encouraged us to find some space away from work. I signed up for a 2-day beginner’s carpentry workshop.

 

workfloor
When I was little, I often saw my Daddy working on small pieces of wood with improvised carving tools to make little figurines or cigarette holders. The cigarette holders were lovely but they were given away many years ago, when he (thankfully) stopped smoking. Some of the little figurines are still around the house, and a few larger pieces made out of driftwood remain in the family home. However, I do not recall him making anything like a chair or a shelf that could be used around the house. In India, it is the norm to get such items made, but by the friendly neighborhood carpenter. Same goes for many other things like fixing leaking taps, or broken electrical switches, or painting a room. There is always someone with the requisite skills nearby who can be hired. As a result, many of us lack basic skills in these matters as opposed to people elsewhere in the world.

 

I did not expect to become an expert carpenter overnight, and hence went with hope that my carpentry skills would improve from 0 to maybe 2, on a scale of 100. The class had 3 other people – a student, a man working in a startup, and a doctor. The instructor had been an employee at a major Indian technology services company, and now had his own carpentry business and these classes. He had an assistant. The space was quite large (the entire ground floor of the building) and had the electronics lab and woodwork section.

 

We started off with an introduction to several types of soft and hardwood, and plywoods. Some of them were available in the lab as they were going to be used during the class, or were stored in the workshop. Rarer wood like mahogany, and teak  were displayed using small wooden blocks. We were going to use rubber wood, and some plywood for our projects. Next, we were introduced to some of the tools – with and without motors. We learnt to use the circular saw, table saw, drop sawjigsaw, power drill and wood router. Being more petite than usual and unaccustomed to such tools, the 400-600w saws were quite terrifying for me at the beginning.

 

clock
The first thing I made was a wall clock shaped like the beloved deer – Bambi. On a 9”x 9” block of rubber wood, I first traced the shape. Then used a jigsaw to cut off the edges and make the shape. Then used the drill to make some holes and create the shapes for eyes and spots. The sander machine was eventually used to smoothen the edges. This clock is now proudly displayed on a wall at my Daddy’s home very much like my drawings from age 6.

 

shelfNext, we made a small shelf with dado joints that can be hung up on the wall. We started off with a block of rubber wood about 1’6’’ x 1’. The measurements for the various parts of this shelf was provided on a piece of paper and we had to cut the pieces using the table saw, set to the appropriate width and angle. The place where the shelves connected with the sides were chiseled out and smoothed with a wood router. The pieces were glued together and nailed. The plane and sander were used to round the edges.

 

The last project for the day was to prepare the base for a coffee table. The material was a block of  pinewood 2 inches thick and 2’ x  1’. We had to first cut these blocks from a bigger block, using the circular saw. Next, these were taken to the table saw to make 5 long strips of 2 inch width. 1 of these strips had about 1/2 inch from the edges narrowed down into square-ish pegs to fit into the legs of the table. The legs had some bits of the center hollowed out to be glued together into X shapes. These were left overnight to dry and next morning, with a hammer and chisel, the holes were made into which the pegs of the central bar could be connected. Finally, the drop saw was used to chop off the edges to make the table stand correctly. I was hoping to place a plywood on top of this base to use as a standing desk. However, it may need some more chopping to be made into the right height.

 

trayThe final project was an exercise for the participants to design and execute an item using a 2’ x 1’ piece of plywood. I chose to make a tray with straight edges using as much of the plywood I could. I used the table saw to cut the base and sides. The smaller sides were tapered down and handles shaped out with a drill and jigsaw. These were glued together and then nailed firmly in place.

 

By the end of the 2nd day, I felt I was more confident handling the terrifying, but surprisingly safe, pieces of machinery. Identifying different types of wood or making an informed decision when selecting wood may need more practise and learning. The biggest challenge that I think I will face if I had to do more of this, is of workspace. Like many other small families in urban India, I live in an apartment building high up the floors, with limited space. This means that setting up an isolated area for a carpentry workbench would not only take up space, but without an enclosure it will cause enough particle matter to float around a living area. For the near future, I expect to not acquire any motorized tools but get a few manual tools that can be used to make small items (like storage boxes) with relative ease and very little disruption.

Testing Multilingual Applications – Talk Summary from Wikimania 2014

Its been a while since I managed to write something important on this blog. My blogging efforts these days are mostly concentrated around the day job, which by the way has been extremely enriching. I have had the opportunity to widen my perspective of global communities working with the multilingual digital world. People from diverse cultures come together in the Wikimedia projects to collaborate on social issues like education, digital freedom, tolerance of expression & lifestyles, health, equal opportunities and more. What better place to see this happen than at Wikimania – the annual conference of the Wikimedia movement. The 10th edition of the conference was held this year in London, UK. I was fortunate to participate and also present, along with my team-mate Kartik Mistry. This was our first presentation at a Wikimania.

Since the past few years, I have tried to publish the talking points from my presentations. This was my first major presentation in a long time. Kartik and I presented about the challenges we face everyday when testing applications, that our team creates and maintains for the 300+ languages in Wikimedia projects. We have been working actively to make our testing processes better equipped to handle these challenges, and to blend them into our development work-flow. The slides and the talking points are presented below. I will add the link to the video when its available. Feedback is most welcome.

Talk Abstract

As opposed to traditional testing methodologies an important challenge for testing internationalized applications is to verify the preciseness of the content delivered using them. When we talk about applications developed for different scripts and languages, the key functionalities like display, and input of content may require several levels of verification before it can be signed off as being adequately capable of handling a particular language. A substantial part of this verification process includes visual verification of the text, which requires extensive collaboration between the language speakers and the developers. For content on Wikimedia this can stretch to more than 300 languages for websites that are active or waiting in the incubator. In this session, we would like to present about our current best practices and solutions like tofu-detection – a way to identify if scripts are not being displayed, that can narrow down the long-drawn manual testing stages to address specific problems as they are identified. Talk Submission.

Slides

Talk Summary

Slide 2

As we know, the mission of the Wikimedia Projects is to share the sum of all human knowledge. In the Wikimedia universe we have active projects in over 300 languages, while the multilingual resources have the capability to support more than 400 languages.

For using these languages we use extra tools and resources all the time (sometimes even without our knowledge). But these are not developed as widely as we would like them to be.

You may know them already…

Slide 3

Fonts, input methods, dictionaries, the different resources that are used for spell checking, grammar and everything else that is needed to address the special rules of a language. To make it work in the same way we can use English in most systems.

Slide 4

The applications that we develop to handle multilingual content are tested in the same way other applications are tested. The code sanity, the functionality and everything else that needs to be tested to validate the correctness of the design of the application is tested during the development process. (Kartik described this in some details.)

Slide 5

However, this is one part. The other part combines the language’s requirements to make sure that what gets delivered through the applications is what the language needs.

So the question we are trying to answer as developers is – my code works but does the content look good too?

Slide 6

At this point what becomes important is a visual verification of the content. Are the t-s being crossed and the i-s being dotted but in more complex ways.

Lets see some of the examples here to help explain better what we are trying to say:

  • Example 1 and Example 2 : Fonts entirely missing. Displays Tofu or blocks
  • Example 3: Partially available Text – makes it hard to understand what the User Interface wants you to do
  • Example 4: Input Methods on Visual Editor doesn’t capture the sequence of typed characters
  • Example 5: The otherwise functional braces lose their position when used with RTL text
  • Example 6: Dependent Vowels in complex scripts appear broken with a particular font

Slide 13

There are always more interesting things that keep coming up. The takeaway from this is that, we haven’t yet found a way to escape manual tests when developing applications that are expected to handle multilingual content.

Slide 14

For now, what we have been trying to do is to make it less painful and more organised. Lets go over a checklist that we have been using as a  guideline.

  1. Standard Tests – These are the tests that the developers are doing all the time. Unit tests etc. Its part of the development plans.
  2. Identify must-check items – Once you are through with the standard tests, try to identify the issues and checks that are most important for some languages of a similar type or individual languages. For instance, in languages with complex scripts you may want to check some combinations that should never break.
  3. Note the new and recurring bugs – This list should by no means be rigid. If during tests there are problems that seem to recur or new bugs of major impact surface, add them into your test set of must-checks so that you are aware that these need to be tested again when you make the next release.
  4. Predictable regression tests – The idea is to keep the regression tests organised to some extent so that you don’t miss the really important things.
  5. Ad-hoc testing – However, by no means should the hunt for hidden bugs be stopped. Explore as far as you can. However, you may have to be a little careful because you might find a really ugly bug, and may not remember how to ended up there. So retracing your steps can be a challenge, but that shouldn’t be a major blocker. Once you find it, you can note it down.
  6. Track the results – For precisely this purpose we keep the tests that we regularly want to do in a test tracking system. We use TestLink, where you can organise the tests, the steps that a user can follow and the expected results. Success and failures can be noted and tests can be repeated very easily across releases.
  7. Seek expert help – However, the two most important things to keep in mind is to make sure that you speak to native speakers of the language and maybe to an expert, if you are already a native speaker. There may be situations where your understanding of a language will be challenged. For instance, ancient scripts may have to be tested for projects like WikiSource, and it may even be unfamiliar for regular users of the modern version of the script.
  8. Testing Environments – Secondly, make sure you have stable testing environments in place where people can come and test the applications on their own time

So that’s all we are currently doing to keep things organised. However, we would also like to explore options that can cut down this Herculean effort.

Contact Us


We had a blooper moment, when during the presentation we realised that the screenshot for Example 6 had been accidentally removed. We did not plan for it, but the audience got a glimpse of how manual tests can save the day on more serious occasions.

Planet FLOSS India – 10th anniversary

Its been 10 years since Planet Floss India (PFI) was started by Sayamindu and Sankarshan.

Over the years, many people have been syndicated on this planet and we have thoroughly enjoyed reading about their journey from being individuals enthusiastic about a project to growing as successful professionals, entrepreneurs, and mature contributors with ideas that define much of what the Indian FLOSS-world stands for today.

We wanted to celebrate this occasion with what PFI best stands for – blog posts. We would be super happy if anytime during the next few weeks you could write a post on your syndicated blog about being part of PFI all these years and how you would like to see the project evolve.

Do let us know if your blog has moved and we shall make the changes to get you back on the planet. Thanks for your support and do keep writing.

(Writing on behalf of the PFI team)

PFI is open for syndication all-year-round. So let us know if you would like your blog to be added to the planet.

Hum bolta ko bolta bolta hain…

The findings of the recently conducted survey of the languages of India under the aegis of the People’s Linguistic Survey of India have been the talking point since the past few days. The survey results are yet to be published in its entirety, but parts of it has been released through the mainstream media. The numbers about the scheduled and non-scheduled languages, scripts, speakers are fascinating.

Besides the statistics from the census, this independent survey has identified languages which are spoken in remote corners of the country and by as less as 4 people. From some of the reports[1][2] that have been published, what one can gather is that there are ~780 languages and ~66 scripts presently in use in India. Of which the North Eastern states of India have the largest per capita density of languages and contribute with more than 100 (closer to ~200 if one sums things up) of those. It has also been known that in the last 50 years, ~250 languages have been lost, which I am assuming means that no more speakers of these languages remain.

This and some other things have led onto a few conversations around the elements of language diversity that creep into the everyday Indian life. Things that we assume for normal, yet are so diametrically varied from monolingual cultures. To demonstrate, we picked names of acquaintances/friends/co-workers and put 2 or more of them together to find what was a common language for each group. In quite a few cases we had to settle that English was the only language a group of randomly picked people could converse in. Well if one has been born in (mostly) urban India anytime onwards from the 1970s (or maybe even earlier), this wouldn’t be much of a surprise. The bigger cities have various degrees of cosmopolitan pockets. From a young age people are dragged through these either as part of their own social circle (like school) or their parents’. Depending upon the location and social circumstances English is often the first choice.

When at age 10 I had to change schools for the very first time, I came home open-mouthed and narrated to my mother that in the new school the children speak to each other in Bengali! Until that time, Bengali was the exotic language that was only spoken at home and was heard very infrequently on the telly on sunday afternoons. The conservative convent school where I went was a melting pot of cultures with students from local North East Indian tribes, Nepalis (both from India and Nepal), Tibetans, Chinese, Bhutanese and Indians from all possible regions where Government and Armed Forces personnel are recruited from. Even the kid next door who went to the same school spoke in English with me at school and in Bengali at the playground in the evening.

The alternative would be the pidgin that people have to practice out of necessity. Like me and the vegetable vendor in the sunday market. I don’t know her language fluent enough to speak (especially due to the variation in dialect), she probably hasn’t even heard of mine, and we both speak laughable hindi. What we use is part Hindi and part Marathi and a lot of hand movements to transact business. I do not know what I would do if I was living further south were Hindi is spoken much less. But it would be fun to try out how that works.

An insanely popular comic strip has been running since the past year – Guddu ang Gang, by Garbage Bin Studios. The stories are a throwback to our growing up years from the late 80s and 90s and touched so many chords on a personal level. The conversations are in Hindi, but the script they use is English. Like so many other thousands of people I have been following it and even purchased the book that came out. But maybe it wouldn’t have been the same amount of fun if the script was in Devanagari. I don’t read it fast enough. And no, in this case translating the text won’t make any sense. There is Chacha Chaudhary for that. Or even Tintin comics. Thanks to Anandamela, most people my age have grown up reading Tintin and Aranyadeb (The Phantom) comics in Bengali. There also exist juicy versions of Captain Haddock’s abuses.

Last year I gave a talk at Akademy touching on some of these aspects of living in a multi-cultural environment. TL&DR version: the necessities that requires people to embrace so many languages – either for sheer existence or for the fringes, and how we can build optimized software and technical content. For me, its still an area of curiosity and learning. Especially the balance between practical needs and cultural preservation.

** Note about the title: bolta – Hindi:’saying’, Bengali:’wasp’. Go figure!

Building a Standardized Colour Reference Set

Among the various conversations that happened over the last week, one that caught my attention was about having a standardized list of colour for translation glossaries. This has been on my mind for a long time and I have often broached the subject in various talks. Besides the obvious primary hurdle of figuring out how best to create this list, what I consider of more importance is the way such a reference set ought to be presented. For word based terminology, a standard mapping like:

key terms -> translated key terms (with context information, strongly recommended)

is easy to adopt.

However, for colours this becomes difficult to execute, for one very important reason. Colours have names which have been made to sound interesting with cultural or local references, nature (again maybe local or widespread) popular themes or general creativity. This makes it hard to translate. To translate colour names like ‘Salmon’ or ‘Bordeaux’ or <colour-of-sea-caused-by-local-mineral-in-the-water-no-one-outside-an-island-in-the-pacific-has-heard-of> one has to be able to understand what they refer to, which may be hard if one has never come across the fish or the wine or the water. To work around that, I have been using a 2-step method for a while which is probably how everyone does anyways (but never really talks about):

colour-name -> check actual colour -> create new name (unless of course its some basic colours like Red)

So, a natural progression in the direction of standardizing this would involve having the actual colour squeezed in somewhere as the context. Something on the lines of:

Colour Name -> Sample of the of colour -> Colour Translation

like:

Salmon         salmon          ইঁট

It would be good to have something like this set up for not just translation, but general reference.

Translation Sprint for Gaia

Last saturday i.e. 29th December 2012 we had a translation sprint for Ankur India with specific focus on Gaia localization. The last few weeks saw some volunteers introducing themselves to participate in translation and localization. The Firefox OS seemed like a popular project with them that was also easy to translate. However, the reigning confusion with the tool of choice, was not easy to workaround. The new translators were given links to the files they could translate, and send over to the mailing list/mentor for review. Going back and forth in the review process was taking time and we quickly decided on the mailing list, a date to have a translation sprint. We used the IRC channel #ankur.org.in and gathered there from 11 in the morning to 4 in the evening. The initial hour was spent to set up the repository and to decide how we were going to manage the tasks between ourselves. Two of us had commit rights on the Mozilla mercurial repository. Of the 5 translators, two participants were very new to translation work, so it was essential to help them with constant reviews. By the end of the second hour, we were string crunching fast and hard, translators were announcing which modules they were picking (after some initial overlooking of this, prolly due to all the excitement) and then pushing them into the mercurial repository. We shut shop at the closing time, but had a clear process in place which allowed people to continue their work and continue the communication over email. All it needed was an IRC channel and a fundamental understanding of the content translation and delivery cycle.

SUMMARY

Participants:

  • Biraj Karmakar (biraj),
  • Priyanka Nag (priyanka_nag),
  • Runa Bhattacharjee (arrbee/runa_b)
  • Samrat Bhattacharya (samratb),
  • Sayak Sarkar (sayak)

Translation Statistics:

  • At Mozilla Dashboard – 39% translated. (Does not include the files still being reviewed)

What worked:

  • Communication was live
  • Faster turnaround of translation -> reviews -> revision
  • Queries were resolved faster
  • Commits were immediately made into the repository
  • Workflow was established to ensure the committers were being notified of files ready to go into the repository
  • No overlapping of translation

What could have been nicer:

  • A simpler tool to track the translation, through *one* interface. (Discussed many times earlier, and comments can be directed to the earlier post)
  • Pre-decided work assignments to start things off (this was rather hastily put up)
  • More time

Follow up:

  • There is still more to do and the translation has to continue. Not just for Gaia, but for other projects as well.
  • A review session for all the translated content. Besides catching errors and omissions of various nature, this can be of particular benefit to the new translators who can gauge the onscreen context of the content that they had to blindly translate

Mozilla L10n – Discussion Notes

During the recently held Language Summit at Pune, we got an opportunity to discuss about a long standing issue related the localization process. Several discussions over various media have been constantly happening since the past couple of years and yet a clarity on the dynamics were sorely missed. A few months back a generic bug was also filed, which helped collate the points of these dispersed discussion.

Last week, we had Arky from Mozilla with us who helped us get an insight on how things currently stand in the Mozilla Localization front. Old hats like me who have been working on the localization of Mozilla products since a long time (for instance, I had started sometime around Firefox 1.x), had been initiated and trained to use the elaborate method of translation submissions using file comparision in the version control system. During each Firefox release, besides the core component there are also ancilliary components like web-parts that need to be translated on other Version control systems or through bugs. Thankfully there is now the shipping dashboard that lists some of these bits at one url.

However, recently there have been quite a few announcements from various quarters about Mozilla products being made available for translation through several hosts/tools – verbatim, narro, locamotion, even on transifex. Translators could gather files and submit translations via these tools, yet none of them deprecated the earlier method of direct submission into the servers through the Version Control Systems. The matter was much compounded with also a spate of translations coming in from new translators who were being familiarized with translation work at various local camps as part of Mozilla’s community outreach programs.

During the above mentioned session, we sought to find some clarity on this matter and also to understand the future plans that are being undertaken to reconcile the situation. Firstly, we created a list of all the tools and translation processes that are presently active.

 

1. Direct Submission into Mercurial or SVN

2. https://localize.mozilla.org/ – Aka ‘Verbatim‘ is essentially a version of pootle running on a server hosted by Mozilla. Used to translate web-parts, snippets, SUMO content etc.

3. mozilla.locamotion.org – Hosted by the http://translate.sourceforge.net group, and runs an advanced version of pootle. Used to translate Firefox, main.lang etc.

4. Narro – The Narro tool that allows translations of Firefox, components of Thunderbird, Gaia etc

5. Pontoon Project – To localize web content. More details from the developers here.

6. Transifex – Primarily gaia project

7. Babelzilla – Mozilla Plug-ins

There could be more beyond the above.

The reason that was given for the existence of all these tools is to allow translators to choose a tool that they were ‘comfortable with‘. This however gives rise to quite a few complications involving syncing between these tools which evidently provide duplicate platforms for some of the projects and also about maintaining a trace of translations by the translation coordinators. Especially when the direct submission into VCS is still pretty much an open option for translators (coordinators) who may have not be aware of a parallel translation group working on the same project on another translation platform.

A new project called ELMO is aimed at rectifying this situation. This would host the top-level URI of the Mozilla Localization project, with direct links to each Language’s home page. The home page intends to list the Translation team details and urls for the projects. However, there is one big difference that seemed apparent: Unlike other Translation Projects which provide one umbrella translation team, each of the Mozilla products can have different Translation Teams and Coordinators, independent of each other. It may be a scaleable solution for manpower management, but leaves a big chance of product continuity going off-sync in terminology and translation. However, it may be a good idea to wait and see how Elmo fixes the situation.

Meanwhile there were a few action items that were fixed during the discussion (Thanks Arky!), these were:

1. A page on the mozilla wiki listing *all* the translation tools/hosts that are active and the projects that they host

2. Follow up on the discussion bug for “Process Modification”

3. A request to have automatic merging of strings modified in source content into the l10n modules in Mercurial (i.e. the strings identified via compare-locale). For instance, the comparision between the bn-IN and en-US module for the Aurora branch can be found here (cron output).

4. Explore the possibility to identify a consolidated Project calendar for all the Mozilla l10n projects. (Reference comment here)

As Arky mentioned during the discussion, there were plans that were already underway to implemention and I am quite excited to wait and see how things go. Some blogs or updates from the Mozilla L10n administration team would be really helpful and I hope those come in quick and fast.

Attendees:

Arky
Amir Aharoni
Sayak Sarkar
Ankit Gadgil
Ani Peter
Sweta Kothari
Jaswinder Singh
Rajesh Ranjan
Shankar Prasad
Nilamdyuti Goswami
Shantha Kumar
Manoj Giri
Krishnababu Krothrapalli
…(Please leave a comment if I missed your name)