Internationalization & Localization Tips – Concatenation
Following my previous article, where I talked about Plurals, this week I’ll bring you Concatenation.
Thou shalt not concatenate!
Concatenation is not your friend... believe me…
I’ll split this subject in two parts:
Concatenating to create (or modify) a word
Many languages use the letter “S” at the end of a noun to make it plural. This is the case especially with regular nouns. Using some English examples, we could have one cat or many cats, one apple or many apples, and I could definitely waste a significant space of this article adding more examples.
This is sometimes misused, using concatenation of a letter “S” as the only mean to turn a word into its plural form.
Well… it actually works in some cases… if you are not expecting to translate, and you always know what word you will be pluralizing, but hey, this is about internationalization ;) so don’t expect an English regular noun to be regular in other languages, and also, don’t even expect an "S" to be the pluralization element in some other languages.
So, to make this crystal clear, if you are willing to display a noun in singular and plural forms, make sure you create separate strings for each version so that it works fine with other languages. (pst! And if you are thinking about plurals, you should also read my previous article, if you haven’t yet)
Concatenating to create a sentence (or string)
Let me introduce you to the concept of sentence structure:
SVO (Subject-Verb-Object) and SOV (Subject-Object-Verb) are the most common sentence structures but there are even more (you can read more about it here).
Now think of a string that is built dynamically, like the ones you read in your Facebook or LinkedIn feeds, indicating that someone liked your post:
John liked your article Internationalization & Localization Tips – Plurals
This, in code, might eventually look like this:
{user} + {action} + “your” + {object} + {subject}
{user} is the name of the user doing the action (this is a correct usage of the variable).
{action} in our example is “liked” but it could be “commented” or another action, depending on what the system allows you to do.
{object} is “article” in my example and it could be “document”, “post” and a lot more, depending, again, of the system where it could show up.
{subject} is the title of the content, and it’s fine that we use a variable to display it.
Let’s put our translator hat and we’ll have a list of single items to translate (thinking of the examples above):
Your, liked, commented, article, document, post.
“Your”, in Spanish, for example, could be translated as “su” or “suyo”, depending on context.
“Liked” and “Commented” could either be participle or a past tense, imagine it won’t translate the same way in all languages.
“Article”, “Document” and “Post” could either be verb or noun. Same issue, it can translate in different ways depending on the target language.
Yes, we could be lucky, and match the right options for all those… but there’s Murphy’s Law too.
When we create a sentence programmatically, we move the linguistic structure into the code, potentially turning translations into machine-translation-like (and sometimes even worse). Translations are not a word-by-word replacement!
Adding up, when we concatenate, the translator has no way to know whether we want to say "Click here to turn" or "Click here to turn OFF" (note I pretended concatenating ON/OFF as possible results at the end of the string, but the first sentence alone without knowing what comes after and that something actually comes after, has a totally different meaning).
Sometimes partial strings (that we use to generate a complete string through concatenation) have different meanings when taken as a whole than their meaning as part of a bigger sentence, not to tell, sometimes they are a total nonsense.
Concatenation (with some logic) helps us keep a clean code (programming best practices) but that simply doesn't work with I18N. Think of activity streams where we mostly say:
{actor} + {action} + [{objectType}] + {objectSubject} + [“in” {place}] + [“on” {date}]
(square brackets [] indicate optional parts of the sentence)
That is a string with a bit of logic behind to determine who did it? what did they do? where and when was it done? and we have a few sub-strings with the available actions (created, modified, commented, replied, liked, marked, voted, joined, etc.) and object types (document, discussion, question, blog post, comment, poll, etc.).
In a perfect I18N world you'll have separate strings with a combinatory of all possibilities where we use variables only for adding elements we can't know beforehand (actor, place and date).
Activity streams are particularly tricky for this, and the only way to have natural language with decent I18N is to explode the combinatory, which will definitely increase translation cost and that developers will surely not like much… Just if we think of the actions and object types provided as examples above, along with the string (forget about the conditional parts) we need 48 strings to cover all possible options.
Sometimes it’s best to modify the UI so we can show the info but avoid the natural language so we have more freedom of using substrings on their own, yet, still challenging.
This article continues a series I started writing with the goal of bringing different Internationalization and Localization topics to general public. Stay tuned for the next one in a few days.
Please comment if you liked the article or have any questions, and share with your Dev friends to save your I18N guy some headaches ;)
Thanks for reading!
Very insightful article Daniel, I like that you establish that "Translations are not a word-by-word replacement!". People, and in particular developers for i18n purposes, need to be reminded of that because of course that's what makes it hard for all of us, but that's also what makes it challenging, interesting, and ultimately satisfactory.
Thanks for this article. I believe"your" represents a much bigger problem that you gave it credit for, due to plural AND gender needs. I find the use of possessive determiners and modifiers in English strings one of the main I18N issues. It might deserve a separate article :)