Programming Multi-Language PHP Applications

Best practice multi language website

Topic's premise

There are three distinct aspects in a multilingual site:

  • interface translation
  • content
  • url routing

While they all interconnected in different ways, from CMS point of view they are managed using different UI elements and stored differently. You seem to be confident in your implementation and understanding of the first two. The question was about the latter aspect - "URL Translation? Should we do this or not? and in what way?"

What the URL can be made of?

A very important thing is, don't get fancy with IDN. Instead favor transliteration (also: transcription and romanization). While at first glance IDN seems viable option for international URLs, it actually does not work as advertised for two reasons:

  • some browsers will turn the non-ASCII chars like 'ч' or 'ž' into '%D1%87' and '%C5%BE'
  • if user has custom themes, the theme's font is very likely to not have symbols for those letters

I actually tried to IDN approach few years ago in a Yii based project (horrible framework, IMHO). I encountered both of the above mentioned problems before scraping that solution. Also, I suspect that it might be an attack vector.

Available options ... as I see them.

Basically you have two choices, that could be abstracted as:

  • http://site.tld/[:query]: where [:query] determines both language and content choice

  • http://site.tld/[:language]/[:query]: where [:language] part of URL defines the choice of language and [:query] is used only to identify the content

Query is Α and Ω ..

Lets say you pick http://site.tld/[:query].

In that case you have one primary source of language: the content of [:query] segment; and two additional sources:

  • value $_COOKIE['lang'] for that particular browser
  • list of languages in HTTP Accept-Language (1), (2) header

First, you need to match the query to one of defined routing patterns (if your pick is Laravel, then read here). On successful match of pattern you then need to find the language.

You would have to go through all the segments of the pattern. Find the potential translations for all of those segments and determine which language was used. The two additional sources (cookie and header) would be used to resolve routing conflicts, when (not "if") they arise.

Take for example: http://site.tld/blog/novinka.

That's transliteration of "блог, новинка", that in English means approximately "blog", "latest".

As you can already notice, in Russian "блог" will be transliterated as "blog". Which means that for the first part of [:query] you (in the best case scenario) will end up with ['en', 'ru'] list of possible languages. Then you take next segment - "novinka". That might have only one language on the list of possibilities: ['ru'].

When the list has one item, you have successfully found the language.

But if you end up with 2 (example: Russian and Ukrainian) or more possibilities .. or 0 possibilities, as a case might be. You will have to use cookie and/or header to find the correct option.

And if all else fails, you pick the site's default language.

Language as parameter

The alternative is to use URL, that can be defined as http://site.tld/[:language]/[:query]. In this case, when translating query, you do not need to guess the language, because at that point you already know which to use.

There is also a secondary source of language: the cookie value. But here there is no point in messing with Accept-Language header, because you are not dealing with unknown amount of possible languages in case of "cold start" (when user first time opens site with custom query).

Instead you have 3 simple, prioritized options:

  1. if [:language] segment is set, use it
  2. if $_COOKIE['lang'] is set, use it
  3. use default language

When you have the language, you simply attempt to translate the query, and if translation fails, use the "default value" for that particular segment (based on routing results).

Isn't here a third option?

Yes, technically you can combine both approaches, but that would complicate the process and only accommodate people who want to manually change URL of http://site.tld/en/news to http://site.tld/de/news and expect the news page to change to German.

But even this case could probable be mitigated using cookie value (which would contain information about previous choice of language), to implement with less magic and hope.

Which approach to use?

As you might already guessed, I would recommend http://site.tld/[:language]/[:query] as the more sensible option.

Also in real word situation you would have 3rd major part in URL: "title". As in name of the product in online shop or headline of article in news site.

Example: http://site.tld/en/news/article/121415/EU-as-global-reserve-currency

In this case '/news/article/121415' would be the query, and the 'EU-as-global-reserve-currency' is title. Purely for SEO purposes.

Can it be done in Laravel?

Kinda, but not by default.

I am not too familiar with it, but from what I have seen, Laravel uses simple pattern-based routing mechanism. To implement multilingual URLs you will probably have to extend core class(es), because multilingual routing need access to different forms of storage (database, cache and/or configuration files).

It's routed. What now?

As a result of all you would end up with two valuable pieces of information: current language and translated segments of query. These values then can be used to dispatch to the class(es) which will produce the result.

Basically, the following URL: http://site.tld/ru/blog/novinka (or the version without '/ru') gets turned into something like

$parameters = [
'language' => 'ru',
'classname' => 'blog',
'method' => 'latest',
];

Which you just use for dispatching:

$instance = new {$parameter['classname']};
$instance->{'get'.$parameters['method']}( $parameters );

.. or some variation of it, depending on the particular implementation.

Methods for multi-language web applications in PHP

Use the printf()/sprintf() family of functions.

printf, sprintf, vsprintf, (and so on...) functions are used for formatting strings and as such they enable you to utilize a pretty standardized set of placeholder logic.


What you want to have stored in your database is something like this:

%s has decapitated %s.

In PHP you can format this using, for example, the printf() function (which directly outputs the result) or the sprintf() function (which returns the result).

$translated = sprintf("%s has decapitated %s.", "red", "blue");
echo $translated;

red has decapitated blue.

If you need to specify the order of the arguments passed in, you can do it by specifying the position. Ie. in english $format = "%1$s has decapitated %2$s." and in some other language something like $format = "%2$s has been decapitated by %1$s.".

You can use this when you want to have different order of inserted words, but you want to keep the order same in your source code.

Both of the these $format strings will be correctly formatted via the same sprintf($format, "red", blue") call:

  • red has decapitated blue.
  • blue has been decapitated by red.

Possible formatting options are nicely presented here: http://php.net/manual/en/function.sprintf.php

With regards to creating multi-language site

There is a normalized database approach.

a) One table for keeping language information with columns (id, languageName)

b) In the ad table you would add a column for language id

Then when you would search for the language Mandarin you would query the ad table with either the "WHERE languageID = 2".

Now when you save the post you add the language id for the corresponding language.

Best practice for multi-lingual structure of a PHP web app

Everybody has different opinions about how best to go about setting up an internationally-friendly website. However, I try not to reinvent the wheel by making my own system. Rather, I use the built in internationalisation and localisation tools in frameworks such as CakePHP.

From the CakePHP book;

One of the best ways for your applications to reach a larger audience is to cater for multiple languages. This can often prove to be a daunting task, but the internationalization and localization features in CakePHP make it much easier.

First, it’s important to understand some terminology. Internationalization refers to the ability of an application to be localized. The term localization refers to the adaptation of an application to meet specific language (or culture) requirements (i.e., a "locale"). Internationalization and localization are often abbreviated as i18n and l10n respectively; 18 and 10 are the number of characters between the first and last character.

http://book.cakephp.org/1.3/view/1228/Internationalization-Localization

Using the built-in tools, for me, offers an efficient way to translate applications without URL rewrites. It also means that a user can configure their localisation preferences and have them automatically applied every time they log in.

Such a method will also be considered more search-engine friendly because you won't get multilingual duplicates of the same content.

Hope this helps out.

Best system for multiple language support in PHP?

Similar Questions:

  • what is the best method to build “multilingual” script in php?
  • Programming Multi-Language PHP applications

Here's a benchmark comparing gettext with string-array: Benchmarking PHP Localization - Is gettext fast enough?

How would you transform a pre-existing web app into a multilingual one?

There are a number of ways of tackling this. None of them "the best way" and all of them with problems in the short term or the long term.
The very first thing to say is that multi lingual sites are not easy, translators and lovely people but hard to work with and most programmers see the problem as a technical one only. There is also another dimension, outside the scope of this answer, as to whether you are translating or localising. This involves looking at the target audiences cultural mores and then tailoring language, style, layout, colour, typeface etc., to that culture. Finally do not use MT, Machine Translation, for anything serious or if it needs to be accurate and when acquiring translators ensure that they are translating from a foreign language into their native language which means that they understand all the nuances of the target language.

Right. Solutions. On the basis that you do not want to rewrite the site then simply clone the site you have and translate the copies to the target language. Assuming the code base is stable you can use a VCS to manage any code changes. You can tweak individual parts of the site to fit the target language, for example French text is on average 30% larger than the equivalent English text so using one site to deliver this means you may (will) have formatting problems and need to swap a different css file in and out depending on the language. It might seem a clunky way to do it but then how long are the sites going to exist? The management overhead of doing it this way may well be less than other options.

Second way without rebuilding. Replace all content in the current site with tags and then put the different language in file or db tables, sniff the users desired language (do you have registered users who can make a preference or do you want to get the browser language tag, or is it going to be URL dot-com dot-fr, dot-de that make the choice) and then replace the tags with the target language. Then you need to address the sizing issues and the image issues separately. This solution is in effect when frameworks like Symfony and Zend do to implement l10n.

Then you could rebuild with a framework or with gettext and possibly have a cleaner solution but remember frameworks were designed to solve other problems, not translation and the translation component has come into the framework as partial solution not the full one.

The big problem with all the solutions is ongoing maintenance. Because not only do you have a code base but also multiple language bases to maintain. Unless you all in one solution is really clever and effective then to ongoing task will be difficult.



Related Topics



Leave a reply



Submit