View Issue Details

IDProjectCategoryView StatusLast Update
0010768mantisbtlocalizationpublic2011-08-05 02:30
Reporterquasipedia Assigned Tograngeway  
PrioritynoneSeveritymajorReproducibilityN/A
Status closedResolutionfixed 
Product Version1.2.0rc1 
Summary0010768: [i18n] - Switching to gettext (POT/PO files)
Description

Hi and congratulations for the awesome product.

I would like to propose that the language folder with translation should follow the gettext convention with .pot files (the template strings) and .po files (translated strings).

This would allow users to autonomously complete or change translation on their website "on the fly" using one of the many gettext compliant editors.

Beside, I think it would be beneficial if BT could choose a more friendly platform for translators: translatewiki is nice, but the overall impression is of "unfriendliness": I went there with the idea of contributing back to the community, but the overall impression I got is that letting me enter new translation would have been a concession TW would have given to me, rather than a contribution I would have given to the community.

Just a thought... Mantis rocks! :)

TagsNo tags attached.

Relationships

related to 0010669 new Document localization process in Developers manual 

Activities

vboctor

vboctor

2009-07-26 23:57

manager   ~0022562

Changed category to Localization and assigned to Siebrand who is our localization lead. He also happens to run TranslateWiki so he would be the best person to address your feedback.

I generally prefer using standard approaches and gettext was on our horizon long time ago. However, translatewiki has worked really well for us. It would be fantastic if we can get the best of both worlds!

giallu

giallu

2009-07-27 03:10

reporter   ~0022565

AFAICT the strongness of translatewiki is the huge community behind it (and of course the great Siebrand's skills...), but I agree most open source products, especially those in the desktop field, use gettext for translations and translators are used to that system.

I think moving to gettext could also solve the problem of strings being scattered between core and plugins.

siebrand

siebrand

2009-07-27 06:54

reporter   ~0022566

Translatewiki.net supports gettext export, including translations hints, and you can submit the resulting updated translations by e-mail. Non-issue as far as I'm concerned. We can leave things as they are.

quasipedia

quasipedia

2009-07-27 07:31

reporter   ~0022567

Wow, I wish in the company I work for we could be as fast as you are in giving feedback to issues! :)

I'm happy to know that the present system is highly compatible with gettext standards. I still believe - though - that the present situation can be vastly improved with these two simple steps:

  1. Keeping the TW <--> POT/PO conversions behind the scenes (i.e.: the devels could use whatever system works best for them to get the translations done, but the distributed version of Mantis should have .po files in it, as per standard).

  2. Updating the explanation file in the language directory giving more extensive directions, for example how the "translation by e-mail feature" should work. Indeed, if the files where POT/PO, the instructions could be as easy as "If you update / correct / improve transalation files, you can send your updated PO file to translate@example.com".

vboctor

vboctor

2009-07-27 07:50

manager   ~0022568

What @quasipedia is requesting is similar to issue 0010669 which refers to adding more documentation relating to the translation process in the manual.

There is some discussion about gettext in 0004227.

grangeway

grangeway

2009-07-28 13:21

reporter   ~0022585

I've been looking at the mediawiki logic a bit recently (had a brief conversation with siebrand on it last week - planning to send a mail on this point at some point soon) - partly thinking to convert language strings to an array as a first step.

However notes:
a) RE giallu's comment:
"I think moving to gettext could also solve the problem of strings being scattered between core and plugins."

We would either need to support multiple .po files or build it from a file as part of install for the simple reason that:
If I write a plugin that's not part of core and define a string 'fred', how would the .po file we distribute get updated to contain fred.

Therefore the difference we are talking of here is whether we have multiple .txt/.php files or multiple .po files

b)

If we did use .po files, where gettext is a seperate optional php module, I'd personally want a fallback to the legacy files for systems that don't have the gettext php module installed.

Therefore the reason for switching to gettext from out point of view would purely be as a performance optimisation.

Therefore, I'd be more inclined to look at the recently mediawiki logic changes. Siebrand knows the guy on irc behind them and translate wiki is aligned to mediawiki... so I guess there might be some benefit (even if it's just for siebrand) to align ourselves along that route.

quasipedia

quasipedia

2009-07-28 16:02

reporter   ~0022586

Last edited: 2009-07-28 16:06

I am not sure I managed to follow grangeway completely. But if I understand correctly, the concerns of the case are two: 1) multiple .po files 2) legacy code.

I think both problems can be elegantly and effectively solved by adopting a solution "a la drupal" (http://drupal.org). For those of you who might not know drupal, here's a bit of explanation on what is the logic.

There is a core function t() in drupal that does the translation of a string. So you do not have to declare a string as a variable to get it translated. You simply wrap it with t() as in "return t('This bug has been closed');".

t() also accepts optional parameters for strings with variable content so that you can write "t('Bug number !number has been closed on date !date', array('!number' => $bnum, '!date' => $date);". This ensures not only the possibility to include the data in the translated string, but also the possibility to have it at the right place in the sentence.

t() is in the core, and what it does is to check if there is - in the directory of the module where t() has been called from - a "lang" directory if this is the case, t() checks if there is a .po file for the language selected by the user, and if this is the case, t() finally looks up for its translation.

If any of the conditions mentioned above evaluates to false, the function returns the original untranslated string.

As you can see, this solution solves at once both the concerns grangeway brought into the discussion: multiple files are organised logically and can coexist peacefully, and legacy code that does not use the t() is simply a different case that does not bother the core infrastructure and does not rise exceptions.

From a programmer's point of view, that means the end of all problems.

Given the standard syntax of t(), it is possible to use a script to automatically generate the .pot files for any given code (which is the case for drupal).

Cherry on the cake: drupal is free software (GPLv2) written in PHP, so if the core developers would like to adopt such a feature, they could easily port the relevant bits of code from drupal (which is - by the way - an extremely well designed and well written piece of software).

PS: Drupal pushes itself the extra mile and also comes with a caching mechanism for not having to look up the strings everytime t() is called, but the performance hit of t(), on a site like Mantis would however be minimal if compared to a rich CMS like drupal.

HTH,
Mac.

quasipedia

quasipedia

2009-07-28 16:17

reporter   ~0022587

It's a bit that I do not hack with drupal core, but giving it a second thought, I seem to remember t() actually "imports" all the strings from .po files in a DB table, consolidating all the strings for a given language in a table...

...not quite sure if I thought about it or if I really saw that in the code. Regardless, Mantis could definitively do so, if performance is a concern.

grangeway

grangeway

2009-07-28 16:52

reporter   ~0022589

You did kind of miss my point.

The shortened version is:

  1. You need multiple .po files (same as you need multiple .txt files atm) for each plugin. (Drupal appears to also do this - for example, see the translations folder in the views module - http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/views/translations/ )

  2. gettext support is an optional module in php - so I was making the point that we'd need to keep .txt files for people that dont have gettext available. However, having looked at drupal's code - it seems they don't use the php gettext support and instead manually parse the file ( http://cvs.drupal.org/viewvc.py/drupal/drupal/includes/locale.inc?view=markup )

In which case, I prefer mediawiki's previous approach. If we wanted performance, there new approach has potential (uses cdb style database - http://cr.yp.to/cdb.html ), however that's probably going a bit too far.

Either way, in any case, Mantis already localises strings (via lang_get), the format of the data file is largely irrelevant to most people - I'm quite confident anyone that has a gettext editor can probably find a text editor also...

Is there actually anything that .po would 'fix' that is broken atm?

Paul

quasipedia

quasipedia

2009-07-28 20:16

reporter   ~0022591

Hi!

  1. Now I get what you mean by "multiple .po's", yet I still miss what the problem in that approach is. Can you explain?

  2. Any pointer/link to "mediawiki's previous approach"? :)

As for the "broken atm" I believed I filed this under "feature" not "bug", if I did not, apologies, this meant to be a feature request, not a bug report.

On the gettext vs text editor, can't really take that seriously... but I acknowledge that you are the developers and I am not, so ultimately you can do whatever keeps you happier! :)

grangeway

grangeway

2009-07-29 15:25

reporter   ~0022602

The new stuff is spread between at least:

http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/includes/LocalisationCache.php
http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/includes/Cdb.php
http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/includes/Cdb_PHP.php
http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/languages/

dhx

dhx

2010-01-18 23:07

reporter   ~0024196

gettext is not thread safe and therefore we can't use it without forcing people to use PHP in FastCGI mode.

Thus we have to implement our own array-based translations (it's not that much slower than gettext from benchmarks I've seen).

One thing we do need to work out with a new format is how to handle plurals. See http://www.gnu.org/software/hello/manual/gettext/Plural-forms.html for a bit of a primer on how this works with GNU gettext.

I imagine our implementation may look something like this:

lang_get_plural( string, number ) {
switch( language) {
case EN:
plural = number != 1;
case FR:
plural = number > 1;
...
...
}
return $g_lang[string][plural];
}

Where $g_lang looks something like for the English language:

$g_lang = array(
'number_files' => array(
'1 file',
'%d files'
)
);

And for another language like Polish with more complex plural rules:

$g_lang = array(
'number_files' => array(
'1 plik',
'%d pliki',
'%d pliko\'w'
)
);

siebrand

siebrand

2010-11-01 07:33

reporter   ~0027207

Set to resolved, although this may be an incorrect resolution, as the issue summary was not what was resolved. Instead of PHP variables, MantisBT now uses PHP arrays with key-value pairs. See:

http://git.mantisbt.org/?p=mantisbt.git;a=blob;f=lang/strings_english.txt;h=ae5e2710e743acd8b1fd59c6ee8132b5856ed256;hb=ec53b72a0b4c201a90ec191ea3e837ef7414822b