Table of Contents
Upgrading to a Mantis UTF8
Author: Alexey Chumakov
Introduction
This HOW TO describes the issues that are involved with upgrading from 1.0.x releases (or ones old than those) to 1.1.0x releases, since in the 1.1.0x releases Mantis was changed to use UTF8 language files and encoding for database saved and read from the database. This Wiki topic is based on an email sent by Alexey Chumakov to the Mantis mailing lists.
Details
AFAICR, the upgrade routine shown below will be sufficient.
Why it hadn't been implemented yet? Being experienced in localization, I am not a php programmer (still I can read and understand php code), and thus we need somebody to actually help with the implementation :)
Going farther (somewhat later), we could consider learning from phpBB team's experience and utf-8 code (phpBB 3, now in beta, will have one of most clean utf-8 libraries I know). If our strategy and licensing issues allow, I'd suggest using their utf-8 library as the base of string processing.
Now, the upgrade routine:
1. Display a notice like this:
“Dear Mantis administrator,
Mantis 1.1.a2 and above now uses UTF-8 encoding by default. If any non-Latin characters are used in your Mantis installation, you have to perform the conversion to still have them readable. Latin-only databases do not require conversion.
There are following options to convert:
A. If you hadn't previously set up any custom default encoding except utf-8 or 8859-1 in your database server config files, AND your users do not actively use multiple encodings within single Mantis installation, you should simply run the conversion routine provided below.
B. (For the expert users) If your database was previously set to any custom encoding, you should export Mantis database to the text file manually, then convert this file to utf-8 using iconv (a), then re-create the database in utf-8 or 8859-1, then import the converted file. Or, alternatively, you could use convertion tools provided with you database (b).
Example for (a): - After download and install iconv, then I created a new db: CREATE DATABASE `bugtracker` DEFAULT CHARACTER SET utf8 - convert the file contents using iconv: iconv -f latin1 -t utf-8 backup_latin1.sql > backup_utf8.sql - and imported the contents of the converted file into the new database: mysql -uUser -pPassword bugtracker < backup_utf8.sql
Example for (b): To convert the database encoding using the mysql tools, just set the character set to UTF-8 when dumping the previous database using: mysql -uUSER -pPASSWORD INSTANCE –default-character-set=utf8 > mantis-db.sql And you will get a dump file encoded in UTF-8. After setting up the new database with default character set utf8 you can import this sql script to create the new database schema.
C. (If multiple encodings are uses in the Mantis, or you have other reasons not to perform conversion) You could skip conversion and just specify non-utf8 language as default in config_inc.php. In this case, you couldn't use new Multilanguage features of Mantis, but your installation will be running intact.
ANYWAY, BACKUP YOUR DATABASE FIRST!”
2. Display the following dialog:
Database strings conversion to UTF-8. Warning: some very long strings in the database could be truncated on conversion.
Current encoding: [dropdown list here]
[Encode to utf-8 button]
3. Convert the database: * resize some fields subject to truncation (optional, see explanation below) * convert all the strings in the database from source_encoding to utf-8 using iconv.
Ah, yes, we should have some utf-8 flag in the database not to convert twice.
The following minor problems could arise (AFAIK, rather rare)
1. If the database physically uses non-utf8 encoding, some long strings could be truncated (e.g. very long bug 'short descriptions'). This could be really a problem only for Cyrillic-based languages (length effectively doubles). Latin-based scripts and CJK languages are probably unaffected (small or none byte length increase). Ultimate solution: convert physical Mantis database into utf-8. Could be planned at future Mantis releases, see phpBB's experience. Medium solution: enlarge critical fields on convertion. Minimal solution: just ignore.
2. Strings in database could be garbled on conversion if user specifies incorrect source encoding.
Ultimate solution: add another button to revert conversion Medium solution: fetch and display some sample to-be-converted data from the database/ Minimal solution: warn to backup.
3. Export to Excel/Word could look garbled with utf-8 database
Need to be confirmed - my installation works fine. If the problem exists, the existing utf-8 installations should experience it, too.
4. Hard-coded (in Mantis) characters could show incorrectly after utf-8 conversion
Need to be confirmed - my installation works fine.
5. Database uses custom encoding, characters are garbled after conversion
AFAIK, Mantis doesn't specify the database encoding on creation, so it is likely set to DB default, which could probable be 8859-1 or utf-8 for MySQL (I haven't actually tested different databases).
8859-1 MySQL encoding doesn't need to be changed at the moment, moreover, it's recommended to run UTF-8 databases in MySQL ⇐ 5 as 8859-1 (e.g. MediaWiki is running this way).
If Mantis administrator hasn't changed the db default encoding, we (probably) have nothing to do here.
In case he did the custom db setup, we have to warn him to export Mantis database to text file manually, convert this file to utf-8 then re-create database with utf-8 encoding. This should be rather rare case - did anybody here really change the system-wide DB encoding ever?
Ultimate solution: unknown Medium solution: warn to convert manually Minimal solution: ignore (as such an admin probably knows what to do)
Web resources
Since this procedure still needs to be refined, we wish to collect here links to other pages with informations and howtos. Feel free to add your own or comment about the results if you use one of them to perform the upgrade:
- http://www.phpwact.org/php/i18n/charsets - Really useful document on the issue
- Convert a db to UTF8 after upgrading to MySQL 4.1 - 4 steps procedure
- Turning MySQL data in latin1 to utf-8 - very detailed howto complete with troubleshooting steps
Enable CJK Text Entering
additional notes by tomyjwu
Introduction
This hack and experience is done in ubuntu 8.04 with
- mantis 1.1.2+dfsg-8~hardy1
- mysql 5.0.51a-3ubuntu5.4
The goal is to enable CJK text input on mantis bug descriptions and notes. Two issues must be resolved:
- Mantis database must store descriptions in utf8 by default
- Mantis communicates with mySQL using utf8 way, according to the book by herongyang
Please check the Web resources section for the book URL.
Enable utf8 communications
- Edit the config file /etc/mysql/my.cnf
- add to both server and client section the default encoding
[client] default-character-set=utf8 [mysqld] default-character-set=utf8
- check setting in mysql command line (mysql -u root -p)
SHOW VARIABLES LIKE 'character_set_%';
Upgrade the mantis database
- Upgrade 3 levels according to the book by herongyang about mySQL and php.
- Manage mantis database by issuing following commands in mysql command line
mysql -u root -p mantis
- database level
- altering commands
alter database mantis character set utf8;
- verify by
SHOW CREATE DATABASE mantis;
- table and column level on mantis_bug_text_table and mantis_bugnote_text_table
- altering commands
alter table mantis_bug_text_table default character set utf8; alter table mantis_bug_text_table convert to character set utf8; alter table mantis_bugnote_text_table default character set utf8; alter table mantis_bugnote_text_table convert to character set utf8;
- verify by
SHOW CREATE TABLE mantis_bug_text_table; SHOW CREATE TABLE mantis_bugnote_text_table;
Results
Now you could add a new bug entry in the mantis using web interface and preserve the entered CJK text in utf8 format. Resulted entries are displayed correctly in web browser. Tested with Firefox browser.