====== Upgrading to a Mantis UTF8 ======
**Author:** Alexey Chumakov
===== Introduction =====
This HOW TO describes the issues that are involved with upgrading from 1.0.x releases (or ones old than those) to 1.1.0x releases, since in the 1.1.0x releases Mantis was changed to use UTF8 language files and encoding for database saved and read from the database. This Wiki topic is based on an email sent by Alexey Chumakov to the Mantis mailing lists.
===== Details =====
AFAICR, the upgrade routine shown below will be sufficient.
Why it hadn't been implemented yet? Being experienced in localization, I am
not a php programmer (still I can read and understand php code), and thus we
need somebody to actually help with the implementation :)
Going farther (somewhat later), we could consider learning from phpBB team's
experience and utf-8 code (phpBB 3, now in beta, will have one of most clean
utf-8 libraries I know). If our strategy and licensing issues allow, I'd
suggest using their utf-8 library as the base of string processing.
Now, the upgrade routine:
1. Display a notice like this:
"Dear Mantis administrator,
Mantis 1.1.a2 and above now uses UTF-8 encoding by default.
If any non-Latin characters are used in your Mantis installation, you have
to perform the conversion to still have them readable. Latin-only databases
do not require conversion.
There are following options to convert:
A. If you hadn't previously set up any custom default encoding except
utf-8 or 8859-1 in your database server config files, AND your users do not
actively use multiple encodings within single Mantis installation, you
should simply run the conversion routine provided below.
B. (For the expert users) If your database was previously set to any
custom encoding, you should export Mantis database to the text file
manually, then convert this file to utf-8 using iconv (a), then re-create the
database in utf-8 or 8859-1, then import the converted file. Or,
alternatively, you could use convertion tools provided with you database (b).
Example for (a):
- After download and install iconv, then I created a new db:
CREATE DATABASE `bugtracker` DEFAULT CHARACTER SET utf8
- convert the file contents using iconv:
iconv -f latin1 -t utf-8 backup_latin1.sql > backup_utf8.sql
- and imported the contents of the converted file into the new database:
mysql -uUser -pPassword bugtracker < backup_utf8.sql
Example for (b):
To convert the database encoding using the mysql tools, just set the character set to UTF-8 when dumping the previous database using:
mysql -uUSER -pPASSWORD INSTANCE --default-character-set=utf8 > mantis-db.sql
And you will get a dump file encoded in UTF-8. After setting up the new database with default character set utf8 you can import this sql script to create the new database schema.
C. (If multiple encodings are uses in the Mantis, or you have other
reasons not to perform conversion) You could skip conversion and just
specify non-utf8 language as default in config_inc.php. In this case, you
couldn't use new Multilanguage features of Mantis, but your installation
will be running intact.
ANYWAY, BACKUP YOUR DATABASE FIRST!"
2. Display the following dialog:
Database strings conversion to UTF-8.
Warning: some very long strings in the database could be truncated on
conversion.
Current encoding: [dropdown list here]
[Encode to utf-8 button]
3. Convert the database:
* resize some fields subject to truncation (optional, see explanation below)
* convert all the strings in the database from source_encoding to utf-8
using iconv.
Ah, yes, we should have some utf-8 flag in the database not to convert
twice.
The following minor problems could arise (AFAIK, rather rare)
1. If the database physically uses non-utf8 encoding, some long strings
could be truncated (e.g. very long bug 'short descriptions'). This could be
really a problem only for Cyrillic-based languages (length effectively
doubles). Latin-based scripts and CJK languages are probably unaffected
(small or none byte length increase).
Ultimate solution: convert physical Mantis database into utf-8. Could be
planned at future Mantis releases, see phpBB's experience.
Medium solution: enlarge critical fields on convertion.
Minimal solution: just ignore.
2. Strings in database could be garbled on conversion if user specifies
incorrect source encoding.
Ultimate solution: add another button to revert conversion
Medium solution: fetch and display some sample to-be-converted data from the
database/
Minimal solution: warn to backup.
3. Export to Excel/Word could look garbled with utf-8 database
Need to be confirmed - my installation works fine. If the problem exists,
the existing utf-8 installations should experience it, too.
4. Hard-coded (in Mantis) characters could show incorrectly after utf-8
conversion
Need to be confirmed - my installation works fine.
5. Database uses custom encoding, characters are garbled after conversion
AFAIK, Mantis doesn't specify the database encoding on creation, so it is
likely set to DB default, which could probable be 8859-1 or utf-8 for MySQL
(I haven't actually tested different databases).
8859-1 MySQL encoding doesn't need to be changed at the moment, moreover,
it's recommended to run UTF-8 databases in MySQL <= 5 as 8859-1 (e.g.
MediaWiki is running this way).
If Mantis administrator hasn't changed the db default encoding, we
(probably) have nothing to do here.
In case he did the custom db setup, we have to warn him to export Mantis
database to text file manually, convert this file to utf-8 then re-create
database with utf-8 encoding.
This should be rather rare case - did anybody here really change the
system-wide DB encoding ever?
Ultimate solution: unknown
Medium solution: warn to convert manually
Minimal solution: ignore (as such an admin probably knows what to do)
===== Web resources =====
Since this procedure still needs to be refined, we wish to collect here links to other pages with informations and howtos.
Feel free to add your own or comment about the results if you use one of them to perform the upgrade:
* http://www.phpwact.org/php/i18n/charsets - Really useful document on the issue
* [[http://textsnippets.com/posts/show/84|Convert a db to UTF8 after upgrading to MySQL 4.1]] - 4 steps procedure
* http://dev.mysql.com/tech-resources/articles/4.1/unicode.html
* http://dev.mysql.com/doc/refman/4.1/en/charset-conversion.html
* [[http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html|Turning MySQL data in latin1 to utf-8]] - very detailed howto complete with troubleshooting steps
* [[http://www.w3.org/International/questions/qa-forms-utf-8.en.php|Perl regexp to test a string for UTF-8 encoding]]
* [[http://www.herongyang.com/PHP-Chinese/PHP-MySQL-Character-Set-on-Text-Columns.html]]
====== Enable CJK Text Entering ======
**additional notes by tomyjwu**
===== Introduction =====
This hack and experience is done in ubuntu 8.04 with
* mantis 1.1.2+dfsg-8~hardy1
* mysql 5.0.51a-3ubuntu5.4
The goal is to enable CJK text input on mantis bug descriptions and notes. Two issues must be resolved:
* Mantis database must store descriptions in utf8 by default
* Mantis communicates with mySQL using utf8 way, according to the book by herongyang
Please check the Web resources section for the book URL.
===== Enable utf8 communications =====
* Edit the config file /etc/mysql/my.cnf
* add to both server and client section the default encoding
[client]
default-character-set=utf8
[mysqld]
default-character-set=utf8
* check setting in mysql command line (mysql -u root -p)
* SHOW VARIABLES LIKE 'character_set_%';
===== Upgrade the mantis database =====
* Upgrade 3 levels according to the book by herongyang about mySQL and php.
* Manage mantis database by issuing following commands in mysql command line mysql -u root -p mantis
* database level
* altering commands alter database mantis character set utf8;
* verify by SHOW CREATE DATABASE mantis;
* table and column level on mantis_bug_text_table and mantis_bugnote_text_table
* altering commands
alter table mantis_bug_text_table default character set utf8;
alter table mantis_bug_text_table convert to character set utf8;
alter table mantis_bugnote_text_table default character set utf8;
alter table mantis_bugnote_text_table convert to character set utf8;
* verify by
SHOW CREATE TABLE mantis_bug_text_table;
SHOW CREATE TABLE mantis_bugnote_text_table;
===== Results =====
Now you could add a new bug entry in the mantis using web interface and preserve the entered CJK text in utf8 format. Resulted entries are displayed correctly in web browser. Tested with Firefox browser.