View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0020431||mantisbt||db schema||public||2015-12-28 05:22||2017-10-16 08:35|
|Target Version||2.8.0||Fixed in Version|
|Summary||0020431: Use utf8mb4 charset for new MySQL installations|
We currently create the database with 'utf8' charset and 'general_ci' collation.
In MySQL, utf8 charset uses up to 3 bytes, which means some characters can't be stored properly. Since MySQL 5.5.3 the 'utf8mb4' charset is available, and does not have this limitation.
We should use utf8mb4 for new installation.
In addition, we default to 'general' collation which sometimes yield incorrect sort order for special (multibyte) characters. See MySQL documentation  for examples and more explanation.
|Tags||No tags attached.|
|related to||0008017||closed||dregad||Increase the size of the username field|
|parent of||0020465||closed||dregad||Reduce size of username and email fields to allow utf8mb4 charset|
|related to||0021101||closed||dregad||Issues with emoji's are truncated before getting saved|
|related to||0021841||closed||dregad||Minimum requirements for 2.x releases|
Switching to utf8mb4 charset may not be as simple as I thought initially...
MySQL limits the maximum index key size depending on the engine used:
Considering 4 bytes per characters, the maximum length of a character field used as index key is:
Schema update step 196 updated the user table's username field to 255 chars in length (see 0008017). Converting the table to utf8mb4 or trying to create it from scratch with that charset triggers the following error
1071: Specified key was too long; max key length is 767 bytes
We have the following options:
Option 3 is probably acceptable, since it seems highly unlikely that we would have email addresses longer than 191 chars to begin with, and even less to have 2 of them differ only on or after the 192nd char. Nevertheless there is still the technical possiblity that this situation will occur.
I would recommend number 2 as the safest option, and assuming we don't really need 255 chars for the username. I don't think we do; the original requirement from 0008017 was to allow storing email addresses as user identifier; 255 was set arbitrarily.
Reminder sent to: atrol, vboctor
Your opinion on 0020431:0052209 would be appreciated.
Looking at the code under admin/check/ it seems that our current minimum requirement for MySQL is 5.0.8 and hence if we are planning to use utf8mb4, then we need to up such requirement to 5.5.3. Not sure how common is 5.5.3. Though if we end up waiting, then we could make the move directly to 5.7.7 where it wouldn't be necessary to reduce the field size.
I don't have a problem with reducing the max size for username. I assume based on the above, this is the only offending field.
To wait for 5.7.7 is no option.
Ubuntu LTS versions that come with 5 years of security updates 
Red Hat Enterprise Linux is quite another story 
So requiring 5.5.3 would rule out Red Hat.
It seems we should stay with utf8 at the moment.
Thanks to both of you for your comments.
Actually my intention was not to enforce utf8mb4 across the board, but rather to detect the mysql version at install time, and define the charset accordingly (utf8 if < 5.5.3, utf8mb4 otherwise), allowing instances running recent software to benefit from better unicode support..
We can keep MySQL 5.0.8 as minimal support, even though that's effectively end-of-life since 2013.
With regards to MariaDB, at least until version 5.5 it's supposed to be a "drop-in replacement" for MySQL, so (in theory) RHEL 7.2 should be just fine and use utf8mb4.
Not sure how things will evolve with MariaDB 10.x / MySQL 5.7 though, but that's another topic.
Pull request https://github.com/mantisbt/mantisbt/pull/699
@dregad How about the option is using a prefix index and updating the API(s) that looks up by username to potentially handle more than one match? That is assuming the DBMS won't filter these independent of the index anyway. Which I think it should.
The problem is not with filtering, it is about ensuring the key's uniqueness, which can't be guaranteed with a prefix index.
Updating the API would not resolve this issue, and sounds like overengineering for an issue that is anyway quite unlikely to occur (have you ever heard of a 191-char long e-mail address ?)
Interesting note on Drupal's approach to handle index size limitation on utf8mb4 fields https://www.drupal.org/node/1314214
In order to (greatly) simplify the implementation of utf8mb4 charset support, including upgrade steps and future maintenance, I would like to propose that we increase the minimum version requirement for MySQL to 5.5.3 in 2.0.x.
5.5.3 was released in March 2010. It was actually a milestone release, the General availability is 5.5.8 (released 2010-12-03), but I don't see the need to require a higher version than we actually need.
As pointed out by atrol, this will prevent some distros from running our software, but I think that's an acceptable trade-off.
For the record, Drupal followed a similar approach, as documented in the link I referenced in my previous post.
I would prefer this instead of implementing workarounds to fix 0021101.
I agree, but I also believe that we do need a solution for 1.x as well, since depending on MySQL settings, use of any 4-byte char will either result in
The workaround I propose in 0021101 / PR https://github.com/mantisbt/mantisbt/pull/797 is fairly simple, and prevents an error which is more and more likely to occur as people using Mantis on smartphones are used to inserting emojis.
Of course will need to be reverted once utf8mb4 support has been implemented.
This blog post suggests a different and interesting workaround compared to the pull requests referred to in the previous post: https://roartindon.blogspot.de/2015/04/hacking-utf16-to-work-around-mysqls.html
Changing target version since we'll have MySQL 5.5.3 as minimum requirement there.
There is some generic info about UTF8mb4 here: https://www.everipedia.com/UTF8/
@travm1 what is your point ?
MantisBT: master-1.3.x 805ef0cb
2016-06-18 16:42:22Details Diff
|New database API function db_mysql_fix_utf8()
This new function replaces 4-byte UTF-8 chars by Unicode U+FFFD
character for MySQL databases.
This is a temporary workaround to avoid data getting truncated on MySQL
databases using native utf8 encoding which only supports 3 bytes chars,
until we're able to support utf8mb4 charset (see issue 0020431).
|mod - core/database_api.php||Diff File|
|2015-12-28 05:22||dregad||New Issue|
|2015-12-30 18:21||dregad||Note Added: 0052209|
|2015-12-30 18:21||dregad||Relationship added||related to 0008017|
|2015-12-30 18:23||dregad||Note Added: 0052210|
|2015-12-31 05:25||vboctor||Note Added: 0052211|
|2015-12-31 12:25||atrol||Note Added: 0052214|
|2015-12-31 18:21||dregad||Note Added: 0052215|
|2015-12-31 18:27||dregad||Relationship added||parent of 0020465|
|2015-12-31 18:28||dregad||Assigned To||=> dregad|
|2015-12-31 18:28||dregad||Status||new => assigned|
|2015-12-31 18:28||dregad||Category||installation => db schema|
|2015-12-31 18:28||dregad||Target Version||=> 1.3.0-rc.2|
|2016-01-01 19:25||dregad||Note Added: 0052230|
|2016-01-03 20:43||vboctor||Note Added: 0052239|
|2016-01-03 20:46||vboctor||Note Edited: 0052239||View Revisions|
|2016-01-04 10:39||dregad||Note Added: 0052240|
|2016-06-12 02:37||atrol||Target Version||1.3.0-rc.2 => 1.3.0|
|2016-06-13 06:10||dregad||Relationship added||related to 0021101|
|2016-06-13 06:17||dregad||Note Added: 0053358|
|2016-06-18 16:32||dregad||Note Added: 0053413|
|2016-06-18 17:12||atrol||Note Added: 0053416|
|2016-06-18 17:35||dregad||Note Added: 0053418|
|2016-07-03 05:23||dregad||Changeset attached||=> MantisBT master-1.3.x 805ef0cb|
|2016-07-10 07:57||atroladmin||Target Version||1.3.0 => 1.3.1|
|2016-08-28 10:37||atrol||Target Version||1.3.1 => 1.3.2|
|2016-10-02 19:36||atrol||Target Version||1.3.2 => 1.3.3|
|2016-10-30 23:23||vboctor||Target Version||1.3.3 => 1.3.4|
|2016-11-02 06:09||dregad||Relationship added||related to 0021841|
|2016-11-02 10:37||j_schultz||Note Added: 0054374|
|2016-11-24 09:01||dregad||Target Version||1.3.4 => 2.0.0-rc.2|
|2016-11-24 09:01||dregad||Note Added: 0054574|
|2016-11-24 09:47||atrol||Note Added: 0054577|
|2016-11-27 08:20||dregad||Target Version||2.0.0-rc.2 => 2.0.0|
|2016-12-30 15:56||vboctor||Target Version||2.0.0 => 2.0.1|
|2017-02-01 22:49||vboctor||Target Version||2.0.1 => 2.2.0|
|2017-02-26 21:19||vboctor||Target Version||2.2.0 => 2.3.0|
|2017-04-01 00:20||vboctor||Target Version||2.3.0 => 2.4.0|
|2017-04-07 00:09||travm1||Note Added: 0056429|
|2017-04-07 03:21||dregad||Note Added: 0056430|
|2017-04-30 14:53||vboctoradmin||Target Version||2.4.0 => 2.5.0|
|2017-06-04 16:19||atrol||Target Version||2.5.0 => 2.6.0|
|2017-09-03 18:49||vboctor||Target Version||2.6.0 => 2.7.0|
|2017-10-08 23:55||vboctor||Target Version||2.7.0 => 2.8.0|