View Issue Details

IDProjectCategoryView StatusLast Update
0021101mantisbtbugtrackerpublic2017-10-12 05:59
ReportervboctorAssigned Todregad 
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product Version1.3.0-rc.2 
Target Version1.3.0Fixed in Version1.3.0 
Summary0021101: Issues with emoji's are truncated before getting saved
Description

The following line is expected to be truncated when saved to the database and email is sent on the emoji after Z3 but there is text after that which I pasted in.

Not compatible with my Xperia Z3

Tagsmantishub

Relationships

related to 0020431 assigneddregad Use utf8mb4 charset for new MySQL installations 

Activities

dregad

dregad

2016-06-13 05:58

developer  

emoji_text.txt (97 bytes)
Not compatible with my Xperia Z3 😢😢 any help would be great as this game looks amazing 👍
emoji_text.txt (97 bytes)
dregad

dregad

2016-06-13 06:10

developer   ~0053357

I added the text you sent by e-mail as an UTF-8 text file attachment for the record.

Emoji are stored as 4-byte Unicode characters, so I would guess that the issue is a side effect of our using MySQL's 'UTF8' charset, which only supports 3 bytes chars. See 0020431 and more specifically my note 0020431:0052209:

"1. [...] (eventually, someone will face issues as they try to store 4-byte unicode chars, e.g. emoji or some CJK characters)".

vboctor

vboctor

2016-06-13 11:17

manager   ~0053366

I've tried this string with other services and some of them replace the emojis with ??? but don't truncate the text. Can we do a similar work around until the db support is done?

dregad

dregad

2016-06-18 09:34

developer   ~0053407

Can we do a similar work around until the db support is done?

Certainly.

I believe the simplest would be to simply replace any UTF-8 char > U+10000 by a given character or string (I'd suggest we use U+FFFD - �).

Question is, do we also need/want to somehow store the original character too ? e.g. for the crying face example you reported, we could replace by something like '�[U+1f622]'. I'm not sure it's worth the effort.

That could make the display look bad if echoed as-is, especially if there are a lot of "invalid" characters (e.g. a sentence in Chinese) but on the other hand it would allow us to

  • display the original character (at the expense of an extra preg_replace() call for each text display of course; this could be done in MantisCoreFormatting
  • convert any occurence found in the DB back to the original character (by means of an upgrade function) once utf8mb4 support has been implemented

This being a workaround, to minimize the impact on the code base, I would also limit applying this to key selected fields; I would say: bug summary, description, steps to reproduce, additional info and bugnote text.

Let me know your thoughts.

dregad

dregad

2016-06-18 09:44

developer   ~0053408

Proof-of-concept: see attached screenshot 'Selection_002.png'

dregad

dregad

2016-06-18 09:45

developer  

Selection_002.png (10,486 bytes)
Selection_002.png (10,486 bytes)
vboctor

vboctor

2016-06-18 10:28

manager   ~0053409

Looks good. I would go with the simple approach of replacing 4-byte unicode characters with �. Similar to what you have done in proof of concept.

dregad

dregad

2016-06-18 16:33

developer   ~0053414

OK then. I'll submit a pull request after applying the workaround to the 3 bug fields.

Will also need to check if this does not also cause issues in history and bug_revision tables.

dregad

dregad

2016-06-18 17:22

developer   ~0053417

PR https://github.com/mantisbt/mantisbt/pull/797

dregad

dregad

2016-06-18 18:27

developer   ~0053422

For the record, a couple helper functions I used while testing

function utf8_chr( $ordinal ) {
    return mb_convert_encoding( '&#' . (int)$ordinal . ';', 'UTF-8', 'HTML-ENTITIES');
}

function utf8_ord( $p_char ) {
    $char = mb_substr( $p_char, 0, 1, 'utf-8' );
    $size = strlen( $char );

    $ordinal = ord( $char[0] ) & ( 0xFF >> $size );
    for( $i = 1; $i < $size; $i++ ) {
        $ordinal = $ordinal << 6 | ( ord( $char[$i] ) & 0x7F );
    }
    return $ordinal;
}

Related Changesets

MantisBT: master-1.3.x 805ef0cb

2016-06-18 16:42:22

dregad

Details Diff
New database API function db_mysql_fix_utf8()

This new function replaces 4-byte UTF-8 chars by Unicode U+FFFD
character for MySQL databases.

This is a temporary workaround to avoid data getting truncated on MySQL
databases using native utf8 encoding which only supports 3 bytes chars,
until we're able to support utf8mb4 charset (see issue 0020431).

Fixes 0021101
mod - core/database_api.php Diff File

MantisBT: master-1.3.x 4dcb16cc

2016-06-18 16:48:59

dregad

Details Diff
Fix 4-byte UTF-8 chars issues on MySQL

This applies the new db_mysql_fix_utf8() function to the following
fields:

- bug.summary
- bug.description
- bug.steps_to_reproduce
- bug.additional_information
- bugnote.text
- custom fields

Fixes 0021101
mod - core/bug_api.php Diff File
mod - core/bugnote_api.php Diff File
mod - core/cfdefs/cfdef_standard.php Diff File
mod - core/custom_field_api.php Diff File

Issue History

Date Modified Username Field Change
2016-06-13 00:49 vboctor New Issue
2016-06-13 00:52 vboctor Tag Attached: mantishub
2016-06-13 05:58 dregad File Added: emoji_text.txt
2016-06-13 06:10 dregad Note Added: 0053357
2016-06-13 06:10 dregad Relationship added related to 0020431
2016-06-13 11:17 vboctor Note Added: 0053366
2016-06-18 09:34 dregad Note Added: 0053407
2016-06-18 09:44 dregad Note Added: 0053408
2016-06-18 09:45 dregad File Added: Selection_002.png
2016-06-18 10:28 vboctor Note Added: 0053409
2016-06-18 16:33 dregad Note Added: 0053414
2016-06-18 17:22 dregad Note Added: 0053417
2016-06-18 17:35 dregad Assigned To => dregad
2016-06-18 17:35 dregad Status new => assigned
2016-06-18 18:27 dregad Note Added: 0053422
2016-07-03 05:23 dregad Changeset attached => MantisBT master-1.3.x 805ef0cb
2016-07-03 05:23 dregad Changeset attached => MantisBT master-1.3.x 4dcb16cc
2016-07-03 05:23 dregad Status assigned => resolved
2016-07-03 05:23 dregad Resolution open => fixed
2016-07-03 05:23 dregad Fixed in Version => 1.3.0
2016-07-09 19:28 vboctor Status resolved => closed