It’s interesting to see where setting Community Aims will get you. QS-E has a few so-called Community Aims, these are basically goals we must attain to support some of the most requested features in Quantum Star SE Evolved. A close second to the ideal of being able to transparently add new content to the game was the idea of allowing a full translation feature and native support for all multi-byte character encodings – We have gotten translation offers before for Hindi, Romanian, Russian, and more.

Recently given my prior blog post on a translation system I was looking at the system as a test implementation, and the broader area of I18N (Internationalisation). Further to my earlier post of today (Man, I write too much to this blog!) I found out from a Planet-PHP.net blog entry that Harry Fuecks recently released a godsend. He’s written and released the 0.1 version of a PHP UTF-8 library which does NOT rely upon having the mbstring library available. This is fantastic news! Using this library one can actually operate on UTF-8 multi-byte characters safely – no corruption of multi-byte strings!

I modified a test included in the library download. It offers the following results – with my name finally represented accurately!

Operating on a 20 character string of both single and multi-byte characters…

The following represents String operations utilising a default PHP installation without the mbstring library.

String is: Iñtërnâtiônàlizætiøn
This is a UTF-8 string – standard PHP manipulations follow (no mbstring enabled):
Num chars: 27 (it’s actually 20 – PHP cannot count characters since strlen() actually counts bytes)
Uppercase: IñTëRNâTIôNàLIZæTIøN (php ignored all the multi-byte characters)
Reversed: n��it��zil��n��it��nr��t��I (php corrupted the multi-byte characters, turning each into twin corrupted byte chars)

The following represents String operations utilising a default PHP installation without the mbstring library. The test does however utilise a PHP implementation of UTF-8 safe String functions.

String is: Iñtërnâtiônàlizætiøn
It’s well formed UTF-8
Num chars: 20
Uppercase: IÑTËRNÂTIÔNÀLIZÆTIØN
Reversed: nøitæzilànôitânrëtñI

My Name:
Maugrim’s real name: Pádraic
Reversed (Default PHP): ciard��P
Reversed (with UTF-8 Lib): ciardáP

Related posts:

  1. PHP Applications using UTF-8 – should we believe them?