On thePHP.cc site today they have a quick post that looks ahead at the future of the PHP language towards PHP version 8 and one planned feature - the deprecation of some multi-byte character handling.
Since the attempt to create a Unicode-based PHP implementation has failed, PHP 7 – just like PHP 5 – does not handle Unicode strings natively. The commonly used UTF-8 encoding, for example, is a multibyte encoding, as opposed to ASCII, where each character is represented by one single byte.[...] UTF-8 is a variable-length encoding and each character (code point, to be exact) is represented by one to four bytes. For ASCII characters, everything works smoothly, because UTF-8 is a superset of ASCII. The problems start with non-ASCII characters.
The post covers some of the common issues with multi-byte Unicode characters in PHP and the role that the iconv and mbstring functions play in their handling. It shows how the mbstring handling allows developers to "cheat a little" and where, when PHP 8 comes around, the main issue will lie: the deprecation of thembstring.func_overload
setting in the php.ini
.