Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

PHPMaster.com:
Working with Multibyte Strings
Jul 18, 2013 @ 10:12:55

On PHPMaster.com there's a tutorial posted that helps you understand how to work with multibyte strings in PHP. Multibyte strings could be a set of characters from a non-English language. They have to be treated differently than normal strings using the mbstring functionality.

A written language, whether it’s English, Japanese, or whatever else, consists of a number of characters, so an essential problem when working with a language digitally is to find a way to represent each character in a digital manner. Back in the day we only needed to represent English characters, but it’s a whole different ball game today and the result is a bewildering number of character encoding schemes used to represent the characters of many different languages. How does PHP relate to and deal with these different schemes?

He goes through a bit of introduction to multibyte strings - how they're represented internally, character schemes and Unicode. He also talks about the PHP support for the strings, noting that it's not really made to deal with them by default and the two methods you might use - iconv and mbstring. He shows how to enable the latter and introduces some of the most common functions you'll use with it (complete with some code examples).

tagged: multibyte strings tutorial mbstring introduction unicode

Link: http://phpmaster.com/working-with-multibyte-strings

Project:
Patchwork-UTF8 - UTF8 Support for PHP
Jan 27, 2012 @ 11:38:40

Nicolas Grekas has shared another tool that he's pulled out of his "Patchwork" framework to make it a stand-alone tool: the Patchwork-UTF8 helper that provides matching functions to those PHP already has for regular strings, but a little smarter to work with UTF8 correctly.

The PatchworkUtf8 class implements the quasi complete set of string functions that need UTF-8 grapheme clusters awareness. These functions are all static methods of the PatchworkUtf8 class. The best way to use them is to add a use PatchworkUtf8 as u; at the beginning of your files, then when UTF-8 awareness is required, prefix by u:: when calling them.

In the README for the tool he talks about the functions included in the current release that match PHP's string functions as well as some additional methods like "isUtf8", "bestFit" and "strtocasefold". It relies on the mbstring, iconv and intl extensions being installed, and if they aren't, it falls back to other functionality (list of those methods included).

tagged: utf8 support string patchwork framework helper mbstring iconv intl

Link:

Yannick's Blog:
mbstring vs iconv benchmarking
Oct 06, 2008 @ 12:50:20

Recently on his blog Yannick has done some benchmarking comparing mbstring and iconv in PHP 5.2.4 release.

Following up on my previous post about the differences between the mbstring and iconv international characters libraries (which resulted in a tentative conclusion that nobody knew anything about those differences), and particularly the comments by Nicola, we have combined forces (mostly efforts from Nicola, actually) to provide you with a little benchmarking, if that can help you decide.

His code for the test script is included (for you to gather your own results) and a full listing of his results comparing the effects of possible caching, running up to ten executions. You can download the text file that he ran the script on here.

tagged: mbstring iconv benchmark php5 text file statistic

Link:

Vinu Thomas' Blog:
mbstring Functions by default in PHP
Jul 18, 2008 @ 07:57:16

In a new post to his blog, Vinu Thomas talks about a set of functions that can make your life easier when handling unicode strings - the mb_* methods of the mbstring extension.

When dealing with multiple languages and internalization in PHP, some of the default functions in PHP end up mangling up the unicode characters in PHP. This is evident when you have a lot of funny looking characters coming up on your web page instead of the actual characters. [...] There is an extensions called mbstring which you can install in PHP which gives you a set of functions which are unicode ( actually multibyte ) ready.

He mentions some of the replacements like mb_send_mail instead o fmail and mb_strlen instead of the usual strlen. Thankfully, there's a simple way to make use of these functions without having to replace a lot of code - a setting in your php.ini (mbstring.func_overload) that tells your application to seamlessly replace things behind the scenes.

tagged: mbstring function utf8 unicode multibyte replace

Link:

Dokeos Blog:
mbstring vs iconv
Apr 24, 2008 @ 11:18:08

In this post on the Dokeos blog, there's a comparison of the mbstring function and the iconv library as it pertains to their use on multi-byte strings.

I was wondering today why use mbstring rather than iconv in Dokeos, and honestly I didn't remember exactly why I had chosen mbstring in the past, but finding information about the *differences* between the two. [...] Searching a bit more, I found a PPT presentation from Carlos Hoyos on Google.

Essentially, it boils down to how the library is integrated - mbstring is bundled and iconv is pulled from an external source. So, if you're looking for maximum portability, he recommends mbstring.

tagged: mbstring iconv multibyte character string compare internal external

Link:

Alessandro Crugnola's Blog:
AMFPHP and mbstring
Oct 12, 2007 @ 09:23:00

Alessandro Crugnola was struggling with an application he was developing (with Flex and PHP) where his local PHP installation worked just fine but his remote system errored on the same code:

Connecting to the service browser I was receiving the error "Channel.Ping.Failed" error and investingating a bit more in the fault message I discovered that the source error was: "The class {Amf3Broker} could not be found under the class path {/var/htdocs/amfphp/services/amfphp/Amf3Broker.php}" and the Amf3Broker php class does not exists anywhere in amfphp!

Despite some default settings he found, though, things still weren't loading correctly. Finally, he found the culprit - mbstring. One server had the setting to overload the strings and the other didn't resulting in the return of corrupted data from the amfphp stream.

tagged: amfphp mbstring flex application error amfphp mbstring flex application error

Link:

Alessandro Crugnola's Blog:
AMFPHP and mbstring
Oct 12, 2007 @ 09:23:00

Alessandro Crugnola was struggling with an application he was developing (with Flex and PHP) where his local PHP installation worked just fine but his remote system errored on the same code:

Connecting to the service browser I was receiving the error "Channel.Ping.Failed" error and investingating a bit more in the fault message I discovered that the source error was: "The class {Amf3Broker} could not be found under the class path {/var/htdocs/amfphp/services/amfphp/Amf3Broker.php}" and the Amf3Broker php class does not exists anywhere in amfphp!

Despite some default settings he found, though, things still weren't loading correctly. Finally, he found the culprit - mbstring. One server had the setting to overload the strings and the other didn't resulting in the return of corrupted data from the amfphp stream.

tagged: amfphp mbstring flex application error amfphp mbstring flex application error

Link:

SitePoint PHP Blog:
Hot PHP UTF-8 tips
Aug 10, 2006 @ 14:50:03

Following up on some of his previous posts to the SitePoint PHP Blog, Harry Fuecks has posted this quick guide with some "hot UTF-8 tips" to share with the community.

As a result of all the noise about UTF-8, got an email from Marek Gayer with some very smart tips on handling UTF-8. What follows is a discussion illustrating what happens when you get obsessed with performance and optimizations (be warned - may be boring, depending on your perspective).

He talks mainly about using the native PHP functionality to avoid the mbstring issues that could arise by restricting locale behavior and using a fast case conversion function to handle strings correctly. The other tip involves delivery methods to those not able to recieve UTF-8 formatted content - checking their character set and responding accordingly.

tagged: utf8 tips mbstring native locale behavior case conversion character set utf8 tips mbstring native locale behavior case conversion character set

Link:

SitePoint PHP Blog:
Hot PHP UTF-8 tips
Aug 10, 2006 @ 14:50:03

Following up on some of his previous posts to the SitePoint PHP Blog, Harry Fuecks has posted this quick guide with some "hot UTF-8 tips" to share with the community.

As a result of all the noise about UTF-8, got an email from Marek Gayer with some very smart tips on handling UTF-8. What follows is a discussion illustrating what happens when you get obsessed with performance and optimizations (be warned - may be boring, depending on your perspective).

He talks mainly about using the native PHP functionality to avoid the mbstring issues that could arise by restricting locale behavior and using a fast case conversion function to handle strings correctly. The other tip involves delivery methods to those not able to recieve UTF-8 formatted content - checking their character set and responding accordingly.

tagged: utf8 tips mbstring native locale behavior case conversion character set utf8 tips mbstring native locale behavior case conversion character set

Link:

Matthew Weir O'Phinney's Blog:
mbstring comes to the rescue
May 17, 2006 @ 05:49:23

Character encodings, especially when dealing with XML, in PHP can be a pain to say the least. Matthew Weir O'Phinney found this out first-hand when a script he was working with had a mixed character set in one of its strings, giving the XML parser in the SimpleXML functionality problems.

I tried a number of solutions, hoping actually to automate it via mbstring INI settings; these schemes all failed. iconv didn't work properly. The only thing that did work was to convert the encoding to latin1 -- but this wreaked havoc with actual UTF-8 characters.

Then, through a series of trial-and-error, all-or-nothing shots, I stumbled on a simple solution.

The discovery was to detect the encoding of the string itself (not really the content) and convert eveything in it to that encoding. How, you might ask? With the handy mb_detect_encoding and mb_convert_encoding functions. Of course, this functionality has to be compiled into PHP, but it's well worth it if it's exactly what you need.

tagged: mbstring xml simplexml encoding utf-8 detect convert mbstring xml simplexml encoding utf-8 detect convert

Link: