Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

thePHP.cc:
Putting PHP 8 on the Roadmap
Feb 02, 2018 @ 15:30:07

On thePHP.cc site today they have a quick post that looks ahead at the future of the PHP language towards PHP version 8 and one planned feature - the deprecation of some multi-byte character handling.

Since the attempt to create a Unicode-based PHP implementation has failed, PHP 7 – just like PHP 5 – does not handle Unicode strings natively. The commonly used UTF-8 encoding, for example, is a multibyte encoding, as opposed to ASCII, where each character is represented by one single byte.

[...] UTF-8 is a variable-length encoding and each character (code point, to be exact) is represented by one to four bytes. For ASCII characters, everything works smoothly, because UTF-8 is a superset of ASCII. The problems start with non-ASCII characters.

The post covers some of the common issues with multi-byte Unicode characters in PHP and the role that the iconv and mbstring functions play in their handling. It shows how the mbstring handling allows developers to "cheat a little" and where, when PHP 8 comes around, the main issue will lie: the deprecation of thembstring.func_overload setting in the php.ini.

tagged: php8 roadmap unicode chanracter mbstring overload setting deprecation

Link: https://thephp.cc/news/2018/02/putting-php-8-on-the-roadmap

MyBuilder Tech Blog:
Managing Newlines and Unicode within JavaScript and PHP
Dec 28, 2016 @ 16:07:46

On the MyBuilder.com Tech blog Edd Mann has a post sharing some advice about dealing with newlines and Unicode characters in both Javascript and PHP functionality.

We were recently sent a tweet in-regard to a text-area client/server-side length validation not correlating. After some detective work we were able to find two issues that could have caused this to occur. In this post I wish to discuss our findings, and how we resolved each issue.

The first issue they found was newline characters that didn't seem to be there by normal ACSII standards in text-area inputs. They were showing as a single character on the client where it was two on the server, later discovered to be a defined standard in the HTML 5 spec. The second Unicode-related issue was with characters outside of the basic multilingual plane and how Javascript measures its length. The post then explains their solutions to each of the two issues, doing some string replacement and using a different function to get the length of a string.

tagged: newline javascript unicode html5 mbstring length

Link: http://tech.mybuilder.com/managing-newlines-and-unicode-within-javascript-and-php/

PHPMaster.com:
Working with Multibyte Strings
Jul 18, 2013 @ 15:12:55

On PHPMaster.com there's a tutorial posted that helps you understand how to work with multibyte strings in PHP. Multibyte strings could be a set of characters from a non-English language. They have to be treated differently than normal strings using the mbstring functionality.

A written language, whether it’s English, Japanese, or whatever else, consists of a number of characters, so an essential problem when working with a language digitally is to find a way to represent each character in a digital manner. Back in the day we only needed to represent English characters, but it’s a whole different ball game today and the result is a bewildering number of character encoding schemes used to represent the characters of many different languages. How does PHP relate to and deal with these different schemes?

He goes through a bit of introduction to multibyte strings - how they're represented internally, character schemes and Unicode. He also talks about the PHP support for the strings, noting that it's not really made to deal with them by default and the two methods you might use - iconv and mbstring. He shows how to enable the latter and introduces some of the most common functions you'll use with it (complete with some code examples).

tagged: multibyte strings tutorial mbstring introduction unicode

Link: http://phpmaster.com/working-with-multibyte-strings

Project:
Patchwork-UTF8 - UTF8 Support for PHP
Jan 27, 2012 @ 17:38:40

Nicolas Grekas has shared another tool that he's pulled out of his "Patchwork" framework to make it a stand-alone tool: the Patchwork-UTF8 helper that provides matching functions to those PHP already has for regular strings, but a little smarter to work with UTF8 correctly.

The PatchworkUtf8 class implements the quasi complete set of string functions that need UTF-8 grapheme clusters awareness. These functions are all static methods of the PatchworkUtf8 class. The best way to use them is to add a use PatchworkUtf8 as u; at the beginning of your files, then when UTF-8 awareness is required, prefix by u:: when calling them.

In the README for the tool he talks about the functions included in the current release that match PHP's string functions as well as some additional methods like "isUtf8", "bestFit" and "strtocasefold". It relies on the mbstring, iconv and intl extensions being installed, and if they aren't, it falls back to other functionality (list of those methods included).

tagged: utf8 support string patchwork framework helper mbstring iconv intl

Link:

Yannick's Blog:
mbstring vs iconv benchmarking
Oct 06, 2008 @ 17:50:20

Recently on his blog Yannick has done some benchmarking comparing mbstring and iconv in PHP 5.2.4 release.

Following up on my previous post about the differences between the mbstring and iconv international characters libraries (which resulted in a tentative conclusion that nobody knew anything about those differences), and particularly the comments by Nicola, we have combined forces (mostly efforts from Nicola, actually) to provide you with a little benchmarking, if that can help you decide.

His code for the test script is included (for you to gather your own results) and a full listing of his results comparing the effects of possible caching, running up to ten executions. You can download the text file that he ran the script on here.

tagged: mbstring iconv benchmark php5 text file statistic

Link:

Vinu Thomas' Blog:
mbstring Functions by default in PHP
Jul 18, 2008 @ 12:57:16

In a new post to his blog, Vinu Thomas talks about a set of functions that can make your life easier when handling unicode strings - the mb_* methods of the mbstring extension.

When dealing with multiple languages and internalization in PHP, some of the default functions in PHP end up mangling up the unicode characters in PHP. This is evident when you have a lot of funny looking characters coming up on your web page instead of the actual characters. [...] There is an extensions called mbstring which you can install in PHP which gives you a set of functions which are unicode ( actually multibyte ) ready.

He mentions some of the replacements like mb_send_mail instead o fmail and mb_strlen instead of the usual strlen. Thankfully, there's a simple way to make use of these functions without having to replace a lot of code - a setting in your php.ini (mbstring.func_overload) that tells your application to seamlessly replace things behind the scenes.

tagged: mbstring function utf8 unicode multibyte replace

Link:

Dokeos Blog:
mbstring vs iconv
Apr 24, 2008 @ 16:18:08

In this post on the Dokeos blog, there's a comparison of the mbstring function and the iconv library as it pertains to their use on multi-byte strings.

I was wondering today why use mbstring rather than iconv in Dokeos, and honestly I didn't remember exactly why I had chosen mbstring in the past, but finding information about the *differences* between the two. [...] Searching a bit more, I found a PPT presentation from Carlos Hoyos on Google.

Essentially, it boils down to how the library is integrated - mbstring is bundled and iconv is pulled from an external source. So, if you're looking for maximum portability, he recommends mbstring.

tagged: mbstring iconv multibyte character string compare internal external

Link:

Alessandro Crugnola's Blog:
AMFPHP and mbstring
Oct 12, 2007 @ 14:23:00

Alessandro Crugnola was struggling with an application he was developing (with Flex and PHP) where his local PHP installation worked just fine but his remote system errored on the same code:

Connecting to the service browser I was receiving the error "Channel.Ping.Failed" error and investingating a bit more in the fault message I discovered that the source error was: "The class {Amf3Broker} could not be found under the class path {/var/htdocs/amfphp/services/amfphp/Amf3Broker.php}" and the Amf3Broker php class does not exists anywhere in amfphp!

Despite some default settings he found, though, things still weren't loading correctly. Finally, he found the culprit - mbstring. One server had the setting to overload the strings and the other didn't resulting in the return of corrupted data from the amfphp stream.

tagged: amfphp mbstring flex application error amfphp mbstring flex application error

Link:

Alessandro Crugnola's Blog:
AMFPHP and mbstring
Oct 12, 2007 @ 14:23:00

Alessandro Crugnola was struggling with an application he was developing (with Flex and PHP) where his local PHP installation worked just fine but his remote system errored on the same code:

Connecting to the service browser I was receiving the error "Channel.Ping.Failed" error and investingating a bit more in the fault message I discovered that the source error was: "The class {Amf3Broker} could not be found under the class path {/var/htdocs/amfphp/services/amfphp/Amf3Broker.php}" and the Amf3Broker php class does not exists anywhere in amfphp!

Despite some default settings he found, though, things still weren't loading correctly. Finally, he found the culprit - mbstring. One server had the setting to overload the strings and the other didn't resulting in the return of corrupted data from the amfphp stream.

tagged: amfphp mbstring flex application error amfphp mbstring flex application error

Link:

SitePoint PHP Blog:
Hot PHP UTF-8 tips
Aug 10, 2006 @ 19:50:03

Following up on some of his previous posts to the SitePoint PHP Blog, Harry Fuecks has posted this quick guide with some "hot UTF-8 tips" to share with the community.

As a result of all the noise about UTF-8, got an email from Marek Gayer with some very smart tips on handling UTF-8. What follows is a discussion illustrating what happens when you get obsessed with performance and optimizations (be warned - may be boring, depending on your perspective).

He talks mainly about using the native PHP functionality to avoid the mbstring issues that could arise by restricting locale behavior and using a fast case conversion function to handle strings correctly. The other tip involves delivery methods to those not able to recieve UTF-8 formatted content - checking their character set and responding accordingly.

tagged: utf8 tips mbstring native locale behavior case conversion character set utf8 tips mbstring native locale behavior case conversion character set

Link:


Trending Topics: