<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Wed, 19 Jun 2013 13:46:17 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Reddit.com: Let's talk Character Encoding]]></title>
      <guid>http://www.phpdeveloper.org/news/17680</guid>
      <link>http://www.phpdeveloper.org/news/17680</link>
      <description><![CDATA[<p>
On Reddit.com there's <a href="http://www.reddit.com/r/PHP/comments/qxacr/rphp_lets_talk_character_encoding/">a recent post</a> with a growing discussion about character encodings in PHP applications (with some various recommendations).
</p>
<blockquote>
I would rather not have to convert these weird characters to the HTML character entities, if possible. I'd rather be able to use these characters directly on the web page. If this is for some reason a bad idea, let me know. This might be more of a general web design question (i already posted it there), but I figured it is still appropriate to post here as well since PHP is used to pull an entry from the database, and I figured a lot of you here would know the answer to the question. 
</blockquote>
<p>
The general consensus is to use UTF8 in this case, but there's a few reminders for the poster too:
</p>
<ul>
<li>Don't forget to make the database UTF8 too
<li>Be sure you're sending the right Content-Type for the UTF8 data
<li>an <a href="http://www.joelonsoftware.com/articles/Unicode.html">link to an article</a> about what "developers must know about unicode/charactersets"
</ul>]]></description>
      <pubDate>Thu, 15 Mar 2012 11:07:07 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Nikita Popov's Blog: htmlspecialchars() improvements in PHP 5.4]]></title>
      <guid>http://www.phpdeveloper.org/news/17462</guid>
      <link>http://www.phpdeveloper.org/news/17462</link>
      <description><![CDATA[<p>
In <a href="http://nikic.github.com/2012/01/28/htmlspecialchars-improvements-in-PHP-5-4">this new post</a> to his blog <i>Nikita Popov</i> looks at an update that might have gotten lost in the shuffle of new features coming in PHP 5.4 - some updates to <a href="http://php.net/htmlspecialchars">htmlspecialchars</a>.
</p>
<blockquote>
One set of changes that I think is particularly important was largely overlooked: For PHP 5.4 cataphract (Artefacto on StackOverflow) heroically rewrote large parts of htmlspecialchars thus fixing various quirks and adding some really nice new features. Here a quick summary of the most important changes: UTF-8 as the default charset, improved error handling (ENT_SUBSTITUTE) and Doctype handling (ENT_HTML401,...).
</blockquote>
<p>
He goes into each of these three main features in a bit more detail, providing code to illustrate the improved error handling and the new flags for Doctype handling (covering HTML 4.01, HTML 5, XML 1 and XHTML).
</p>]]></description>
      <pubDate>Mon, 30 Jan 2012 09:55:24 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Project: Patchwork-UTF8 - UTF8 Support for PHP]]></title>
      <guid>http://www.phpdeveloper.org/news/17458</guid>
      <link>http://www.phpdeveloper.org/news/17458</link>
      <description><![CDATA[<p>
<i>Nicolas Grekas</i> has shared another tool that he's pulled out of his "Patchwork" framework to make it a stand-alone tool: the <a href="https://github.com/nicolas-grekas/Patchwork-UTF8">Patchwork-UTF8 helper</a> that provides matching functions to those PHP already has for regular strings, but a little smarter to work with UTF8 correctly.
</p>
<blockquote>
The PatchworkUtf8 class implements the quasi complete set of string functions that need UTF-8 grapheme clusters awareness. These functions are all static methods of the PatchworkUtf8 class. The best way to use them is to add a use PatchworkUtf8 as u; at the beginning of your files, then when UTF-8 awareness is required, prefix by u:: when calling them.
</blockquote>
<p>
In <a href="https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/README.md">the README</a> for the tool he talks about the functions included in the current release that match PHP's string functions as well as some additional methods like "isUtf8", "bestFit" and "strtocasefold". It relies on the mbstring, iconv and intl extensions being installed, and if they aren't, it falls back to other functionality (list of those methods included).
</p>]]></description>
      <pubDate>Fri, 27 Jan 2012 11:38:40 -0600</pubDate>
    </item>
    <item>
      <title><![CDATA[Ahmed Shreef's Blog: iconv misunderstands UTF-16 strings with no BOM]]></title>
      <guid>http://www.phpdeveloper.org/news/15035</guid>
      <link>http://www.phpdeveloper.org/news/15035</link>
      <description><![CDATA[<p>
<i>Ahmed Shreef</i> has <a href="http://shreef.com/2010/08/iconv-misunderstands-utf-16-strings-with-no-bom/">a recent post</a> to his blog about an issue he had converting UTF-16 strings over to UTF-8 with the <a href="http://php.net/iconv">iconv</a> functionality in PHP. Specifically, he ended up with "rubbish unreadable characters" after the conversion.
</p>
<blockquote>
I had a problem last week with converting UTF-16 encoded strings to UTF-8 using PHP's iconv library on a Linux server. my code worked fine on my machine but the same code resulted in a rubbish unreadable characters on our production server.
</blockquote>
<p>
In his example (a basic "Hello World" in Arabic) he notes that there's no <a href="http://en.wikipedia.org/wiki/Byte-order_mark">byte order mark</a> on the string and, because of this, the iconv feature tries to guess if it's big-endian or little-endian. This guessing varies from machine to machine resulting in the inconsistencies he saw. The solution is to define the "to" and "from" for the conversion manually rather than letting it just guess.
</p>]]></description>
      <pubDate>Fri, 27 Aug 2010 13:36:56 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Danne Lundqvist's Blog: Detecting UTF BOM - byte order mark]]></title>
      <guid>http://www.phpdeveloper.org/news/14435</guid>
      <link>http://www.phpdeveloper.org/news/14435</link>
      <description><![CDATA[<p>
In a new post to his blog <i>Danne Lundqvist</i> looks at <a href="http://www.dotvoid.com/2010/04/detecting-utf-bom-byte-order-mark/">a common pitfall</a> that could trip you up if you're not careful with your UTF-8 data - not looking for the UTF byte order mark that tells the application if it needs to be handled as UTF content.
</p>
<p>
One such thing is the occurrence of the UTF byte order mark, or BOM. [...] For UTF-8, especially on Windows, it has become more and more common to use it to indicate that the file is indeed UTF. Most text editors handle this well and you won't ever see these bytes. As it should be.
</p>
<p>
He points out what could cause an issue - using <a href="http://php.net/strcmp">strcmp</a> or <a href="http://php.net/substr">substr</a> but it can be prevented by looking at and removing those first three bytes if needed. He includes a snippet of code that does just that.
</p>]]></description>
      <pubDate>Thu, 29 Apr 2010 11:47:03 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Evert Pot's Blog: basename() is locale-aware]]></title>
      <guid>http://www.phpdeveloper.org/news/14269</guid>
      <link>http://www.phpdeveloper.org/news/14269</link>
      <description><![CDATA[<p>
<i>Evert Pot</i> found out an interesting thing about the <a href="http://php.net/basename">basename</a> function in PHP - it's more than just a handy shortcut for paths, it's <a href="http://www.rooftopsolutions.nl/article/271">also locale aware</a>.
</p>
<blockquote>
It turns out basename does a bit more than just splicing the string at the last slash, because it's locale aware. In my case I was dealing with a multi-byte UTF-8 string. It took me quite some time figuring out what was going on, because I was testing from the console which had the en_US.UTF-8 locale, and the bug was appearing on Apache, which defaults to the C locale.
</blockquote>
<p>
He includes an example snippet of code showing how it can work with both the default (well, for Apache anyway) of the "C" locale versus the "UTF-8" locale and return different results for the same <a href="http://php.net/urldecode">urldecoded</a> information.
</p>]]></description>
      <pubDate>Tue, 30 Mar 2010 12:04:35 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Elliot Haughin's Blog: Building UTF8 Compatible CodeIgniter Applications]]></title>
      <guid>http://www.phpdeveloper.org/news/14248</guid>
      <link>http://www.phpdeveloper.org/news/14248</link>
      <description><![CDATA[<p>
<i>Elliot Haughin</i> has written up a post for all of those developers out there either already using <a href="http://codeigniter.com">CodeIgniter</a> or wanting to use it for your application - a look at making a <a href="http://www.haughin.com/2010/02/23/building-utf8-compatible-codeigniter-applications/">UTF-8 compatible site</a> with the help of a few custom libraries and form helpers.
</p>
<blockquote>
UTF8 allows your site to represent characters other than those in the basic english alphabet. More often than not, your CodeIgniter Application will contain methods where users can enter their name. [...] This guide assumes you are reasonably competent in installing php extensions, adding config variables to your php.ini, and using MY_ CodeIgniter overloading. If you're not sure about any of these, please make sure you consult a professional.
</blockquote>
<p>
You'll need to install the mbstring extension for PHP to be able to follow along with his example. He shows how to override the basic form functionality with custom functions to change the display of the form and how it handles the submitted information. He also looks at how to update the XML-RPC library that comes with the framework and the creation of a new helper to allow you to convert, check, compare and sort UTF-8 data.
</p>]]></description>
      <pubDate>Thu, 25 Mar 2010 12:13:43 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Phil Sturgeon's Blog: UTF-8 support for CodeIgniter]]></title>
      <guid>http://www.phpdeveloper.org/news/13068</guid>
      <link>http://www.phpdeveloper.org/news/13068</link>
      <description><![CDATA[<p>
<i>Phil Sturgeon</i> has <a href="http://philsturgeon.co.uk/news/2009/08/UTF-8-support-for-CodeIgniter">posted a quick guide</a> on working with UTF-8 support in the <a href="http://codeigniter.com">CodeIgniter</a> framework.
</p>
<blockquote>
Writing an application is easy. Writing an application that supports all characters from multiple languages? Not so easy. [...] To make your <a href="http://codeigniter.com/">CodeIgniter</a> application play nicely with UTF-8 you have a few things to think about.
</blockquote>
<p>
He shows how to set the headers up correctly, change the framework's configuration to include UTF-8 as a character set and set up the database connection to use it too. If you're already using a non-UTF-8 database structure, he also includes an example of how to make the conversion.
</p>]]></description>
      <pubDate>Wed, 19 Aug 2009 09:34:30 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Pablo Viquez's Blog: JSON, ISO 8859-1 and UTF-8 - Part]]></title>
      <guid>http://www.phpdeveloper.org/news/12906</guid>
      <link>http://www.phpdeveloper.org/news/12906</link>
      <description><![CDATA[<p>
After spotting some null values in a few of his form fields following an Ajax request, <i>Pablo Viquez</i> decided to track down his issue:
</p>
<blockquote>
While I was looking at some AJAX calls, I started to have a problem, for some reason, when I tried to query a JSON service I did using JQuery, the result was null for some fields. Going a little deeper, I notice that the records from the DB were OK, and the JavaScript was OK to, so what was the problem? The JSON Encode!
</blockquote>
<p>
His issue stemmed from the character encoding of the string being passed into <a href="http://php.net/json_encode">json_encode</a> (in this case, UTF-8) was coming from a PHP script saved in a page encoded as ISO-8859-1. You can <a href="http://www.pabloviquez.com/demo_files/Encoding-JSON.zip">download the files</a> he's come up with to illustrate the point.
</p>]]></description>
      <pubDate>Mon, 20 Jul 2009 12:42:36 -0500</pubDate>
    </item>
    <item>
      <title><![CDATA[Rob Allen's Blog: UTF8, PHP and MySQL]]></title>
      <guid>http://www.phpdeveloper.org/news/12167</guid>
      <link>http://www.phpdeveloper.org/news/12167</link>
      <description><![CDATA[<p>
<i>Rob Allen</i> <a href="http://akrabat.com/2009/03/18/utf8-php-and-mysql/">had a problem</a> - he needed to get the "pound" (as in the British monetary unit) into his MySQL database. His database didn't seem to want to comply:
</p>
<blockquote>
Everyone else probably already knows this stuff, but I hit an issue today to that took a while to sort out. Fortunately, some kind folks on IRC helped me, but as it's embarrassing to ask for help on the same issue twice, I'm writing down what I've learned! The problem: Get a &pound; character stored to MySQL, retrieved and then displayed without any weird characters in front of it using UTF8.
</blockquote>
<p>
His solution? Make sure you're using UTF-8 everywhere, not just when trying to insert into the database - in the broser's headers (both going in and coming out) and in the MySQL database insert. He gives code examples for each including database examples for PDO and the Zend_Db component of the <a href="http://framework.zend.com">Zend Framework</a>.
</p>]]></description>
      <pubDate>Thu, 19 Mar 2009 08:43:19 -0500</pubDate>
    </item>
  </channel>
</rss>
