News Feed
Jobs Feed
Sections

Recent Jobs

News Archive
feed this:

SitePoint PHP Blog:
Character Encoding Issues with Cultural Integration
September 10, 2008 @ 12:07:06

On the SitePoint PHP Blog Troels Knak-Nielsen points out some "cultural integration issues" he's seen when it comes to character encoding in his PHP applications.

The gold standard solution is to convert everything to utf-8. Since utf-8 covers the entire unicode range, it is capable of representing any character that latin1 can. Unfortunately, that's a lot easier to do from the outset, than with a big, running application. And even then, there may be third party code and extensions, which assume latin1. I'd much rather continue with latin1 being the default, and only jump through hoops at the few places where I actually need full utf-8 capacity.

He came up with a (relatively) simple solution - keep the information encoded in the latin1 he already has but serve up the pages with a utf-8 format, embedding utf-8 inside the latin1 when needed. He gives the code for both, making use of output buffering and the utf8 encoding functions to make it all work.

0 comments voice your opinion now!
character encoding cultural integration utf8 latin1 tutorial



Vinu Thomas' Blog:
mbstring Functions by default in PHP
July 18, 2008 @ 07:57:16

In a new post to his blog, Vinu Thomas talks about a set of functions that can make your life easier when handling unicode strings - the mb_* methods of the mbstring extension.

When dealing with multiple languages and internalization in PHP, some of the default functions in PHP end up mangling up the unicode characters in PHP. This is evident when you have a lot of funny looking characters coming up on your web page instead of the actual characters. [...] There is an extensions called mbstring which you can install in PHP which gives you a set of functions which are unicode ( actually multibyte ) ready.

He mentions some of the replacements like mb_send_mail instead o fmail and mb_strlen instead of the usual strlen. Thankfully, there's a simple way to make use of these functions without having to replace a lot of code - a setting in your php.ini (mbstring.func_overload) that tells your application to seamlessly replace things behind the scenes.

0 comments voice your opinion now!
mbstring function utf8 unicode multibyte replace


ThinkPHP Blog:
Multilingual Websites with PHP
July 15, 2008 @ 07:55:38

On the ThinkPHP blog, Florian Eibeck has posted an overview of some key things to consider when internationalizing your application/website.

The biggest problem is that most developers lack knowledge about Internationalisation, Localisation, Character encodings, Unicode and all those terms connected with multilingualism. The following article should give you a basic understanding and show you how to avoid those funny characters.

He defines a few terms - internationalization, ASCII, unicode and the UTF-8/ISO-8859 character sets. He mentions how to accept the utf-8 string into your application and how to use it in both PHP and store it in a MySQL database.

0 comments voice your opinion now!
multilingual website internationalization i18n utf8 unicode


Padraic Brady's Blog:
ZF Blog Tutorial Addendum #1 Base URL, Magic Quotes, Database Schema & UTF-8
May 29, 2008 @ 16:12:03

Padraic Brady has an addendum he's posted to his "making a blogging application with the Zend Framework" series dealing with a few random issues from along the way.

The interesting thing about live publishing of a long tutorial series is that it's not flawless. In fact it's the opposite. [...] To cover all these I'll occasionally highlight the more important ones both in notes to new entries, or where they slip past me, in Addendum entries like this one.

There's four sections in this update - one dealing with the referencing of base URLs, another worrying about magic_quotes settings, an updated database schema for the project and the final about removing non-english characters in the title URLs.

0 comments voice your opinion now!
addendum base url magicquotes database schema utf8


PHPWACT.org:
Handling UTF-8 with PHP
January 24, 2008 @ 07:51:00

Ed Finkler has pointed out a handy resource for those trying to cope with using the UTF-8 support included in several of PHP's functions - this page on the Web Application Component Toolkit wiki.

This page is intended as a reference for functionality PHP provides which can either help with handling UTF-8 or should be regarded as a risk when used in conjunction with UTF-8 encoded strings. Further information can be found on the Internationalization (I18N) and Character Sets / Character Encoding Issues pages.

It talks about the "dangerous" functionality PHP has (issues that the language has in current functions) when using things like the PCRE extension, the string extension, the array methods, handling variables, the XML extensions (DOM and SAX), image manipulation, and URL parsing functionality.

0 comments voice your opinion now!
utf8 dangerous functionality pcre xml string array image url


Nessa's Blog:
Convert Database to UTF-8
December 13, 2007 @ 10:23:00

Nessa has posted a quick way to convert a database from whatever character set it's currently on over to UTF-8 with a handy PHP script.

When you're dealing with special characters in a database, you have to make sure that the charset and collation are dumped *with* the database, so that when you move it to another server the tables and data create properly. The biggest annoyance so far is converting tables back to UTF-8, as when this is done through the MySQL shell or phpmyadmin is had to be done table-by-table.

The script logs into the database and pulls all of the table information out (could be a lengthy list depending on the database) and runs an ALTER TABLE to change its character set to 'UTF8'.

0 comments voice your opinion now!
characterset utf8 convert alter database table characterset utf8 convert alter database table


Felix Geisendorfer's Blog:
Enforce utf8 for multiple db connections
November 12, 2007 @ 12:55:00

Felix Geisendorfer has another quick CakePHP tip - an update from a previous tip for forcing utf8 connections on multiple databases:

This is just a quick update for Dessert #6 - MySql & UTF-8. I've been using the approach outlined in that old post pretty much until today, when I realized that it has two major flaws: It does not work when using multiple db connections (i.e. using load balancing or connecting to a 3rd party db), and it might interfere with other databases that don't need this utf8 thing to be set.

He includes some code (a quick 13 liner) to take care of this small issue. Check out the comments on the post for an even easier way too.

0 comments voice your opinion now!
caekphp framework multiple database connection utf8 encoding caekphp framework multiple database connection utf8 encoding


Maggie Nelson's Blog:
When PHP and Oracle assume the worst about each other
June 13, 2007 @ 10:10:00

As mentioned by Ben Ramsey today, Maggie Nelson bumped into an issue in one of her applications with character sets and the incorrect storage/retrieval of information:

Even Oracle, which usually gets storing of data right on the money has had issues with character sets. [...] Needless to say, even when you *know* you set up your database correctly for supporting UTF8, the path to debug issues may be frustrating and full of red herrings.

She mentions the setup the application is using (NLS_CHARACTERSET AL32UTF8, NLS_NCHAR_CHARACTERSET AL16UTF16) but something wasn't right. The problem popped up when they tried to store Chinese characters into the database with the result of invalid data on a select.

After following several different leads, they finally came upon the culprit - the Apache process didn't have the access it needed to a directory in the ORACLE_HOME. In the end, it all only broke down into three easy steps to fix a very frustrating issue.

0 comments voice your opinion now!
oracle utf8 characterset nls unicode oracle utf8 characterset nls unicode


Aaron Wormus' Blog:
Migrating to Unicode
May 18, 2007 @ 07:08:54

Aaron Wormus has dug up some old notes that he made at the International PHP Conference back in 2005 on the topic of Unicode in PHP that he wanted to share.

He mentions:

  • A case study of an online survey generator
  • info about Unicoding and UTF-8
  • Iconv and mbstring
  • and the migration of the application to support Unicoded characters (a nine step process)

And also, a word of warning:

Everything was much more complex than expected. Don't do this because you think that UTF-8 is cool, it's difficult, not well supported in PHP, and don't do it without needing it. Don't do this without a CVS.
0 comments voice your opinion now!
migration unicode casestudy survey generator utf8 migration unicode casestudy survey generator utf8


David Sklar's Blog:
Visiting each character in a string
April 26, 2007 @ 07:01:00

In a new post today, David Skalr demonstrates how he solved a simple problem - looping through all of the characters in a string in a UTF-8 enabled environment.

So I've got this string (in PHP) and I need to scan through it character by character. I can't scan byte by byte because it's 2007, our users write in all sorts of languages, and the string is UTF-8.

To remedy the situation, he falls back on an old standby - the mb_* functions, mb_substr and mb_strlen. His benchmarks show that, with a 1500 character string, running his sample script gives him around 61 scans per second. (The PHP6 version with TextIterator works much faster, though - 450 scans per second).

0 comments voice your opinion now!
string loop utf8 mbstrlen mbsubstr benchmark textiterator string loop utf8 mbstrlen mbsubstr benchmark textiterator



Community Events











Don't see your event here?
Let us know!


example mysql database book job zend application release code framework cakephp security ajax developer PEAR zendframework conference releases PHP5 package

All content copyright, 2008 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework