Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

PHP Roundtable:
046: Character Encoding and UTF-8 in PHP
Jun 03, 2016 @ 14:13:13

The PHP Roundtable podcast, hosted by PHP community member Sammy K Powers, has posted their latest episode today: Episode #46 - Character Encoding and UTF-8 in PHP.

If you've ever gotten a number of weird looking characters in your database or on your website like, "?" and didn't know why, then this episode is for you. Those bizarre characters called "mojibake", rear their ugly heads when we don't account for a consistent character encoding. Today we discuss what character encoding is, how to accommodate for it in HTML, PHP & your database, and how we can ensure we'll never encounter an unexpected alien character in our web apps again.

For this episode Sammy is joined by Andreas Heigl and Evert Pot two developers more than familiar with Unicode woes. You can watch this latest episode through either the in-page audio or video player or by grabbing the audio file. If you enjoy the show, be sure to subscribe to their feed and follow them on Twitter for updates when future shows are released.

tagged: phproundtable podcast character encoding utf8 andreasheigl evertpot

Link: https://www.phproundtable.com/episode/character-encoding-and-utf-8-in-php

Toptal.com:
Data Encoding: A Guide to UTF-8 for PHP and MySQL
Jan 28, 2016 @ 19:22:56

The Toptal.com blog has posted a guide to data encoding in PHP and MySQL looking specifically at the use of UTF-8 and related handling. They talk about some of the updates you'll need to make to configurations, code and the MySQL settings to fully support this character set.

As a MySQL or PHP developer, once you step beyond the comfortable confines of English-only character sets, you quickly find yourself entangled in the wonderfully wacky world of UTF-8.

[...] Indeed, navigating through UTF-8 related data encoding issues can be a frustrating and hair-pulling experience. This post provides a concise cookbook for addressing these issues when working with PHP and MySQL in particular, based on practical experience and lessons learned (and with thanks, in part, to information discovered here and here along the way).

They start with the changes on the PHP side, updating the INI settings to make UTF-8 the default character set and which functions you'll need to update and replace. With those changes out of the way they move to the MySQL side, changing up settings in the my.cnf file and a few other things to consider on the database side (including that the MySQL support for UTF-8 is only a partial character set).

tagged: toptal data encoding mysql utf8 update configuration code

Link: http://www.toptal.com/php/a-utf-8-primer-for-php-and-mysql

Three Devs & A Maybe Podcast:
Understanding Character Sets and Encodings
May 14, 2014 @ 18:12:06

The Three Devs & A Maybe podcast (with hosts Michael Budd, Fraser Hart, Lewis Cains and Edd Mann) has posted their latest episode (#24) talking about character sets and encodings.

Having only just recently been bit by the character encoding issue again, we thought it would be a good time to bring it up on the podcast. Starting from the beginning with ASCII, we move on to discuss how 8-bit compatible machines brought way to the ISO-8859-* standards. This leads us on to Unicode, with the goal to develop a single character-set encoding standard that could support all of the world's scripts. Finally, we discuss the de-factor character encoding implementation used on the web today 'UTF-8', and reasons why this is the case.

Lots of different topics are mentioned including reversing a Unicode String in PHP using UTF-16BE/LE, portable UTF-8 and a YouTube video covering Pragmatic Unicode. You can listen to this new episode though the in-page player, by downloading the mp3 or subscribing to their feed.

tagged: threedevsandamaybe podcast ep24 unicode character set encoding utf8

Link: http://threedevsandamaybe.com/posts/understanding-character-sets-and-encodings/

PerishablePress.com:
Encoding & Decoding PHP Code
Jun 08, 2012 @ 15:56:26

On the PerishablePress.com site there's a recent article showing you how to encode your PHP project's code (though some of the methods are more obfuscation than actual encryption).

There are many ways to encode and decode PHP code. From the perspective of site security, there are three PHP functions - str_rot13(), base64_encode(), and gzinflate - that are frequently used to obfuscate malicious strings of PHP code. For those involved in the securing of websites, understanding how these functions are used to encode and decode encrypted chunks of PHP data is critical to accurate monitoring and expedient attack recovery.

They show examples of several methods of encoding/obfuscation of the code including rot13, base64, gzinflate/gzdeflate and links to some other resources.

tagged: encoding source obfuscate tutorial

Link:

Reddit.com:
Let's talk Character Encoding
Mar 15, 2012 @ 16:07:07

On Reddit.com there's a recent post with a growing discussion about character encodings in PHP applications (with some various recommendations).

I would rather not have to convert these weird characters to the HTML character entities, if possible. I'd rather be able to use these characters directly on the web page. If this is for some reason a bad idea, let me know. This might be more of a general web design question (i already posted it there), but I figured it is still appropriate to post here as well since PHP is used to pull an entry from the database, and I figured a lot of you here would know the answer to the question.

The general consensus is to use UTF8 in this case, but there's a few reminders for the poster too:

  • Don't forget to make the database UTF8 too
  • Be sure you're sending the right Content-Type for the UTF8 data
  • an link to an article about what "developers must know about unicode/charactersets"
tagged: character encoding advice reddit utf8 contenttype unicode

Link:

James Cohen's Blog:
How to Avoid Character Encoding Problems in PHP
Apr 25, 2011 @ 19:13:14

James Cohen has a recent post to his blog looking at a way you can avoid some of the character encoding problems in PHP that can come with working with multiple character sets.

Character sets can be confusing at the best of times. This post aims to explain the potential problems and suggest solutions. Although this is applied to PHP and a typical LAMP stack you can apply the same principles to any multi-tier stack.

He includes a "boring history" session (and recommends skipping if you just want the good stuff) that talks a bit about character sets and their history in computer system handling. All that said, he recommends using UTF-8 to ease your character encoding woes. He talks about configuring your editor to support it, making sure your browsers understand it and setting up your MySQL database connection to use it.

tagged: character encoding issue mysql browser editor ide

Link:

Brian Swan's Blog:
SQL Server Driver for PHP Connection Options: CharacterSet
Feb 28, 2011 @ 18:15:33

Brian Swan has posted another in his series looking at connection options for the SQL Server driver for PHP. In his latest he looks at the "CharacterSet" setting, an easy way to define which encoding the remote database is using.

One thing that helped me understand the CharacterSet option was to realize that its name is a bit misleading (although it seems to be inline with other uses of CharacterSet or charset). It is used to specify the encoding of data that is being sent to the server, not the character set. With that in mind, the possible values for the option begin to make sense: SQLSRV_ENC_CHAR, SQLSRV_ENC_BINARY, and UTF-8.

He looks at each of these three options in more detail - SQLSRV_ENC_CHAR being the default, SQLSRV_ENC_BINARY when binary data is needed and UTF-8 when, obviously, you need UTF-8 data transfer between the client and server.

tagged: sqlserver connection option characterset encoding

Link:

WebReference.com:
Create a Localized Web Page with PHP
Oct 21, 2010 @ 18:21:23

On WebReference.com there's a new tutorial posted about localizing your website by defining a character set to use for your content.

The process of making your applications/websites usable in many different locales is called internationalization, While customizing your code for different locales is called localization. Localization is the process of making your applications or websites local to where it is being viewed. For example, you can make a website more local to a particular place by converting its text to the predominate language of that location and by displaying the local time (e.g. German for people living in Germany or French for people living in France).

They show how to define constants that can be used in your application for the character set and language encoding. They use two major encodings - UTF-8 and ISO-8859-1 - in their examples of showing a sample "welcome" message in different languages. There's also a simple page to show you how to switch between languages if you'd like to give your visitors the option.

tagged: localize tutorial language encoding character

Link:

Kevin Schroeder's Blog:
You want to do WHAT with PHP? Chapter 3
Aug 31, 2010 @ 18:44:32

Kevin Schroeder has posted another excerpt from his "You Want to Do WHAT with PHP?" book to his blog today. This time it's from the third chapter that looks at character encodings like UTF-8 or ISO-8859-1.

I realized that while this 3.5-year PHP consultant knew Unicode, UTF-8, character encodings such as ISO-8859-1 or ISO-8859-7, I didn't understand them as well as I thought I had. With that I threw this chapter in the book. Knowing about character encoding is what many developers have. Not as many truly understand it. In this chapter I try to de-mystify character encoding as a whole.

The excerpt introduces character encoding and what it really is - a translation for the computer to be able to handle the human language. The problem comes in when multiple tools try to define the same sort of letters/chatacters in different ways. He gives an example of a "hello world" string in a normal ASCII format versus one from the EBCDIC format and how it would be rendered by an ASCII-understanding browser.

tagged: character encoding book excerpt ascii example

Link:

Evert Pot's Blog:
Filesystem encoding and PHP
Apr 21, 2010 @ 17:19:39

Evert Pot has a new post to his blog about working with files in your applications, more specifically in dealing with filesystem encodings other than some of the defaults.

Many PHP applications save files to a local filesystem. Most of the times for the bulk of readers here you'll likely only ever store files using US-ASCII encoding, either because your filenames are simply based on database fields (as you should try in most cases), or simply because most of your users never have a need for non-english characters. When you do though, it's important to know how operating systems cope with these characters. Unsurprising, all of them do this differently.

He talks about encoding issues in three major operating system types - Windows, OS X and Linux - with some code snippets included to illustrate how each handles the different encodings.

tagged: filesystem encoding osx linux windows

Link:


Trending Topics: