Tip for handling UTF8 encoding when using scripts

 

This tip is for those poor souls that are writting their data in UTF8 (not English characters) to the MySQL database from script.

In my experience this has always been an issue to load data from a script (a dump script) in UTF8 - it just refuses to come out right the first time. Especially this is the case with PHP MyAdmin. So here are the key points to watch out for:

1. Check that your database is in UTF8 if you have a statement for creating database

in the script you may see something like :

--
-- Database: `db`
--
CREATE DATABASE `db` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
USE `db`;

2. Check that your tables are all in UTF8 if you have table creation statements

in the script you may see something like :

DROP TABLE IF EXISTS `tbl`;
CREATE TABLE IF NOT EXISTS `
tbl` (
  `id` int(11) NOT NULL auto_increment,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

3. Make it use the charset that you want when the script is run

Add the following to the script:

SET NAMES utf8;
SET CHARACTER SET utf8;

These lines should appear first in your script

4. Oh boy, you are in trouble if you keep on reading. This is the MySQL server config issue from now on

You need to check the default character set

[mysql]
default-character-set=utf8

and handshaking (which defines whether the client can override the default character set setting)

character-set-client-handshake=true

 

Hope this helps...

This page was last updated on: 21/07/2009 11:37