Sorry this page looks weird. It was automatically migrated from my old blog, which had a different layout and different CSS.

Fixing MySQL’s Illegal Mix of Collations

I recently started to encounter MySQL’s dreaded illegal-mix-of-collations error. The occasional query would result in Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='.

MySQL’s default character set is latin1 with an adventurous Swedish collation, which presumably seemed like a good idea at the time. However Rails uses UTF-8 everywhere and from time to time the two collations collide.

Fortunately this problem is not new and clever people have already solved it: see Getting out of MySQL Character Set Hell and Fixing a MySQL Character Encoding Mismatch.

Those two articles advise more or less the same process. Here I’m just jotting down some notes where I found something confusing or ambiguous in case I ever have to do this again.

Order of events

After exporting the data and hacking the dump file, I wasn’t sure whether to configure MySQL for UTF-8 and then import the data, or the other way round.

I chose to import the data and then configure MySQL to avoid double-encoding the data. It seems to have worked. It silently truncated all my accented text but I was able to fix that.

Take stock

First figure out where you stand. In MySQL do this:

mysql> show variables like 'char%';

The result you want is this:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

At this stage several of these values will be latin1.

Similarly for collation:

mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+

The utf8_general_ci collation will do but ideally we want utf8_unicode_ci. The former basically ignores all accents: it treats ‘ΓΌ’ the same as ‘u’. The latter pays attention to accents; it’s a tiny bit slower but more accurate.

To see what your tables are using:

mysql> show table status \G

…and look at the Collation field in each table.

Dumping your data as latin1

Here’s the command I used:

mysqldump -uUSER -pPASSWORD --quick --single-transaction --create-options --skip-set-charset --default-character-set=latin1 DATABASE > dump.sql

…where USER, PASSWORD, and DATABASE should be replaced appropriately.

My SQL dump was only 40MB so I was able to edit the tables' character sets in Vim (set :syn off for speed). In Vim:

:%s/DEFAULT CHARSET=latin1;/DEFAULT CHARSET=utf8;/

Configuring MySQL to use UTF8

I added this configuration to /etc/mysql/my.cnf (on Ubuntu 10.04):

[client]:
default-character-set=utf8

[mysqld]
# MySQL 5.5.3+
character-set-server=utf8
collation-server=utf8_general_ci
# MySQL <5.5.3
#default-character-set=utf8
#default-collation=utf8_general_ci

The default-character-set and default-collation options were replaced in MySQL 5.5.3 by character-set-server and collation-server.

After I had done all this and restarted mysql, I found all the character_set_* variables were correct except character_set_database. To fix this:

mysql> alter database DATABASE default character set utf8;
mysql> alter database DATABASE default collate utf8_general_ci;

Verify with show create database DATABASE. (Documentation.)

Verifying everything worked

In the MySQL console I ran these:

mysql> show variables like 'char%';
mysql> show variables like 'collation%';
mysql> show create database DATABASE;
mysql> show table status \G

And then in the Rails console:

>> ActiveRecord::Base.connection.collation
"utf_general_ci"

Andrew Stewart • 24 January 2012 • MySQLRails
You can reach me by email or on Twitter.