Sorry this page looks weird. It was automatically migrated from my old blog, which had a different layout and different CSS.

Fixing MySQL’s Illegal Mix of Collations

I recently started to encounter MySQL’s dreaded illegal-mix-of-collations error. The occasional query would result in Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='.

MySQL’s default character set is latin1 with an adventurous Swedish collation, which presumably seemed like a good idea at the time. However Rails uses UTF-8 everywhere and from time to time the two collations collide.

Fortunately this problem is not new and clever people have already solved it: see Getting out of MySQL Character Set Hell and Fixing a MySQL Character Encoding Mismatch.

Those two articles advise more or less the same process. Here I’m just jotting down some notes where I found something confusing or ambiguous in case I ever have to do this again.

Order of events

After exporting the data and hacking the dump file, I wasn’t sure whether to configure MySQL for UTF-8 and then import the data, or the other way round.

I chose to import the data and then configure MySQL to avoid double-encoding the data. It seems to have worked. It silently truncated all my accented text but I was able to fix that.

Take stock

First figure out where you stand. In MySQL do this:

mysql> show variables like 'char%';

The result you want is this:

| Variable_name            | Value                      |
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |

At this stage several of these values will be latin1.

Similarly for collation:

mysql> show variables like 'collation%';
| Variable_name        | Value           |
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |

The utf8_general_ci collation will do but ideally we want utf8_unicode_ci. The former basically ignores all accents: it treats ‘ΓΌ’ the same as ‘u’. The latter pays attention to accents; it’s a tiny bit slower but more accurate.

The utf8_general_ci collation will do, though utf8_unicode_ci is more accurate and there may well be a better one yet for the languages you want to support. See Unicode Character Sets in the MySQL manual for the details. In my case supporting English, French, and German, I am happy with utf8_unicode_ci.

The most robust choice is the utf8mb4 character set and the utf8mb4_unicode_520_ci collation, which will handle 4-byte UTF-8 needed by, for example, emoji.

Thanks to Rick James for his feedback on character sets and collations.

To see what your tables are using:

mysql> show table status \G

…and look at the Collation field in each table.

Dumping your data as latin1

Here’s the command I used:

mysqldump -uUSER -pPASSWORD --quick --single-transaction --create-options --skip-set-charset --default-character-set=latin1 DATABASE > dump.sql

…where USER, PASSWORD, and DATABASE should be replaced appropriately.

My SQL dump was only 40MB so I was able to edit the tables' character sets in Vim (set :syn off for speed). In Vim:


Configuring MySQL to use UTF8

I added this configuration to /etc/mysql/my.cnf (on Ubuntu 10.04):


# MySQL 5.5.3+
# MySQL <5.5.3

The default-character-set and default-collation options were replaced in MySQL 5.5.3 by character-set-server and collation-server.

After I had done all this and restarted mysql, I found all the character_set_* variables were correct except character_set_database. To fix this:

mysql> alter database DATABASE default character set utf8;
mysql> alter database DATABASE default collate utf8_general_ci;

Verify with show create database DATABASE. (Documentation.)

Verifying everything worked

In the MySQL console I ran these:

mysql> show variables like 'char%';
mysql> show variables like 'collation%';
mysql> show create database DATABASE;
mysql> show table status \G

And then in the Rails console:

>> ActiveRecord::Base.connection.collation

Andrew Stewart • 24 January 2012 • MySQLRails
You can reach me by email or on Twitter.