Sorry this page looks weird. It was automatically migrated from my old blog, which had a different layout and different CSS.

Fixing Truncated, Accented Text In MySQL

So when I loaded my data into MySQL the other day, MySQL silently truncated every piece of text at the first accented character. This affected about 13,000 records. The truncation was bad enough, though I suppose MySQL just couldn’t figure out what the byte stream meant at that point. But to do it silently?

Not cool.

Why did MySQL truncate my text? I have no idea.

Update: Rick James maintains an exhaustive list of causes and solutions. See also his Stack Overflow answer to a similar question.

The correct solution

Move to PostgreSQL.

The solution in the meantime

I couldn’t find the affected records in the new database because the truncated data was truly lost. How do you query what’s not there?

So I loaded my data dump into a fresh development database. Strangely this worked despite having gone pear-shaped in production.

I then looked for all texty fields in the database containing a French or German accent (based on my customers' languages) and generated a SQL script with their values. The accent query below just wouldn’t work until I used the binary keyword.

Here’s the Rake task I wrote:

task :fix_accents => :environment do
  accents = %w[ ß ä ë ö ü é à è â ô ]
  ActiveRecord::Base.connection.tables.
    reject { |name| %w[delayed_jobs schema_migrations].include? name }.
    each do |table|
      klass = table.classify.constantize
      texty_columns = klass.columns_hash.
        select { |name,col| col.type == :string || col.type == :text }.
        map    { |name,col| name }.
        each do |name|
          accents.each do |accent|
            klass.select("id, `#{name}`").where("binary `#{name}` like '%#{accent}%'").each do |row|
              puts "UPDATE #{table} SET `#{name}` = #{row.send(name).inspect} WHERE id = #{row.id};"
            end
          end
        end
    end
end

I piped the output into a new file, tested a few key lines on the production database, then piped the whole thing into the production database. Problem solved.

I’m still moving to PostreSQL though.

Andrew Stewart • 27 January 2012 • MySQLRails
You can reach me by email or on Twitter.