mysql utf8 and latin1 string comparison in legacy databases

When comparing strings in MySQL, using characters like “é”, you might end up with some wierd results. If you have a legacy database where you can’t change collations, charsets and the like, it gets harder. And to make all things worse, if you can’t change your connection or do a “set names”, you might start pulling some hair.

Let us set the environment: I have some strings in my database, in a latin1 table, with some accented characters in them. One of them is “César”, another one is “Si j’étais jardinier”. When doing a SELECT title FROM mytable WHERE title like '%étais%', nothing popped up. However, when I searched for “etais” (without the accent), this gave me the results, with the accents. My connection was latin1, the stored data was UTF8 (in a latin1 table, but don’t ask my why, see the first paragraph). The words “collation”, “charsets” and “not again” flashed before my eyes.

But, no worries, there is a solution. It was buried deeply in the less-visited section of my brain, but I managed to dig it out this morning.

The solution is rather simple, if you know it. You can prepend the string your searching with the charset, to force the collation. Just prepend it to the string, and add an underscore in front of it (underscore charset string); just take a look at the where-statement below:

SELECT title FROM mytable WHERE title like _utf8'%étais%'

My real-world example goes like this:

The pain:

mysql> select title from mytable where title like "%étais%";
Empty set (0.09 sec)


mysql> select title from mytable where title like "%etais%";
+-----------------------------------------------+
| title |
+-----------------------------------------------+
| Je n'aimais plus j'étais triste et réveur |
| J'étais seule en un bocage |
...

The solution:

mysql> select title from mytable where title like _utf8"%étais%";
+-----------------------------------------------+
| title |
+-----------------------------------------------+
| Je n'aimais plus j'étais triste et réveur |
| J'étais seule en un bocage |

Proof:

mysql> select title from mytable where title like _latin1"%étais%";
Empty set (0.09 sec)

It is not the best solution, but it works (and don’t come complaining about the crappy ‘%something%’-searching, I know it is there, and I’m deeply ashamed).

Tags: , ,

One Response to “mysql utf8 and latin1 string comparison in legacy databases”

  1. Sam Frans Vloeberghs Says:

    Interesting post! :) Bookmarked it!

    I’ve never bezen into a situation where I could not at least set names to UTF8, but I have no doubt I ever will..

    Thx!