mysql regex utf

I am trying to get data from MySQL database via REGEX with or without special utf-8 characters.

Let me explain on example :

If user enters word like sirena it should return rows which include words like sirena , siréna , šíreňá .. and so on.. also it should work backwards when he enters siréná it should return the same results..

I am trying to search it via REGEX , my query looks like this :

SELECT * FROM `content` WHERE `text` REGEXP '[sšŠ][iíÍ][rŕŔřŘ][eéÉěĚ][nňŇ][AaáÁäÄ0]'

It works only when in database is word sirena but not when there is word siréňa ..

Is it because something with UTF-8 and MySQL? (collation of mysql column is utf8_general_ci )

Thank you!


MySQL's regular expression library does not support utf-8.

See Bug #30241 Regular expression problems, which has been open since 2007. They will have to change the regular expression library they use before that can be fixed, and I haven't found any announcement of when or if they will do this.

The only workaround I've seen is to search for specific HEX strings:

mysql> SELECT * FROM `content` WHERE HEX(`text`) REGEXP 'C3A9C588';
+----------+
| text     |
+----------+
| siréňa   |
+----------+

Re your comment:

No, I don't know of any solution with MySQL.

You might have to switch to PostgreSQL, because that RDBMS supports u codes for UTF characters in their regular expression syntax.


Try something like ... REGEXP '(a|b|[ab])'

SELECT * FROM `content` WHERE `text` REGEXP '(s|š|Š|[sšŠ])(i|í|Í|[iíÍ])(r|ŕ|Ŕ|ř|Ř|[rŕŔřŘ])(e|é|É|ě|Ě|[eéÉěĚ])(n|ň|Ň|[nňŇ])(A|a|á|Á|ä|Ä|0|[AaáÁäÄ0])'

It works for me!


Use the lib_mysqludf_preg library from the mysql UDF repository for PCRE regular expressions directly in mysql

Although MySQL's regular expression library does not support utf-8 the mysql UDF repository has the ability to use utf-8 compatible regex according PCRE regular expressions directly in mysql.

http://www.mysqludf.org/ https://github.com/mysqludf/lib_mysqludf_preg#readme

链接地址: http://www.djcxy.com/p/17026.html

上一篇: 将一个数组列表复制到另一个列表的最快方法

下一篇: MySQL的正则表达式的UTF