Stripping MySQL queries of foreign accents

2018-06-24 05:00:11

I must admit that I am ignorant of php, and that my current script was inherited...

It queries a MySQL database with a city name and returns all instances it finds of that city .

I had a couple of problems: the first to do with hyphens (eg Stratford-upon-Avon); that has been solved with the addition of

$searchq = str_replace( '-', ' ', $searchq );

which allows me to enter the data in the database without hyphens.

My remaining problem has to do with foreign accent (in particular: acute, grave, circumflex, cedille, tilde). I tried a million functions, many that I found on this site and don't manage to get it to work.

the main php code of my current page is this

$searchq = filter_var("%{$_POST['keyword']}%", FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH); // Sanitize the string

$searchq = str_replace( '-', ' ', $searchq );

$sql = "SELECT Image, Chain, Country, City, Top as '', Medium as '', Low as '' FROM Chains WHERE Country LIKE ? OR City LIKE ?"; // Your query string

$prepare = $mysqli->prepare($sql); // Prepare your query string
$prepare->bind_param('ss', $searchq, $searchq); // Bind the placeholders to your search variables
// s = string | i = integer | d = double | b = blob
$prepare->execute(); // Execute the prepared statement
$prepare->store_result(); // Store the results for later checking

I have avoided coming to this forum as I understand that it is meant for advanced developers, and I am not one of them...

All the above code does is to DELETE the accented letter, rather than replacing it with the same letter without the accent

EDIT

How do I get Ollie Jones' attention again?

I am stuck, not knowing how to handle the script part

Another EDIT When I enter this in the table SQL

ALTER TABLE Chains CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8_general_ci;

I get this warning and nothing gets done...

#1253 - COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'utf8mb4'

MySQL's character set and collation features are designed to handle this sort of thing correctly without the need for extra search columns.

For example, observe this little query:

select _utf8'résumé' COLLATE utf8_general_ci = _utf8'resume'

or, using the more modern utf8mb4 character set,

select _utf8mb4'résumé' COLLATE utf8mb4_general_ci = _utf8mb4'resume'

Both these queries find that résumé and resume are equal. It works for almost every European-language accented character.

These queries contains two character string constants explicitly created as unicode strings. It then compares them using the case_insensitive collation. In that collation, the upper and lower case forms of e-acute and e-grave are all considered the same.

How do you get this to work with your database?

make sure the character set of your place-name columns ( City , Country ) are set to utf8 , or better yet, the more robust and modern utf8mb4 .

make sure the default collation for those tables is the case-insensitive collation for the character set you choose.

Just do your queries. You don't need anything special. For example, WHERE City = 'Sèvres' and WHERE City = 'sevres' will yield identical results. This is perfect for users accustomed to Google-type search.

Before altering your table make a backup copy in case you screw something up.

CREATE TABLE chains_backup SELECT * FROM chains

Then use this sort of command to change the columns in your table.

  alter table chains
       modify City  varchar(255)
                    character set utf8mb4
                    collate utf8mb4_general_ci

In place of varchar(255) you need to use the actual data type of the column. You didn't tell us what that is, so I'm guessing.

The default collation you choose for each colum is baked in to the indexes. So not only will your diacritic-insensitive searches be accurate, they'll be fast.

Notice that Spanish-language ñ is an odd case. With the general collation ñ and n are equal. But in Spanish lexicography, ñ is a different letter. So if you want an alphabetizing of Spanish place names you need the utf8_spanish_ci or utf8mb4_spanish_ci collation.

The good news for the code shown in your question is this: you don't need that just_clean function at all when you use the case-insensitive collation.

You may want to use WHERE City LIKE 'stratford%' rather than WHERE City = 'stratford' to search -- this will allow your queries to match the first few characters of a search term. The LIKE construct will match Stratford-upon-Avon as well as Stratfordshire .

链接地址: http://www.djcxy.com/p/67824.html

上一篇: 保存特殊字母（ü，č，ž..）到mysql数据库

下一篇: 剥离外国口音的MySQL查询