Stripping MySQL queries of foreign accents
I must admit that I am ignorant of php, and that my current script was inherited...
It queries a MySQL database with a city
name and returns all instances it finds of that city
.
I had a couple of problems: the first to do with hyphens (eg Stratford-upon-Avon); that has been solved with the addition of
$searchq = str_replace( '-', ' ', $searchq );
which allows me to enter the data in the database without hyphens.
My remaining problem has to do with foreign accent (in particular: acute, grave, circumflex, cedille, tilde). I tried a million functions, many that I found on this site and don't manage to get it to work.
the main php code of my current page is this
$searchq = filter_var("%{$_POST['keyword']}%", FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH); // Sanitize the string
$searchq = str_replace( '-', ' ', $searchq );
$sql = "SELECT Image, Chain, Country, City, Top as '', Medium as '', Low as '' FROM Chains WHERE Country LIKE ? OR City LIKE ?"; // Your query string
$prepare = $mysqli->prepare($sql); // Prepare your query string
$prepare->bind_param('ss', $searchq, $searchq); // Bind the placeholders to your search variables
// s = string | i = integer | d = double | b = blob
$prepare->execute(); // Execute the prepared statement
$prepare->store_result(); // Store the results for later checking
I have avoided coming to this forum as I understand that it is meant for advanced developers, and I am not one of them...
All the above code does is to DELETE the accented letter, rather than replacing it with the same letter without the accent
EDIT
How do I get Ollie Jones' attention again?
I am stuck, not knowing how to handle the script part
Another EDIT When I enter this in the table SQL
ALTER TABLE Chains CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8_general_ci;
I get this warning and nothing gets done...
#1253 - COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'utf8mb4'
MySQL's character set and collation features are designed to handle this sort of thing correctly without the need for extra search columns.
For example, observe this little query:
select _utf8'résumé' COLLATE utf8_general_ci = _utf8'resume'
or, using the more modern utf8mb4 character set,
select _utf8mb4'résumé' COLLATE utf8mb4_general_ci = _utf8mb4'resume'
Both these queries find that résumé
and resume
are equal. It works for almost every European-language accented character.
These queries contains two character string constants explicitly created as unicode strings. It then compares them using the case_insensitive collation. In that collation, the upper and lower case forms of e-acute and e-grave are all considered the same.
How do you get this to work with your database?
make sure the character set of your place-name columns ( City
, Country
) are set to utf8
, or better yet, the more robust and modern utf8mb4
.
make sure the default collation for those tables is the case-insensitive collation for the character set you choose.
Just do your queries. You don't need anything special. For example, WHERE City = 'Sèvres'
and WHERE City = 'sevres'
will yield identical results. This is perfect for users accustomed to Google-type search.
Before altering your table make a backup copy in case you screw something up.
CREATE TABLE chains_backup SELECT * FROM chains
Then use this sort of command to change the columns in your table.
alter table chains
modify City varchar(255)
character set utf8mb4
collate utf8mb4_general_ci
In place of varchar(255)
you need to use the actual data type of the column. You didn't tell us what that is, so I'm guessing.
The default collation you choose for each colum is baked in to the indexes. So not only will your diacritic-insensitive searches be accurate, they'll be fast.
Notice that Spanish-language ñ
is an odd case. With the general collation ñ
and n
are equal. But in Spanish lexicography, ñ
is a different letter. So if you want an alphabetizing of Spanish place names you need the utf8_spanish_ci
or utf8mb4_spanish_ci
collation.
The good news for the code shown in your question is this: you don't need that just_clean
function at all when you use the case-insensitive collation.
You may want to use WHERE City LIKE 'stratford%'
rather than WHERE City = 'stratford'
to search -- this will allow your queries to match the first few characters of a search term. The LIKE
construct will match Stratford-upon-Avon
as well as Stratfordshire
.
下一篇: 剥离外国口音的MySQL查询