PHP使用MySQL的最佳排序规则是什么？

2018-05-30 11:04:36

我想知道是否有一个“最好”的选择整理MySQL的一般网站，你不是100％确定会输入什么？我知道所有的编码都应该是相同的，例如MySQL，Apache，HTML和PHP内部的任何编码。

在过去，我已经设置PHP以“UTF-8”输出，但是哪种整理在MySQL中匹配？我认为这是UTF-8之一，但我之前使用过utf8_unicode_ci ， utf8_general_ci和utf8_bin 。

主要区别在于排序准确性（在比较语言中的字符时）和性能。唯一特别的是用于比较二进制格式的字符的utf8_bin。

utf8_general_ci比utf8_unicode_ci稍快，但不太准确（用于排序）。特定语言的utf8编码（例如utf8_swedish_ci ）包含额外的语言规则，这使得它们对于这些语言的排序最为准确。大多数情况下，我使用utf8_unicode_ci （我更喜欢精确性来改善性能），除非我有充分的理由选择特定的语言。

您可以阅读MySQL手册中特定Unicode字符集的更多内容 - http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

非常非常了解使用utf8_general_ci时可能发生的这个问题。

如果使用utf8_general_ci排序规则，MySQL将不会区分select语句中的某些字符。这可能会导致非常令人讨厌的错误 - 尤其是在涉及用户名的情况下。根据使用数据库表的实施情况，此问题可能允许恶意用户创建与管理员帐户相匹配的用户名。

这个问题至少在早期的5.x版本中公开 - 我不确定这种行为是否稍后改变。

我不是DBA，但为了避免这个问题，我总是用utf8-bin而不是不区分大小写的。

下面的脚本通过示例来描述问题。

-- first, create a sandbox to play in
CREATE DATABASE `sandbox`;
use `sandbox`;

-- next, make sure that your client connection is of the same 
-- character/collate type as the one we're going to test next:
charset utf8 collate utf8_general_ci

-- now, create the table and fill it with values
CREATE TABLE `test` (`key` VARCHAR(16), `value` VARCHAR(16) )
    CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO `test` VALUES ('Key ONE', 'value'), ('Key TWO', 'valúe');

-- (verify)
SELECT * FROM `test`;

-- now, expose the problem/bug:
SELECT * FROM test WHERE `value` = 'value';

--
-- Note that we get BOTH keys here! MySQLs UTF8 collates that are 
-- case insensitive (ending with _ci) do not distinguish between 
-- both values!
--
-- collate 'utf8_bin' doesn't have this problem, as I'll show next:
--

-- first, reset the client connection charset/collate type
charset utf8 collate utf8_bin

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Note that we get just one key now, as you'd expect.
--
-- This problem appears to be specific to utf8. Next, I'll try to 
-- do the same with the 'latin1' charset:
--

-- first, reset the client connection charset/collate type
charset latin1 collate latin1_general_ci

-- next, convert the values that we've previously inserted
-- in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_general_ci;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Again, only one key is returned (expected). This shows 
-- that the problem with utf8/utf8_generic_ci isn't present 
-- in latin1/latin1_general_ci
--
-- To complete the example, I'll check with the binary collate
-- of latin1 as well:

-- first, reset the client connection charset/collate type
charset latin1 collate latin1_bin

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_bin;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Again, only one key is returned (expected).
--
-- Finally, I'll re-introduce the problem in the exact same 
-- way (for any sceptics out there):

-- first, reset the client connection charset/collate type
charset utf8 collate utf8_generic_ci

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

-- now, re-check for the problem/bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Two keys.
--

DROP DATABASE sandbox;

实际上，你可能想要使用utf8_unicode_ci或utf8_general_ci 。

utf8_general_ci通过去除所有的重音和排序，就好像它是ASCII一样

utf8_unicode_ci使用Unicode排序顺序，因此它在更多语言中正确排序

但是，如果您只是使用它来存储英文文本，这些不应该有所不同。

链接地址: http://www.djcxy.com/p/4263.html

上一篇: What is the best collation to use for MySQL with PHP?

下一篇: Which MySQL data type to use for storing boolean values