PHP使用MySQL的最佳排序规则是什么?

我想知道是否有一个“最好”的选择整理MySQL的一般网站,你不是100%确定会输入什么? 我知道所有的编码都应该是相同的,例如MySQL,Apache,HTML和PHP内部的任何编码。

在过去,我已经设置PHP以“UTF-8”输出,但是哪种整理在MySQL中匹配? 我认为这是UTF-8之一,但我之前使用过utf8_unicode_ciutf8_general_ciutf8_bin


主要区别在于排序准确性(在比较语言中的字符时)和性能。 唯一特别的是用于比较二进制格式的字符的utf8_bin。

utf8_general_ciutf8_unicode_ci稍快,但不太准确(用于排序)。 特定语言的utf8编码(例如utf8_swedish_ci )包含额外的语言规则,这使得它们对于这些语言的排序最为准确。 大多数情况下,我使用utf8_unicode_ci (我更喜欢精确性来改善性能),除非我有充分的理由选择特定的语言。

您可以阅读MySQL手册中特定Unicode字符集的更多内容 - http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html


非常非常了解使用utf8_general_ci时可能发生的这个问题。

如果使用utf8_general_ci排序规则,MySQL将不会区分select语句中的某些字符。 这可能会导致非常令人讨厌的错误 - 尤其是在涉及用户名的情况下。 根据使用数据库表的实施情况,此问题可能允许恶意用户创建与管理员帐户相匹配的用户名。

这个问题至少在早期的5.x版本中公开 - 我不确定这种行为是否稍后改变。

我不是DBA,但为了避免这个问题,我总是用utf8-bin而不是不区分大小写的。

下面的脚本通过示例来描述问题。

-- first, create a sandbox to play in
CREATE DATABASE `sandbox`;
use `sandbox`;

-- next, make sure that your client connection is of the same 
-- character/collate type as the one we're going to test next:
charset utf8 collate utf8_general_ci

-- now, create the table and fill it with values
CREATE TABLE `test` (`key` VARCHAR(16), `value` VARCHAR(16) )
    CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO `test` VALUES ('Key ONE', 'value'), ('Key TWO', 'valúe');

-- (verify)
SELECT * FROM `test`;

-- now, expose the problem/bug:
SELECT * FROM test WHERE `value` = 'value';

--
-- Note that we get BOTH keys here! MySQLs UTF8 collates that are 
-- case insensitive (ending with _ci) do not distinguish between 
-- both values!
--
-- collate 'utf8_bin' doesn't have this problem, as I'll show next:
--

-- first, reset the client connection charset/collate type
charset utf8 collate utf8_bin

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Note that we get just one key now, as you'd expect.
--
-- This problem appears to be specific to utf8. Next, I'll try to 
-- do the same with the 'latin1' charset:
--

-- first, reset the client connection charset/collate type
charset latin1 collate latin1_general_ci

-- next, convert the values that we've previously inserted
-- in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_general_ci;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Again, only one key is returned (expected). This shows 
-- that the problem with utf8/utf8_generic_ci isn't present 
-- in latin1/latin1_general_ci
--
-- To complete the example, I'll check with the binary collate
-- of latin1 as well:

-- first, reset the client connection charset/collate type
charset latin1 collate latin1_bin

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_bin;

-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Again, only one key is returned (expected).
--
-- Finally, I'll re-introduce the problem in the exact same 
-- way (for any sceptics out there):

-- first, reset the client connection charset/collate type
charset utf8 collate utf8_generic_ci

-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

-- now, re-check for the problem/bug
SELECT * FROM test WHERE `value` = 'value';

--
-- Two keys.
--

DROP DATABASE sandbox;

实际上,你可能想要使用utf8_unicode_ciutf8_general_ci

  • utf8_general_ci通过去除所有的重音和排序,就好像它是ASCII一样
  • utf8_unicode_ci使用Unicode排序顺序,因此它在更多语言中正确排序
  • 但是,如果您只是使用它来存储英文文本,这些不应该有所不同。

    链接地址: http://www.djcxy.com/p/4263.html

    上一篇: What is the best collation to use for MySQL with PHP?

    下一篇: Which MySQL data type to use for storing boolean values