PHP使用MySQL的最佳排序规则是什么?
我想知道是否有一个“最好”的选择整理MySQL的一般网站,你不是100%确定会输入什么? 我知道所有的编码都应该是相同的,例如MySQL,Apache,HTML和PHP内部的任何编码。
在过去,我已经设置PHP以“UTF-8”输出,但是哪种整理在MySQL中匹配? 我认为这是UTF-8之一,但我之前使用过utf8_unicode_ci
, utf8_general_ci
和utf8_bin
。
主要区别在于排序准确性(在比较语言中的字符时)和性能。 唯一特别的是用于比较二进制格式的字符的utf8_bin。
utf8_general_ci
比utf8_unicode_ci
稍快,但不太准确(用于排序)。 特定语言的utf8编码(例如utf8_swedish_ci
)包含额外的语言规则,这使得它们对于这些语言的排序最为准确。 大多数情况下,我使用utf8_unicode_ci
(我更喜欢精确性来改善性能),除非我有充分的理由选择特定的语言。
您可以阅读MySQL手册中特定Unicode字符集的更多内容 - http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
非常非常了解使用utf8_general_ci
时可能发生的这个问题。
如果使用utf8_general_ci
排序规则,MySQL将不会区分select语句中的某些字符。 这可能会导致非常令人讨厌的错误 - 尤其是在涉及用户名的情况下。 根据使用数据库表的实施情况,此问题可能允许恶意用户创建与管理员帐户相匹配的用户名。
这个问题至少在早期的5.x版本中公开 - 我不确定这种行为是否稍后改变。
我不是DBA,但为了避免这个问题,我总是用utf8-bin
而不是不区分大小写的。
下面的脚本通过示例来描述问题。
-- first, create a sandbox to play in
CREATE DATABASE `sandbox`;
use `sandbox`;
-- next, make sure that your client connection is of the same
-- character/collate type as the one we're going to test next:
charset utf8 collate utf8_general_ci
-- now, create the table and fill it with values
CREATE TABLE `test` (`key` VARCHAR(16), `value` VARCHAR(16) )
CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO `test` VALUES ('Key ONE', 'value'), ('Key TWO', 'valúe');
-- (verify)
SELECT * FROM `test`;
-- now, expose the problem/bug:
SELECT * FROM test WHERE `value` = 'value';
--
-- Note that we get BOTH keys here! MySQLs UTF8 collates that are
-- case insensitive (ending with _ci) do not distinguish between
-- both values!
--
-- collate 'utf8_bin' doesn't have this problem, as I'll show next:
--
-- first, reset the client connection charset/collate type
charset utf8 collate utf8_bin
-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';
--
-- Note that we get just one key now, as you'd expect.
--
-- This problem appears to be specific to utf8. Next, I'll try to
-- do the same with the 'latin1' charset:
--
-- first, reset the client connection charset/collate type
charset latin1 collate latin1_general_ci
-- next, convert the values that we've previously inserted
-- in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_general_ci;
-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';
--
-- Again, only one key is returned (expected). This shows
-- that the problem with utf8/utf8_generic_ci isn't present
-- in latin1/latin1_general_ci
--
-- To complete the example, I'll check with the binary collate
-- of latin1 as well:
-- first, reset the client connection charset/collate type
charset latin1 collate latin1_bin
-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_bin;
-- now, re-check for the bug
SELECT * FROM test WHERE `value` = 'value';
--
-- Again, only one key is returned (expected).
--
-- Finally, I'll re-introduce the problem in the exact same
-- way (for any sceptics out there):
-- first, reset the client connection charset/collate type
charset utf8 collate utf8_generic_ci
-- next, convert the values that we've previously inserted in the table
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
-- now, re-check for the problem/bug
SELECT * FROM test WHERE `value` = 'value';
--
-- Two keys.
--
DROP DATABASE sandbox;
实际上,你可能想要使用utf8_unicode_ci
或utf8_general_ci
。
utf8_general_ci
通过去除所有的重音和排序,就好像它是ASCII一样 utf8_unicode_ci
使用Unicode排序顺序,因此它在更多语言中正确排序 但是,如果您只是使用它来存储英文文本,这些不应该有所不同。
链接地址: http://www.djcxy.com/p/4263.html上一篇: What is the best collation to use for MySQL with PHP?
下一篇: Which MySQL data type to use for storing boolean values