PHP MySQL for permutations on MySQL Table
I have a mysql table with 7 columns, on which with each row contains integer values.
I have a simple site which receives values from the user and I have to try to see if the values sent by the user match or are similiar to any of the rows in the table.
So the user writes eg 1 2 3 4 5 6 7
as input.
I have to find out if any of the rows in my table are similar to it without order. So 1 2 3 4 5 6 7 = 7 6 5 4 3 2 1
and so on. The table my contain more than 40,000 rows of data.
I also have to see if they share at least 5
, 6
or 7
digits in common.
This means using permutations to find all possible combinations. However what is the best approach for such a problem?
Take the input from the user and get all permutations and match against first row, second row, etc and report if found? Alternatively, do the reverse, get a row from the table and get all permutations and do the match against the user input?
What about memory and CPUusage when going through such a big table with so many permutations?
Thanks for any tips on this! Souciance
In a full normalized schema this is a single having query
Let's suppose your table with pk as:
create table T1
( pk char (1), a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int);
insert into T1 values
('a',1,2,3,4,5,6,7),
('b',2,3,4,5,6,7,8),
('z',10,11,12,13,14,15,16);
At this time, we can normalize data as:
select
pk,
case a
when 1 then a1
when 2 then a2
when 3 then a3
when 4 then a4
when 5 then a5
when 6 then a6
when 7 then a7
end
as v
from T1
cross join
(select 1 as a from dual union all
select 2 as a from dual union all
select 3 as a from dual union all
select 4 as a from dual union all
select 5 as a from dual union all
select 6 as a from dual union all
select 7 as a from dual ) T2
In the previous query, it is easy to match your requirements with a single having:
select pk
from
(
select
pk,
case a
when 1 then a1
when 2 then a2
when 3 then a3
when 4 then a4
when 5 then a5
when 6 then a6
when 7 then a7
end
as v
from T1
cross join
(select 1 as a from dual union all
select 2 as a from dual union all
select 3 as a from dual union all
select 4 as a from dual union all
select 5 as a from dual union all
select 6 as a from dual union all
select 7 as a from dual ) T2
) T
where
T.v in ( 4,5,6,7,8,9,10)
group by pk
having <-- The Having
count( pk ) > 4
Results:
| PK |
------
| b |
a light method might be to add an additional field in your database, which is a numerically ordered version of all 7 fields combined.
eg. if the data in the database was 2 4 7 6 5 1 3 , the combination field would be 1234567
Then when comparing, sort the users response numerically and compare against the combination field in the database.
Depending on what you are doing, you could write your query like this
select * from table where combination like '12%' or combination like '123%'
If you know what the minimum number of matching numbers needs to be , that would lighten up the query
To find out how similar what they wrote vs what is in the database. You could use the levenshtein PHP function: http://php.net/manual/en/function.levenshtein.php
$result = levenshtein($input,$combination);
I'm afraid you cannot build query on problem like this really efficiently.
You may build WHERE
clause like:
(`1` IN ARRAY(1,2,3,4,5,6,7)
AND `2` IN ARRAY(1,2,3,4,5,6,7)
AND `3` IN ARRAY(1,2,3,4,5,6,7)
AND `4` IN ARRAY(1,2,3,4,5,6,7)
AND `5` IN ARRAY(1,2,3,4,5,6,7))
OR
(`1` IN ARRAY(1,2,3,4,5,6,7)
AND `2` IN ARRAY(1,2,3,4,5,6,7)
AND `3` IN ARRAY(1,2,3,4,5,6,7)
AND `4` IN ARRAY(1,2,3,4,5,6,7)
AND `6` IN ARRAY(1,2,3,4,5,6,7))
-- Each combination
But that would be hell of a condition. On the other hand you may try using combination of:
IN()
IF()
HAVING
First of check if column 1
contains info:
IF( `1` IN ARRAY(1,2,3,4,5,6,7), 1, 0)
Then sum all those data:
SELECT (
IF( `1` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
IF( `2` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
IF( `3` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
IF( `4` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
IF( `5` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
IF( `6` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
IF( `7` IN ARRAY(1,2,3,4,5,6,7), 1, 0)
) AS `matches_cnt`
FROM t1
HAVING `matches_cnt` >= 5
This will iterate trough all rows and condition is quite complex (thus bed performance).
You may also try replacing values by binary string, for example:
1,2,7 = 01000011
And then calculate Hamming distance between checked record and database, but this will only decrease complexity of condition, but need to iterate trough all records will remain the same.
Implementation in mysql using:
XOR
BIT_COUNT
Will replace first part by:
SELECT (
$MAX_NUMBER$ - BIT_COUNT( XOR( `binary_representation`, $DATA_FROM_USER$))
) AS `matches_cnt`
链接地址: http://www.djcxy.com/p/66474.html
上一篇: 将二维数组转换为另一个二维数组Java