PHP MySQL for permutations on MySQL Table

I have a mysql table with 7 columns, on which with each row contains integer values.

I have a simple site which receives values from the user and I have to try to see if the values sent by the user match or are similiar to any of the rows in the table.

So the user writes eg 1 2 3 4 5 6 7 as input.

I have to find out if any of the rows in my table are similar to it without order. So 1 2 3 4 5 6 7 = 7 6 5 4 3 2 1 and so on. The table my contain more than 40,000 rows of data.

I also have to see if they share at least 5 , 6 or 7 digits in common.

This means using permutations to find all possible combinations. However what is the best approach for such a problem?

  • Take the input from the user and get all permutations and match against first row, second row, etc and report if found? Alternatively, do the reverse, get a row from the table and get all permutations and do the match against the user input?

  • What about memory and CPUusage when going through such a big table with so many permutations?

  • Thanks for any tips on this! Souciance


    In a full normalized schema this is a single having query

    Let's suppose your table with pk as:

    create table T1 
    ( pk char (1), a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int);
    
    insert into T1 values 
    ('a',1,2,3,4,5,6,7),
    ('b',2,3,4,5,6,7,8),
    ('z',10,11,12,13,14,15,16);
    

    At this time, we can normalize data as:

    select
       pk, 
       case a
        when 1 then a1
        when 2 then a2
        when 3 then a3
        when 4 then a4
        when 5 then a5
        when 6 then a6
        when 7 then a7
       end
       as v
    from T1   
    cross join 
       (select 1 as a from dual union all
        select 2 as a from dual union all
        select 3 as a from dual union all
        select 4 as a from dual union all
        select 5 as a from dual union all
        select 6 as a from dual union all
        select 7 as a from dual ) T2
    

    In the previous query, it is easy to match your requirements with a single having:

    select pk
    from
    (
    select
       pk, 
       case a
        when 1 then a1
        when 2 then a2
        when 3 then a3
        when 4 then a4
        when 5 then a5
        when 6 then a6
        when 7 then a7
       end
       as v
    from T1   
    cross join 
       (select 1 as a from dual union all
        select 2 as a from dual union all
        select 3 as a from dual union all
        select 4 as a from dual union all
        select 5 as a from dual union all
        select 6 as a from dual union all
        select 7 as a from dual ) T2
    ) T
    where
       T.v in ( 4,5,6,7,8,9,10)
    group by pk
    having                                           <-- The Having
       count( pk ) > 4
    

    Results:

    | PK |
    ------
    |  b |
    

    a light method might be to add an additional field in your database, which is a numerically ordered version of all 7 fields combined.

    eg. if the data in the database was 2 4 7 6 5 1 3 , the combination field would be 1234567

    Then when comparing, sort the users response numerically and compare against the combination field in the database.

    Depending on what you are doing, you could write your query like this

    select * from table where combination like '12%' or combination like '123%' 
    

    If you know what the minimum number of matching numbers needs to be , that would lighten up the query

    To find out how similar what they wrote vs what is in the database. You could use the levenshtein PHP function: http://php.net/manual/en/function.levenshtein.php

    $result = levenshtein($input,$combination);
    

    I'm afraid you cannot build query on problem like this really efficiently.

    You may build WHERE clause like:

    (`1` IN ARRAY(1,2,3,4,5,6,7) 
        AND `2` IN ARRAY(1,2,3,4,5,6,7)
        AND `3` IN ARRAY(1,2,3,4,5,6,7)
        AND `4` IN ARRAY(1,2,3,4,5,6,7)
        AND `5` IN ARRAY(1,2,3,4,5,6,7))
    OR
    (`1` IN ARRAY(1,2,3,4,5,6,7) 
        AND `2` IN ARRAY(1,2,3,4,5,6,7)
        AND `3` IN ARRAY(1,2,3,4,5,6,7)
        AND `4` IN ARRAY(1,2,3,4,5,6,7)
        AND `6` IN ARRAY(1,2,3,4,5,6,7))
    -- Each combination
    

    But that would be hell of a condition. On the other hand you may try using combination of:

  • IN()
  • IF()
  • HAVING
  • First of check if column 1 contains info:

    IF( `1` IN ARRAY(1,2,3,4,5,6,7), 1, 0)
    

    Then sum all those data:

    SELECT (
        IF( `1` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
        IF( `2` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
        IF( `3` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
        IF( `4` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
        IF( `5` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
        IF( `6` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
        IF( `7` IN ARRAY(1,2,3,4,5,6,7), 1, 0)
    ) AS `matches_cnt`
    FROM t1
    HAVING `matches_cnt` >= 5
    

    This will iterate trough all rows and condition is quite complex (thus bed performance).

    You may also try replacing values by binary string, for example:

    1,2,7 = 01000011
    

    And then calculate Hamming distance between checked record and database, but this will only decrease complexity of condition, but need to iterate trough all records will remain the same.

    Implementation in mysql using:

  • XOR
  • BIT_COUNT
  • Will replace first part by:

    SELECT (
        $MAX_NUMBER$ - BIT_COUNT( XOR( `binary_representation`, $DATA_FROM_USER$))
    ) AS `matches_cnt`
    
    链接地址: http://www.djcxy.com/p/66474.html

    上一篇: 将二维数组转换为另一个二维数组Java

    下一篇: PHP MySQL用于MySQL表中的排列