How can I simplify/improve the performance of this MySQL query?

I am very new to MySQL and thanks to the great support from you more experienced guys here I am managing to struggle by, while learning a lot in the process.

I have a query that does exactly what I want. However, it looks extremely messy to me and I am certain there must be a way to simplify it.

How can this query be improved and optimized for performance?

Many thanks

            $sQuery = "
        SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."

    FROM $sTable b 
    LEFT JOIN (
   SELECT COUNT(*) AS projects_count, a.songs_id

   FROM $sTable2 a
   GROUP BY a.songs_id
) bb ON bb.songs_id = b.songsID

LEFT JOIN (
   SELECT AVG(rating) AS rating, COUNT(rating) AS ratings_count, c.songid

FROM $sTable3 c

   GROUP BY c.songid   
) bbb ON bbb.songid = b.songsID

LEFT JOIN (
   SELECT c.songid, c.userid,

    CASE WHEN EXISTS 
   ( 
       SELECT songid 
       FROM $sTable3
       WHERE songid = c.songid 
   ) Then 'User Voted'
   else
   (
       'Not Voted'
   )
   end
   AS voted
FROM $sTable3 c
WHERE c.userid = $userid


   GROUP BY c.songid   
) bbbb ON bbbb.songid = b.songsID

EDIT: Here is a description of what the query is doing:-

I have three tables:

  • $sTable = a table of songs (songid, mp3link, artwork, useruploadid etc.)

  • $sTable2 = a table of projects with songs linked to them (projectid, songid, project name etc.)

  • $sTable3 = a table of song ratings (songid, userid, rating)

  • All of this data is output to a JSON array and displayed in a table in my application to provide a list of songs, combined with the projects and ratings data.

    The query itself does the following in this order:-

  • Collects all rows from $sTable
  • Joins to $sTable2 on songsID and counts the number of rows (projects) in this table which have the same songsID
  • Joins to $stable3 on songsID and works out an average of the column 'rating' in this table which have the same songsID
  • At this point it also counts the total number of rows in $sTable3 which have the same songID to provide a total number of votes.
  • Finally it performs a check on all these rows to see if the $userid (which is a variable containing the ID of the logged in user) matches the 'userid' stores in $sTable3 for each row in order to check whether a user has already voted on a given songID or not. If it matches then it returns "User Voted" if not it returns "Not Voted". It outputs this as a seperate column into my JSON array which I then check against clientside in my app and add a class to.
  • If there is any more detail anyone needs, please just let me know. Thanks all.

    EDIT:

    Thanks to Aurimis' excellent first attempt I am closing in on a much more simple solution.

    This is the code I have tried based on that suggestion.

    SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."
    
        FROM 
          (SELECT 
            $sTable.songsID, COUNT(rating) AS ratings_count, 
            AVG(rating) AS ratings
          FROM $sTable 
            LEFT JOIN $sTable2 ON $sTable.songsID = $sTable2.songs_id
            LEFT JOIN $sTable3 ON $sTable.songsID = $sTable3.songid
          GROUP BY $sTable.songsID) AS A
        LEFT JOIN $sTable3 AS B ON A.songsID = B.songid AND B.userid = $userid
    

    There are several problems however. I had to remove the first line of your answer as it caused a 500 internal server error:

    IF(B.userid = NULL, "Not voted", "User Voted") AS voted 
    

    Obviously now the 'voted check' functionality is lost.

    Also and more importantly it is not returning all the columns defined in my array, only the songsID. My JSON returns Unknown column 'song_name' in 'field list' - If I remov it from my $aColumns array it will of course move on to the next one.

    I am defining my columns at the beginning of my script as this array is used for filtering and putting together the output for the JSON encode. This is the definition of $aColumns:-

    $aColumns = array( 'songsID', 'song_name', 'artist_band_name', 'author', 'song_artwork', 'song_file', 'genre', 'song_description', 'uploaded_time', 'emotion', 'tempo', 'user', 'happiness', 'instruments', 'similar_artists', 'play_count', 'projects_count',  'rating', 'ratings_count', 'voted');
    

    In order to quickly test the rest of the query I modified the first line within the subquery to select $sTable.* rather than $sTable.songsID (remember $sTable is the songs table)

    Then... The query obviously worked, but with terrible performance of course. But only returned 24 songs out of the 5000 song test dataset. Therefore I changed your first 'JOIN' to a 'LEFT JOIN' so that all 5000 songs were returned. To clarify the query needs to return ALL of the rows in the songs table but with various extra bits of data from the projects and ratings tables for each song.

    So... We are getting there and I am certain that this is a much better approach it just needs some modification. Thanks for your help so far Aurimis.


    SELECT SQL_CALC_FOUND_ROWS
        songsID, song_name, artist_band_name, author, song_artwork, song_file,
        genre, song_description, uploaded_time, emotion, tempo,
        `user`, happiness, instruments, similar_artists, play_count,
        projects_count,
        rating, ratings_count,
        IF(user_ratings_count, 'User Voted', 'Not Voted') as voted
    FROM (
        SELECT
            sp.songsID, projects_count,
            AVG(rating) as rating,
            COUNT(rating) AS ratings_count,
            COUNT(IF(userid=$userid, 1, NULL)) as user_ratings_count
        FROM (
            SELECT songsID, COUNT(*) as projects_count
            FROM $sTable s
            LEFT JOIN $sTable2 p ON s.songsID = p.songs_id
            GROUP BY songsID) as sp
        LEFT JOIN $sTable3 r ON sp.songsID = r.songid
        GROUP BY sp.songsID) as spr
    JOIN $sTable s USING (songsID);
    

    You will need the following indexes:

  • (songs_id) on $sTable2
  • the composite (songid, rating, userid) on $sTable3
  • the ideas behind the query:

  • subqueries operate with INTs so that the result of the subquery would easily fit in memory
  • left joins are grouped separately to reduce the cartesian product
  • user votes are counted in the same subquery as other ratings to avoid expensive correlated subquery
  • all othe information is retrieved ib the final join

  • Let me try based on your description, not the query. I'll just use Songs to indicate Table1 , Projects to indicate Table2 and Ratings to indicate Table3 - for clarity.

    SELECT 
      /* [column list again] */,
      IF(B.userid = NULL, "Not voted", "Voted") as voted 
    FROM 
      (SELECT 
        Songs.SongID, count(rating) as total_votes, 
        avg(rating) as average_rating /*[,.. other columns as you need them] */
      FROM Songs 
        JOIN Projects ON Songs.SongID = Projects.SongID
        LEFT JOIN Ratings ON Songs.SongID = Ratings.SongID
      GROUP BY Songs.SongID) as A
    LEFT JOIN Ratings as B ON A.SongID = B.SongID AND B.userid = ? /* your user id */
    

    As you see, you can get all the information on songs in one, relatively simple query (just using Group by and count() / avg() functions). To get the information whether a song was rated by a particular user requires a subquery - where you can do a LEFT JOIN, and if the userid is empty - you know he has not voted.

    Now, I did not go through your query in depth, as it really looks complicated. Could be that I missed something - if that's the case, please update the description and I can try again :)

    链接地址: http://www.djcxy.com/p/56466.html

    上一篇: 具有多个数据库连接

    下一篇: 我怎样才能简化/改善这个MySQL查询的性能?