Mysql – How to optimize this MySQL query that involves two left joins

left-joinmysqloptimizationpolymorphic-associations

I cannot figure out why my query slows down. What it boils down to are four tables: team, player, equipment, and metadata. Records in player and equipment have a FK to team, making team a parent of player and equipment. And all three of those tables' rows each have a record in metadata which stores things like creation date, creator user id, etc.

What I would like to retrieve all at once are any player and equipment records that belong to a particular team, in order of creation date. I start from the metadata table and left join the player and equipment tables via the metadata_id FK, but when I try to filter the SELECT to only retrieve records for a certain team, the query slow down big time when there are lots of rows.

Here is the query:

SELECT metadata.creation_date, player.id, equipment.id
FROM
  metadata
  JOIN datatype       ON datatype.id           = metadata.datatype_id
  LEFT JOIN player    ON player.metadata_id    = metadata.id
  LEFT JOIN equipment ON equipment.metadata_id = metadata.id
WHERE
  datatype.name IN ('player', 'equipment')
  AND (player.team_id = 1 OR equipment.team_id = 1)
ORDER BY metadata.creation_date;

You'll need to add a lot of rows to really see the slow down, around 10,000 for each table. What I don't understand is why it is really quick if I only filter in the where clause on one table, for example: "… AND player.team_id = 1" But when I add the other to make it "… AND (player.team_id = 1 OR equipment.team_id = 1)" it takes much, much longer.

Here are the tables and datatypes. Note that one thing that seems to help a lot, but not all that much, is the combined keys on player and equipment for metadata_id and team_id.

CREATE TABLE `metadata` (
  `id` INT(4) unsigned NOT NULL auto_increment,
  `creation_date` DATETIME NOT NULL,
  `datatype_id` INT(4) unsigned NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB;
CREATE TABLE `datatype` (
  `id` INT(4) unsigned NOT NULL auto_increment,
  `name` VARCHAR(255) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB;
CREATE TABLE `team` (
  `id` INT(4) unsigned NOT NULL auto_increment,
  `metadata_id` INT(4) unsigned NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB;
CREATE TABLE `player` (
  `id` INT(4) unsigned NOT NULL auto_increment,
  `metadata_id` INT(4) unsigned NOT NULL,
  `team_id` INT(4) unsigned NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB;
CREATE TABLE `equipment` (
  `id` INT(4) unsigned NOT NULL auto_increment,
  `metadata_id` INT(4) unsigned NOT NULL,
  `team_id` INT(4) unsigned NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB;
ALTER TABLE  `metadata` ADD INDEX (  `datatype_id` ),
  ADD INDEX ( `creation_date` );
ALTER TABLE  `team`      ADD INDEX (  `metadata_id` );
ALTER TABLE  `player`    ADD INDEX `metadata_id` (  `metadata_id`,  `team_id` ),
  ADD INDEX ( `team_id` );
ALTER TABLE  `equipment` ADD INDEX `metadata_id` (  `metadata_id`,  `team_id` ),
  ADD INDEX ( `team_id` );
ALTER TABLE `metadata`  ADD CONSTRAINT `metadata_ibfk_1`  FOREIGN KEY (`datatype_id`) REFERENCES `datatype` (`id`);
ALTER TABLE `team`      ADD CONSTRAINT `team_ibfk_1`      FOREIGN KEY (`metadata_id`) REFERENCES `metadata` (`id`);
ALTER TABLE `player`    ADD CONSTRAINT `player_ibfk_1`    FOREIGN KEY (`metadata_id`) REFERENCES `metadata` (`id`);
ALTER TABLE `player`    ADD CONSTRAINT `player_ibfk_2`    FOREIGN KEY (`team_id`)     REFERENCES `team` (`id`);
ALTER TABLE `equipment` ADD CONSTRAINT `equipment_ibfk_1` FOREIGN KEY (`metadata_id`) REFERENCES `metadata` (`id`);
ALTER TABLE `equipment` ADD CONSTRAINT `equipment_ibfk_2` FOREIGN KEY (`team_id`)     REFERENCES `team` (`id`);
INSERT INTO `datatype` VALUES(1,'team'),(2,'player'),(3,'equipment');

Please note that I realize I could easily speed this up by doing a UNION of two SELECTS on player and equipment for a given team id, but the ORM I'm using does not natively support UNION's and so I'd much rather try and see if I can optimize this query instead. Also I'm just plain curious.

Best Solution

In MySQL it's hard to optimize "OR" conditions.

One common remedy is to split the query into two simpler queries and use UNION to combine them.

 (SELECT metadata.creation_date, datatype.name, player.id
  FROM metadata
    JOIN datatype ON datatype.id = metadata.datatype_id
    JOIN player ON player.metadata_id = metadata.id
  WHERE datatype.name = 'player' AND player.team_id = 1)
 UNION ALL
 (SELECT metadata.creation_date, datatype.name, equipment.id
  FROM metadata
    JOIN datatype ON datatype.id = metadata.datatype_id
    JOIN equipment ON equipment.metadata_id = metadata.id
  WHERE datatype.name = 'equipment' AND equipment.team_id = 1)
 ORDER BY creation_date;

You have to use the parentheses so that the ORDER BY applies to the result of the UNION instead of only to the result of the second SELECT.


update: What you're doing is called Polymorphic Associations, and it's hard to use in SQL. I even call it an SQL antipattern, despite some ORM frameworks that encourage its use.

What you really have in this case is a relationship between teams and players, and between teams and equipment. Players are not equipment and equipment are not players; they don't have a common supertype. It's misleading in both an OO sense and a relational sense that you have modeled them that way.

I'd say dump your metadata and datatype tables. These are anti-relational structures. Instead, use the team_id (which I assume is a foreign key to a teams table). Treat players and equipment as distinct types. Fetch them separately if you can't use UNION in your ORM. Then combine the result sets in your application.

You don't have to fetch everything in a single SQL query.

Related Question