Mysql – “SELECT COUNT(*)” is slow, even with where clause

mysqloptimizationperformance

I'm trying to figure out how to optimize a very slow query in MySQL (I didn't design this):

SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391';
+----------+
| COUNT(*) |
+----------+
|  3224022 |
+----------+
1 row in set (1 min 0.16 sec)

Comparing that to a full count:

select count(*) from change_event;
+----------+
| count(*) |
+----------+
|  6069102 |
+----------+
1 row in set (4.21 sec)

The explain statement doesn't help me here:

 explain SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: me
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 4120213
        Extra: Using where; Using index
1 row in set (0.00 sec)

OK, it still thinks it needs roughly 4 million entries to count, but I could count lines in a file faster than that! I don't understand why MySQL is taking this long.

Here's the table definition:

CREATE TABLE `change_event` (
  `change_event_id` bigint(20) NOT NULL default '0',
  `timestamp` datetime NOT NULL,
  `change_type` enum('create','update','delete','noop') default NULL,
  `changed_object_type` enum('Brand','Broadcast','Episode','OnDemand') NOT NULL,
  `changed_object_id` varchar(255) default NULL,
  `changed_object_modified` datetime NOT NULL default '1000-01-01 00:00:00',
  `modified` datetime NOT NULL default '1000-01-01 00:00:00',
  `created` datetime NOT NULL default '1000-01-01 00:00:00',
  `pid` char(15) default NULL,
  `episode_pid` char(15) default NULL,
  `import_id` int(11) NOT NULL,
  `status` enum('success','failure') NOT NULL,
  `xml_diff` text,
  `node_digest` char(32) default NULL,
  PRIMARY KEY  (`change_event_id`),
  KEY `idx_change_events_changed_object_id` (`changed_object_id`),
  KEY `idx_change_events_episode_pid` (`episode_pid`),
  KEY `fk_import_id` (`import_id`),
  KEY `idx_change_event_timestamp_ce_id` (`timestamp`,`change_event_id`),
  KEY `idx_change_event_status` (`status`),
  CONSTRAINT `fk_change_event_import` FOREIGN KEY (`import_id`) REFERENCES `import` (`import_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Version:

$ mysql --version
mysql  Ver 14.12 Distrib 5.0.37, for pc-solaris2.8 (i386) using readline 5.0

Is there something obvious I'm missing? (Yes, I've already tried "SELECT COUNT(change_event_id)", but there's no performance difference).

Best Solution

InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages. In order to do a range scan you still have to scan through all of the potentially wide rows in data pages; note that this table contains a TEXT column.

Two things I would try:

  1. run optimize table. This will ensure that the data pages are physically stored in sorted order. This could conceivably speed up a range scan on a clustered primary key.
  2. create an additional non-primary index on just the change_event_id column. This will store a copy of that column in index pages which be much faster to scan. After creating it, check the explain plan to make sure it's using the new index.

(you also probably want to make the change_event_id column bigint unsigned if it's incrementing from zero)

Related Question