OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
Help optimising a query

From: Al (nobroweircom.net)
Date: Thu Jan 08 2004 - 18:59:19 CST


Hi all. Im looking for some help to speed up some queries I have to run. Ill try to briefly describe the setup, but mostly try let my code explain itself.

There are three tables described below from a database that describes the execution of a Java program. class_loads records details of each class. methods records details of each method. Each method is associated with a record from class_loads via the class_id field. Finally method_executions contains info about method executions by mapping method_ids (which can be used to join with methods table) to event_numbers which represent time.

mysql> describe class_loads;
+--------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+-------+
| event_number | bigint(20) unsigned | | PRI | 0 | |
| class_id | int(10) unsigned | | PRI | 0 | |
| class_name | varchar(255) | | MUL | | |
| source_name | varchar(255) | | | | |
+--------------+---------------------+------+-----+---------+-------+
5 rows in set (0.08 sec)

mysql> describe methods;
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| method_id | int(10) unsigned | | PRI | 0 | |
| class_id | int(10) unsigned | | MUL | 0 | |
| method_name | varchar(255) | | | | |
| method_signature | varchar(255) | | | | |
+------------------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> describe method_executions;
+--------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+-------+
| event_number | bigint(20) unsigned | | PRI | 0 | |
| method_id | int(10) unsigned | | | 0 | |
+--------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

The objective is to compute what I would call an execution frequency matrix (time intervals as rows, methods as columns), i.e. for some given event_number/time interval [x, y), I need to determine how often each method with a name conforming to a specified regex is executed within that interval.

Heres what I have done. Being a novice Im sure its painfully ineffecient. The bottleneck is indicated below. To give an idea of the table sizes, in a fairly small example we have 199, 2423 and 2434194 rows in class_loads, methods and method_executions respectively.

CREATE TEMPORARY TABLE temp_class_loads (
 class_id INT NOT NULL,
 class_name VARCHAR(255) NOT NULL,
 source_name VARCHAR(255) NOT NULL)
 
CREATE TEMPORARY TABLE temp_methods (
 method_id INT NOT NULL,
 source_name VARCHAR(255) NOT NULL,
 class_name VARCHAR(255) NOT NULL,
 method_name VARCHAR(255) NOT NULL,
 method_signature VARCHAR(255) NOT NULL)

LOCK TABLES class_loads READ, methods READ, method_entries READ

INSERT INTO temp_class_loads (
 SELECT class_id, class_name, source_name
 FROM class_loads
 WHERE class_name LIKE "com.%")
 
INSERT INTO temp_methods (
 SELECT method_id, source_name, class_name, method_name, method_signature
  FROM temp_class_loads, methods
  WHERE temp_class_loads.class_id = methods.class_id
  GROUP BY method_id)

{do the following statements for various [x, y) event_number ranges - The third one is the bottleneck!}

 CREATE TEMPORARY TABLE interval_method_freqs (
  method_id INT NOT NULL,
  count INT NOT NULL)
  
 INSERT INTO interval_method_freqs (
  SELECT method_id, COUNT(*)
   FROM method_executions
   WHERE method_executions.event_number BETWEEN x AND y
   GROUP BY method_id)
 
 SELECT SUM(count)
  FROM temp_methods LEFT JOIN interval_method_freqs USING(method_id)
  GROUP BY source_name, class_name, method_name, method_signature
  
 DROP TABLE interval_method_freqs
 
{loop}

UNLOCK TABLES

DROP TABLE temp_methods

DROP TABLE temp_class_loads

If you got this far thanks for making the effort! I look forward to reading suggested improvements.

Regards,
A.