|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: mos (mos99
fastmail.fm)
Date: Sun Jul 12 2009 - 23:13:54 CDT
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 08:06 PM 7/12/2009, Morten wrote:
>Mike, you're right - sorry.
>
>I've been reading High Performance MySQL today and got some great tips
>from that which will help a lot.
Yes it is a good book. I hope you have the 2nd edition.
> I think the fundamental challenge
>now, is that the table contains a lot of timestamps, and querying
>against these involves multiple range queries which makes indexing hard.
If you can get rid of the DateTime and switch to just Date it may speed up
the indexes.
>The "actions" table has the following columns (of relevance to the
>example):
>
> status_id
> assignee_id
> company_id
> created_at
> assigned_at
> opened_at
> updated_at
> verified_at
> due_at
> solved_at
> closed_at
>
>Queries could be:
>
> "Show all actions which are assigned to Tom, were created in
>October and solved in November"
> "Show all open actions which were opened before August, do not have
>an assignee and were verified last week"
>
>These queries which involve easily indexable fields (status_id,
>assignee_id, company_id) and multiple conditions on different ranges
>are what's difficult. The table is about 2.500.000 records and grows
>at a daily rate of about 50.000 records (that number is growing
>though). Once an action has been closed, it gets status "closed" and
>is no longer of interest. 70% of the records in the table will be
>status "closed".
So why not have 2 tables: "Cases_Active" for the cases currently open, and
"Cases_Closed".
This will greatly reduce the number of rows in the table and the depth of
the index tree. Your application will select the appropriate table if the
question mentions "Active" or "Closed" cases. You can also define a Merge
table "Cases_All" that will logically combine both tables so you could do a
query on both closed and active cases if you need to, or just do a Union on
the two tables via 2 separate queries.
You didn't mention how many queries per second you need to handle. A couple
hundred queries per second can be handled by MyISAM because it has
excellent query cache that can be tweaked. You can also load the indexes
into memory if needed.
The main problem as you mentioned is trying to index all of the date
fields. I don't think this is necessary. I noticed in 5.1 MySQL (because it
was rushed out the door) does not always use the correct index and I often
have to force it to use the proper index with "Force Index". So you need to
use the Explain on your slow queries (make sure you log them) and find out
which index they are using.
You can use a Memory table but they don't work with Merge tables. I would
only consider this if the table gets updates every second or two and that
flushes the cache. I'm not sure how much money you're willing to throw at
this project, but I know of some additional hardware that can squeeze out
more speed.
>I think what I'm looking for now, is some way to encode the different
>date values into a single column which can be indexed and the value of
>which gets calculated and updated by a background job. This will cost
>some precision, but I hope that can be done. Otherwise I'm back to
>considering alternative index/query-mechanisms.
>
>Does my problem make a little more sense now? Thanks.
Yup! :-)
Mike
>Morten
>
>
>
>Let's say I would like to see all actions that were created in october
>and solved in november.
>
>
>On Jul 12, 2009, at 3:54 PM, mos wrote:
>
>>Morten,
>> Perhaps you could also add how many rows are in the table, how
>>many rows are added each day, what are the column types, and what do
>>the search queries look like?
>>
>>Mike
>>
>>At 11:39 AM 7/12/2009, Morten wrote:
>>
>>>Hi,
>>>
>>>I'm working on a table that has about 12 columns against which
>>>arbitrary queries must perform really well. Currently there are a lot
>>>of indexes on the table, but I'm hitting some problems - and adding
>>>more indexes seems a slippery slope (there are ~15 multi-column
>>>indexes, I'd like that reduced).
>>>
>>>So I'm looking for a way out and I'm currently considering:
>>>
>>>* Building a memory table on top of the existing table
>>>* Sphinx indexing and then throw the queries against Sphinx instead
>>>* Using a different "in-memory-DB" like Tokyo Cabinet for the queries
>>>* Building a series of "reporting tables" which each handle a subset
>>>of the supported queries
>>>
>>>All of the solutions would maintain the current table for consistency
>>>and it's acceptable with a couple of minutes lag.
>>>
>>>I'm tempted to go for the memory table and update that depending on
>>>which rows have been updated in the parent table since last update.
>>>Eliminating duplicates could be a challenge, unless I build a new
>>>table for each update and then "rename" the tables - but that's
>>>costly
>>>in terms of memory.
>>>
>>>What do people usually do in this situation? Any other solutions to
>>>consider?
>>>
>>>Thanks,
>>>
>>>Morten
>>>
>>>
>>>
>>>--
>>>MySQL General Mailing List
>>>For list archives: http://lists.mysql.com/mysql
>>>To unsubscribe: http://lists.mysql.com/mysql?unsub=mos99
fastmail.fm
>>
>>
>>--
>>MySQL General Mailing List
>>For list archives: http://lists.mysql.com/mysql
>>To unsubscribe: http://lists.mysql.com/mysql?unsub=my.lists
mac.com
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=mos99
fastmail.fm
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]