Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email firstname.lastname@example.org
From: Gary Richardson (gary.richardsongmail.com)
Date: Thu Apr 03 2008 - 15:55:45 CDT
I've built several datamarts using perl and MySQL. The largest ones
have been up to about 30GB, so I'm not quite on your scale.
for #1, I have an etl_id in the fact table so I can track back any
particular ETL job. I typically make it a dimension and include date,
time, software version, etc. That doesn't help so much if you're
messing up your dimension tables, but I haven't typically run into
that problem based on the designs I've used.
For #2, I haven't built anything big enough for it to be a concern yet..
Also, LOAD DATA INFILE is your friend :)
On Thu, Apr 3, 2008 at 11:28 AM, Dre <abyfieldgmail.com> wrote:
> Hey folks,
> I'm currently deciding whether to build a decent sized (around 300-500GB,
> although honestly, I've got little to base that on at the moment) data
> warehouse in postgreSQL or MySQL. I've developed several in MS SQL and
> postgreSQL, but the client is comfortable with MySQL, and I'd prefer to use
> that as the platform since it will be less painful for them to manage when
> I'm gone. I'm hoping that someone with experience building a warehouse on
> MySQL will be able to answer two outstanding questions I have:
> 1) Several sources seem to suggest MyISAM is a good choice for data
> warehousing, but due to my lack of experience in a transaction-less world,
> this makes me a little nervous. How do you handle data inconsistency
> problems when ETL jobs fail? (For the record, I don't use a separate tool
> for the ETL; I usually use perl/shell scripts to interact with the file
> system, and pl/pgsql or transact-sql once the data is loaded into the
> staging database. For each file that is loaded, I'll identify steps that
> must be posted together, and wrap them in a transaction in the ETL job.) I
> can see doing something like manually cleaning out the necessary tables
> before you re-run, but that seems a bit messy to me. Anyone figure out a
> better approach?
> 2) Isn't the lack of bitmap indexes a problem in the warehouse? Most FKs in
> the fact tables will be low cardinality columns; queries that didn't use
> date would be very slow on large fact tables (MS SQL had this problem). Has
> anyone run into this with MySQL?
> Many thanks in advance!
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql