Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email email@example.com
From: Vikram A (vikkiatbiplyahoo.in)
Date: Wed Jun 16 2010 - 01:58:12 CDT
In this case how the images of a book will be stored, a chapter may contain number of images with different size.
Or It deals only text?
From: Jerry Schwartz <jerrygii.co.jp>
To: Andy <listanandgmail.com>; mysqllists.mysql.com
Sent: Fri, 11 June, 2010 9:05:26 PM
Subject: RE: MySQL For Huge Collections
>From: Andy [mailto:listanandgmail.com]
>Sent: Friday, June 11, 2010 8:09 AM
>Subject: Re: MySQL For Huge Collections
>Thanks much for your replies.
>OK, so I realized that I may not have explained the problem clearly enough.
>I will try to do it now.
>I am a researcher in computational linguistics, and I am trying to research
>language usage and writing styles across different genres of books over the
>years. The system I am developing is not just to serve up e-book content
>(that will happen later possibly) but to help me analyze at micro-level the
>different constituent elements of a book ( say at chapter level or paragraph
>level). As part of this work, I need to break-up, store and repeatedly run
>queries across multiple e-books. Here are several additional sample queries:
>* give me books that use the word "ABC"
>* give me the first 10 pages of e-book "XYZ"
>* give me chapter 1 of all e-books
[JS] You pose an interesting challenge. Normally, my choice is to store "big
things" as normal files and maintain the index (with accompanying descriptive
information) in the database. You've probably seen systems like this, where
you assign "tags" to pictures. That would certainly handle the second two
cases (with some ancillary programming, of course).
Your first example is a bigger challenge. MySQL can do full text searches, but
from what I've read they can get painfully slow. I never encountered that
problem, but my databases are rather small (~100000 rows). For this technique,
you would want to store all of your text in LONGTEXT columns.
I've also read that there are plug-ins that do the same thing, only faster.
I'm not sure how you would define a "page" of an e-book, and I suspect you
would also deal with individual paragraphs or lines. My suggestion for that
would be to have a "book" table, with such things as the title and author and
perhaps ISBN; a "page" table identifying which paragraphs are on which page
(for a given book); a "paragraph" table identifying which lines are in which
paragraph; and then a "lines" table that contains the actual text of each
[book1, title, ...] <-> [book1, para1] <-> [para1, line1, linetext]
[book2, title, ...] [book1, para2] [para1, line2, linetext]
[book3, title, ...] [book1, para3] [para1, line3, linetext]
... [book1, para4] [para1, line4, linetext]
... [para1, line5, linetext]
This would let you have a full text index on the titles, and another on the
linetext, with a number of ways to limit your searches. Because the linetext
field would be relatively short, the search should be relatively fast even
though there might be a relatively large number of records returned if you
wanted to search entire books.
NOTE: Small test cases might yield surprising results because of the way full
text searches determine relevancy! This has bitten me more than once.
This was fun, I hope my suggestions make sense.
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032
860.674.8796 / FAX: 860.674.8341
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=vikkiatbiplyahoo.in