Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email email@example.com
From: Kevin Hunter (huntekeearlham.edu)
Date: Fri May 16 2008 - 04:00:29 CDT
At 2:49a -0400 on Fri, 16 May 2008, Ananda Kumar wrote:
> it goes without saying
Eh, not to those who don't know. Hence the list and question. :-D
Krishna, the short answer is that it depends on your data, and the
queries against it that you run. Test/benchmark on your own DB and data
to see what solves your issue.
> 1. primary key and unique key would do unique scan which is
> fastest of all scan
First, let's get terminology correct: keys, indexes.
Terminology: keys are nothing more than data. Keys are used to identify
rows for manipulation, either directly by a SQL statement/request, or
indirectly through the DB SQL parser. A key is not necessarily unique.
SELECT name FROM person WHERE height = 70;
is a request for the names of people who are 5'10" tall. The
identifying key in this statement then, is height. All people who are
70 inches tall are "keyed" for selection.
A special case of a key is a primary key. This tells the DB that the
data in this column(s) uniquely identifies each row. Assuming that 'id'
was created as a PRIMARY KEY,
SELECT name FROM person WHERE id = 7;
will return exactly 1 or 0 rows because id is guaranteed to be unique by
the DB. Further, a primary key is also implicitly guaranteed to be not
A unique key is a bit of a misnomer, and I'm sorry the MySQL
documentation perpetuates it. It is more accurate to think of the
column as being constrained such that every row in that column(s) is
unique. Thus, unique is a /property/ of a column, not it's defining
Terminology: indexes are an /implementation/ of the DB, usually for
speed and to help maintain data integrity. When you designate a column
(or columns) as the primary key for the rows of a table, MySQL will
automatically create an unique index on that column. This way, for
example, when you insert data into the table, it won't have to scan the
entire table to make sure that the new data is unique. It can just walk
a couple of levels in the index to ensure uniqueness.
For speed however, it depends on what the DB is asked, and how the index
was defined. MySQL is somewhat limited in that you are pretty much
limited to the index type the DB has: b-tree index. But that's
implementation specific. Oracle, for example, has other index types,
(hash, bitmap). Modulo the internal specifics of the MySQL
implementation, the rough idea of a unique not null b-tree index (e.g. a
primary key column) is that the DB ostensibly need only walk O(lg(n))
steps to find a key. A hash index, then, would presumably be the
fastest at O(1).
> 2 Index would do range scan, which would be comparitivly slower.
Not exactly. This, again, depends on implementation and the data
against which queries are run. An index is an index. Remember, a
primary key is implemented by the MySQL engine as a unique not null
b-tree index. In fact, trying to get low-cardinality data from a table
via indexes could even be faster than the primary key. Consider a table
with 1,000,001 rows of favorite colors of people. Let's say 1 million
people like the color blue, but only 1 person who likes the color puce.
An index for that single person would be a huge win. Depending on the
DB (read: implementation) one could make this particular style of skewed
data even faster. Postgres, for example, has partial-indexes.
As I said at the beginning of this email, I suggest the OP do some
personal investigation and tests, because understanding indexes /and
your own data/ is not as simple as a checklist.
P.S. I highly suggest you Google some of the concepts about which I
wrote. Wikipedia is fairly good start. Also, don't be afraid to read
other DB's docs. I've learned a great deal from comparing different
DBs, documentation sources, and interacting with multiple communities.
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql