OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
(Q) FullText (UTF8)

From: Little, Timothy (TLittleThomasGlobal.com)
Date: Thu Nov 20 2008 - 15:30:03 CST


We are using MySQL 5.0.22 on CENTOS/redhat linux. The table and database character-sets are all utf8.

We have a database supporting numerous languages. Of course, full-text works beautifully with most of the languages.

But Chinese and Japanese are giving us problems, and there is NO reason why it should be a problem since we are taking measures to help the database see word-breaks.

When we insert the Chinese and Japanese passages, they have spaces (normal ASCII $14-#32) between each word (verified). So basically if you have two words like {APPLE}{DRUM} then we put {APPLE} then space then {DRUM}. If you have UTF-8 then you can look at this sample, 三坐标测量机 固定架

When we try to match either {APPLE} or {DRUM} individually (or technically 三坐标测量机 or 固定架 ) then MySQL fails to find a match against anything. But clearly it should find those.

MySQL is only finding matches for Japanese and Chinese on exact full-string matches, which is clearly less than ideal.

I have already changed the ft min length setting to 1, to no avail.

What is going wrong, and how do I fix this?

Here is my sample query (selecting for ONE word
select *
from category_attributes
where match ( value ) against ( '三坐标测量机' ) > 0

When I replace the word with固定架 then it still doesn't match anything. And there is a row with merely
三坐标测量机 space固定架

Tim...

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql