0

My question is a little simillar to Extract specific words from text field in mysql, but now the same.

I have a text field with words inside. In my language word can have many different endings. I need to find this endings.

I use fulltext search of mysql, but I would need to have access to the index database where all the field is "cut" to words and words are counted. I could then search for "test*" and I could quickly find "test", "tested", "testing". I need the list of all endigns that exist in my database, that is my primary goal.

As it is I can get the records with specific "test*" words in it, but I need not only to locate the occurence in the field, but to group somehow so I get the list of all the words that for example start with "test". I don't need location in which record they are, just a list, grouped so that "testing" is not written 10 times but only once (maybe a counter of how many times it is found but not necessary).

Is there a way to extract this info from fulltextsearch field or should I explode all this fields to words and make a index table full of words and just do a "like "word%" and group by the different results? I am not sure how to do that either in practice, but just to point me to the right direction please.

So to summarize: I have a text fied and I need to find out which words are inside that start with "test", like "tested", "test", "testing" etc... It doesn't make sense in English but in my language it does as we have same word on different endigns and there are so many of them, somethimes 20, I need to find out which ones are there so I can make a synonims table ;-)

UPDATE:

Database has columns ID (int), ingredients (text) and recipe (text).

Data in ingredients are cooking ingredients with different endings like:

1 egg 2 eggs

etc.

1
  • can you provide some sort of details atleast the db structure or query used? Commented Apr 19, 2011 at 7:12

1 Answer 1

1

You can dump all words that are present in an index. And that would also show frequency of each word. E.g. test is used 200 times and testing is used 300 times.

Manual for that: http://dev.mysql.com/doc/refman/5.0/en/myisam-ftdump.html

Sign up to request clarification or add additional context in comments.

4 Comments

That seems like a great idea, but can you dump this index table to something else than text file, I would need to have it as a database table so I can search words on it, and reading the documentation I can only find dump to text :-(
You will have to dump this to text and then import that text file via "LOAD DATA INFILE" : dev.mysql.com/doc/refman/5.1/en/load-data.html
So no direct mysql way without using "command line" I guess... Thanx... I see some problem because I have UTF8 with our special cahracters and in the exported text I get mungled 2-byte character instead of č or š
That is expected since UTF-8 will use 2 bytes or more for pretty all characters other than English. This will work fine when loaded in DB. Or if you view it in an editor that lets you specify charset. To load this data, so specify utf-8 in LOAD DATA. That is done via "CHARACTER SET" option.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.