mysql - extract specific words from text field using full text search

Question

My question is a little simillar to Extract specific words from text field in mysql, but now the same.

I have a text field with words inside. In my language word can have many different endings. I need to find this endings.

I use fulltext search of mysql, but I would need to have access to the index database where all the field is "cut" to words and words are counted. I could then search for "test*" and I could quickly find "test", "tested", "testing". I need the list of all endigns that exist in my database, that is my primary goal.

As it is I can get the records with specific "test*" words in it, but I need not only to locate the occurence in the field, but to group somehow so I get the list of all the words that for example start with "test". I don't need location in which record they are, just a list, grouped so that "testing" is not written 10 times but only once (maybe a counter of how many times it is found but not necessary).

Is there a way to extract this info from fulltextsearch field or should I explode all this fields to words and make a index table full of words and just do a "like "word%" and group by the different results? I am not sure how to do that either in practice, but just to point me to the right direction please.

So to summarize: I have a text fied and I need to find out which words are inside that start with "test", like "tested", "test", "testing" etc... It doesn't make sense in English but in my language it does as we have same word on different endigns and there are so many of them, somethimes 20, I need to find out which ones are there so I can make a synonims table ;-)

UPDATE:

Database has columns ID (int), ingredients (text) and recipe (text).

Data in ingredients are cooking ingredients with different endings like:

1 egg 2 eggs

etc.

can you provide some sort of details atleast the db structure or query used? — Harish
– Harish, Commented Apr 19, 2011 at 7:12

Shamit Verma · Accepted Answer · 2011-04-19 09:30:57Z

1

You can dump all words that are present in an index. And that would also show frequency of each word. E.g. test is used 200 times and testing is used 300 times.

Manual for that: http://dev.mysql.com/doc/refman/5.0/en/myisam-ftdump.html

answered Apr 19, 2011 at 9:30

Shamit Verma

3,82725 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jerry2 Over a year ago

That seems like a great idea, but can you dump this index table to something else than text file, I would need to have it as a database table so I can search words on it, and reading the documentation I can only find dump to text :-(

Shamit Verma Over a year ago

You will have to dump this to text and then import that text file via "LOAD DATA INFILE" : dev.mysql.com/doc/refman/5.1/en/load-data.html

Jerry2 Over a year ago

So no direct mysql way without using "command line" I guess... Thanx... I see some problem because I have UTF8 with our special cahracters and in the exported text I get mungled 2-byte character instead of č or š

Shamit Verma Over a year ago

That is expected since UTF-8 will use 2 bytes or more for pretty all characters other than English. This will work fine when loaded in DB. Or if you view it in an editor that lets you specify charset. To load this data, so specify utf-8 in LOAD DATA. That is done via "CHARACTER SET" option.

Collectives™ on Stack Overflow

mysql - extract specific words from text field using full text search

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related