2

I am using the twitter API to fill my database with tweets. I now want to be able to search for words in the tweets. I have realized that this is not as simple as using LIKE because LIKE will do the following:

If searching for 'pan' like would return tweets that have words that contain that string such as 'companion'.

I have decided I need to use regular expressions to solve this. Basically I want to be able to find only full words (not if they are contained in other words like the above) in the tweets I am searching. So how would I be able to do this?

The following is what I have so far:

SELECT tweet_id, text FROM tweets WHERE text REGEXP ''

I am just unsure about what the regular expression should contain.

2 Answers 2

9

You can use [[:<:]] and [[:>:]] as word boundaries that match the beginning and end of words, respectively: http://dev.mysql.com/doc/refman/5.1/en/regexp.html

SELECT tweet_id, text FROM tweets WHERE text REGEXP '[[:<:]]pan[[:>:]]'
Sign up to request clarification or add additional context in comments.

2 Comments

I wasn't actually aware of the word boundaries regex within MySQL - that does work more nicely than my own answer.
Thank you! And for the source also.
0

If you go to http://gskinner.com/RegExr/ you can enter in some test data and play around with your regex strings until you get one that works just right.

As a starting point, you probably want something like '.* pan .*'

. = a wildcard character .* = matches 1 or more of the wildcard character

This will work for most examples of your search but you will need to play with regexr to get one that works for all possibilities

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.