1

I need to write a SQL statement to find matching tags. The problem is, that in the database column the words are stored like this:

¶Klimawandel¶Ökosystem¶Moose¶CO2¶Stickstoffkreislauf¶

So if I want to find the word reis and in one column is the word Stickstoffkreislauf it will match, because reis is part of Stickstoffkreislauf.

Therefore I want to write a regex to match the 3 cases:

  • starts with
  • ends with
  • starts and ends with

But unfortunately I have no idea where to start with the regex because of the in the database. Does anyone have an idea how to start this? Thank you!

Here is my statement so far:

SELECT DISTINCT csia.cID, csia.ak_tags, p.cParentID, cv.cvName 
FROM CollectionSearchIndexAttributes csia 
JOIN Pages p ON csia.cID = p.cID 
JOIN CollectionVersions cv ON csia.cID = cv.cID 
WHERE cv.cvisApproved = '1' 
AND csia.ak_tags like '%reis%'

The column I'm looking for is csia.ak_tags.

¶ is unicode 00B6

5
  • which rdbms are you using? Commented Jun 30, 2014 at 17:20
  • i didn't find any differences in your 3rd and above two cases. Commented Jun 30, 2014 at 17:20
  • @Andreas we use MyISAM Commented Jun 30, 2014 at 17:23
  • @AvinashRaj yes, the last case would be sufficient for this example. But I'm thinking about what happens when in one column there is no at the beginning or ending of the data. Commented Jun 30, 2014 at 17:25
  • stackoverflow.com/questions/13287145/… may be of help Commented Jun 30, 2014 at 17:27

2 Answers 2

2

try a like clause akin to this:

csia.ak_tags like '%\\\\u00B6reis\\\\u00B6%'

untested, but feels like it may work. If you get this going, adjusting for a missing trailing ¶ or a missing leading ¶ is a non-issue.


update: http://sqlfiddle.com/#!2/a2269/33/0

select * from csia
where ak_tags like concat('%',0xC2B6,'reis',0xC2B6,'%');

comments in Unicode escape sequence in command line MySQL were helpful


update #2, turns out we were just looking for newlines:

select * from CollectionSearchIndexAttributes
where ak_tags like '%\nreis\n%';

http://sqlfiddle.com/#!2/02f3a/3/0

Sign up to request clarification or add additional context in comments.

10 Comments

can you confirm the unicode value fields are delimited with? I was guessing 00B6, but I might be wrong. Also, what collation is the column/table under?
the collation is utf8_general_ci
and I'm sorry, I'm not quite sure what you mean with can you confirm the unicode value fields are delimited with?. Can you paraphrase that? Thank you!
the symbol is being used to delimit tags in the ak_tags field. Is your \u00B6 or some other unicode char that looks the same but has a different code?
perhaps this char code tool will help: babelstone.co.uk/Unicode/whatisit.html
|
0

Have you tried using the literal character? Sql is pretty amazing and using like '%reis%¶' should do the trick. Check out this SQL Fiddle that I made for testing purposes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.