1

In my database I have a field wich contains a html document. Now there must be a possibility to search in this document. However, the html tags may not be found. So when I have something like this:

<html>
  <head>
    <title>Bar</title>
  </head>
  <body>
   <p>
     this content my be found
   </p>
  </body>
</html>

It is possible that the document stored in the database is not xhtml. Can you tell me what the best way is to search in the content? Shall i use regular expressions? And of so, how would it look like? ANd if not, what should I use else?

2 Answers 2

2

You could try turning on Full-Text Search or use something like Lucene.Net to index the content for you.

Sign up to request clarification or add additional context in comments.

Comments

2

What volume of records are there? I expect you might have to use full-text search and an IFilter to do this efficiently. Html does not lend itself well to regex - it can quickly be very hard to do something very simple.

If the volume isn't huge, can you iterate over the records with an external parsing application, using something like the HTML Agility Pack (for .NET) - or any other DOM of your choice.

But the FTS/IFilter would be my first choice.

2 Comments

The search has to be done in 5 tables. Each table has a few 100 records. How do I use the FTS and IFilter?
Looks to be under the "Management" node in Management Studio.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.