0

I've got a problem I need solved using Regex expressions; it involves taking a CSS selector and compiling a regex that matches the string representation of the nodes inside an HTML document. The point is to avoid parsing the HTML as XML and then either making Xpath or DOM queries to apply style attributes.

Does anyone know of a project that already implements something like this in any language? The target platform would be .NET 3.5.

2 Answers 2

3

Html Agility Pack

Sign up to request clarification or add additional context in comments.

1 Comment

Well if the performance isn't too bad, this would be ideal for the job. Thanks!
0

Regular expressions seem like an amazingly bad way of matching those nodes. I'm not sure I follow your problem - why not just use something like jquery to pick out those nodes? eg given a css selector 'div>span.red:first-child',

$('div>span.red:first-child')

would return an array of those matching nodes.

EDIT: Oh, wait - are you trying to do this 'offline', as it were - not in a user's browser? Yeah, ignore my advice. (Even so, I'd still suggest that regular expressions aren't going to help you. Why are you against generating an xml-document representation of the page?)

2 Comments

Seconding translating css selectors into regular expressions sounding like a bad idea.
Yeah, not my first choice nor decision either. But given the restrictions, I see no other more efficient way, apart from the HTML Agility Pack listed above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.