3

Consider the following queries:

select
    foo,
    bar,
    yourmom
from
    theTable

select top 10 *
from
    theTable

select distinct foo, bar + 1, yourmom from theTable

I want a regex query that would extract:

foo,
bar,
yourmom

*

foo, bar + 1, yourmom

respectively.

I tried ^\sselect\s(distinct\s)?(top\s\d*)?(?'columns'.*\s)from[\s.]*$, which I thought would work, but doesn't. I've been playing with it for a while now and I still cannot get it to work in all three test cases. Can someone help me with their regex-fu?

2 Answers 2

3

Edit: First you need to make .-match every character including newline. In java you can set the DOTALL flag, but in C# I believe you should use the RegexOptions.SingleLine option.

Then this expression should work:

^\s*select\s+(?:distinct\s+)?(?:top\s+\d*\s+)?(?'columns'.*?)from.*$
Sign up to request clarification or add additional context in comments.

2 Comments

This does not match on the test cases.
Ok. I fixed it and updated the answer. And I verified it this time!
1

I think that it would actually be easier to write a "proper" parser for SQL queries (check Irony: it's awesome and comes with a SQL example) than using regular expressions.

3 Comments

"Easier" is a relative term. The column list is fairly regular; I need nothing else, certainly not a whole new library to learn.
I think that Irony is easy to learn, and it will provide a flexibility that regular expressions will not give you in this case. Then, of course, it's your choice...
When I have the time to examine Irony, I will, but for now the easiest answer was provided by barsju.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.