1

I have an SQL string that looks something like this:

SELECT
    USER."ID", USER."NAME", USER."BIRTH",USER."GENDER",
    PACKAGE."type"
    PACKAGE."code"
FROM
    "DBNAME"."USER" USER,
    "DBNAME2"."PACKAGE" PACKAGE
WHERE
    USER."PACKAGE_ID" = PACKAGE."ID"
ORDER BY
    USER."NAME";

How should I write my regular expression in C# to extract all the column names between the SELECT and FROM keywords, and then the table names in the FROM clause?

The expected output should find these so that I can put them into List to loop through:

ColumnsList:

USER."ID"
USER."NAME"
USER."BIRTH"
USER."GENDER"
PACKAGE."type"
PACKAGE."code"

TablesList:

"DBNAME"."USER" USER
"DBNAME2"."PACKAGE" PACKAGE
4
  • Can you show an example of desired output? Commented Aug 14, 2014 at 2:17
  • 1
    Using a lexer is easier than regex for parsing SQL. Commented Aug 14, 2014 at 2:32
  • regex is the most (mis)used tool for this type of things. you need a parser. @hjpotter92's lexer may not be sufficient. checkout this impl here at codeproject. this may be a bit more for your current needs, but will scale well with future demands. queries will grow complex and you may need more than column and table names. Commented Aug 14, 2014 at 2:37
  • @inquisitive unfortunately, the implementation at codeproject doesn't support the select clause at this time. :( Commented Aug 14, 2014 at 7:40

1 Answer 1

5

Use this Regex will get the column and table name:

  (?is)SELECT(.*?)(?<!\w*")FROM(?!\w*?")(.*?)(?=WHERE|ORDER|$)
  • Group[1] : column
  • Group[2] : table name

Code Samples:

string sql=@"SELECT
    USER.""ID"", USER.""NAME"", USER.""BIRTH"",USER.""GENDER"",
    PACKAGE.""type""
    PACKAGE.""code""
FROM
    ""DBNAME"".""USER"" USER,
    ""DBNAME2"".""PACKAGE"" PACKAGE
WHERE
    USER.""PACKAGE_ID"" = PACKAGE.""ID""
ORDER BY
    USER.""NAME"";";

    var reg=new Regex(@"(?is)SELECT(.*?)(?<!\w*"")FROM(?!\w*?"")(.*?)(?=WHERE|ORDER|$)");
    var colunms=reg.Match(sql).Groups[1].Value.Split(new char[]{','},StringSplitOptions.RemoveEmptyEntries);
    var tables=reg.Match(sql).Groups[2].Value.Split(new char[]{','},StringSplitOptions.RemoveEmptyEntries);
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! This one worked pretty well. Except that I realise for SQLs that have column names with the word "FROM" in it will cause it extract wrongly. Say Select USER."FROM_COUNTRY"..., this will cause the regex to break.
Thanks for the updates. But it seems like it still doesn't work. I have a query that is like select USER."TRAVEL_FROM_COUNTRY"... the regex will stop right after TRAVEL_ due to the FROM keyword.
@Carven test USER."TRAVEL_FROM_COUNTRY" with my regex, it works well. you can list your test sql ,then I can debug my regex code.
ohh.. I realised what happened. In a few of the SQL files I have, they may not have the quotes around them. So, they look like USER.TRAVEL_FROM_COUNTRY or USER.FROM_COUNTRY. This will cause the problem. Those with the quotes are okay, but not those without the quotes. How can I resolve this?
@Carven You should list all your ruels when you ask question. here is the last update: (?is)SELECT(.*?)(?<!\w*["_])FROM(?!\w*?["_])(.*?)(?=WHERE|ORDER|$)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.