2

I am trying to create a regex for this task, but I really can't grasp the understanding of regex apart from very simple cases :-( :

The problem: I have this ("SQL like") query:

SELECT tcmcs003.*, tccom130.nama, tccom705.dsca, tcmcs052.dsca, tccom100.nama
FROM tcmcs003, tccom130,tccom705,tcmcs052,tccom100
WHERE  tcmcs003.cadr REFERS TO tccom130
 AND tcmcs003.casi REFERS TO tccom705
 AND tcmcs003.cprj REFERS TO tcmcs052
 AND tcmcs003.bpid REFERS TO tccom100
ORDER BY tcmcs003._index1

I want to "extract" all the table names and column names, and after that I want to simply add my characters to them... For example replace:

 SELECT tcmcs003.*, tccom130.nama

with:

 SELECT tcmcs003XXX.*, tccom130XXX.namaYYY

Up to now I have the "best" regex I have is this:

(?<gselect>SELECT\s+)*(?<tname>\w{5}\d{3})*(?<spaces>[\.\,\s])+(?<colname>\w{4})*

And replacement pattern:

${gselect}${tname}XXX${spaces}${colname}YYY

The output is really terrible :-(

SELECT tcmcs003.
 m130
.nama
 m705
.dsca
 s052
.dsca
 m100
.nama

FROM
 s003
 m130
,m705
,s052
,m100

WHER
 s003
.cadr
 REFE

 m130

 s003

How can I write the regex?
I want to capture repeteately something like

[(any string)(table name)(\.a dot or not)(column name)(any string) ] (repeat N times)

EDIT

  1. I am writing in C#

  2. The pattern should be a bit more general that: \b(tc(?:mcs|com)\d{3}XXX.\w+)\b

in the sense that table name is 5 characters (the first is always a t, followed by 4 random chars) followed by 3 random digits

table column is 4 random chars

4
  • Something like this and this ? Commented Oct 2, 2015 at 9:18
  • internet a bit blocked... Commented Oct 2, 2015 at 10:11
  • @Mariano.. perfect! please respond so I can upvote ;-) Commented Oct 2, 2015 at 10:12
  • ehm.. a couple of problems, please see my edit Commented Oct 2, 2015 at 10:15

1 Answer 1

1

Instead of trying to match the whole command, I'll simply match each table or column independently. Since tables have digits in its name, there's few chances it could match something else.

  1. Match column names with:

    \b(t\w{4}\d{3}\.\w{4})\b
    
  2. Match table names with:

    \b(t\w{4}\d{3})\b
    

Then, we can replace each with the desired value: "$1YYY" and "$1XXX" respectively. The patterns use these constructs:

  • \b Matches a word boundary (a word char on one side and not a word char on the other).
  • \w{4} Matches 4 word chars ([A-Za-z0-9_]).
  • \d{3} Matches 3 digits ([0-9]).

Code:

string input = @"SELECT tcmcs003.*, tccom130.nama, tccom705.dsca, tcmcs052.dsca, tccom100.nama  
FROM tcmcs003, tccom130,tccom705,tcmcs052,tccom100
WHERE  tcmcs003.cadr REFERS TO tccom130 
AND tcmcs003.casi REFERS TO tccom705  
AND tcmcs003.cprj REFERS TO tcmcs052 
AND tcmcs003.bpid REFERS TO tccom100
ORDER BY tcmcs003._index1";

string Pattern1 = @"\b(t\w{4}\d{3}\.\w{4})\b";
string Pattern2 = @"\b(t\w{4}\d{3})\b";
Regex r1 = new Regex(Pattern1);
Regex r2 = new Regex(Pattern2);
string replacement1 = "YYY";
string replacement2 = "XXX";

string result = "";

result = r1.Replace(input, "$1" + replacement1);
result = r2.Replace(result, "$1" + replacement2);

Console.WriteLine(result);

ideone Demo

Sign up to request clarification or add additional context in comments.

1 Comment

Note: I used \w for the random chars. It matches [A-Za-z0-9_]. Feel free to change it accordingly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.