0

Hi I have a file with a lot of bad data lines. I've identified the lines with bad data. The file is very big that it cant be done manually. The problem may reoccur in future so I'm writing a small tool in java to remove the bad segments based on a input regex and remove it.

An example of Bad data is

ABC*HIK*UG*XY\17

I'm trying to write a regex for the above string. So far

Only "(^ABC)" works and ABC is removed.

When I use this nothing happens.

"(^ABC*.XY\17$)"

Please give your inputs.

EDITED:

The answer is working perfect but

If my input files contains this

ABC
123
ABC*HIK*UG*XY\17
1025
KHJ*YU*789

I should get output like

ABC
123
1025
KHJ*YU*789

but I'm getting like this

ABC
123

1025
KHJ*YU*789
5
  • Do you have access to Perl or Python? I'd use those over Java... Commented Jan 15, 2015 at 17:30
  • 2
    Can you provide a better explanation of 'bad data', and what you're trying to get from it? Commented Jan 15, 2015 at 17:30
  • I'm just trying to remove them. They are not needed for me. Commented Jan 15, 2015 at 17:33
  • I practically work with java. A java code would be fine. Commented Jan 15, 2015 at 17:34
  • Your edit is a different question that requires your java code to solve. I suggest that you ask it as a separate question (and remove your edit) because your original is already answered. Commented Jan 15, 2015 at 18:01

1 Answer 1

1

Change your pattern to,

"^ABC.*XY\\\\17$"

In java, you need to escape the backslash three more times in-order to match a single \ character. And the pattern to match any character zero or more times must be like .* not *. And also you don't need to put your pattern inside a capturing group.

String s = "ABC\n" + 
        "123\n" + 
        "ABC*HIK*UG*XY\\17\n" + 
        "1025\n" + 
        "KHJ*YU*789";
System.out.println(s.replaceAll("(?m)^ABC.*XY\\\\17\n?", ""));

Output:

ABC
123
1025
KHJ*YU*789

Since we are using anchors in our regex, we need to add the modifier. In our case, we need to add multi-line modifier (?m)

Sign up to request clarification or add additional context in comments.

11 Comments

Perfect!! Works like magic. Can you tell me how to remove the empty line??
to match an empty line use this regex ^$ . For the above case, i think you need this "^ABC.*XY\\\\17\n". This matches also the newline character following the matched characters.
use \s* if there contain any spaces. "^\\s*ABC.*XY\\\\17$" . Still you have any problem, then please add the exact input to your question along with the expected output.
@OracleNerd i already told you to add a newline character at the last.
It isnt working. If i add a new line char, The line isnt replaced at all
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.