1

Yet another 'negation matching'/'match everything except' issue in Java Script.

So here's what I want to do:

I have a huge text file and I want to remove everything from the file except the username/password lines. The following is a sample part from the text:

<property name="password">QWERTY</property>
....lots of similar tags......
<property name="username">Hello</property>
<property name="passive">1</property>
<property name="password">Test Password</property>
<property name="scheme">smb</property>
<property name="timeout">10000</property>
<property name="username">RANDOM USER</property>
....lots of similar tags......
<property name="username">Sid</property>

I want to remove each and every line which is not the password or the username.

I tried the following replace function to at least start off with the password but it didn't seem to work:

incomingString = incomingString.replace(/[\W\w]*?(?=<property name="password">[\W\w]*?</property).*?/g,"");

Looking back I can understand there are far too many issues with the regex so I wished to know a working regex that would help me remove all the lines in the previously mentioned text and leave me with

<property name="password">QWERTY</property>
<property name="username">Hello</property>
<property name="password">Test Password</property>
<property name="username">RANDOM USER</property>
<property name="username">Sid</property>

PS: It is important that their order in the document should be maintained

I went through a few questions on SO about this unending issue in JavaScript regex (this and lookbehinds)but the answers were very specific to that particular case.

Any help would be appreciated.

Thanks.

7
  • 1
    Parsing XML with regex is almost as bad as parsing HTML with regex. You might look at stackoverflow.com/questions/17604071/parse-xml-using-javascript Commented Sep 26, 2014 at 15:16
  • 1
    Also, rather than "replace everything that doesn't match", might it not be easier to "extract the part that does match". Commented Sep 26, 2014 at 15:19
  • @MattBurland: Thanks for the link. And I was almost sure I'd be told parsing XML with Regex is a terrible idea but I wasn't sure how else I'd have been able to do it. I'll use the link for that but I'd still like to know how JS regex deals with negations Commented Sep 26, 2014 at 15:20
  • @MattBurland: About extracting all that does match, that won't help me retain the order though, correct? I need to make sure if a username is say XYZ, his password must be ABC. That's actually the main reason behind this question. Commented Sep 26, 2014 at 15:22
  • 1
    I'd suggest not using regular expressions (though at least XML is semi-regular), which will come as no kind of surprise, I know; but given that, on first attempt, my non-regex best-effort seems to be atrociously verbose (JS Fiddle demo), honestly, why not? RegEx looks so much nicer in the anubhava's answer... Commented Sep 26, 2014 at 15:39

2 Answers 2

3

You can use this regex for String#match call:

/<property[^>]*name="(username|password)[^>]*>[^<]*</property>/gi

RegEx Demo

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks again. I remember you had helped me with negation matching last time around. I'm sorry I didn't mention, the xml could be on one line too. There are times I've seen that happen in the files. Secondly, this gives me the matches is there any way to get to the non-matched bit? For the replace function
Oh ok, is XML always using <property name="password">QWERTY</property> type tags?
Yes, every tag I need to extract will be either that with "password" or that with "username"
@Sid: Those .* probably ought to be non-greed .*? to solve the problem of it matching the last </property> rather than the next </property> on the line.
@anubhava: You need to fix the case where a line starts with a property the OP doesn't want as well as the case where it is followed by a property that isn't wanted. Example: <property name="passive">1</property><property name="username">Hello</property>
|
1

Although I still think you are better off using an XML parser here, this should fix the one line problem:

<property[^>]*name="(username|password)".*?</property>

http://regex101.com/r/oM7aD2/1

You match the literal <property follow by any number of characters that aren't a literal > (this prevents you from matching if the first tag of the line isn't username or password) then the rest is the same as @anubhava's (although I took the liberty of adding the second literal " in case you encounter other properties that are prefixed with username or password - e.g. password_expires)

1 Comment

It was nice to consider the _expires. While this does work. I personally thought replacing everything that doesn't match this would be easier in my scenario. But is your suggested alternative as follows - Use array = string.match with the said regex and use the array to generate the string again?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.