regex tag in reactcomponent by name in two possibles tag

Question

I need an regex to find <Field ...name="document"> or <FieldArray ...name="document"> to replace with an empty string. They can be defined across multiple lines.

This is not html or xhtml, it's just a text string containing <Field> and <FieldArray>

Example with Field:

      <Field
        component={FormField}
        name="document"
        typeInput="selectAutocomplete"
      />

Example with FieldArray:

      <FieldArray
        component={FormField}
        typeInput="selectAutocomplete"
        name="document"
      />

the are inside a list of components. Example:

      <Field
        name="amount"
        component={FormField}
        label={t('form.amount')}
      />
      <Field
        name="datereception"
        component={FormField}
        label={t('form.datereception')}
      />
      <Field
        component={FormField}
        name="document"
        typeInput="selectAutocomplete"
      />
      <Field
        name="datedeferred"
        component={FormField}
        label={t('form.datedeferred')}
      />

I've have read some solutions like to find src in Extract image src from a string but his structure is different a what i'm looing for.

this is not html or xhmtl, i'ts just string with 2 properties — DDave
– DDave, Commented Dec 19, 2017 at 12:16

The fourth bird · Accepted Answer · 2017-12-19 17:28:30Z

2

+50

It is not advisable to parse [X]HTML with regex. If you have a possibility to use a domparser, I would advise using that instead of regex.

If there is no other way, you could this approach to find and replace your data:

<Field(?:Array)?\b(?=[^\/>]+name="document")[^>]+\/>

Explanation

Match <Field with optional "Array" and end with a word boundary <Field(?:Array)?\b
A positive lookahead (?=
Which asserts that following is not /> and encounters name="document" [^\/>]+name="document"
Match not a > one or more times [^>]+
Match \/>

var str = `<Field
    name="amount"
    component={FormField}
    label={t('form.amount')}
  />
  <Field
    name="datereception"
    component={FormField}
    label={t('form.datereception')}
  />
  <Field
    component={FormField}
    name="document"
    typeInput="selectAutocomplete"
  />
  <Field
    name="datedeferred"
    component={FormField}
    label={t('form.datedeferred')}
  />
<FieldArray
    component={FormField}
    typeInput="selectAutocomplete"
    name="document"
  /><FieldArray
    component={FormField}
    typeInput="selectAutocomplete"
    name="document"
  />` ;
str = str.replace(/<Field(?:Array)?\b(?=[^\/>]+name="document")[^>]+\/>/g, "");
console.log(str);

edited Dec 19, 2017 at 17:28

answered Dec 18, 2017 at 12:08

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

DDave Over a year ago

i did not test your code in mine, but i think it's going to work, my code is not xhtml or html, just component tags <Tag />

Adam Katz Over a year ago

Given the generous lookahead here, your optional (?:Array)? doesn't do anything. maybe you intended to have a \b after it to denote the end of that tag? Also, your [\s\S]+? (nongreedy expansion) is expensive. Why not use [^>]+ instead? <Field(?:Array)?\b(?=[^\/>]+name="document")[^>]+\/>. You might also be interested in using template literals for multi-line strings to clean up that example. I'm not sure why there's a -1 on this answer, it looks good to me.

Adam Katz Over a year ago

@DDave – It looks like your code is XML, which has the same issue. You're still better off using an actual XML parser. DOM parsers can handle this.

The fourth bird Over a year ago

@AdamKatz Thank you for your comment! I have updated my answer.

user557597 Over a year ago

You may not believe this, but it's not good enough to use [^>]. Your regex matches <Field but = "name="document"/> which is valid html but does not contain the name="document" attrib/value.

|

Adam Katz · Accepted Answer · 2017-12-19 21:16:49Z

2

Here's an answer with actual XML parsing and no regular expressions:

var xml = document.createElement("xml");
xml.innerHTML = `
      <Field
        name="amount"
        component={FormField}
        label={t('form.amount')}
      />
      <FieldDistractor
        component={FormField}
        name="document"
        typeInput="selectAutocomplete"
      />
      <Field
        name="datereception"
        component={FormField}
        label={t('form.datereception')}
      />
      <Field
        component={FormField}
        name="document"
        typeInput="selectAutocomplete"
      />
      <Field
        name="datedeferred"
        component={FormField}
        label={t('form.datedeferred')}
      />
      <FieldArray
        component={FormField}
        typeInput="selectAutocomplete"
        name="document"
      /><FieldArray
        component={FormField}
        typeInput="selectAutocomplete"
        name="document"
      />
`;

var match = xml.querySelectorAll(
  `field:not([name="document"]), fieldarray:not([name="document"]),
    :not(field):not(fieldarray)`
);
var answer = "";
for (var m=0, ml=match.length; m<ml; m++) {
  // cloning the node removes children, working around the DOM bug
  answer += match[m].cloneNode().outerHTML + "\n";
}
console.log(answer);

In writing this answer, I found a bug in the DOM parser for both Firefox (Mozilla Core bug 1426224) and Chrome (Chromium bug 796305) that didn't allow creating empty elements via innerHTML. My original answer used regular expressions to pre- and post-process the code to make it work, but using regexes on XML is so unsavory that I later changed it to merely strip off children by using cloneNode() (with its implicit deep=false).

So we dump the XML into a dummy DOM element (which we don't need to place anywhere), then we run querySelectorAll() to match some CSS that specifies your requirements:

field:not([name="document"]) "Field" elements lacking name="document" attributes, or
fieldarray:not([name="document"]) "FieldArray" elements lacking that attribute, or
:not(field):not(fieldarray) Any other element

edited Dec 19, 2017 at 21:16

answered Dec 19, 2017 at 17:28

Adam Katz

16.3k5 gold badges80 silver badges94 bronze badges

4 Comments

user557597 Over a year ago

This [^>] by itself isn't sufficient to parse html tags.

Adam Katz Over a year ago

I removed the regex code and used a non-regex workaround rather than dealing with ridiculously arcane XML-parsing issues (which are the reason for avoiding regexes in the first place).

user557597 Over a year ago

Yeah but nobody's talking about parsing XML/Xhtml/html. The issue is parsing tags or markup. Note that the given specs by w3c are written using regex to begin with. A typical use is a sax parser. Incase you don't think regex can be used, you can take a look at this which strips all html markup and invisible content from any html source: regex101.com/r/4jvwsH/1

Robert Longson Over a year ago

This is not a bug in either Chrome's or Firefox's DOM Parser. There are a limited number of empty elements in HTML, HTML is not XML.

score 0 · Accepted Answer · 2017-12-18 21:12:14Z

0

You can parse HTML tags with regex because parsing the tags themselves are nothing special and are the first thing parsed as an atomic operation.

But, you can't use regex to go beyond the atomic tag.
For example, you can't find the balanced tag closing to match the open as
this would put a tremendous strain on regex capability.

What a Dom parser does is use regex to parse the tags, then uses internal
algorithms to create a tree and carry out processing instructions to interpret
and recreate an image.
And of course regex doesn't do that.

Sticking to strictly parsing tags, including invisible content (like script),
is not that easy as well.
Content can hide or embed tags that, when you look for them, you shouldn't
find them.

So, in essence, you have to parse the entire html file to find the real
tag your looking for.
There is a general regex that can do this that I will not include here.
But if you need it let me know.

So, if you want to jump straight into the fire without parsing all the
tags of the entire file, this is the regex to use.

It is essentially a cut up version of the one that parses all tags.
This flavor finds the tag and any attribute=value that you need,
and also finds them out-of-order.
It can also be used to find out-of-order, multiple attr/val's within the same tag.

This is for your usage:

/<Field(?:Array)?(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*(?:(['"])\s*document\s*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+\/>/

Explained/Formatted

 < Field                # Field or  FieldArray  tag
 (?: Array )?

 (?=                    # Asserttion (a pseudo atomic group)
      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
      \s name \s* = \s* 
      (?:
           ( ['"] )               # (1), Quote
           \s* document \s*       # With name = "document"
           \1 
      )
 )
 \s+ 
 (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
 />

Running demo: https://regex101.com/r/ieEBj8/1

edited Dec 18, 2017 at 21:12

answered Dec 18, 2017 at 20:45

user557597

3 Comments

user557597 Over a year ago

Dave - This is grade A stuff. If I were you I'd write it down so you don't lose it ..

DDave Over a year ago

thanks sln i'm going to study your code. my code is not full html, it's just a string containin Field and FieldArray, i did not understand what do you mean with 'write dow,'

user557597 Over a year ago

@DDave - If it were just a string containing Field and FieldArray then you can't tell where they begin and end compared to something else without using delimiter parsing rules. Especially when you're looking for a specific attribute / value (or ah, sub-expression I mean). Don't think you're fooling anybody. What I mean by write it down is, this regex form is a gold standard I developed years ago and has been used for big scraping projects. I disseminate it freely, but I don't often fully explain it (by design). This is custom for you, different for someone else, etc..

Collectives™ on Stack Overflow

regex tag in reactcomponent by name in two possibles tag

3 Answers 3

7 Comments

4 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related