1

I'm trying to match domain example.com and I would like to delete all IPs beneath it

Input:

[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33

Desired output:

[example.net]
10.100.251.22
10.100.251.33

Here is what I have tried so far:

\[example.com\](\s+^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$)*

It works, but not sure if thats efficient.

I'm doing my regex testing with rubular here is a sample

http://rubular.com/r/cavVHWPvT2

3
  • 2
    this doesn't seem like the job for a regex, what do you mean delete? Commented Oct 30, 2016 at 6:40
  • Well I would like to target these entries for deletion Commented Oct 30, 2016 at 6:41
  • Why don't you: try to put the second part into an array. Then, looping it and checking for containing in the first part. If matching, delete it? Commented Oct 30, 2016 at 6:50

5 Answers 5

1

I wouldn't bother with a complex regex, I'd do it using Ruby's slice_before:

data = '[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
'

data.lines.slice_before(/\A\[/).select { |ary| ary.first[/example\.net/] }.join
# => "[example.net]\n10.100.251.22\n10.100.251.33\n"

Breaking it down:

data
  .lines # => ["[example.com]\n", "10.100.251.1\n", "10.100.251.2\n", "10.100.251.3\n", "[example.net]\n", "10.100.251.22\n", "10.100.251.33\n"]
  .slice_before(/\A\[/) # => #<Enumerator: #<Enumerator::Generator:0x007f987b8b4528>:each>
  .select { |ary| ary.first[/example\.net/] } # => [["[example.net]\n", "10.100.251.22\n", "10.100.251.33\n"]]
  .join # => "[example.net]\n10.100.251.22\n10.100.251.33\n"

Regular expressions are great, and I use them when necessary but they're not always the best tool for a task. They can be very fragile and very treacherous, and greatly increase the task of maintaining code, especially as they get more complex.

This could also be accomplished using a flip-flop but explaining that is left to a different question: "What is a flip-flop operator?".

Sign up to request clarification or add additional context in comments.

Comments

0

Try this:

Find:

\[example\.com\].*?(\[(?:(?!example\.com).)*?\])

Replace:

$1

Regex101

3 Comments

First of all, update your question with the tool you are using. My regex would work in a tool such as Notepad++, but perhaps not yours.
.* means match any character, zero or more times. .*? means match any character zero or more times, but it is a non greedy match.
Explore this regex using the link provided.
0

We are given

str =<<-END
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
END
  #=> "[example.com]\n10.100.251.1\n10.100.251.2\n10.100.251.3\n[example.net]\n10.100..."

The question is a bit confusing in that the desired output is said to be

[example.net]
10.100.251.22
10.100.251.33

but that is also what is to be deleted. What follows returns the lines that are not deleted, but it would be a simple matter to change it to return the deleted bits. Also, the question doesn't make clear if the string "[example.net]" is known or if it's just an example of what might follow the "[example.com]" "block". Nor is it clear if there are exactly two "blocks", as in the example, or there could be one or more than two blocks.

If you know "[example.net]" immediately follows the "[example.com]" block, you could write

r = /
    \[example\.com\]     # match string
    .*?                  # match any number of characters, lazily
    (?=\[example\.net\]) # match string in positive lookahead
    /mx                  # multiline and free-spacing modes

puts str[r]
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3

If you don't know what follows the "[example.com]" "block", except that that the first line of the following block, if there is one, contains at least one character other than a digit or period, you could write

r = /
    \[example\.com\]\n  # match string
    .*?                 # match any number of any characters, lazily
    (?:[\d.]*\n)        # match a string containing > 0 digits and periods,
                        # followed by a newline, in a non-capture group
    +                   # match the above non-capture group > 0 times
    /x                  # free-spacing mode

puts str[r]
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3

6 Comments

Nice regex...looks like mine ;-)
@TimBiegeleisen, there certainly are similarities, but differences too, as I'm returning the keepers and you're returning the removals.
Thanks @CarySwoveland and Tim, I'm having truble running the example in my temrinal, you think you can help me with a sample on rubular.com ? Thanks
Sure, Dean, but for me it will have to wait until morning.
@CarySwoveland No, I am returning the keepers. Try it in Notepad++ and you will see.
|
0

Your regex is very close. What you miss is a bit of grouping and a linebreak construct at the right place:

/^\[example\.com\]\R*(?:(?:\d{1,3}\.){3}\d{1,3}\R*)*/

See the Rubular demo

Details:

  • ^ - start of line
  • \[example\.com\] - [example.com] literal substring
  • \R* - zero or more linebreaks (for older Ruby versions, use (?:\r?\n|\r)*)
  • (?:(?:\d{1,3}\.){3}\d{1,3}\R*)* - zero or more sequences of
    • (?:\d{1,3}\.){3} - 3 sequences of 1 to 3 digits and a dot
    • \d{1,3} - 1 to 3 digits
    • \R* - 0+ linebreaks

And a Ruby demo:

str =<<DATA
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
DATA
rx = /^\[example\.com\]\R*(?:(?:\d{1,3}\.){3}\d{1,3}\R*)*/
puts str[rx]

2 Comments

We end up with almost same regular expressions, but I still think \s* is better, than \R*. Either one claims the explicit precise format, then there should not be * matchers, or let’s allow spaces after IPs :)
\s matches horizontal whitespace, so [example.com]78.78.89.67556.87.87.87 can also be matched. I understand they must be on the subsequent lines.
0

Treat Your Data Like an INI File: Scan for Sections

One way to deal with your data is to treat it like an INI file. A regex with the multi-line option enabled can break a string representation of your INI file into an array of sections as follows:

ini = <<~'EOF'
  [example.com]
  10.100.251.1
  10.100.251.2
  10.100.251.3
  [example.net]
  10.100.251.22
  10.100.251.33
EOF

# Scan for INI section headers.
sections = ini.scan /^\[.*?\]$[^\[]*/m

You can then extract just the sections you want using Enumerable#grep. For example, to extract the example.net section:

section_title = 'example.net'
sections.grep /\A\[#{Regexp.escape section_title}\]\s*$/
#=> ["[example.net]\n10.100.251.22\n10.100.251.33\n"]

Caveats

  1. The multi-line regex above assumes you have the entire file loaded as a single String object. If you're doing something else, you may need a different approach.
  2. Note the importance of Regexp#escape, which ensures that your string is properly converted for use in a regex pattern. Otherwise, characters like [, ., and ] would not match as you might expect.
  3. INI files can be more complex than your sample data. You might consider a writing a real INI parser, or using a gem like inifile, rather than trying to handle all the possible edge cases in one regular expression.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.