0

So I'm currently trying to sort values from a file. I'm stuck on the finding the first attribute, and am not sure why. I'm new to regex and ruby so I'm not sure how to go about the problem. I'm trying to find values of a,b,c,d,e where they are all positive numbers.

Here's what the line will look like

length=<a> begin=(<b>,<c>) end=(<d>,<e>)

Here's what I'm using to find the values

current_line = file.gets
if current_line == nil then return end
while current_line = file.gets do
   if line =~ /length=<(\d+)> begin=((\d+),(\d+)) end=((\d+),(\d+))/
       length, begin_x, begin_y, end_x, end_y = $1, $2, $3, $4, $5
       puts("length:" + length.to_s + " begin:" + begin_x.to_s + "," + begin_y.to_s + " end:" + end_x.to_s + "," + end_y.to_s)
   end
end

for some reason it never prints anything out, so I'm assuming it never finds a match

Sample input length=4 begin=(0,0) end=(3,0)


A line with 0-4 decimals after 2 integers seperated by commas. So it could be any of these:

2 4 1.3434324,3.543243,4.525324   
1 2     
18 3.3213,9.3233,1.12231,2.5435    
7 9 2.2,1.899990    
0 3 2.323    
1
  • @Jake Senior, I posted an answer that does what you want. Commented Feb 20, 2015 at 2:23

2 Answers 2

2

Here is your regex:

r = /length=<(\d+)> begin=((\d+),(\d+)) end=((\d+),(\d+))/
str.scan(r)
  #=> nil

First, we need to escape the parenthesis:

r = /length=<(\d+)> begin=\((\d+),(\d+)\) end=\((\d+),(\d+)\)/

Next, add the missing < and > after "begin" and "end".

r = /length=<(\d+)> begin=\(<(\d+)>,<(\d+)>\) end=\(<(\d+)>,<(\d+)>\)/

Now let's try it:

str = "length=<4779> begin=(<21>,<47>) end=(<356>,<17>)" 

but first, let's set the mood

str.scan(r)
  #=> [["4779", "21", "47", "356", "17"]]

Success!

Lastly (though probably not necessary), we might replace the single spaces with \s+, which permits one or more spaces:

r = /length=<(\d+)>\s+begin=\(<(\d+)>,<(\d+)>\)\send=\(<(\d+)>,<(\d+)>\)/

Addendum

The OP has asked how this would be modified if some of the numeric values were floats. I do not understand precisely what has been requested, but the following could be modified as required. I've assumed all the numbers are non-negative. I've also illustrated one way to "build" a regex, using Regexp#new.

  s1 = '<(\d+(?:\.\d+)?)>' # note single parens
    #=> "<(\\d+(?:\\.\\d+)?)>" 
  s2 = "=\\(#{s1},#{s1}\\)"
    #=> "=\\(<(\\d+(?:\\.\\d+)?)>,<(\\d+(?:\\.\\d+)?)>\\)" 
  r = Regexp.new("length=#{s1} begin#{s2} end#{s2}")
    #=> /length=<(\d+(?:\.\d+)?)> begin=\(<(\d+(?:\.\d+)?)>,<(\d+(?:\.\d+)?)>\) end=\(<(\d+(?:\.\d+)?)>,<(\d+(?:\.\d+)?)>\)/ 

  str = "length=<47.79> begin=(<21>,<4.7>) end=(<0.356>,<17.999>)" 

  str.scan(r)
    #=> [["47.79", "21", "4.7", "0.356", "17.999"]] 
Sign up to request clarification or add additional context in comments.

3 Comments

what's thee \b for? and do you need to have \( and \) whenever you use parentheses in regex? any explanation would be great
\b is a (zero-width) word break. It prevents a match on, say, "oddlength". (Probably not necessary). Some characters, including parentheses, mean something in regexes ((..) is a capture group, (?:...) is a non-capture group, etc), so they must be escaped to tell the parser that you are just referring to that character. It's a little complicated, though, as most of those characters don't need to be escaped when inside a character class (e.g., [()abc]).
Sorry I found my solution to that. However I just added an edit to my OP. Is there anyway you could explain to me how I could find that format? I tried using (\w) but the commas throw it off
0

Sample input:

length=4 begin=(0,0) end=(3,0)

data.txt:

length=3 begin=(0,0) end=(3,0)
length=4 begin=(0,1) end=(0,5)
length=2 begin=(1,3) end=(1,5)

Try this:

require 'pp'

Line = Struct.new(
  :length, 
  :begin_x,
  :begin_y,
  :end_x,
  :end_y,
)

lines = []

IO.foreach('data.txt') do |line|
  numbers = []

  line.scan(/\d+/) do |match|
    numbers << match.to_i
  end

  lines << Line.new(*numbers)
end

pp lines

puts lines[-1].begin_x

--output:--
[#<struct Line length=3, begin_x=0, begin_y=0, end_x=3, end_y=0>,
 #<struct Line length=4, begin_x=0, begin_y=1, end_x=0, end_y=5>,
 #<struct Line length=2, begin_x=1, begin_y=3, end_x=1, end_y=5>]
1

With this data.txt:

2 4 1.3434324,3.543243,4.525324   
1 2     
18 3.3213,9.3233,1.12231,2.5435    
7 9 2.2,1.899990    
0 3 2.323    

Try this:

require 'pp'

data = []

IO.foreach('data.txt') do |line|
  pieces = line.split
  csv_numbers = pieces[-1]

  next if not csv_numbers.index('.') #skip the case where there are no floats on a line

  floats = csv_numbers.split(',')
  data << floats.map(&:to_f)
end

pp data

--output:--
[[1.3434324, 3.543243, 4.525324],
 [3.3213, 9.3233, 1.12231, 2.5435],
 [2.2, 1.89999],
 [2.323]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.