Two alternatives:
s = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
s.split(/\s*\n\s*/).map{ |p| p.scan(/[^|\[\]]+/).map(&:strip) }
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
irb> s.split(/\s*\n\s*/).map do |line|
line.sub(/^\s*\[\s*/,'').sub(/\s*\]\s*$/,'').split(/\s*\|\s*/)
end
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
Both of them start by splitting on newlines (throwing away surrounding whitespace).
The first one then splits each chunk by looking for anything that is not a [, |, or ] and then throws away extra whitespace (calling strip on each).
The second one then throws away leading [ and trailing ] (with whitespace) and then splits on | (with whitespace).
You cannot get the final result you want with a single scan. About the closest you can get is this:
s.scan /\[(?:([^|\]]+)\|)*([^|\]]+)\]/
#=> [["test", " blah"], ["foo ", "bar bar bar"], ["123 ", " 456 789"]]
…which drops information, or this:
s.scan /\[((?:[^|\]]+\|)*[^|\]]+)\]/
#=> [["test| blah"], ["foo |bar bar bar"], ["test| abc |123 | 456 789"]]
…which captures the contents of each "array" as a single capture, or this:
s.scan /\[(?:([^|\]]+)\|)?(?:([^|\]]+)\|)?(?:([^|\]]+)\|)?([^|\]]+)\]/
#=> [["test", nil, nil, " blah"], ["foo ", nil, nil, "bar bar bar"], ["test", " abc ", "123 ", " 456 789"]]
…which is hardcoded to a maximum of four items, and inserts nil entries that you would need to .compact away.
There is no way to use Ruby's scan to take a regex like /(?:(aaa)b)+/ and get multiple captures for each time the repetition is matched.