0

I am trying to use a regex to validate a string. It should allow white spaces between a string and a booleaen operator like (@string1 OR), but disallow white spaces in between strings like (string 1). Other boolean logics allowed are:

(A AND B) AND (NOT C)
(A OR B) AND (NOT C)
(A AND B)
(A OR B)
(NOT C)

Examples of possible valid and invalid inputs are below.

Valid:

(@string1 OR @string2) AND ( NOT @string3)
(@string-1 AND @string.2) AND ( NOT @string_3)
(@string1 OR @string2 OR @string4) AND ( NOT @string3 AND NOT @string5)
(@string1    OR   @string2   OR    @string4)
(@string1 AND @string2 AND @string4)
( NOT @string1 AND NOT @string2 AND NOT @string4)
( NOT @string1 AND NOT @string2)

Invalid:

()
(string  1 OR @str ing2) AND ( NOT @tag3)
(@string 1 OR @tag 2) AND ( NOT @string 3)
(@string1  @string2) ( NOT @string3)
(@string1 OR @string12) AND (@string3)
(@string1 AND NOT @string2)

Is it better to parse the string and then have multiple regexes check for the absence of whitespaces, or can a regex be written to check the entire string?

9
  • These queries can have nested (...), correct? Commented Apr 17, 2017 at 12:29
  • Also, what stops the user from having :- NOT NOT, where the second NOT is a string? I guess what I am asking is, how do you tell what is and is not a string? Commented Apr 17, 2017 at 13:02
  • @WiktorStribiżew Yes that is correct. Commented Apr 17, 2017 at 13:02
  • @grail. They user might be able to do a Post using NOT NOT but they validation should catch that. Not sure if that answers your question Commented Apr 17, 2017 at 13:04
  • Not really. How will the validation know the difference between NOT NOT as qualifier and string as opposed to NOT NOT string where 2 qualifiers have been used to turn the resulting value into a truth. So you have not answered the second part of my previous question, which is, how do you tell what is and is not a string? Commented Apr 17, 2017 at 13:19

2 Answers 2

1

This kind of sophisticated validation would be best solved with a grammar parser.

Just to get you started, here is an (incomplete) solution in parslet. As you can see, you build up from primitives and construct more and more complicated structures.

require 'parslet'

class Boolean < Parslet::Parser
  rule(:space)  { match[" "].repeat(1) }
  rule(:space?) { space.maybe }

  rule(:lparen) { str("(") >> space? }
  rule(:rparen) { str(")") >> space? }

  rule(:and_operator) { str("AND") >> space? }
  rule(:or_operator) { str("OR") >> space? }
  rule(:not_operator) { str("NOT") >> space? }

  rule(:token) { str("@") >> match["a-z0-9"].repeat >> space? }

  # The primary rule deals with parentheses.
  rule(:primary) { lparen >> expression >> rparen | token }

  rule(:and_expression) { primary >> and_operator >> primary }
  rule(:or_expression) { primary >> or_operator >> primary }
  rule(:not_expression) { not_operator >> primary }

  rule(:expression) { or_expression | and_expression | not_expression | primary }

  root(:expression)
end

You can test a string with this little helper method:

def parse(str)
  exp = Boolean.new
  exp.parse(str)
  puts "Valid!"
rescue Parslet::ParseFailed => failure
  puts failure.parse_failure_cause.ascii_tree
end

parse("@string AND (@string2 OR @string3)")
#=> Valid!
parse("(string1 AND @string2)")
#=> Expected one of [OR_EXPRESSION, AND_EXPRESSION, NOT_EXPRESSION, PRIMARY] at line 1 char 1.
#   ...
#   - Failed to match sequence ('@' [a-z0-9]{0, } SPACE?) at line 1 char 2.
#      - Expected "@", but got "s" at line 1 char 2.
Sign up to request clarification or add additional context in comments.

Comments

0

You need recursion or loop, and a stack to parse that properly and regex alone will be very difficult albeit impossible to validate that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.