1

I want to extract part of string that did not match pattern

My pattern matching condition is sting should be of length 5 and should contain only N or Y.

Ex:

NYYYY => valid

NY    => Invalid , length is invalid

NYYSY => Invalid. character at position 3 is invalid

If string is invalid then I want to find out which particular character did not match. Ex : In NYYSY 4th character did not match.

I tried with pattern matching in scala

val Pattern = "([NY]{5})".r
    paramList match {
     case Pattern(c) => true
     case _  => false
    }
1
  • 1
    What about something like "NYCYNX"? That's invalid for 3 different reasons. Do you need to report all 3 or just whatever test fails first? Commented Mar 26, 2019 at 5:29

5 Answers 5

3

Returns a String indicating validation status.

def validate(str :String, len :Int, cs :Seq[Char]) :String = {
  val checkC = cs.toSet
  val errs = str.zipAll(Range(0,len), 1.toChar, -1).flatMap{ case (c,x) =>
               if      (x < 0)     Some("too long")
               else if (checkC(c)) None
               else if (c == 1)    Some("too short")
               else                Some(s"'$c' at index $x")
             }
  str + ": " + (if (errs.isEmpty) "valid" else errs.distinct.mkString(", "))
}

testing:

validate("NTYYYNN", 4, "NY")  //res0: String = NTYYYNN: 'T' at index 1, too long
validate("NYC",     7, "NY")  //res1: String = NYC: 'C' at index 2, too short
validate("YNYNY",   5, "NY")  //res2: String = YNYNY: valid
Sign up to request clarification or add additional context in comments.

Comments

1

Here's one approach that returns a list of (Char, Int) tuples of invalid characters and their corresponding positions in a given string:

def checkString(validChars: List[Char], validLength: Int, s: String) = {
  val Pattern = s"([${validChars.mkString}]{$validLength})".r

  s match {
    case Pattern(_) => Vector.empty[(Char, Int)]
    case s =>
      val invalidList = s.zipWithIndex.filter{case (c, _) => !validChars.contains(c)}
      if (invalidList.nonEmpty) invalidList else Vector(('\u0000', -1))
  }
}

List("NYYYY", "NY", "NNSYYTN").map(checkString(List('N', 'Y'), 5, _))
// res1: List(Vector(), Vector((?,-1)), Vector((S,2), (T,5)))

As shown above, an empty list represents a valid string and a list of (null-char, -1) means the string has valid characters but invalid length.

Comments

0

Here is one suggestion which might suit your needs:

"NYYSY".split("(?<=[^NY])|(?=[^NY])").foreach(println) 

NYY
S
Y

This solution splits the input string at any point when either the preceding or following character is not a Y or a N. This places each island of valid and invalid characters as separate rows in the output.

3 Comments

thank you. I would like to know which particular character at particular position is invalid. Ex : character at position 3 is invalid
Then honestly you should just iterate the string and check at each position if it be N, Y, or some other invalid value. Regex can only do so much here.
The position of the false character can be determined by calling length on the prior group. Because length starts at 1 the index offset is build-in.
0

You can use additional regular expressions to detect the specific issue:

val Pattern = "([NY]{5})".r
val TooLong = "([NY]{5})(.+)".r
val WrongChar = "([NY]*)([^NY].*)".r

paramList match {
  case Pattern(c) => // Good
  case TooLong(head, rest) => // Extra character(s) in sequence
  case WrongChar(head, rest) => // Wrong character in sequence
  case _ => // Too short
}

You can work out the index of the error using head.length and the failing character is rest.head.

Comments

0

You can achieve this with pattern matching each characters of the string without using any sort of regex or complex string manipulation.

def check(value: String): Unit = {
  if(value.length!=5) println(s"$value length is invalid.")
  else value.foldLeft((0, Seq[String]())){
    case (r, char) =>
      char match {
        case 'Y' | 'N' => r._1+1 -> r._2
        case c @ _ => r._1+1 -> {r._2 ++ List(s"Invalid character `$c` in position ${r._1}")}
      }
  }._2 match {
    case Nil => println(s"$value is valid.")
    case errors: List[String] => println(s"$value is invalid - [${errors.mkString(", ")}]")
  }
}

check("NYCNBNY")
NYNYNCC length is invalid.

check("NYCNB")
NYCNB is invalid - [Invalid character `C` in position 2, Invalid character `B` in position 4]

check("NYNNY")
NYNNY is valid.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.