3

Problem Statement

Given a string s , matching the regular expression [A-Za-z !,?._'@]+, split the string into tokens. We define a token to be one or more consecutive English alphabetic letters. Then, print the number of tokens, followed by each token on a new line.

Input Format

A single string, s. s is composed of English alphabetic letters, blank spaces, and any of the following characters: !,?._'@

Output Format

On the first line, print an integer,n, denoting the number of tokens in string s (they do not need to be unique). Next, print each of the n tokens on a new line in the same order as they appear in input string s .

Sample Input

He is a very very good boy, isn't he?

Sample Output

10

He

is

a

very

very

good

boy

isn

t

he

My Code:

import java.io.*;
import java.util.*;
import java.util.regex.*; 
public class Solution {

    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine();
        scan.close();
       String[] splitString = (s.replaceAll("^[\\W+\\s+]", "").split("[\\s!,?._'@]+"));
            System.out.println(splitString.length);
            for (String string : splitString) {
                System.out.println(string);
              }
}
}

This code works fine for the Sample Input but do not pass this test case.

Test case:

Input:

       YES      leading spaces        are valid,    problemsetters are         evillllll

Expected Output:

8

YES

leading

spaces

are

valid

problemsetters

are

evillllll

What changes in the code will pass this test case ?

0

8 Answers 8

2

Speaking about trimming non-word chars in the beginning of the string, your regex is not correct.

The ^[\\W+\\s+] matches 1 character at the beginning of a string, either a non-word (\W), a + or a whitespace. Using replaceAll makes no sense since only 1 char at the start of the string will get matched. Also, \W actually matches whitespace characters, too, so there is no need including \s into the same character class with \W.

You may replace that .replaceAll("^[\\W+\\s+]", "") with .replaceFirst("^\\W+", ""). This will remove 1 or more non-word chars at the beginning of the string (see this regex demo).

See this online Java demo yielding your expected output.

NOTE: to split a sentence into word char chunks, you may actually use

String[] tokens = s.replaceFirst("^\\W+", "").split("\\W+");

Java demo:

String s = "       YES      leading spaces        are valid,    problemsetters are         evillllll";
String[] splitString = s.replaceFirst("^\\W+", "").split("\\W+");

Then,

System.out.println(splitString.length); // => 8
for (String string : splitString) {
    System.out.println(string);
}
// => [ YES, leading, spaces, are, valid, problemsetters, are, evillllll]
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks a lot for the online java demo link. It passed the intended test case and all other test cases except for 1. > Test case: > INPUT: (No INPUT STRING) > EXPECTED OUTPUT: > 0
Just to make sure: did it also fail before? I think yes. So, the string = "", right? I suggest checking with if (s.isEmpty()) and if not use splitting.
No it did not fail before, and yes string =" ", Even this change did not pass the test case. if (s.isEmpty()) { System.out.println("0"); System.out.println(s); } else{ String[] splitString = (s.replaceAll("^\\W+", "").split("[\\s!,?._'@]+")); System.out.println(splitString.length); for (String string : splitString) { System.out.println(string); }
Here is your code with s = " ", and it shows 1. However, you should check a trimmed input with isEmpty(). See this demo
Voila.. All test cases passed with this change in the code, if (s.trim().isEmpty()) { System.out.println(0); } Thanks a lot.
1

This will pass all test cases

import java.io.*;
import java.util.*;

public class Solution {

public static void main(String[] args) {
    Scanner scan = new Scanner(System.in);
    String s = scan.nextLine();
    if(s.trim().isEmpty()) {
        System.out.println(0);
    }
    else {
        System.out.println(s.trim().split("[!,?. @_']+").length);
        for(String a : s.trim().split("[!,?. @_']+")){
            System.out.println(a);
            }
        }
    scan.close();
    }
}

Comments

1

Try this one it's working

import java.io.*;
import java.util.*;

public class Solution {

    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine();
        scan.close();
        
         s = s.trim();
        if (s.length() == 0) {
            System.out.println(0);
        } else {
            String[] strings = s.split("['!?,._@ ]+");
            System.out.println(strings.length);
            for (String str : strings)
                System.out.println(str);

        }
    }
}

Comments

1

You can trim the string before splitting it. In the given test case, it will count blankspace at the starting of the string as well. Try this:

import java.util.*;

public class Solution {

    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine().trim();
        if(s.isEmpty()) 
            System.out.println("0");
        else {
        String[] S = s.split("[\\s!,?._'@]+");
        System.out.println(S.length);
        for(int i=0;i<S.length;i++) {
            System.out.println(S[i]);
        }
        }
        scan.close();
    }
}

Comments

0
public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine();
        StringTokenizer st = new StringTokenizer(s,("[_\\@!?.', ]"));
        System.out.println(st.countTokens());
        while(st.hasMoreTokens()){
            System.out.println(st.nextToken());
        }
        scan.close();
    }

Comments

0
if(s.trim().isEmpty()){
   System.out.println("0");
   System.out.println(s);
} else {
   String[] splitString = (s.replaceAll("^\\W+", "").split("[\\s!,?._'@]+"));
   System.out.println(splitString.length);
   for(String str: splitString) {
        System.out.println(str);
   }
}

Comments

0
import java.io.*;
import java.util.*;

public class Solution {
public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine();
         String[] arr = s.split("\\s+|\\,+|\\'+|[\\-\\+\\$\\?\\.@&].*");  
    // Write your code here.
        for(int i=0;i<arr.length;i++){

            System.out.println(arr[i]);
        }
          scan.close();
    }
}

1 Comment

It would be better to add some explanation to the code. At least to your regular expression
0

The following should help

  public static void regexTest() {
    String s="isn't he a good boy?";
    // Replace any non alphabetic characters with a space.
    // [^a-zA-Z]
    // [          - Start a custom character class
    //  ^         - Anything that is not
    //   a-zA-Z   - a lowercase character or upper case character.
    //              for example a-z means everything starting from 'a' up to 
    //              and including 'z'
    //         ]  - End the custom character class.
    // Given the input string, the single quote and question mark will be replaced
    // by a space character.
    s=s.replaceAll("[^a-zA-Z]", " ");
    // Split the string (that only contains letters and spaces into individual words.
    String[] array_s=s.split(" ");
    for(int i=0;i<array_s.length;i++) {
        System.out.println(array_s[i]);
    }

1 Comment

I've found it is a good idea with Regular expressions to comment them construct by construct as they are often somewhat cryptic to people. In your case, a simple comment on the replaceAll would help alot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.