How to split the string (by matching a set of regular expression) into tokens and print each token in JAVA?

Question

Problem Statement

Given a string s , matching the regular expression [A-Za-z !,?._'@]+, split the string into tokens. We define a token to be one or more consecutive English alphabetic letters. Then, print the number of tokens, followed by each token on a new line.

Input Format

A single string, s. s is composed of English alphabetic letters, blank spaces, and any of the following characters: !,?._'@

Output Format

On the first line, print an integer,n, denoting the number of tokens in string s (they do not need to be unique). Next, print each of the n tokens on a new line in the same order as they appear in input string s .

Sample Input

He is a very very good boy, isn't he?

Sample Output

10

He

is

a

very

very

good

boy

isn

t

he

My Code:

import java.io.*;
import java.util.*;
import java.util.regex.*; 
public class Solution {

    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine();
        scan.close();
       String[] splitString = (s.replaceAll("^[\\W+\\s+]", "").split("[\\s!,?._'@]+"));
            System.out.println(splitString.length);
            for (String string : splitString) {
                System.out.println(string);
              }
}
}

This code works fine for the Sample Input but do not pass this test case.

Test case:

Input:
       YES      leading spaces        are valid,    problemsetters are         evillllll
Expected Output:

8

YES

leading

spaces

are

valid

problemsetters

are

evillllll

What changes in the code will pass this test case ?

Wiktor Stribiżew · Accepted Answer · 2019-07-27 16:51:59Z

2

Speaking about trimming non-word chars in the beginning of the string, your regex is not correct.

The ^[\\W+\\s+] matches 1 character at the beginning of a string, either a non-word (\W), a + or a whitespace. Using replaceAll makes no sense since only 1 char at the start of the string will get matched. Also, \W actually matches whitespace characters, too, so there is no need including \s into the same character class with \W.

You may replace that .replaceAll("^[\\W+\\s+]", "") with .replaceFirst("^\\W+", ""). This will remove 1 or more non-word chars at the beginning of the string (see this regex demo).

See this online Java demo yielding your expected output.

NOTE: to split a sentence into word char chunks, you may actually use

String[] tokens = s.replaceFirst("^\\W+", "").split("\\W+");

Java demo:

String s = "       YES      leading spaces        are valid,    problemsetters are         evillllll";
String[] splitString = s.replaceFirst("^\\W+", "").split("\\W+");

Then,

System.out.println(splitString.length); // => 8
for (String string : splitString) {
    System.out.println(string);
}
// => [ YES, leading, spaces, are, valid, problemsetters, are, evillllll]

edited Jul 27, 2019 at 16:51

answered Sep 29, 2016 at 12:26

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ganesh Ramachandran Over a year ago

Thanks a lot for the online java demo link. It passed the intended test case and all other test cases except for 1. > Test case: > INPUT: (No INPUT STRING) > EXPECTED OUTPUT: > 0

Wiktor Stribiżew Over a year ago

Just to make sure: did it also fail before? I think yes. So, the string = "", right? I suggest checking with if (s.isEmpty()) and if not use splitting.

Ganesh Ramachandran Over a year ago

No it did not fail before, and yes string =" ", Even this change did not pass the test case.

if (s.isEmpty()) { System.out.println("0"); System.out.println(s); } else{ String[] splitString = (s.replaceAll("^\\W+", "").split("[\\s!,?._'@]+")); System.out.println(splitString.length); for (String string : splitString) { System.out.println(string); }

Wiktor Stribiżew Over a year ago

Here is your code with s = " ", and it shows 1. However, you should check a trimmed input with isEmpty(). See this demo

Ganesh Ramachandran Over a year ago

Voila.. All test cases passed with this change in the code, if (s.trim().isEmpty()) { System.out.println(0); } Thanks a lot.

Reppin Frost · Accepted Answer · 2020-04-26 07:35:51Z

1

This will pass all test cases

import java.io.*;
import java.util.*;

public class Solution {

public static void main(String[] args) {
    Scanner scan = new Scanner(System.in);
    String s = scan.nextLine();
    if(s.trim().isEmpty()) {
        System.out.println(0);
    }
    else {
        System.out.println(s.trim().split("[!,?. @_']+").length);
        for(String a : s.trim().split("[!,?. @_']+")){
            System.out.println(a);
            }
        }
    scan.close();
    }
}

answered Apr 26, 2020 at 7:35

Reppin Frost

397 bronze badges

Comments

Abdulnaser · Accepted Answer · 2021-01-06 21:29:44Z

1

Try this one it's working

import java.io.*;
import java.util.*;

public class Solution {

    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine();
        scan.close();
        
         s = s.trim();
        if (s.length() == 0) {
            System.out.println(0);
        } else {
            String[] strings = s.split("['!?,._@ ]+");
            System.out.println(strings.length);
            for (String str : strings)
                System.out.println(str);

        }
    }
}

edited Jan 6, 2021 at 21:29

answered Jan 6, 2021 at 21:05

Abdulnaser

313 bronze badges

Comments

Suraj Rao · Accepted Answer · 2021-02-05 15:25:40Z

1

You can trim the string before splitting it. In the given test case, it will count blankspace at the starting of the string as well. Try this:

import java.util.*;

public class Solution {

    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine().trim();
        if(s.isEmpty()) 
            System.out.println("0");
        else {
        String[] S = s.split("[\\s!,?._'@]+");
        System.out.println(S.length);
        for(int i=0;i<S.length;i++) {
            System.out.println(S[i]);
        }
        }
        scan.close();
    }
}

edited Feb 5, 2021 at 15:25

Suraj Rao

29.7k11 gold badges96 silver badges104 bronze badges

answered Feb 5, 2021 at 15:24

Mahi Vijayvargiya

111 bronze badge

Comments

sreekar sunku · Accepted Answer · 2017-09-07 07:19:06Z

0

public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine();
        StringTokenizer st = new StringTokenizer(s,("[_\\@!?.', ]"));
        System.out.println(st.countTokens());
        while(st.hasMoreTokens()){
            System.out.println(st.nextToken());
        }
        scan.close();
    }

answered Sep 7, 2017 at 7:19

sreekar sunku

12 bronze badges

Comments

jrtapsell · Accepted Answer · 2017-09-07 08:56:58Z

0

if(s.trim().isEmpty()){
   System.out.println("0");
   System.out.println(s);
} else {
   String[] splitString = (s.replaceAll("^\\W+", "").split("[\\s!,?._'@]+"));
   System.out.println(splitString.length);
   for(String str: splitString) {
        System.out.println(str);
   }
}

edited Sep 7, 2017 at 8:56

jrtapsell

7,0491 gold badge30 silver badges50 bronze badges

answered Oct 17, 2016 at 17:44

myer

1

Comments

user10204287 · Accepted Answer · 2018-08-09 15:49:08Z

0

import java.io.*;
import java.util.*;

public class Solution {
public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String s = scan.nextLine();
         String[] arr = s.split("\\s+|\\,+|\\'+|[\\-\\+\\$\\?\\.@&].*");  
    // Write your code here.
        for(int i=0;i<arr.length;i++){

            System.out.println(arr[i]);
        }
          scan.close();
    }
}

answered Aug 9, 2018 at 15:49

user10204287

1

1 Comment

Aleks Andreev Over a year ago

It would be better to add some explanation to the code. At least to your regular expression

GMc · Accepted Answer · 2019-05-09 03:51:22Z

0

The following should help

  public static void regexTest() {
    String s="isn't he a good boy?";
    // Replace any non alphabetic characters with a space.
    // [^a-zA-Z]
    // [          - Start a custom character class
    //  ^         - Anything that is not
    //   a-zA-Z   - a lowercase character or upper case character.
    //              for example a-z means everything starting from 'a' up to 
    //              and including 'z'
    //         ]  - End the custom character class.
    // Given the input string, the single quote and question mark will be replaced
    // by a space character.
    s=s.replaceAll("[^a-zA-Z]", " ");
    // Split the string (that only contains letters and spaces into individual words.
    String[] array_s=s.split(" ");
    for(int i=0;i<array_s.length;i++) {
        System.out.println(array_s[i]);
    }

edited May 9, 2019 at 3:51

GMc

1,7841 gold badge12 silver badges26 bronze badges

answered May 8, 2019 at 20:55

diksha smriti

1

1 Comment

GMc Over a year ago

I've found it is a good idea with Regular expressions to comment them construct by construct as they are often somewhat cryptic to people. In your case, a simple comment on the replaceAll would help alot.

Collectives™ on Stack Overflow

How to split the string (by matching a set of regular expression) into tokens and print each token in JAVA?

8 Answers 8

5 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

5 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related