253

I need to split a string base on delimiter - and .. Below are my desired output.

AA.BB-CC-DD.zip ->

AA
BB
CC
DD
zip 

but my following code does not work.

private void getId(String pdfName){
    String[]tokens = pdfName.split("-\\.");
}
5
  • Based on what you said, it looks like it is working fine. What is your desired output? Commented May 13, 2011 at 14:59
  • 4
    @Jeff: He showed his desired output (AA / BB / CC ...) Commented May 13, 2011 at 15:02
  • 2
    Are you sure? I interpreted that as his current output, not his desired output. Maybe its time to stand up and walk around a little bit. Commented May 13, 2011 at 15:04
  • 1
    @Jeff: Sorry for the confusion, I updated my post to clear your misunderstand. Commented May 13, 2011 at 15:05
  • Regex will degrade your performance. I would recommend write a method which will go character by character and split string if need. You can optimize this futher to get log(n) performance. Commented Feb 16, 2013 at 17:55

15 Answers 15

363

I think you need to include the regex OR operator:

String[]tokens = pdfName.split("-|\\.");

What you have will match:
[DASH followed by DOT together] -.
not
[DASH or DOT any of them] - or .

Sign up to request clarification or add additional context in comments.

5 Comments

why we require two backslashes ??
The . character in regex means any character other than new line. tutorialspoint.com/java/java_regular_expressions.htm In this case, however, they wanted the actual character .. The two backslashes indicate that you are referring to .. The backslash is an escape character.
for normal cases it would be .split("match1|match2"), (eg. split("https|http")), \\ is to escape the special char . in above case
or generally, you can use pdfName.split("\\W"); as below @Peter Knego answer
use [-.] instead of -|\\.
72

Try this regex "[-.]+". The + after treats consecutive delimiter chars as one. Remove plus if you do not want this.

2 Comments

@Lurkers: The only reason Peter didn't have to escape that - was that it's the first think inside the [], otherwise there would need to be a backslash in front of it (and of course, to put a backslash in front of it, we need two because this is a string literal).
I think this answer is better than the accepted one, because when you use the logical operator |, the problem is that one of your delimiters can be a part of your result 'tokens'. This will not happen with Peter Knego's [-.]+
32

You can use the regex "\W".This matches any non-word character.The required line would be:

String[] tokens=pdfName.split("\\W");

2 Comments

it doesn't work for me ` String s = "id(INT), name(STRING),". Using \\W here creates an array of length 6 where as it should be only 4
This will also break when the input contains Unicode character. It's best to only include the actual delimiter, instead of a "grab all" with \W.
20

The string you give split is the string form of a regular expression, so:

private void getId(String pdfName){
    String[] tokens = pdfName.split("[-.]");
    // ...
}

That means "split on any character within the []" (so, split on - and .). A couple of notes on that:

  1. Normally, you have to escape the dot (.) by putting a backslash in front of it because in a regular expression . means "any character." But you don't have to do that within a character class ([]).
  2. Normally, within a character class ([]), you have to escape the dash (-) because in that context it has special meaning (it indicates a range, like [0-9A-Fa-f] to match all hex digits). But when it's the first character after the [, we don't have to escape it.

If you did need to escape either of those, the way you'd do it is by having a backslash in front of it in the string. Since we're writing this as a string literal, to actually put a backslash in the string requires that we escape it, since otherwise it's an escape character (for instance, \n means newline, \t means tab, etc.). So we'd have to write \\ to put an actual backslash in the string for the regular expression engine to see it and use it to escape the next character (- or .). For instance, "[\\-.]" if we wanted to escape the - even though we don't need to.

Live example: https://ideone.com/PMA8d3

6 Comments

You don't need to escape the hyphen in this case, because [-.] couldn't possibly be interpreted as a range.
@Alan: Because it's the very first thing in the class, that's quite true. But I always do, it's too easy to go back later and add something in front of it without thinking. Escaping it costs nothing, so...
do you know how to escape the brackets? I have String "[200] Engineering" that I want to split into "200" , "Engineering"
Oh wow I got it...I had to use two backslashes instead of one. String[] strings = codes.get(x).split("\\[|\\]| "); <-- code for anyone interested
Can you explain why we need to "escape the backlash because this is a string?"
|
16

Using Guava you could do this:

Iterable<String> tokens = Splitter.on(CharMatcher.anyOf("-.")).split(pdfName);

Comments

10

For two char sequence as delimeters "AND" and "OR" this should be worked. Don't forget to trim while using.

 String text ="ISTANBUL AND NEW YORK AND PARIS OR TOKYO AND MOSCOW";
 String[] cities = text.split("AND|OR"); 

Result : cities = {"ISTANBUL ", " NEW YORK ", " PARIS ", " TOKYO ", " MOSCOW"}

1 Comment

How can I get output like {"ISTANBUL AND", " NEW YORK AND", " PARIS OR", " TOKYO AND", " MOSCOW"}
6

pdfName.split("[.-]+");

  • [.-] -> any one of the . or - can be used as delimiter

  • + sign signifies that if the aforementioned delimiters occur consecutively we should treat it as one.

Comments

4

I'd use Apache Commons:

import org.apache.commons.lang3.StringUtils;

private void getId(String pdfName){
    String[] tokens = StringUtils.split(pdfName, "-.");
}

It'll split on any of the specified separators, as opposed to StringUtils.splitByWholeSeparator(str, separator) which uses the complete string as a separator

Comments

3
String[] token=s.split("[.-]");

1 Comment

Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation.
2

It's better to use something like this:

s.split("[\\s\\-\\.\\'\\?\\,\\_\\@]+");

Have added a few other characters as sample. This is the safest way to use, because the way . and ' is treated.

Comments

2

Try this code:

var string = 'AA.BB-CC-DD.zip';
array = string.split(/[,.]/);

1 Comment

Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation.
1

You may also specified regular expression as argument in split() method ..see below example....

private void getId(String pdfName){
String[]tokens = pdfName.split("-|\\.");
}

Comments

1
s.trim().split("[\\W]+") 

should work.

2 Comments

First, no, it does not work - maybe you can try it before posting? Then this answer is same as your - but working. Finally you should check your formating (should work.).
Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation.
-1

If you know the sting will always be in the same format, first split the string based on . and store the string at the first index in a variable. Then split the string in the second index based on - and store indexes 0, 1 and 2. Finally, split index 2 of the previous array based on . and you should have obtained all of the relevant fields.

Refer to the following snippet:

String[] tmp = pdfName.split(".");
String val1 = tmp[0];
tmp = tmp[1].split("-");
String val2 = tmp[0];
...

3 Comments

It can be done in one step, so do it in one step. See the other replies.
pdfName.split(".") results in a zero-length array.
1) . Needs to be escaped as \\.
-1

you can try this way as split accepts varargs so we can pass multiple parameters as delimeters

 String[]tokens = pdfName.split("-",".");

you can pass as many parameters that you want.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.