2

Consider the following as tokens:

  1. +, -, ), (
  2. alpha charactors and underscore
  3. integer

Implement 1.getToken() - returns a string corresponding to the next token 2.getTokPos() - returns the position of the current token in the input string

Example input: (a+b)-21)
Output: (| a| +| b| )| -| 21| )|

Note: Cannot use the java string tokenizer class

Work in progress - Successfully tokenized +,-,),(. Need to figure out characters and numbers:

OUTPUT: +|-|+|-|(|(|)|)|)|(| |

10
  • use split even if you can use a tokenizer. ;) Commented Oct 4, 2010 at 22:08
  • @Bozho I actually prefer the tokenizer--I can't figure out how to get String.split to return tokens and I'm distrustful of regular expressions anyway--but I'm sure if they seem natural to you then string.split makes a lot more sense. Commented Oct 4, 2010 at 22:10
  • 1
    Does anybody understand why educators give students assignments to do something and not let them use the libraries which are coming in the box? I mean 90% of the Java knowledge is knowing which class to use, not which operator to use. Well, it probably has something to do with learning the basics, but it still goes against the grain. Commented Oct 4, 2010 at 22:21
  • 3
    @Peter Tillermans Its not just about the basics, but it enables students to learn the much less basic as well. An exercise like this can be a good place to see basic things like loops and string operations, yes, but it also is a leading off point into the things that use tokenizers, such as recursive decent parsing, which then leads to LL(k) predictive parsing, and so forth into language construction. All of those things will seem like magic to a student who doesn't really grasp string tokenizing to begin with. Commented Oct 4, 2010 at 22:25
  • @Zoe I kind of know, but still... I did my share of parser/interpreters but I fail to see how implementing low level programming helps deepening the knowledge of LL(k) predictive parsing. If you do not fully understand it before you start you simply dig yourself deeper in a hole and being waist high in broken code just leads to cognitive overload and feeling stupid, neither of which lead to better understanding (although it might cut hubris to size). I guess I've seen too many people thinking implementing functionality is faster than reading the API of a library providing it. Commented Oct 4, 2010 at 22:52

4 Answers 4

3

java.util tokenizer is a deprecated class.

Tokenizing Strings in Java is much easier with "String.split()" since Java 1.4 :

String[] tokens = "(a+b)-21)".split("[+-)(]");

If it is a homework, you probably have to reimplement a "split" method:

  • read the String character by character
  • if the character is not a special char, add it to a buffer
  • when you encounter a special char, add the buffer content to a list and clear the buffer

Since it is (probably) a homework, I let you implement it.

Sign up to request clarification or add additional context in comments.

3 Comments

I think i understand your implementation. However, this will give a,b,21 instead of (,a,+,b,),-,21,) I still can't figure it out... thank you for the replies
Since the characters used as the split pattern are the delimiters of the split, they will not be returned in the results array. Using string.split unfortunately won't be able to actually get everything. if you split on any of the characters, you lose them. If you split on "", then 21 becomes two tokens, which is still not right (you actually end up with the same problem as before, since you just reduce the string to an array of characters).
If you also needs the delimiters, you just have to do a little adaptation on my algorithm: "if the character is a special char, add the buffer content to a list, add the 'special char' as the next element of the list, and only then clear the buffer".
1

Java lets you examine the characters in a String one by one with the charAt method. So use that in a for loop and examine each character. When you encounter a TOKEN you wrap that token with the pipes and any other character you just append to the output.

public static final char PLUS_TOKEN = '+';
// add all tokens as 

public String doStuff(String input)
{
    StringBuilder output = new StringBuilder();
    for (int index = 0; index < input.length(); index++)
    {
        if (input.charAt(index) == PLUS_TOKEN)
        {
            // when you see a token you need to append the pipes (|) around it
            output.append('|');
            output.append(input.charAt(index);
            output.append('|');
        }
        else if () //compare the current character with all tokens
        else
        {
            // just add to new output
            output.append(input.charAt(index);
        }

    }
    return output.toString();
}

4 Comments

your amazing. You included formulas that is needed for my assignment and now I have all the tools to make it right. My prof expects us to learn java on our own so writing my first program is harder.
@Tom - All you needed was a nudge in the right direction, I hope it was not too big of a nudge. :)
1.Is there a syntax for integer numberic constants and characters? I need to define them as tokens also. Thanks!!
@Tom - Of course there is, but you have to do this one on your own. Hint is to look at the Character class download-llnw.oracle.com/javase/6/docs/api/java/lang/…
0

If it's not a homework assignment use String.split(). If is a homework assignment, say so and tag it so that we can give the appropriate level of help (I did so for you, just in case...).

1 Comment

It is a homework assignment, it says: You may not use jlex or the java string tokenizer class. Therefore, I assume we can use the split function. Thank you very much for the tips
0

Because the string needs to be cut in several different ways, not just on whitespace or parens, using the String.split method with any of the symbols there will not work. Split removes the character used as a seperator. You could try to split on the empty string, but this wouldn't get compound symbols, like 21. To correctly parse this string, you will need to effectively implement your own tokenizer. Try thinking about how you could tell you had a complete token if you looked at the string one character at a time. You could probably start a string that collects the characters until you have identified a complete token, and then you can remove the characters from the original and return the string. Starting from this point, you can probably make a basic tokenizer.

If you'd rather learn how to make a full strength tokenizer, most of them are defined by creating a regular expression that only matches the tokens.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.