1

i'm doing tokenizing a text file in java. I want to read an input file, tokenize it and write a certain character that has been tokenized into an output file. This is what i've done so far:

package org.apache.lucene.analysis;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.StreamTokenizer;

class StringProcessing {
    // Create BufferedReader class instance
    public static void main(String[] args) throws IOException {
        InputStreamReader input = new InputStreamReader(System.in);
        BufferedReader keyboardInput = new BufferedReader(input);
        System.out.print("Please enter a java file name: ");
        String filename = keyboardInput.readLine();
        if (!filename.endsWith(".DAT")) {
            System.out.println("This is not a DAT file.");
            System.exit(0);
        }
        File File = new File(filename);
        if (File.exists()) {
            FileReader file = new FileReader(filename);
            StreamTokenizer streamTokenizer = new StreamTokenizer(file);
            int i = 0;
            int numberOfTokensGenerated = 0;
            while (i != StreamTokenizer.TT_EOF) {
                i = streamTokenizer.nextToken();
                numberOfTokensGenerated++;
            }
            // Output number of characters in the line
            System.out.println("Number of tokens = " + numberOfTokensGenerated);
            // Output tokens
            for (int counter = 0; counter < numberOfTokensGenerated; counter++) {
                char character = file.toString().charAt(counter);
                if (character == ' ') { System.out.println(); } else { System.out.print(character); }
            }
        } else {
            System.out.println("File does not exist!");
            System.exit(0);
        }

        System.out.println("\n");
    }//end main
}//end class

When i run this code, this is what i get:

Please enter a java file name: D://eclipse-java-helios-SR1-win32/LexractData.DAT Number of tokens = 129 java.io.FileReader@19821fException in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 25 at java.lang.String.charAt(Unknown Source) at org.apache.lucene.analysis.StringProcessing.main(StringProcessing.java:40)

The input file will look like this:

-K1 Account 
--Op1 withdraw
---Param1 an
----Type Int
---Param2 amount
----Type Int
--Op2 deposit
---Param1 an
----Type Int
---Param2 Amount
----Type Int
--CA1 acNo
---Type Int
-K2 CheckAccount 
--SC Account
--CA1 credit_limit
---Type Int
-K3 Customer
--CA1 name
---Type String
-K4 Transaction
--CA1 date
---Type Date
--CA2 time
---Type Time
-K5 CheckBook
-K6 Check
-K7 BalanceAccount
--SC Account

I just want to read the string which are starts with -K1, -K2, -K3, and so on... can anyone help me?

1
  • when i run this code, this is what i got java.io.FileReader@19821fException in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 25 at java.lang.String.charAt(Unknown Source) at org.apache.lucene.analysis.StringProcessing.main(StringProcessing.java:40) i know there is wrong in this line: char character = file.toString().charAt(counter);. but i have no idea... Commented Jul 24, 2011 at 3:22

3 Answers 3

3

The problem is with this line --

char character = file.toString().charAt(counter);

file is a reference to a FileReader that does not implement toString() .. it calls Object.toString() which prints a reference around 25 characters long. Thats why your exception says OutofBoundsException at the 26th character.

To read the file correctly, you should wrap your filereader with a bufferedreader and then put each readline into a stringbuffer.

FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
StringBuilder sb  = new StringBuilder();
String s;
while((s = br.readLine()) != null) {
sb.append(s);
} 

// Now use sb.toString() instead of file.toString()

Sign up to request clarification or add additional context in comments.

Comments

2

If you are wanting to tokenize the input file then the obvious choice is to use a Scanner. The Scanner class reads a given input stream, and can output either tokens or other scanned types (scanner.nextInt(), scanner.nextLine(), etc).

import java.util.Scanner;
import java.io.File;
import java.io.IOException;
public static void main(String[] args) throws IOException {
    Scanner in = new Scanner(new File("filename.dat"));
    while (in.hasNext) {
        String s = in.next(); //get the next token in the file
        // Now s contains a token from the file
    }
}

Check out Oracle's documentation of the Scanner class for more info.

Comments

0

public class FileTokenize { public static void main(String[] args) throws IOException {

    final var lines = Files.readAllLines(Path.of("myfile.txt"));
    FileWriter writer = new FileWriter( "output.txt");
    String data = " ";

    for (int i = 0; i < lines.size(); i++) {
        data = lines.get(i);
        StringTokenizer token = new StringTokenizer(data);
        while (token.hasMoreElements()) {
            writer.write(token.nextToken() + "\n");
        }
    }
    writer.close();
}

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.