1019

I need to read a large text file of around 5-6 GB line by line using Java.

How can I do this quickly?

2
  • 5
    Here is a comparison of speed for six possible implementations. Commented Nov 15, 2016 at 9:51
  • After Shog's edit this is indeed a duplicate of stackoverflow.com/q/5800361/103167 but this one has gotten far more activity. Commented Jun 30, 2021 at 18:51

22 Answers 22

1258

A common pattern is to use

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
    String line;
    while ((line = br.readLine()) != null) {
       // process the line.
    }
}

You can read the data faster if you assume there is no character encoding. e.g. ASCII-7 but it won't make much difference. It is highly likely that what you do with the data will take much longer.

EDIT: A less common pattern to use which avoids the scope of line leaking.

try(BufferedReader br = new BufferedReader(new FileReader(file))) {
    for(String line; (line = br.readLine()) != null; ) {
        // process the line.
    }
    // line is not visible here.
}

UPDATE: In Java 8 you can do

try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
        stream.forEach(System.out::println);
}

NOTE: You have to place the Stream in a try-with-resource block to ensure the #close method is called on it, otherwise the underlying file handle is never closed until the garbage collector does it much later.

Sign up to request clarification or add additional context in comments.

35 Comments

What does this pattern look like with proper exception handling? I note that br.close() throws IOException, which seems surprising -- what could happen when closing a file that is opened for read, anyway? FileReader's constructor might throw a FileNotFound exception.
If I have a 200MB file and it can read at 90MB/s then I expect it to take ~3s? Mine seem to take minutes, with this "slow" way of reading. I am on an SSD so read speeds should not be a problem?
@JiewMeng SO I would suspect something else you are doing is taking time. Can you try just reading the lines of the file and nothing else.
Why not for(String line = br.readLine(); line != null; line = br.readLine()) Btw, in Java 8 you can do try( Stream<String> lines = Files.lines(...) ){ for( String line : (Iterable<String>) lines::iterator ) { ... } } Which is hard not to hate.
@AleksandrDubinsky The problem I have with closures in Java 8 is that it very easily makes the code more complicated to read (as well as being slower) I can see lots of developers overusing it because it is "cool".
|
186

Look at this blog:

The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.

// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");

// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));

String strLine;

//Read File Line By Line
while ((strLine = br.readLine()) != null)   {
  // Print the content on the console
  System.out.println (strLine);
}

//Close the input stream
in.close();

5 Comments

My file is 1.5 Gig and it's not possible to read the file using your answer!
@AboozarRajabi Of course it is possible. This code can read any text file.
Downvoted for poor quality link. There is a completely pointless DataInputStream, and the wrong stream is closed. Nothing wrong with the Java Tutorial, and no need to cite arbitrary third-party Internet rubbish like this.
I'd ditch the comments, you have 4 lines of 100% redundant comments for 6 lines of code.
@user207421 Can you explain your last comment?
134

Once Java 8 is out (March 2014) you'll be able to use streams:

try (Stream<String> lines = Files.lines(Paths.get(filename), Charset.defaultCharset())) {
  lines.forEachOrdered(line -> process(line));
}

Printing all the lines in the file:

try (Stream<String> lines = Files.lines(file, Charset.defaultCharset())) {
  lines.forEachOrdered(System.out::println);
}

6 Comments

Use StandardCharsets.UTF_8, use Stream<String> for conciseness, and avoid using forEach() and especially forEachOrdered() unless there's a reason.
Why avoid forEach()? Is it bad?
If I us forEach instead of forEachOrdered, the lines might be printed out of order, aren't they?
@steventrouble Take a look at: stackoverflow.com/questions/16635398/… It's not bad if you pass a short function reference like forEach(this::process), but it gets ugly if you write blocks of code as lambdas inside forEach().
@msayag, You're right, you need forEachOrdered in order to execute in-order. Be aware that you won't be able to parallelize the stream in that case, although I've found that parallelization doesn't turn on unless the file has thousands of lines.
|
40

Here is a sample with full error handling and supporting charset specification for pre-Java 7. With Java 7 you can use try-with-resources syntax, which makes the code cleaner.

If you just want the default charset you can skip the InputStream and use FileReader.

InputStream ins = null; // raw byte-stream
Reader r = null; // cooked reader
BufferedReader br = null; // buffered for readLine()
try {
    String s;
    if (true) {
        String data = "#foobar\t1234\n#xyz\t5678\none\ttwo\n";
        ins = new ByteArrayInputStream(data.getBytes());
    } else {
        ins = new FileInputStream("textfile.txt");
    }
    r = new InputStreamReader(ins, "UTF-8"); // leave charset out for default
    br = new BufferedReader(r);
    while ((s = br.readLine()) != null) {
        System.out.println(s);
    }
}
catch (Exception e)
{
    System.err.println(e.getMessage()); // handle exception
}
finally {
    if (br != null) { try { br.close(); } catch(Throwable t) { /* ensure close happens */ } }
    if (r != null) { try { r.close(); } catch(Throwable t) { /* ensure close happens */ } }
    if (ins != null) { try { ins.close(); } catch(Throwable t) { /* ensure close happens */ } }
}

Here is the Groovy version, with full error handling:

File f = new File("textfile.txt");
f.withReader("UTF-8") { br ->
    br.eachLine { line ->
        println line;
    }
}

2 Comments

What does a ByteArrayInputStream fed by a string literal have to do with reading a large text file?
absolutely useless closes. There is zero reason to close every stream. If you close any of those streams you automatically close all other streams...
29

I documented and tested 10 different ways to read a file in Java and then ran them against each other by making them read in test files from 1KB to 1GB. Here are the fastest 3 file reading methods for reading a 1GB test file.

Note that when running the performance tests I didn't output anything to the console since that would really slow down the test. I just wanted to test the raw reading speed.

1) java.nio.file.Files.readAllBytes()

Tested in Java 7, 8, 9. This was overall the fastest method. Reading a 1GB file was consistently just under 1 second.

import java.io..File;
import java.io.IOException;
import java.nio.file.Files;

public class ReadFile_Files_ReadAllBytes {
  public static void main(String [] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    File file = new File(fileName);

    byte [] fileBytes = Files.readAllBytes(file.toPath());
    char singleChar;
    for(byte b : fileBytes) {
      singleChar = (char) b;
      System.out.print(singleChar);
    }
  }
}

2) java.nio.file.Files.lines()

This was tested successfully in Java 8 and 9 but it won't work in Java 7 because of the lack of support for lambda expressions. It took about 3.5 seconds to read in a 1GB file which put it in second place as far as reading larger files.

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;

public class ReadFile_Files_Lines {
  public static void main(String[] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    File file = new File(fileName);

    try (Stream linesStream = Files.lines(file.toPath())) {
      linesStream.forEach(line -> {
        System.out.println(line);
      });
    }
  }
}

3) BufferedReader

Tested to work in Java 7, 8, 9. This took about 4.5 seconds to read in a 1GB test file.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class ReadFile_BufferedReader_ReadLine {
  public static void main(String [] args) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    FileReader fileReader = new FileReader(fileName);

    try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
      String line;
      while((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
      }
    }
  }

You can find the complete rankings for all 10 file reading methods here.

6 Comments

Your guide is amazing :)
You are mostly timing System.out.print/println() here; you are also assuming the file will fit into memory in your first two cases.
Fair enough. Maybe I could've made those assumptions more explicit in my answer.
the question asked for reading line by line, only last method qualifies...
@eis Given that he tested 10 ways to read a file and the third fastest is line-by-line, it can be assumed reasonably that the third method shown here is also the fastest way to read a file line-by-line. I would argue then that he not only fully answered the question, but gave additional information as well which is quite useful to know.
|
22

What you can do is scan the entire text using Scanner and go through the text line by line. Of course you should import the following:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public static void readText throws FileNotFoundException {
    Scanner scan = new Scanner(new File("samplefilename.txt"));
    while(scan.hasNextLine()){
        String line = scan.nextLine();
        //Here you can manipulate the string the way you want
    }
}

Scanner basically scans all the text. The while loop is used to traverse through the entire text.

The .hasNextLine() function is a boolean that returns true if there are still more lines in the text. The .nextLine() function gives you an entire line as a String which you can then use the way you want. Try System.out.println(line) to print the text.

Side Note: .txt is the file type text.

2 Comments

Shouldn't the method declaration look instead of this: ´public static void readText throws FileNotFoundException(){´ Like: ´public static void readText() throws FileNotFoundException{´
This is considerably slower than BufferedReader.readLine(), and he asked for the best-performing method.
20

In Java 8, you could do:

try (Stream<String> lines = Files.lines (file, StandardCharsets.UTF_8))
{
    for (String line : (Iterable<String>) lines::iterator)
    {
        ;
    }
}

Some notes: The stream returned by Files.lines (unlike most streams) needs to be closed. For the reasons mentioned here I avoid using forEach(). The strange code (Iterable<String>) lines::iterator casts a Stream to an Iterable.

7 Comments

By not implementing Iterable this code is definitively ugly although useful. It needs a cast (ie (Iterable<String>)) to work.
How can I skip the first line with this method?
@qed for(String line : (Iterable<String>) lines.skip(1)::iterator)
If you’re not intending to actually use Stream features, using Files.newBufferedReader instead of Files.lines and repeatedly calling readLine() until null instead of using constructs like (Iterable<String>) lines::iterator seems to be much simpler…
@user207421 Why do you say it reads the file into memory? The javadoc says, Unlike readAllLines, [File.lines] does not read all lines into a List, but instead populates lazily as the stream is consumed... The returned stream encapsulates a Reader.
|
18

FileReader won't let you specify the encoding, use InputStreamReaderinstead if you need to specify it:

try {
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), "Cp1252"));         

    String line;
    while ((line = br.readLine()) != null) {
        // process the line.
    }
    br.close();

} catch (IOException e) {
    e.printStackTrace();
}

If you imported this file from Windows, it might have ANSI encoding (Cp1252), so you have to specify the encoding.

Comments

16

In Java 7:

String folderPath = "C:/folderOfMyFile";
Path path = Paths.get(folderPath, "myFileName.csv"); //or any text file eg.: txt, bat, etc
Charset charset = Charset.forName("UTF-8");

try (BufferedReader reader = Files.newBufferedReader(path , charset)) {
  while ((line = reader.readLine()) != null ) {
    //separate all csv fields into string array
    String[] lineVariables = line.split(","); 
  }
} catch (IOException e) {
    System.err.println(e);
}

5 Comments

be aware! using line.split this way will NOT parse properly if a field contains a comma and it is surrounded by quotes. This split will ignore that and just separate the field in chunks using the internal comma. HTH, Marcelo.
CSV: Comma Separated Values file, thus you shouldn't use comma in a csv field, unless you mean to add another field. So, use split for comma token in java when parsing a CSV file is perfectly fine and right
Diego, this is not correct. The only CSV standard (RFC 4180) specifically says "Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes."
Use StandardCharsets.UTF_8 to avoid the checked exception in Charset.forName("UTF-8")
Thank you "Diego Duarte" for your comment; i must say i agree with what "serg.nechaev" replies. I see commas embedded in csv files 'all the time'. People expect that this will be accepted. with all due respect. also a big thanks to "serg.nechaev". IMHO you are right. Cheerse Everyone.
15

In Java 8, there is also an alternative to using Files.lines(). If your input source isn't a file but something more abstract like a Reader or an InputStream, you can stream the lines via the BufferedReaders lines() method.

For example:

try (BufferedReader reader = new BufferedReader(...)) {
  reader.lines().forEach(line -> processLine(line));
}

will call processLine() for each input line read by the BufferedReader.

Comments

13

For reading a file with Java 8

package com.java.java8;

import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;

/**
 * The Class ReadLargeFile.
 *
 * @author Ankit Sood Apr 20, 2017
 */
public class ReadLargeFile {

    /**
     * The main method.
     *
     * @param args
     *            the arguments
     */
    public static void main(String[] args) {
        try {
            Stream<String> stream = Files.lines(Paths.get("C:\\Users\\System\\Desktop\\demoData.txt"));
            stream.forEach(System.out::println);
        }
        catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

Comments

10

You can use Scanner class

Scanner sc=new Scanner(file);
sc.nextLine();

5 Comments

@Tim 'Bomb horribly' is not a term I recognize in CS. What exactly do you mean?
Bog down, execute very slowly, most likely crash. I probably should avoid idioms on this site ;)
@Tim Why would it do so?
Using Scanner is fine, but this answer does not include the full code to use it properly.
@Tim This code will neither 'bomb horribly' nor 'bog down' nor 'execute very slowly' nor 'most likely crash'. As a matter of fact as written it will only read one line, almost instaneously. You can read megabytes per second this way, although BufferedReader.readLine() is certainly several times as fast. If you think otherwise please provide your reasons.
8

Java 9:

try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
    stream.forEach(System.out::println);
}

5 Comments

I think you have to System.getProperty("os.name").equals("Linux")
Don't compare strings with == !
This is the canonical Java 8 example, as already posted by others. Why do you claim that this is “Java-9”?
@Holger memory mapped files that he forgot to mention may be?
to process it line by line you can do try (Stream<String> stream = Files.lines(Paths.get(inputFile))) { stream.forEach((line) -> { System.out.println(line); }); }
7

You need to use the readLine() method in class BufferedReader. Create a new object from that class and operate this method on him and save it to a string.

BufferReader Javadoc

Comments

6

The clear way to achieve this,

For example:

If you have dataFile.txt on your current directory

import java.io.*;
import java.util.Scanner;
import java.io.FileNotFoundException;

public class readByLine
{
    public readByLine() throws FileNotFoundException
    {
        Scanner linReader = new Scanner(new File("dataFile.txt"));

        while (linReader.hasNext())
        {
            String line = linReader.nextLine();
            System.out.println(line);
        }
        linReader.close();

    }

    public static void main(String args[])  throws FileNotFoundException
    {
        new readByLine();
    }
}

The output like as below, enter image description here

2 Comments

Why is it clearer? And don't post pictures of text here. Post the text.
You posted a picture. It is a picture of text. You could have cut and pasted the text directly into this page. Nobody said anything about posting programs. Posting pictures of text is a waste of your time, which I don't care about, and oyur bandwidth, which I do.
3
BufferedReader br;
FileInputStream fin;
try {
    fin = new FileInputStream(fileName);
    br = new BufferedReader(new InputStreamReader(fin));

    /*Path pathToFile = Paths.get(fileName);
    br = Files.newBufferedReader(pathToFile,StandardCharsets.US_ASCII);*/

    String line = br.readLine();
    while (line != null) {
        String[] attributes = line.split(",");
        Movie movie = createMovie(attributes);
        movies.add(movie);
        line = br.readLine();
    }
    fin.close();
    br.close();
} catch (FileNotFoundException e) {
    System.out.println("Your Message");
} catch (IOException e) {
    System.out.println("Your Message");
}

It works for me. Hope It will help you too.

Comments

3

You can use streams to do it more precisely:

Files.lines(Paths.get("input.txt")).forEach(s -> stringBuffer.append(s);

1 Comment

I agree that it is actually fine. Aguess, people dislike it because of strange StringBuffer choice (StringBuilder is generally preferred, even though it might just be a bad name for variable). Also because it is already mentioned above.
2

I usually do the reading routine straightforward:

void readResource(InputStream source) throws IOException {
    BufferedReader stream = null;
    try {
        stream = new BufferedReader(new InputStreamReader(source));
        while (true) {
            String line = stream.readLine();
            if(line == null) {
                break;
            }
            //process line
            System.out.println(line)
        }
    } finally {
        closeQuiet(stream);
    }
}

static void closeQuiet(Closeable closeable) {
    if (closeable != null) {
        try {
            closeable.close();
        } catch (IOException ignore) {
        }
    }
}

Comments

1

By using the org.apache.commons.io package, it gave more performance, especially in legacy code which uses Java 6 and below.

Java 7 has a better API with fewer exceptions handling and more useful methods:

LineIterator lineIterator = null;
try {
    lineIterator = FileUtils.lineIterator(new File("/home/username/m.log"), "windows-1256"); // The second parameter is optionnal
    while (lineIterator.hasNext()) {
        String currentLine = lineIterator.next();
        // Some operation
    }
}
finally {
    LineIterator.closeQuietly(lineIterator);
}

Maven

<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.6</version>
</dependency>

Comments

-1

You can use this code:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class ReadTextFile {

    public static void main(String[] args) throws IOException {

        try {

            File f = new File("src/com/data.txt");

            BufferedReader b = new BufferedReader(new FileReader(f));

            String readLine = "";

            System.out.println("Reading file using Buffered Reader");

            while ((readLine = b.readLine()) != null) {
                System.out.println(readLine);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

2 Comments

An explanation would be in order.
Specifically an explanation of why this constitutes 'quickly', as opposed to all the other ways of doing it. You don't need to initialize the readLine variable.
-1

You can also use Apache Commons IO:

File file = new File("/home/user/file.txt");
try {
    List<String> lines = FileUtils.readLines(file);
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

1 Comment

FileUtils.readLines(file) is a deprecated method. Additionally, the method invokes IOUtils.readLines, which uses a BufferedReader and ArrayList. This is not a line-by-line method, and certainly not one that would be practical for reading several GB.
-2

For Android developers ending up here (who use Kotlin):

val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
file
    .bufferedReader()
    .lineSequence()
    .forEach(::println)

Or:

val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
file.useLines { lines ->
    lines.forEach(::println)
}

Notes:

  • The vegetables.txt file should be in your classpath (for example, in src/main/resources directory)

  • The above solutions all treat the file encodings as UTF-8 by default. You can specify your desired encoding as the argument for the functions.

  • The above solutions do not need any further action like closing the files or readers. They are automatically taken care of by the Kotlin standard library.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.