Is java multi thread can optimize multiple file writing

Question

I have a file of 400+ GB like:

ID Data ...4000+columns
001 dsa
002 Data
… …
17201297 asdfghjkl

I wish to chunk down the file as per ID to get faster data retrieval as like:

mylocation/0/0/1/data.json
mylocation/0/0/2/data.json
.....
mylocation/1/7/2/0/1/2/9/7/data.json

my code is working fine but whatever writer I'm using with loop end closing it takes at least 159,206 milisoconds for 0.001% completion of file creation.

In that case can multithread be an option to reduce Time complexity (as like writing 100 or 1k files at a time)?

My Current code is:

int percent = 0;
File file = new File(fileLocation + fileName);
FileReader fileReader = new FileReader(file); // to read input file

BufferedReader bufReader = new BufferedReader(fileReader);
BufferedWriter fw = null;
LinkedHashMap<String, BufferedWriter> fileMap = new LinkedHashMap<>();
int dataCounter = 0;

while ((theline = bufReader.readLine()) != null) {
    String generatedFilename = generatedFile + chrNo + "//" + directory + "gnomeV3.json";
    Path generatedJsonFilePath = Paths.get(generatedFilename);
    if (!Files.exists(generatedJsonFilePath)) {// create directory
        Files.createDirectories(generatedJsonFilePath.getParent());
        files.createFile(generatedJsonFilePath);
    }
    String jsonData = DBFileMaker(chrNo, theline, pos);
    if (fileMap.containsKey(generatedFilename)) {
        fw = fileMap.get(generatedFilename);
        fw.write(jsonData);
    } else {
        fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(generatedFilename)));
        fw.write(jsonData);
        fileMap.put(generatedFilename, fw);
    }
    if (dataCounter == 172 * percent) {// As I know my number of rows
        long millisec = stopwatch.elapsed(TimeUnit.MILLISECONDS);
        System.out.println("Upto: " + pos + " as " + (Double) (0.001 * percent)
        + "% completion successful." + " took: " + millisec + " miliseconds");
        percent++;
    }
    dataCounter++;
}
for (BufferedWriter generatedFiles : fileMap.values()) {
    generatedFiles.close();
}

Naomi · Accepted Answer · 2019-12-23 16:16:35Z

2

That really depends on your storage. Magnetic disks really like sequential writes, so multithreading would probably have a bad effect on their performance. However, SSDs may benefit from multithreaded writing.

What you should do is Either separate your code to 2 threads, where one thread creates the buffers of data to be written to disk and the second thread only writes the data. This way your disk would always keep busy and not wait for more data to be generated.

Or to have a single thread that generates the buffers to be written, but to use java nio in order to write the data asynchronously, while going on to generate the next buffer.

edited Dec 23, 2019 at 16:16

answered Dec 23, 2019 at 15:51

Naomi

5,56626 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Is java multi thread can optimize multiple file writing

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related