I have a file of 400+ GB like:
ID Data ...4000+columns
001 dsa
002 Data
… …
17201297 asdfghjkl
I wish to chunk down the file as per ID to get faster data retrieval as like:
mylocation/0/0/1/data.json
mylocation/0/0/2/data.json
.....
mylocation/1/7/2/0/1/2/9/7/data.json
my code is working fine but whatever writer I'm using with loop end closing it takes at least 159,206 milisoconds for 0.001% completion of file creation.
In that case can multithread be an option to reduce Time complexity (as like writing 100 or 1k files at a time)?
My Current code is:
int percent = 0;
File file = new File(fileLocation + fileName);
FileReader fileReader = new FileReader(file); // to read input file
BufferedReader bufReader = new BufferedReader(fileReader);
BufferedWriter fw = null;
LinkedHashMap<String, BufferedWriter> fileMap = new LinkedHashMap<>();
int dataCounter = 0;
while ((theline = bufReader.readLine()) != null) {
String generatedFilename = generatedFile + chrNo + "//" + directory + "gnomeV3.json";
Path generatedJsonFilePath = Paths.get(generatedFilename);
if (!Files.exists(generatedJsonFilePath)) {// create directory
Files.createDirectories(generatedJsonFilePath.getParent());
files.createFile(generatedJsonFilePath);
}
String jsonData = DBFileMaker(chrNo, theline, pos);
if (fileMap.containsKey(generatedFilename)) {
fw = fileMap.get(generatedFilename);
fw.write(jsonData);
} else {
fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(generatedFilename)));
fw.write(jsonData);
fileMap.put(generatedFilename, fw);
}
if (dataCounter == 172 * percent) {// As I know my number of rows
long millisec = stopwatch.elapsed(TimeUnit.MILLISECONDS);
System.out.println("Upto: " + pos + " as " + (Double) (0.001 * percent)
+ "% completion successful." + " took: " + millisec + " miliseconds");
percent++;
}
dataCounter++;
}
for (BufferedWriter generatedFiles : fileMap.values()) {
generatedFiles.close();
}