0

I have a .CSV file containing 100 000 records. I need to parse through a set of records and then delete it. Then again parse the next set of records till the end. How to do it? A code snippet will be very helpful.

I tried but I am not able to delete the records and reuse the same CSV file left with remaining set of records.

10
  • what does lac mean? Commented May 4, 2016 at 15:53
  • Do you mean you want to delete the records in memory, or in the csv file? Please post your code; we may be able to spot any errors. Commented May 4, 2016 at 15:57
  • lets just say its 1 million, I was saying 1 lakh! Commented May 4, 2016 at 16:05
  • Once I parse through the first set of records (let us assume 5,000) I dont need them anymore. I want to delete them in csv file too. Commented May 4, 2016 at 16:06
  • 2
    Maybe you could read the source csv file item by item and write the items you want to keep in an tmp file. So the tmp files only contains the values wich you want to keep. After that you can delete the source file and rename the tmp file to the source file's name. Commented May 4, 2016 at 16:48

3 Answers 3

1

This can not be done efficiently, since CSV is a sequential file format. Say you have

"some text", "adsf"
"more text", "adfgagqwe"
"even more text", "adsfasdf"
...

and you want to remove the second line:

"some text", "adsf"
"even more text", "adsfasdf"
...

you need to move up all subsequent lines (which in your case can be 100 000 ...), which involves reading them at their old location and writing them to the new one. That is, deleting the first of 100 000 lines involves reading and writing 99 999 lines of text, which will take a while ...

It is therefore worthwhile to consider alternatives. For instance, if you are trying to process a file, and want to keep track of how far you got, it is far more efficient store the line number (or offset in bytes) you were at, and leave the input file intact. This will also prevent corrupting the file if your program crashes while deleting the lines. Another approach is to first split the file into many small files (perhaps 1000 lines each), process each file in its entirety and then delete the file.

However, if you truly must delete lines from a CSV file, the most robust way is to read the entire file, write all records you want to keep to a new file, delete the original file, and finally rename the new file to the original file.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, I think i have got the solution from your answer, i will try working on something and see if it will work. Actually the main agenda was to read records from a large file(containing like 1 million records) and make it in a efficient way. Also, if, while reading the file something crashes, I should be able to resume from where I left.
0

You cannot edit or delete the existing data of a file. Ideally you should generate a new file for your output. In your case, once you reach the point to delete the existing data, you can create a new file, copy the remaining lines to the file and use this new file as input code:

File infile =new File("C:\\MyInputFile.txt");
File outfile =new File("C:\\MyOutputFile.txt");
instream = new FileInputStream(infile);
outstream = new FileOutputStream(outfile);

byte[] buffer = new byte[1024];
int length;
/*copying the contents from input stream to
 * output stream using read and write methods
 */
while ((length = instream.read(buffer)) > 0){
  outstream.write(buffer, 0, length);
}
//Closing the input/output file streams
instream.close();
outstream.close();

1 Comment

This code fails to close the files if an exception is thrown while writing them. Using the try-with-resources statement introduced in Java 7 would fix that.
0

Below code is tested working fine, you can erase any line in existing csv file using below code, so please check and let me know, you will have to put row number in array to delete,

    File f=new File(System.getProperty("user.home")+"/Desktop/c.csv");

    RandomAccessFile ra=new RandomAccessFile(f,"rw");
    ra.seek(0);

    long p=ra.getFilePointer();
    byte b[]=ra.readLine().getBytes();


    char c=' ';//44 for comma 32 for white space



    for(int i=0;i<b.length;i++){

        if(b[i]!=44){//Replace all except comma
            b[i]=32;
        }
    }

    ra.seek(p);//Go to intial pointer of line
    ra.write(b);//write blank line with commas as column separators                    

    ra.close();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.