2

I'm dealing with large files that doesn't fit in memory, as a result of that I'm using the iterator functionality of Pandas' Dataframe and processing single chunk each time.

pd.read_csv(csv_file_name, encoding='utf-8', chunksize=chunk_size, iterator=True,
                                            engine='c', error_bad_lines=False, low_memory=False)

While processing I'd like to print the number of processed rows and the percentage of processed rows out of the total amount of rows.

To get the total amount of rows in a Pandas Dataframe I'm using

len(df.index)

But when trying to use it when using ierator I'm getting

AttributeError: 'TextFileReader' object has no attribute 'index'

Any way of doing that? (while not going over each chunk)

1
  • 2
    You won't know about bad lines until you process the chunk and so at best you're only going to get an estimate of the final total. If an estimate is good enough might as well just print the number of lines in the csv: see stackoverflow.com/q/41553467/2750819 if you need help with that. Commented Oct 28, 2019 at 10:56

1 Answer 1

0

Two possible work arounds I would use:

  1. Use the columns option and read the file in with just one column. It may be that is small enough you can read in one go, but if not iterate over that to count the number of rows.

  2. Use the linux command wc -l to count the number of lines. If you have a header you need to remove one. e.g.

wc_output = subprocess.run(['wc','-l', 'csv_file_name'])
# wc_output.stdout will be of format ` N_lines filename`
# subtract 1 to remove header
n_rows = int(wc_output.stdout.split()[0]) - 1
Sign up to request clarification or add additional context in comments.

1 Comment

Kent Shikama's comment has a link to a question with some better suggestions than mine :-) I have upvoted his comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.