56

I am iterating over a pandas dataframe using itertuples. I also want to capture the row number while iterating:

for row in df.itertuples():
    print row['name']

Expected output :

1 larry
2 barry
3 michael

1, 2, 3 are row numbers. I want to avoid using a counter and getting the row number. Is there an easy way to achieve this using pandas?

5
  • Refusing to use enumerate - a common pattern in Python for these cases - seems weird. I would use it. Otherwise df.reset_index() will bring a 0 based index so the row number will be the index you iterate for a given line +1 Commented Apr 5, 2017 at 3:28
  • 1
    You should use iterrows like in this SO post Commented Apr 5, 2017 at 5:00
  • @Boud where does it say they refuse to use enumerate? Commented Apr 16, 2018 at 0:07
  • Does this answer your question? What is the most efficient way to loop through dataframes with pandas? Commented Jan 4, 2021 at 18:16
  • 2
    @Cheng the issue with iterrows is that dtypes may not be consistently maintained across rows. This can be very problematic. Commented Jun 7, 2021 at 23:50

4 Answers 4

78

When using itertuples you get a named tuple for every row. By default, you can access the index value for that row with row.Index.

If the index value isn't what you were looking for then you can use enumerate

for i, row in enumerate(df.itertuples(), 1):
    print(i, row.name)

enumerate takes the place of an ugly counter construct

Sign up to request clarification or add additional context in comments.

2 Comments

Why are counters ugly?
very ugly for(int i=0; i<= arr.length. ; I++) instead of enumrate
40
for row in df.itertuples():
    print(getattr(row, 'Index'), getattr(row, 'name'))

5 Comments

Your answer may be correct, but an explanation would help other readers. For more info read stackoverflow.com/help/how-to-answer
why getattr? just use row.Index, row.name
if the column name is being dynamically synthesized the getattr works nicely as it allows for a string.
Useful if you want to work with column labels as strings.
No longer works with Python 3.12.
13

For column names that aren't valid Python names, use:

for i, row in enumerate(df.itertuples(index=False)):
    print(str(i) + row[df.columns.get_loc('My nasty - column / name')])

If you don't specify index=False, the column before the one named will be read.

4 Comments

Any reason for the downvotes? Not bothered, just curious. Have added enumerate, in case that's it.
I don't think named tuples permit string accessors.
Not sure if I'm missing your point John, but this is working code that successfully solved my issue. get_loc returns the index of the column, not a string.
You’re right, I must have copied it wrong. Thanks for the clarification!
0

If you have a large data frame, (for example million rows), working with itertuples is much much faster then working with iterrows.

From my experience, working with both is pretty easy, you can easily access the values of a data frame.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.