How to Iterate Over Rows in Pandas DataFrame

Pandas is a powerful Python library that allows you to easily store and analyze data in a tabular manner, as rows and columns. They are called dataframes, and allow you to easily access, modify, manipulate and filter data. Sometimes, Python developers need to loop through the rows in Pandas dataframe. There are several ways to do this. In this article, we will learn how to iterate over rows in Pandas dataframe.

Table of Contents

How to Iterate Over Rows in Pandas DataFrame

Here are the common ways to iterate over dataframe rows. Let us say you have the following pandas dataframe.

import pandas as pd
data = {'ID':[1, 2, 3],'Name':['John','Jim', 'Joe'],'Marks':[92,95,94]}
df=pd.DataFrame(data)
print(df)

Here is the output you will see.

   ID  Name  Marks
0   1  John     92
1   2   Jim     95
2   3   Joe     94

In the above dataframe, we have stored 3 rows, one for each student. Each row has 3 columns of data for ID, Name and Marks.

1. Using iterrows()

Let us say you want to calculate the total marks obtained by the 3 students. For this purpose, you will need to iterate over all rows in the table. Each dataframe object supports iterrrows() function that allows you to easily iterate over rows. It returns an iterator to the dataframe, which does not occupy much memory. In each iteration, it returns an index and content object. Here is an example to use iterrows() function to calculate total marks.

total_marks=0

for index, row in df.iterrows():
    total_marks+=row['Marks']
    
print(total_marks)

In the above code, we first define total_marks variable to store the total sum. Then we create a for loop, that retrieves both index and row content in each iteration. In each iteration, we add the value of marks column to total_marks variable. Lastly, we display the total marks. Please note, you need to extract both index as well as row content in for loop, even if you do not use the index anywhere later. Otherwise, it will give an error. Here is an example to show it.

total_marks=0

for row in df.iterrows():
    total_marks+=row['Marks']
    
print(total_marks)

Here is the error message.

ERROR!
Traceback (most recent call last):
  File "<main.py>", line 8, in <module>
TypeError: tuple indices must be integers or slices, not str

Although iterrows() is easy to use and returns an iterator, it is somehow very slow for large dataframes. It is suitable for small dataframes.

2. Using itertuples()

For large dataframes, you can use itertuples() function. It is faster than using iterrows() since it returns a named tuple instead of series content. Named tuples are faster and lightweight than series. Also it returns only the row instead of index and row as in case of iterrows(). It is also available by default for each dataframe. Here is how to use it.

total_marks=0

for row in df.itertuples():
    total_marks+=row.Marks
    
print(total_marks)

In the above code, we loop through the result of itertuples(). In each iteration, we add the Marks attribute of the row to total marks. Please note, in this case, we only retrieve the row, instead of index and row in iterrows(). Also, we need to mention the column names as attribute names using dot(.) notation. For example, using row[‘Marks’] will give an error.

total_marks=0

for row in df.itertuples():
    total_marks+=row['Marks']
    
print(total_marks)

Here is the error.

ERROR!
Traceback (most recent call last):
  File "<main.py>", line 8, in <module>
TypeError: tuple indices must be integers or slices, not str

3. Using apply()

The apply() function is used to apply a function on every row or column of dataframe. It is vectorized operation and much faster than other looping methods. It is faster than using iterrows or itertuples. It is very useful for performing row-wise or column-wise operations, without explicitly iterating over rows. So it is super useful while working with large datasets. Here is a simple example to add 2 marks to each student’s marks using apply().

def add_marks(row):
    return row['Marks']+2

# Apply function row-wise
result = df.apply(add_marks, axis=1)
for res in result:
    print(res)

In the above code, we first define an add_marks() function that accepts a row input and adds 2 marks to the value of its Marks attribute.

Then we call apply() function on our data frame. In that, we pass two arguments – add_marks function object and axis argument. We set axis argument value to 1, to apply function row-wise. We store the result in a variable and loop through it to display the updated marks. Here is the output.

94
97
96

4. Using indexes

You can also use index-based iterative functions such as iloc[] or loc[]. iloc[] allows you to access rows using indexes and loc[] allows you to access rows using labels. It is slower than other methods but allows you to precisely access and modify the row you want. Here is a simple example to loop through the rows of dataframe and display each row’s values.

for i in range(len(df)):
    print(f"Row {i}: {df.iloc[i]}")

Here is the output.

Row 0: ID          1
Name     John
Marks      92
Name: 0, dtype: object
Row 1: ID         2
Name     Jim
Marks     95
Name: 1, dtype: object
Row 2: ID         3
Name     Joe
Marks     94
Name: 2, dtype: object

5. Using List Comprehension

List comprehension is one of the fastest ways to iterate over rows of dataframe. It is easy to understand, versatile and faster than most of the other solutions to loop over dataframe items. Here is a simple example to quickly iterate every row in dataframe. In each iteration, we extract the value of Marks column and store it in results list. Lastly, we call sum() function on this list.

result = [x for x in df['Marks']]
print(sum(result))

Here is the output.

You can easily modify the above code to iterate over multiple columns and in each iteration, store the column values in a tuple and add it to result list.

result = [(x,y) for x,y in zip(df['Name'],df['Marks'])]
print(result)

Here is the output.

[('John', 92), ('Jim', 95), ('Joe', 94)]

In the above code, we use separate loop variables for each column. You can also use a single variable to to fetch the entire row. Here is an example to illustrate it. Here we use column indexes starting from 0, 1, .. to access different column values for each row.

result = [(row[0],row[1]) for row in zip(df['Name'],df['Marks'])]
print(result)

Here is its result.

[('John', 92), ('Jim', 95), ('Joe', 94)]

Conclusion

In this article, we have learnt several simple ways to easily iterate over rows in pandas dataframe. If you are beginner developer working with small datasets, use iterrows(). If you are working with large dataframes, use itertuples(). If you are reasonably experienced, then you can use apply() function or list comprehensions, both of which are faster than using iterrows() or itertuples(). Also, they are versatile and work well with large dataframes too.

FAQ

1. Which is the fastest method to iterate over Pandas dataframe?

Using itertuples() is one of the fastest methods to iterate over Pandas dataframe. They work well even for large dataframes. If vectorization functions are available then they may be even faster in many use cases.

How to Sort Python Dictionary By Value

How to Concatenate Two Lists in Python

How to Copy Data from One Dictionary to Another in Python

How to Backup MySQL Database in Python

Sreeram Sreenivasan

Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.

How to Iterate Over Rows in Pandas DataFrame

How to Iterate Over Rows in Pandas DataFrame

1. Using iterrows()

2. Using itertuples()

3. Using apply()

4. Using indexes

5. Using List Comprehension

Conclusion

FAQ

Related posts:

How to Sort Python Dictionary By Value

How to Concatenate Two Lists in Python

How to Copy Data from One Dictionary to Another in Python

How to Backup MySQL Database in Python

Leave a Reply Cancel reply

How to Iterate Over Rows in Pandas DataFrame

1. Using iterrows()

2. Using itertuples()

3. Using apply()

4. Using indexes

5. Using List Comprehension

Conclusion

FAQ

Related posts:

How to Sort Python Dictionary By Value

How to Concatenate Two Lists in Python

How to Copy Data from One Dictionary to Another in Python

How to Backup MySQL Database in Python

Share this:

Leave a Reply Cancel reply