df iterrows pandas with code examples

Introduction to df.iterrows() in Pandas

df.iterrows() is an in-built function in the Pandas library that is used to iterate over the rows of a data frame and get the index and the values in the rows as (index, Series) pairs. In other words, it allows you to loop through each row of a Pandas data frame and perform operations on the values in each row. This function is particularly useful when you have a large data frame and you want to perform operations that are not possible or inefficient to perform using built-in Pandas functions.

Syntax:

df.iterrows()

Here, df is the data frame that you want to iterate over. The iterrows() function returns an iterator that yields index and row data for each row. The index and row data are returned as a tuple, with the first element being the index and the second element being the row data as a Series object.

Example 1: Loop through each row of a Pandas data frame

Consider the following example where we have a data frame containing employee data and we want to loop through each row of the data frame and print the values in each row.

import pandas as pd

# Create a data frame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
                   'Age': [30, 28, 25, 27],
                   'Salary': [50000, 55000, 45000, 40000]})

# Loop through each row of the data frame
for index, row in df.iterrows():
    print(f"Index: {index}")
    print(f"Row data: {row}")
    print("\n")

Output:

Index: 0
Row data: Name      John
Age         30
Salary    50000
Name: 0, dtype: object

Index: 1
Row data: Name      Jane
Age         28
Salary    55000
Name: 1, dtype: object

Index: 2
Row data: Name      Jim
Age         25
Salary    45000
Name: 2, dtype: object

Index: 3
Row data: Name     Jerry
Age         27
Salary    40000
Name: 3, dtype: object

In the above example, we have used the iterrows() function to loop through each row of the data frame df. The loop variable row contains the values in the current row, while the loop variable index contains the index of the current row. We have printed the index and the values in each row to demonstrate how to use the iterrows() function.

Example 2: Modifying values in a Pandas data frame using iterrows()

Consider the following example where we have a data frame containing employee data and we want to increase the salary of employees who are over 25 years old.

import pandas as pd

# Create a data frame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
                   'Age': [30, 28, 25, 27],
                   'Salary': [50000, 55000, 45000, 40000]})

# Increase the salary of employees who are over 25 years old
for index, row in df.iterrows():
Adjacent Topics to `df.iterrows()` in Pandas

1. `df.apply()`: This function allows you to apply a function to each element of a data frame or to each row or column of a data frame. It can be used as an alternative to `iterrows()` when the operation you want to perform can be expressed as a function. For example, you can use `df.apply()` to calculate the average salary of employees in a data frame.

import pandas as pd

Create a data frame

df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
'Age': [30, 28, 25, 27],
'Salary': [50000, 55000, 45000, 40000]})

Calculate the average salary

average_salary = df['Salary'].mean()
print(f"Average salary: {average_salary}")

2. `df.itertuples()`: This function is similar to `iterrows()` but is faster and more memory-efficient as it returns an iterator yielding namedtuples of the rows instead of Series objects. The namedtuples have fields corresponding to the names of the columns of the data frame. For example, you can use `df.itertuples()` to count the number of employees who are over 25 years old.

import pandas as pd

Create a data frame

df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
'Age': [30, 28, 25, 27],
'Salary': [50000, 55000, 45000, 40000]})

Count the number of employees who are over 25 years old

count = 0
for row in df.itertuples():
if row.Age > 25:
count += 1

print(f"Number of employees over 25 years old: {count}")

Conclusion

In this article, we have discussed the `df.iterrows()` function in Pandas and how to use it to iterate over the rows of a data frame. We have also discussed two alternative functions, `df.apply()` and `df.itertuples()`, that can be used to perform operations on a data frame. Understanding these functions is essential for working with Pandas and is a foundation for more advanced operations on data frames.
## Popular questions 
1. What is the `df.iterrows()` function in Pandas used for?

The `df.iterrows()` function in Pandas is used to iterate over the rows of a data frame. It returns an iterator that yields index and row data as pairs. The row data is returned as a Series object that can be used to access the values of the row.

2. How do you use the `df.iterrows()` function to iterate over the rows of a data frame?

To use the `df.iterrows()` function to iterate over the rows of a data frame, you can loop through the iterator returned by the function. For each iteration, you can access the index and row data as a pair. Here is an example:

import pandas as pd

Create a data frame

df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
'Age': [30, 28, 25, 27],
'Salary': [50000, 55000, 45000, 40000]})

Iterate over the rows of the data frame

for index, row in df.iterrows():
print(f"Index: {index}")
print(f"Row: {row}")

3. What is the difference between `df.iterrows()` and `df.itertuples()` in Pandas?

The main difference between `df.iterrows()` and `df.itertuples()` in Pandas is the way they return the data. `df.iterrows()` returns an iterator that yields index and row data as pairs, where the row data is a Series object. On the other hand, `df.itertuples()` returns an iterator that yields namedtuples of the rows, where the namedtuples have fields corresponding to the names of the columns of the data frame. `df.itertuples()` is generally faster and more memory-efficient than `df.iterrows()`.

4. Can you use `df.iterrows()` to modify the data in a data frame?

Yes, you can use `df.iterrows()` to modify the data in a data frame. However, it is generally recommended to avoid using `iterrows()` for this purpose as it can be slow and is not the most efficient way to modify the data in a data frame. Instead, you can use the vectorized operations in Pandas or the `df.apply()` function to modify the data in a data frame.

5. What is the equivalent of `df.iterrows()` in Numpy?

The equivalent of `df.iterrows()` in Numpy is `np.ndenumerate()`, which is used to iterate over the elements of a Numpy array. Like `df.iterrows()`, `np.ndenumerate()` returns an iterator that yields index and value pairs. Here is an example:

import numpy as np

Create a Numpy array

array = np.array([1, 2, 3, 4])

Iterate over the elements of the Numpy array

for index, value in np.ndenumerate(array):
print(f"Index: {index}")
print(f"Value: {value}")

### Tag 
Pandas
Posts created 2498

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top