Introduction to df.iterrows()
in Pandas
df.iterrows()
is an in-built function in the Pandas library that is used to iterate over the rows of a data frame and get the index and the values in the rows as (index, Series) pairs. In other words, it allows you to loop through each row of a Pandas data frame and perform operations on the values in each row. This function is particularly useful when you have a large data frame and you want to perform operations that are not possible or inefficient to perform using built-in Pandas functions.
Syntax:
df.iterrows()
Here, df
is the data frame that you want to iterate over. The iterrows()
function returns an iterator that yields index and row data for each row. The index and row data are returned as a tuple, with the first element being the index and the second element being the row data as a Series object.
Example 1: Loop through each row of a Pandas data frame
Consider the following example where we have a data frame containing employee data and we want to loop through each row of the data frame and print the values in each row.
import pandas as pd
# Create a data frame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
'Age': [30, 28, 25, 27],
'Salary': [50000, 55000, 45000, 40000]})
# Loop through each row of the data frame
for index, row in df.iterrows():
print(f"Index: {index}")
print(f"Row data: {row}")
print("\n")
Output:
Index: 0
Row data: Name John
Age 30
Salary 50000
Name: 0, dtype: object
Index: 1
Row data: Name Jane
Age 28
Salary 55000
Name: 1, dtype: object
Index: 2
Row data: Name Jim
Age 25
Salary 45000
Name: 2, dtype: object
Index: 3
Row data: Name Jerry
Age 27
Salary 40000
Name: 3, dtype: object
In the above example, we have used the iterrows()
function to loop through each row of the data frame df
. The loop variable row
contains the values in the current row, while the loop variable index
contains the index of the current row. We have printed the index and the values in each row to demonstrate how to use the iterrows()
function.
Example 2: Modifying values in a Pandas data frame using iterrows()
Consider the following example where we have a data frame containing employee data and we want to increase the salary of employees who are over 25 years old.
import pandas as pd
# Create a data frame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
'Age': [30, 28, 25, 27],
'Salary': [50000, 55000, 45000, 40000]})
# Increase the salary of employees who are over 25 years old
for index, row in df.iterrows():
Adjacent Topics to `df.iterrows()` in Pandas
1. `df.apply()`: This function allows you to apply a function to each element of a data frame or to each row or column of a data frame. It can be used as an alternative to `iterrows()` when the operation you want to perform can be expressed as a function. For example, you can use `df.apply()` to calculate the average salary of employees in a data frame.
import pandas as pd
Create a data frame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
'Age': [30, 28, 25, 27],
'Salary': [50000, 55000, 45000, 40000]})
Calculate the average salary
average_salary = df['Salary'].mean()
print(f"Average salary: {average_salary}")
2. `df.itertuples()`: This function is similar to `iterrows()` but is faster and more memory-efficient as it returns an iterator yielding namedtuples of the rows instead of Series objects. The namedtuples have fields corresponding to the names of the columns of the data frame. For example, you can use `df.itertuples()` to count the number of employees who are over 25 years old.
import pandas as pd
Create a data frame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
'Age': [30, 28, 25, 27],
'Salary': [50000, 55000, 45000, 40000]})
Count the number of employees who are over 25 years old
count = 0
for row in df.itertuples():
if row.Age > 25:
count += 1
print(f"Number of employees over 25 years old: {count}")
Conclusion
In this article, we have discussed the `df.iterrows()` function in Pandas and how to use it to iterate over the rows of a data frame. We have also discussed two alternative functions, `df.apply()` and `df.itertuples()`, that can be used to perform operations on a data frame. Understanding these functions is essential for working with Pandas and is a foundation for more advanced operations on data frames.
## Popular questions
1. What is the `df.iterrows()` function in Pandas used for?
The `df.iterrows()` function in Pandas is used to iterate over the rows of a data frame. It returns an iterator that yields index and row data as pairs. The row data is returned as a Series object that can be used to access the values of the row.
2. How do you use the `df.iterrows()` function to iterate over the rows of a data frame?
To use the `df.iterrows()` function to iterate over the rows of a data frame, you can loop through the iterator returned by the function. For each iteration, you can access the index and row data as a pair. Here is an example:
import pandas as pd
Create a data frame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jerry'],
'Age': [30, 28, 25, 27],
'Salary': [50000, 55000, 45000, 40000]})
Iterate over the rows of the data frame
for index, row in df.iterrows():
print(f"Index: {index}")
print(f"Row: {row}")
3. What is the difference between `df.iterrows()` and `df.itertuples()` in Pandas?
The main difference between `df.iterrows()` and `df.itertuples()` in Pandas is the way they return the data. `df.iterrows()` returns an iterator that yields index and row data as pairs, where the row data is a Series object. On the other hand, `df.itertuples()` returns an iterator that yields namedtuples of the rows, where the namedtuples have fields corresponding to the names of the columns of the data frame. `df.itertuples()` is generally faster and more memory-efficient than `df.iterrows()`.
4. Can you use `df.iterrows()` to modify the data in a data frame?
Yes, you can use `df.iterrows()` to modify the data in a data frame. However, it is generally recommended to avoid using `iterrows()` for this purpose as it can be slow and is not the most efficient way to modify the data in a data frame. Instead, you can use the vectorized operations in Pandas or the `df.apply()` function to modify the data in a data frame.
5. What is the equivalent of `df.iterrows()` in Numpy?
The equivalent of `df.iterrows()` in Numpy is `np.ndenumerate()`, which is used to iterate over the elements of a Numpy array. Like `df.iterrows()`, `np.ndenumerate()` returns an iterator that yields index and value pairs. Here is an example:
import numpy as np
Create a Numpy array
array = np.array([1, 2, 3, 4])
Iterate over the elements of the Numpy array
for index, value in np.ndenumerate(array):
print(f"Index: {index}")
print(f"Value: {value}")
### Tag
Pandas