Pandas is a powerful library for data manipulation and analysis in Python. One of its most important features is the ability to handle and manipulate data in a tabular format, using the DataFrame object. A DataFrame is a twodimensional sizemutable, heterogeneous tabular data structure with labeled axes (rows and columns).
A common task when working with DataFrames is to convert one or more of its columns to a NumPy array. NumPy is a library for the Python programming language, adding support for large, multidimensional arrays and matrices, along with a large collection of highlevel mathematical functions to operate on these arrays.
Here is an example of how to convert a single column of a DataFrame to a NumPy array:
import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {'name': ['John', 'Mike', 'Amy', 'Jane'],
'age': [25, 30, 22, 18],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
# Convert the 'age' column to a NumPy array
age_column = df['age'].values
print(age_column)
# Output: [25 30 22 18]
In this example, we first import the pandas and numpy libraries, then create a sample DataFrame using a dictionary of data. We then use the .values
attribute to convert the 'age' column to a NumPy array.
To convert multiple columns of a DataFrame to a NumPy array, we can use the .loc
indexer along with the .values
attribute:
# Select multiple columns and convert to numpy array
selected_columns = df.loc[:, ['name', 'age']].values
print(selected_columns)
# Output: [['John' 25]
# ['Mike' 30]
# ['Amy' 22]
# ['Jane' 18]]
In this example, we use the .loc
indexer to select the 'name' and 'age' columns of the DataFrame, and then use the .values
attribute to convert them to a NumPy array.
Another way to convert multiple columns of a DataFrame to a NumPy array is to use the .iloc
indexer:
# Select multiple columns and convert to numpy array
selected_columns = df.iloc[:, [0, 1]].values
print(selected_columns)
# Output: [['John' 25]
# ['Mike' 30]
# ['Amy' 22]
# ['Jane' 18]]
In this example, we use the .iloc
indexer to select the first and second columns of the DataFrame (by index), and then use the .values
attribute to convert them to a NumPy array.
It's important to note that when you convert a DataFrame column to a NumPy array, the resulting array will have the same shape as the DataFrame column (1D), so if you want to convert multiple columns and stack them, you need to use np.column_stack or np.hstack or np.vstack accordingly.
In summary, converting columns of a pandas DataFrame to a NumPy array is a straightforward task that can be done using the .values
attribute
Sure, here are a few additional topics related to working with pandas and NumPy that you may find useful:

Selecting rows and columns by index or label: In the examples above, we used the
.loc
and.iloc
indexers to select columns by label or index, respectively. These indexers can also be used to select rows, by specifying the row index or label in the first argument. For example,df.loc[0]
would select the first row of the DataFrame, anddf.iloc[:, [0, 1]]
would select the first and second columns. 
Filtering rows based on a condition: Another common task when working with DataFrames is to filter the rows based on a certain condition. This can be done using boolean indexing, where a boolean array is used to select the rows that meet the condition. For example, to select all rows where the 'age' column is greater than 25, we could use the following code:
filtered_df = df[df['age'] > 25]
 Applying mathematical operations: NumPy is well known for its support for fast mathematical operations on arrays. When working with DataFrame columns that have been converted to NumPy arrays, you can use these operations to perform calculations on the entire array at once. For example, to calculate the square of each element in the 'age' column, we could use the following code:
squared_ages = np.square(age_column)

Working with multidimensional arrays: NumPy arrays can have any number of dimensions, unlike pandas DataFrame which is 2D. When working with multidimensional arrays, it's important to understand the shape and dimensions of the array. The
.shape
attribute returns the shape of the array as a tuple of integers, and the.ndim
attribute returns the number of dimensions. There are also a number of functions available for reshaping and manipulating multidimensional arrays, such asnp.reshape
,np.transpose
, andnp.squeeze
. 
Joining and merging DataFrames: As you work with more and more data, it's likely that you'll need to combine data from multiple DataFrames into a single one. Pandas provides several ways to do this, including the
pd.concat
function for concatenating DataFrames along a particular axis, and thepd.merge
function for merging DataFrames based on one or more common columns.
These are some of the topics related to working with pandas and numpy, but there are many more you can explore. I suggest you to check the official documentation of pandas and numpy to find more about advanced topics.
Popular questions
 How do I convert a specific column in a pandas DataFrame to a NumPy array?
To convert a specific column in a pandas DataFrame to a NumPy array, you can use the .values
attribute of the column. For example, if the DataFrame is called df
and the column you want to convert is called 'age', you can use the following code:
age_column = df['age'].values
 Can I convert multiple columns in a DataFrame to NumPy arrays at once?
Yes, you can convert multiple columns in a DataFrame to NumPy arrays at once by passing a list of column names to the .values
attribute. For example, if you want to convert the 'age' and 'height' columns to NumPy arrays, you can use the following code:
age_column, height_column = df[['age', 'height']].values.T
 How do I convert a DataFrame to a NumPy array?
To convert a entire DataFrame to a NumPy array, you can use the .values
attribute of the DataFrame. For example, if the DataFrame is called df
, you can use the following code:
df_array = df.values
 Is it possible to convert a specific column of a DataFrame to a 2D NumPy array?
Yes, it is possible to convert a specific column of a DataFrame to a 2D NumPy array. By default, the .values
attribute returns a 1D array, but you can use the np.newaxis
or None
keyword to add an additional dimension to the array. For example, if the DataFrame is called df
and the column you want to convert is called 'age', you can use the following code:
age_column_2d = df['age'].values[:, np.newaxis]
 I have a DataFrame with multiple columns, is it possible to convert all columns to a list of NumPy arrays?
Yes, it is possible to convert all columns of a DataFrame to a list of NumPy arrays. You can use the .columns
attribute to get a list of column names and then iterate through the list and convert each column to a NumPy array using the .values
attribute. For example, if the DataFrame is called df
, you can use the following code:
columns = df.columns
numpy_arrays = [df[col].values for col in columns]
This will give you a list of numpy array containing values of each column of DataFrame.
Tag
Conversion