How to remove columns in Pandas DataFrames
Following our previous introductory post about Pandas Library in Python, we will now see how to manipulate data. The first thing we usually do when we create a new DataFrame from an Excel file, is to cleanse the data by dropping unnecessary columns.
Create a new DataFrame
For simplicity reasons in our example below, instead of importing the data (from the Excel file), we will just create a small sample dataframe from scratch. This way, you can easily follow along by copying the code below into your Python environment.
# Firstly, we need to import Pandas library in Python
import pandas as pd
# Create some sample data (supposedly some products with cost & stock)
data = {
'Product Name':['Product A','Product B','Product C'],
'Cost / Unit':[10, 12, 17],
'Warranty Period':['2 Years', '2 Years', '2 Years'],
'Current Stock':[100, 128, 85],
'Supplier':['Factory A', 'Factory A', 'Factory A']
}
# Create the DataFrame
df = pd.DataFrame(data)
# let's print the results so we can see the outcome
print(df)
Python will print the below results:
Remove columns in Pandas DataFrames
Now it’s time to see how to remove some columns. In the sample data above, we see that Warranty Period is 2 Years for all products. Let’s assume that 2 years warranty is standard for all products and we decide to remove that column.
# We have already created the DataFrame: 'df' (above)
# Now we want to replace it but without the column 'Warranty Period'
df = df.drop(columns='Warranty Period')
# Let's see the end result
print(df)
And here’s the result:
But what if we want to remove more than one column? In this case, we will need to pass a list instead of a string to the .drop method. Suppose that in our example we would like to have “Warranty Period” & “Supplier” columns removed.
Here is the code:
# we will go a step back & create again the df DataFrame because
# we have already removed the "Warranty Period" column in the example above.
df = pd.DataFrame(data)
df = df.drop(columns=['Warranty Period', 'Supplier'])
# Let's see the end result
print(df)
Python will print the below: