How to efficiently filter data using Pandas DataFrames

How to efficiently filter data using Pandas DataFrames

The Problem

It is a common task for anyone working with data to have to remove or filter some data based on some criteria (property or value). In this post we will see how we can efficiently filter data by using Pandas DataFrames with Python. We will utilize the Pandas .isin() method.

With this process, anyone working with a large set of data can filter and keep only parts of data that are useful for a specific purpose. For example, a sales manager would need to segment his customers based on some criteria (e.g. regions, customer type, etc). Or a product manager needs to have his inventory filtered based on some criteria (e.g. storage location, supplier, product category, etc). The applications for such a process are countless.

This blog post belongs to a series of posts about Pandas Library in Python.

Code

For the purpose of this tutorial, we will create a new DataFrame with a just few data

# We start by importing the Pandas Library
import pandas as pd

# Then we will create a new DataFrame with some sample data
# An easy way to create the DataFrame is with a Python Dictionary as below:

data = {
    'Product Name':['Product A','Product B','Product C','Product D'],
    'Cost / Unit':[10, 12, 17, 15],
    'Warranty Period':['2 Years', '2 Years', '2 Years', '1 Year'],
    'Current Stock':[100, 128, 85, 10],
    'Supplier':['Factory A', 'Factory B', 'Factory A', 'Factory B']
}

# Now we will convert the "data" Dictionary to a DataFrame

df = pd.DataFrame(data)

So far we have created a new sample DataFrame as below:

Now it’s time to see how we can create a new DataFrame which is a subset of our data based on some criteria we want. Suppose that in our example above, we would like to see only the products that are supplied by “Factory A”. Here’s the code we need to type:

# We will create a new DataFrame with the name: df2
# This DataFrame will have only the values that in our criteria are supplied by 'Factory A'
# We will use the .isin() method
df2 = df[df['Supplier'].isin(['Factory A'])]

Our result (df2) will be:

Now we have effectively filtered out everything that does not have the value “Factory A” in the Supplier column. We can segment further our data by using the same method in the new DataFrame.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *