How to sort data with Pandas DataFrames
Intro
This post is part of my Pandas DataFrames series. In these posts I explain (with examples) how someone can manipulate lots of data with Python and the Pandas Library. The examples have practical implications in a business / work environment.
Today I will show you, how we can easily sort our data with Pandas and the use of some easy to understand examples.
For our example, we suppose that we already have a DataFrame loaded in memory (you can either import data or create a DataFrame directly within your Python environment).
Examples
Our hypothesis is that we have a DataFrame called “df” which has a collection of products with data such as: prices, sizes (let’s assume for example they are clothes), product features and those products are classified into categories and subcategories (for example the type of clothes or brands).
We can sort the data with either one, two or more criteria. In our first example we would want to sort the data with two criteria & that criteria have numerical values. We choose to sort them for example with price & size in an ascending way.
Code
We use the method “.sort_values”. When there are more than one criteria, the order of the criteria also determines the order the sorting takes place. Below is the example of the code in which we firstly sort the data based on prices and then based on sizes:
df = df.sort_values(by=['price','size'], ascending=[True, True])
By “playing” with the parameters we can sort them in various ways. For example, the sort can be descending:
#Descending order
df = df.sort_values(by=['price','size'], ascending=[False, False])
# or mixed (ascending in price & descending in size)
df = df.sort_values(by=['price','size'], ascending=[True, False])
Now let’s assume that we also want to sort our data based on a specific order that we would like to impose. This is very common when our data are not numerical (or the numbers does not mean a thing).
In our example let’s suppose we want to sort the data firstly by category, then size and then price. However, the category does not contain numbers but text instead. For example, we have the following (product) categories: trousers, coats, hats, shirts, socks, shoes & ties.
We want to sort the products in the following order based on their category:
- hats
- coats
- shirts
- ties
- trousers
- socks
- shoes
Here is our code:
# We create a categorical variable that contains our custom sort order
category_order['category'] = pd.Categorical(df['category'], ['hats','coats','shirts','ties','trousers','socks','shoes'])
# We then apply it in the ".sort_values" method
# together with our rest criteria as below:
df= df.sort_values(by=['category_order','size','price'], ascending=[True, True, True])