Python developers commonly use Pandas dataframe to store data in tabular manner as columns and rows. It allows you to easily organize data for analysis and manipulation. It also allows you to easily extract specific data as per your requirement. Often software developers want to be able to select specific rows from dataframe just as we do it in SQL databases. In this article, we will learn how to select rows from dataframe based on column values.
How to Select Rows from Dataframe Based on Column Values
Generally, if your data is stored in a relational database platform such as MySQL or PostgreSQL, then you can easily extract specific rows from a table using SQL query as shown below.
SELECT *
FROM table
WHERE column_name = value
Often Python developers want to replicate this functionality within their Python application or website. Let us say you have a Pandas dataframe.
import pandas as pd
data = {
"id": [1, 2, 3],
"name":['John','Jim','Jane'],
"marks": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Here is the output you will see.
# Output
id name marks
0 1 John 50
1 2 Jim 40
2 3 Jane 45
Let us look at some of the common ways to select rows from the above dataframe. For our purpose, we will use loc() built-in function available to each dataframe. It allows you to access rows and columns of a dataframe using one or more conditions. Here is its syntax.
dataframe.loc()
1. Based on Single Value
One of the most common requirements for Python programmers is to select one or more rows in a dataframe, based on the single value of a column. You can do this using the ‘==’ operator in the following command.
df.loc[df['column_name'] == value]
Here is the code to select rows where marks column has value 50.
df.loc[df['marks'] == 50]
When you print the result of above statement, here is the output.
print(df.loc[df['marks'] == 50])
## Output
id name marks
0 1 John 50
Similarly, if you want to select rows where the value IS NOT 50, then use ‘!=’ operator for this purpose.
print(df.loc[df['marks']!=50])
## Output
id name marks
1 2 Jim 40
2 3 Jane 45
2. Based on Multiple Values
Similarly, you may need to select rows that match one or more values from a set of values. You can do this using isin() function. Here is the syntax to select rows based on multiple values of a column.
df.loc[df['column_name'].isin(values)]
Here is an example to select rows where marks is in (40, 50). Please note, you need to pass an iterable like a tuple or list containing the values, in isin() function.
print(df.loc[df['marks'].isin((40,50))])
## Output
id name marks
0 1 John 50
1 2 Jim 40
Here is an example using list of values [40, 50].
print(df.loc[df['marks'].isin([40,50])])
On the other hand, if you need to select rows where a column does not have a set of values, then you need to add ‘~’ operator before the dataframe loc() function as shown below.
print(df.loc[~df['marks'].isin([40,50])])
Here is the output.
## Output
id name marks
2 3 Jane 45
Please note, if you only supply the values in a comma separated manner, without enclosing them in an iterable, you will get an error. Here is an example.
print(df.loc[df['marks'].isin(40,50)])
Here is the error message.
Traceback (most recent call last):
File "<main.py>", line 12, in <module>
TypeError: Series.isin() takes 2 positional arguments but 3 were given
3. Using Single Condition
Instead of using = or != operator, you can also use other conditional operators like <=, >=, <, > to select rows according to various conditions. Here is an example to select rows where marks>=45.
print(df.loc[df['marks']>=45])
## Output
id name marks
0 1 John 50
2 3 Jane 45
4. Using Multiple Conditions
Sometimes you may need to filter rows based on more than one condition. In such cases, you can combine the above mentioned single conditions using ‘&’ operator. Here is an example to select rows where marks>=40 and marks<=45.
print(df.loc[(df['marks'] >= 40) & (df['marks'] <= 45)])
Here is the output you will see.
## Output
id name marks
1 2 Jim 40
2 3 Jane 45
In the above example, please note the placement of round brackets ‘(…)’ around “df[‘marks’] >= 40” and “df[‘marks’] <= 45”.
Conclusion
In this article, we have learnt several different ways to select one or more rows from pandas dataframe in Python. You can use any of the solutions described above, as per your requirement.
Also read:
How to Concatenate Two Lists in Python
How to List All Files in Directory
How to Clone List in Python
data:image/s3,"s3://crabby-images/46588/465880802432e1711c1c01e38f37ca48e2309449" alt=""
Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.