How to Connect Pandas to Database

Python Pandas is a powerful library that allows you to store, analyze and manipulate data as tables containing rows and columns. It allows you to easily easily aggregate, filter and transform data as per your requirement. Generally, pandas dataframes import data from CSV and TXT files. But sometimes you may need to connect Pandas to relational databases like MySQL, PostgreSQL, Oracle and SQL Server, read data from it or write to the database. In this article, we will learn how to connect Pandas to database.

Why Connect Pandas to Database

You may be wondering why we need to connect Pandas to database when both store data as tables, support indexes, filtering, aggregation and other functions. There are a couple of reasons of this. First of all, depending on the tasks you need to do, sometimes it may be simpler to do it by running an SQL query in database. On the other hand, some tasks may be easily done in Pandas instead of a database.

Secondly, in most cases, the web server and database server are located at different places. When you analyze data using Python pandas, you put your web server to use. When you run SQL query, you use your database server. Depending on web server load, you may want to off load a few database operations to database server and vice versa. This will ensure that neither web server nor database is overloaded.

How to Connect Pandas to Database

Here are the steps to connect Python Pandas to database, import data from it into the dataframe, and write data from the dataframe back to the database.

1. Import Libraries

We will use SQLAlchemy module to connect to database, read/write to database. It allows you to connect to all major relational databases such as MySQL, PostgreSQL, Oracle and SQL Server. First, we will import the necessary libraries.

import pandas as pd
from sqlalchemy import create_engine

2. Create Database Engine

Next, we will create a database engine. For this purpose, we will use create_engine() function. It will return a connection engine that can be used to query your database. It takes connection string as input. Here is an example of connection to a PostgreSQL/MySQL database.

engine = create_engine('postgresql://username:password@host:port/database') # for postgresql
OR
engine = create_engine('mysql://username:password@host:port/database') # for mysql

In the above command, replace username, password, host, port and database with appropriate values as per your requirement,

You can use connectionstrings.com to find the connection string for your database. It contains connection string formats for all kinds of databases.

3. Read Data from Database Into Dataframe

Once you have established the connection engine, then you can read data from it using read_sql() function. It accepts two arguments – sql query and connection engine. It allows you to fire SELECT queries to your database. Here is an example to read data from one of the database tables. Replace table_name with the name of the table.

sql_query = "SELECT * FROM table_name"
df = pd.read_sql(sql_query, engine)

The result of the above SQL query will be returned as a Pandas dataframe. Thereafter, you can use Pandas functions and operators to work with the result. Here is an example to display first 15 rows in the result.

df1 = df[0:15]
print(df1)

Here is an example to read specific columns from your data.

df2 = df.loc[:, [col1, col2]].head()
print(df2)

4. Export Dataframe to Database

You can also write dataframe data to database using to_sql() function available in Pandas dataframe. Here is an example to demonstrate it.

df.to_sql('table_name', engine, if_exists='replace', index=False)

The to_sql() function accepts the table name to which you want to export data & connection engine. You can also mention optional arguments such as if_exists which specifies what needs to be done if the table already exists. In this case, the table will be replaced if it exists, else it will be newly created. We have also specified here that we do not want any indexing in the database table.

5. Close Connection

After you have executed all queries as per your requirement, then it is important to close the database connection to avoid database issues. For this, you can call close() function on the connection object.

engine.close()

Conclusion

In this article, we have learnt how to connect to database from Python Pandas, read data from a database into a dataframe using read_sql() function, write data from dataframe to database using to_sql() function. You can modify it as per your requirement.

Also read:

Top Python Libraries for Machine Learning
Machine Learning Life Cycle
How Machine Learning Works

Leave a Reply

Your email address will not be published. Required fields are marked *