How to Connect Pandas to Database

Last updated on May 20th, 2025 at 07:09 am

Python Pandas is a powerful library that allows you to store, analyze and manipulate data as tables containing rows and columns. It allows you to easily easily aggregate, filter and transform data as per your requirement. Generally, pandas dataframes import data from CSV and TXT files. But sometimes you may need to connect Pandas to relational databases like MySQL, PostgreSQL, Oracle and SQL Server, read data from it or write to the database. In this article, we will learn how to connect Pandas to database.

Why Connect Pandas to Database

You may be wondering why we need to connect Pandas to database when both store data as tables, support indexes, filtering, aggregation and other functions. There are a couple of reasons of this. First of all, depending on the tasks you need to do, sometimes it may be simpler to do it by running an SQL query in database. For example, it may be easier to run SQL join queries than merging and joining dataframes. On the other hand, some tasks may be easily done in Pandas instead of a database. For example, it is a lot easier to create pivot tables in Pandas, than using SQL queries.

Secondly, in most cases, the web server and database server are located at different places. When you analyze data using Python pandas, you put your web server to use. When you run SQL query, you use your database server. Depending on web server load, you may want to off load a few database operations to database server and vice versa. This will ensure that neither web server nor database is overloaded.

Thirdly, once you have imported data from database to Pandas, you can easily visualize data by building charts and reports.

How to Connect Pandas to Database

Here are the steps to connect Python Pandas to database, import data from it into the dataframe, and write data from the dataframe back to the database.

1. Import Libraries

We will use SQLAlchemy module to connect to database, read/write to database. It allows you to connect to all major relational databases such as MySQL, PostgreSQL, Oracle and SQL Server. It is not available in Python, by default. So you will need to install it. Open terminal and run the following command to install it.

$ pip install SQLAlchemy

Create an empty Python script.

$ sudo vi test.py

Add the following lines to it. First, we will import the necessary libraries.

import pandas as pd
from sqlalchemy import create_engine

2. Create Database Engine

Next, we will create a database engine. For this purpose, we will use create_engine() function. It will return a connection engine that can be used to query your database. It takes connection string as input. Here is an example of connection to a PostgreSQL/MySQL database.

engine = create_engine('postgresql://username:password@host:port/database') # for postgresql
OR
engine = create_engine('mysql://username:password@host:port/database') # for mysql

In the above command, replace username, password, host, port and database with appropriate values as per your requirement,

You can use connectionstrings.com to find the connection string for your database. It contains connection string formats for all kinds of databases.

3. Read Data from Database Into Dataframe

Once you have established the connection engine, then you can read data from it using read_sql() function. It accepts two arguments – sql query and connection engine. It allows you to fire SELECT queries to your database. Here is an example to read data from one of the database tables. Replace table_name with the name of the table.

sql_query = "SELECT * FROM table_name"
df = pd.read_sql(sql_query, engine)

The result of the above SQL query will be returned as a Pandas dataframe. Thereafter, you can use Pandas functions and operators to work with the result. Here is an example to display first 15 rows in the result.

df1 = df[0:15]
print(df1)

Here is an example to read specific columns from your data.

df2 = df.loc[:, [col1, col2]].head()
print(df2)

After this step, if your resulting dataframe contains duplicate column or row values, then you can find duplicate rows using duplicated() function. Alternatively, you can also export Pandas dataframe to Excel spreadsheet if you want.

4. Export Dataframe to Database

You can also write dataframe data to database using to_sql() function available in Pandas dataframe. Here is an example to demonstrate it.

df.to_sql('table_name', engine, if_exists='replace', index=False)

The to_sql() function accepts the table name to which you want to export data & connection engine. You can also mention optional arguments such as if_exists which specifies what needs to be done if the table already exists. In this case, the table will be replaced if it exists, else it will be newly created. We have also specified here that we do not want any indexing in the database table.

5. Close Connection

After you have executed all queries as per your requirement, then it is important to close the database connection to avoid database issues. For this, you can call close() function on the connection object.

engine.close()

Save and close the file. You can run the script with the following command.

$ sudo python test.py

Conclusion

In this article, we have learnt how to connect to database from Python Pandas, read data from a database into a dataframe using read_sql() function, write data from dataframe to database using to_sql() function. You can modify it as per your requirement.

Also read:

Top Python Libraries for Machine Learning
Machine Learning Life Cycle
How Machine Learning Works

Leave a Reply

Your email address will not be published. Required fields are marked *