How to Merge and Join Pandas Dataframes

Last updated on May 19th, 2025 at 06:06 am

Pandas dataframe makes it easy to store and analyze data as tables using rows and columns. They support tons of functions and operations simplifying data analysis and reporting. Sometimes, Python developers need to merge and join Pandas dataframes. In this article, we will learn the different ways to do this.

Table of Contents

How to Merge and Join Pandas Dataframes

Here are the different ways to merge and join pandas dataframes, using concat(), merge( ) and join() functions. Let us say you have the following two dataframes to work with.

import pandas as pd

d1 = {'Name': ['John','Jim','Tim'],
         'Age': [27, 24, 22],
         'City': ['NYC', 'San Francisco','London'],
     }

d2 = {'Name': ['Jane','Jessica','Tess'],
         'Age': [25, 34, 32],
         'City': ['Chicago', 'Los Angeles','Philadelphia'],
     }
     
df1 = pd.DataFrame(d1, index=[0, 1, 2])

df2 = pd.DataFrame(d2, index=[2, 3, 4])

print(df1)
print(df2)

## output

   Name  Age           City
0  John   27            NYC
1   Jim   24  San Francisco
2   Tim   22         London

      Name  Age          City
2     Jane   25       Chicago
3  Jessica   34   Los Angeles
4     Tess   32  Philadelphia

1. Concatenate Dataframe

Using concat

Pandas library supports concat() function that allows you to concatenate 2 or more dataframes by passing a list of dataframes as argument. Here is its syntax.

pd.concat([df1, df2, ...])

It returns a new concatenated dataframe that can be stored as a separate dataframe or be used to overwrite one of the original dataframes.

res = pd.concat([df1,df2])
print(res)

## output

      Name  Age           City
0     John   27            NYC
1      Jim   24  San Francisco
2      Tim   22         London
2     Jane   25        Chicago
3  Jessica   34    Los Angeles
4     Tess   32   Philadelphia

In this approach, if you end up with duplicate rows or column values, then you can easily find duplicates using duplicated() function.

By specifying type of concatenation

Concat() function allows you to perform different types of joins such as inner joins and outer joins. You can do this by using axis and join arguments. This is similar to SQL joins where the columns of two tables are combined side by side, using common column values, to create a larger table. In this case, the dataframe rows are not added one below the other, but one beside the other.

res = pd.concat([df1,df2], axis=1, join='inner')
print(res)

## output

  Name  Age    City  Name  Age     City
2  Tim   22  London  Jane   25  Chicago

By ignoring indexes

As seen above, by default, the result dataframe’s index values are borrowed from the original dataframes. But if you want the result dataframe to have a new set of indexes, you can use ignore_index option.

res = pd.concat([df1,df2], ignore_index=True)
print(res)

## output

      Name  Age           City
0     John   27            NYC
1      Jim   24  San Francisco
2      Tim   22         London
3     Jane   25        Chicago
4  Jessica   34    Los Angeles
5     Tess   32   Philadelphia

Using group of keys

While concatenating dataframe, you can also assign group keys to groups of rows belonging to each dataframe. You can do this using keys argument, as shown.

res = pd.concat([df1,df2], keys = ['x','y'])
print(res)

## output

        Name  Age           City
x 0     John   27            NYC
  1      Jim   24  San Francisco
  2      Tim   22         London
y 2     Jane   25        Chicago
  3  Jessica   34    Los Angeles
  4     Tess   32   Philadelphia

2. Merge Dataframe

Merging dataframe is similar to SQL joins where you combine rows from multiple tables using common column values. You can do this using merge() function available in pandas library.

Based on one or more keys

Here is the command to merge two dataframes using common column names, or keys. You need to mention the dataframes to be merged, one after the other. Then you need to mention the list of keys to be used for merging, for ‘on’ argument.

pd.merge(df1, df2, on=[key1, key2,...]

Let us say you have the following dataframes, with the ID columns having common values

import pandas as pd

d1 = {'Name': ['John','Jim','Tim'],
         'Age': [27, 24, 22],
         'City': ['NYC', 'San Francisco','London'],
         'ID':[1, 2, 3]
     }

d2 = {'Name': ['Jane','Jessica','Tess'],
         'Age': [25, 34, 32],
         'City': ['Chicago', 'Los Angeles','Philadelphia'],
         'ID':[2,3,4]
     }
     
df1 = pd.DataFrame(d1, index=[0, 1, 2])
df2 = pd.DataFrame(d2, index=[2, 3, 4])

Here is the code to merge the two dataframes, using ID column.

res = pd.merge(df1,df2, on=['ID'])
print(res)

## output

  Name_x  Age_x         City_x  ID   Name_y  Age_y       City_y
0    Jim     24  San Francisco   2     Jane     25      Chicago
1    Tim     22         London   3  Jessica     34  Los Angeles

By default, pandas will try to do an INNER JOIN where it will pick rows with common values for ID column. If there are no common values in the key column, then the result will be an empty dataframe.

By specifying type of merge

You can also specify the type of join, using ‘how’ argument. As you can see in above dataframes, the name column does not have any common values. In such cases, you can do a LEFT or RIGHT join, preserving the column values of one of the two tables. Here is an example of LEFT JOIN. The values from left dataframe are preserved whereas the corresponding values for RIGHT table are set as NaN.

res = pd.merge(df1,df2, how='left',on=['Name'])
print(res)

## output

   Name  Age_x         City_x  ID_x  Age_y City_y  ID_y
0  John     27            NYC     1    NaN    NaN   NaN
1   Jim     24  San Francisco     2    NaN    NaN   NaN
2   Tim     22         London     3    NaN    NaN   NaN

If you want to do an outer join, then specify how=’outer’.

res = pd.merge(df1,df2, how='outer',on=['Name'])
print(res)

## output

      Name  Age_x         City_x  ID_x  Age_y        City_y  ID_y
0     Jane    NaN            NaN   NaN   25.0       Chicago   2.0
1  Jessica    NaN            NaN   NaN   34.0   Los Angeles   3.0
2      Jim   24.0  San Francisco   2.0    NaN           NaN   NaN
3     John   27.0            NYC   1.0    NaN           NaN   NaN
4     Tess    NaN            NaN   NaN   32.0  Philadelphia   4.0
5      Tim   22.0         London   3.0    NaN           NaN   NaN

In all the examples above, you can see that since both tables have same column names, merge() function will automatically assign suffixes x, y, … to overlapping column names.

3. Join Dataframe

Using join()

Like merge, you can also use join() function to combine two dataframes. It is available in all dataframes. Here is its syntax.

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)

Here is an example to illustrate it.

import pandas as pd

d1 = {'Name': ['John','Jim','Tim'],
         'Age': [27, 24, 22],
         'ID':[1, 2, 3]
     }

d2 = {
         'City': ['NYC', 'San Francisco','London'],
     }
     
df1 = pd.DataFrame(d1, index=[0, 1, 2])
df2 = pd.DataFrame(d2, index=[2, 3, 4])

res = df1.join(df2, on='ID')
print(res)

## output

   Name  Age  ID           City
0  John   27   1            NaN
1   Jim   24   2            NYC
2   Tim   22   3  San Francisco

In the above example, the dataframe df2 is joined to df1 using index values. It works well if your dataframes do not have overlapping columns.

By default, the join() function does a left join where it retains all rows and columns of left dataframe, and populates the other dataframe’s values for matching indexes.

If your dataframes have overlapping columns, then you must mention the suffix to differentiate them, otherwise you will get an error.

Here is an example to join the above two dataframes using ID column as merge key.

import pandas as pd

d1 = {'Name': ['John','Jim','Tim'],
         'Age': [27, 24, 22],
         'City': ['NYC', 'San Francisco','London'],
         'ID':[1, 2, 3]
     }

d2 = {'Name': ['Jane','Jessica','Tess'],
         'Age': [25, 34, 32],
         'City': ['Chicago', 'Los Angeles','Philadelphia'],
         'ID':[2,3,4]
     }
     
df1 = pd.DataFrame(d1, index=[0, 1, 2])
df2 = pd.DataFrame(d2, index=[2, 3, 4])

res = df1.join(df2, on='ID')
print(res)

## output
ValueError: columns overlap but no suffix specified: Index(['Name', 'Age', 'City', 'ID'], dtype='object')

Since both dataframes have overlapping columns, we get an error since we did not specify a suffix to differentiate common columns.

To fix this problem, we use lsuffix argument and specify the suffix for overlapping column names.

res = df1.join(df2, on='ID',lsuffix='_a')
print(res)

## output

  Name_a  Age_a         City_a  ID_a     Name   Age         City   ID
0   John     27            NYC     1      NaN   NaN          NaN  NaN
1    Jim     24  San Francisco     2     Jane  25.0      Chicago  2.0
2    Tim     22         London     3  Jessica  34.0  Los Angeles  3.0

Using how argument

Like merge function, you can also specify the type of join using ‘how’ argument. Here is an example to do inner join.

res = df1.join(df2, on='ID',lsuffix='_a', how='inner')
print(res)

## output

  Name_a  Age_a         City_a  ID_a     Name  Age         City  ID
1    Jim     24  San Francisco     2     Jane   25      Chicago   2
2    Tim     22         London     3  Jessica   34  Los Angeles   3

Here is an example to do left join. In this case, the values of left dataframe are retained, along with matching values from right dataframe.

res = df1.join(df2, on='ID',lsuffix='_a', how='left')
print(res)

## output

  Name_a  Age_a         City_a  ID_a     Name   Age         City   ID
0   John     27            NYC     1      NaN   NaN          NaN  NaN
1    Jim     24  San Francisco     2     Jane  25.0      Chicago  2.0
2    Tim     22         London     3  Jessica  34.0  Los Angeles  3.0

Conclusion

In this article, we have learnt several simple ways to combine data from two dataframes. Depending on your requirement, you can choose to merge, join or concatenate dataframes. concat() function allows you to combine dataframes vertically one below the other, or horizontally one beside the other. On the other hand, join() and merge() functions combine dataframes horizontally only.

Also read:
How to Create Pivot Tables in Python
How to Connect Pandas to Database
Top Machine Learning Libraries in Python

How to Create Directory with Missing Parent Directory

How to Select Rows from Dataframe Based on Column Values

How to Iterate Over Dictionaries Using For Loops

What Are f-strings in Python? How to Use f-strings in Python

Sreeram Sreenivasan

Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.

How to Merge and Join Pandas Dataframes

How to Merge and Join Pandas Dataframes

1. Concatenate Dataframe

Using concat

By specifying type of concatenation

By ignoring indexes

Using group of keys

2. Merge Dataframe

Based on one or more keys

By specifying type of merge

3. Join Dataframe

Using join()

Using how argument

Conclusion

Related posts:

How to Create Directory with Missing Parent Directory

How to Select Rows from Dataframe Based on Column Values

How to Iterate Over Dictionaries Using For Loops

What Are f-strings in Python? How to Use f-strings in Python

Leave a Reply Cancel reply

How to Merge and Join Pandas Dataframes

1. Concatenate Dataframe

Using concat

By specifying type of concatenation

By ignoring indexes

Using group of keys

2. Merge Dataframe

Based on one or more keys

By specifying type of merge

3. Join Dataframe

Using join()

Using how argument

Conclusion

Related posts:

How to Create Directory with Missing Parent Directory

How to Select Rows from Dataframe Based on Column Values

How to Iterate Over Dictionaries Using For Loops

What Are f-strings in Python? How to Use f-strings in Python

Share this:

Leave a Reply Cancel reply