Python developers often use Pandas library to store and manipulate data in a tabular manner. Pandas Dataframes are very efficient in allowing you to query data using rows & columns. Sometimes, Python developers need to change the names of columns of dataframe. There are several ways to to do this. In this article, we will learn how to rename columns in Pandas.
How to Rename Columns in Pandas
Here are the different ways to change column names in Pandas dataframe. Let us say you have the following dataframe with 3 columns – $id, $name, $marks.
import pandas as pd
data = {
"$id": [1, 2, 3],
"$name": ['John','Jim','Jane'],
"$marks": [40, 45, 50]
}
df = pd.DataFrame(data)
print(df)
Here is the dataframe output you will see.
$id $name $marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
Let us say you want to rename columns $id, $name and $marks to id, name and marks respectively. Here are the different ways to do this.
1. Using columns property
Every Pandas dataframe has a built in property called columns. It contains a list of all column names.
print(df.columns)
Here is the output.
Index(['$id', '$name', '$marks'], dtype='object')
You can rename columns by modifying the items in this list.
df.columns=['id','name','marks']
print(df.columns)
Here is the output after renaming columns.
Index(['id', 'name', 'marks'], dtype='object')
Here is the dataframe after renaming columns.
print(df)
# Output
id name marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
Please note, when you modify columns attribute, you will have to provide a completely new list of column names. The number of items in this new list must be equal to the number of columns in your dataframe. Otherwise, you will get an error.
df.columns=['id','name']
print(df.columns)
## Output
ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements
Along the same lines, you cannot modify just a single column name by this method. You need to provide column names for all the columns in dataframe.
Another thing to remember is that the order of new column names must match that of old column names otherwise, wrong name will be assigned to the columns.
## incorrect order
df.columns=['id','marks','name']
## correct order
df.columns=['id','name','marks']
2. Using rename function
Every dataframe supports a rename() function that allows you to easily rename dataframe columns. It takes in a dictionary of key-value pairs where keys are old column names and values are their new names. Here is its syntax.
df.rename(columns={'old_column_name':'new_column_name',...}, inplace=True)
Here is an example to rename our columns.
df.rename(columns={'$id':'id','$name':'name','$marks':'marks'}, inplace=True)
print(df)
Here is the dataframe with modified column names.
id name marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
Please note, in the above example, we have used inplace=True option. Without this, rename() function will create a copy of the dataframe, with new column names but same data.
df1=df.rename(columns={'$id':'id','$name':'name','$marks':'marks'})
print(df1)
print(df)
## Output
id name marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
$id $name $marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
This solution allows you to rename one or more columns as per your requirement. While providing column names, you do not need to follow any order. Here is an example to rename just first column and leave the rest unchanged.
df.rename(columns={'$id':'id'}, inplace=True)
print(df)
## Output
id $name $marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
This is a very useful solution because it allows you to rename columns in place, or create a new dataframe with new column names and same data. It also allows you to rename one or more columns, as per your requirement, instead of renaming all columns, every single time.
In rename() function, if the old column names you mention do not exist in the dataframe, then Python will silently skip it, instead of giving any error.
3. Using Lambda function
In each of the above solutions, you need to mention both the old and new column names. What if your dataframe has many columns, like 100 columns? Then it becomes very tedious to rename each of them.
In such cases, you can use a lambda function to replace certain patterns in columns names. This method is useful if you want to replace a specific pattern in each of your column names. For example, if you want to remove $ sign from each of your column name, then you can use the following lambda function, inside rename() function.
df.rename(lambda x: x[1:], axis='columns', inplace=True)
OR
df.rename(lambda x: x[1:], axis=1, inplace=True)
print(df)
## output
id name marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
In the above code, we define a lambda function which basically slices each column name. It extracts all characters of the column name except the first character and re-assigns it back to each column.
This approach is very useful if you have many columns, each with certain pattern in them. This is a common requirement, if your dataframes are programmatically generated from a feed or file.
4. Using zip function
If your old and new column names are present as two separate lists, then you can use zip() function to club the two lists into a single dict and pass it in rename() function. Here is an example to illustrate it.
old_names = ['$id', '$name', '$marks']
new_names = ['id', 'name', 'marks']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)
print(df)
## Output
id name marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
5. Add Prefix or Suffix to Column Names
Sometimes, all you want to do is add a small prefix or suffix to each of your column names. For this purpose, manually specifying new name of each column, seems tedious. Luckily, Pandas provides add_prefix() and add_suffix() functions to add prefix and suffix respectively. They are built in functions readily available for each dataframe.
Here is an example to add prefix ‘x_’ to each column name.
df1=df.add_prefix('x_')
print(df1)
## Output
x_$id x_$name x_$marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
Similarly, here is an example to show how to add suffix ‘_y’.
df1=df.add_suffix('_y')
print(df1)
## output
$id_y $name_y $marks_y
0 1 John 40
1 2 Jim 45
2 3 Jane 50
Please note, both add_prefix() and add_suffix() functions will create a copy of your dataframe, and modify each column name. They will leave your original dataframe unchanged.
6. Replace one or more characters in Column names
Sometimes, you want to replace a character or substring with another character or substring. For this purpose, you can use str.replace(). It is available for columns property in solution #1 above. Here is an example to replace $ character in each column name with _ character.
df.columns = df.columns.str.replace('$', '_')
print(df)
## Output
_id _name _marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
Please note, in this case, the original dataframe is modified and no new copy is created.
7. Using set_axis()
Every dataframe supports set_axis() that accepts a list of new column names and an axis (0 for rows and 1 for columns). Here is an example to illustrate it.
df1=df.set_axis(['id', 'name', 'marks'], axis=1)
print(df1)
## Output
id name marks
0 1 John 40
1 2 Jim 45
2 3 Jane 50
Please note, set_axis() will create a copy of the dataframe, change its column names. It will leave the original dataframe unchanged.
Conclusion
In this article, we have learnt several simple ways to easily rename columns in pandas dataframe. We have also covered many different use cases commonly required by Python developers. If you want to rename all columns in your dataframe, then you can use its columns attribute. If you want to selectively rename one or more columns, then you can use rename() column.
If your dataframe has many columns, then you can try to use lambda function to do pattern matching and rename columns in bulk. If all you want to do is add a prefix/suffix to each of your column names use add_prefix/add_suffix functions for this purpose. If you want to replace a character/substring in each of your column names, then use str.replace() function.
FAQs
1. How to rename only certain columns in dataframe?
You can use rename() function with a dict of key-value pairs, where key is the old column name and value is the new column name. It is mentioned above in solution #2.
2. How to add prefix or suffix to each column name?
Every dataframe supports add_prefix and add_suffix functions to add prefix and suffix respectively. Use add_prefix() to add prefix to each column name. Use add_suffix() function to add suffix to each column name.
3. How to replace specific characters in each column name?
Use df.column.str.replace() function that is available for each dataframe’s column attribute.
Also read:
How to Select Rows in Pandas Dataframe
How to Concatenate Two Lists in Python
How to List All Files in Directory
data:image/s3,"s3://crabby-images/46588/465880802432e1711c1c01e38f37ca48e2309449" alt=""
Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.