How to Remove Duplicates from List in Python

Python developers commonly use lists to store data. Lists are versatile data structures that allow you to store different data types in a compact manner. Often lists contain duplicate data. Sometimes, software developers need to remove duplicates from list for their programs. Otherwise, it may lead to undesirable results. There are several ways to delete duplicates from Python list. In this article, we will learn how to remove duplicates from list in Python.

Why Remove duplicates from list in Python

Sometimes a list may contain duplicate values. If your program requires list with unique items, then such lists with duplicate values may cause your program to give error or wrong result. Therefore, depending on the use case, it is important to remove duplicates from list in Python.

How to Remove Duplicates from List in Python

Let us say you have the following list with duplicate values in it.

data = [1, 2, 3, 3, 4, 4, 5]

Here are the different ways to remove duplicates from list in Python.

1. Using Set

By definition, a set can contain only unique values. Python allows you to create sets or tuples using set() constructor. You can use this function to remove duplicates from a list. Set() function accepts one input, which can be a list of values. If we provide a list with duplicate values to set() function, then it will return a set with unique values.

data1 = set(data) # output is (1, 2, 3, 4, 5)

We can pass this set into a list constructor, which returns a list of items, present in the input set.

list(data1) # output is [1, 2, 3, 4, 5]

Combining the two, we get the following.

new_data = list(set(data)) # output is [1, 2, 3, 4, 5]

However, please note, using set may alter the order of items present in the output.

2. Using fromkeys function

If you want to preserve the order of items in final list, then you can use fromkeys() method present in OrderedDict library.

from collections import OrderedDict
data = [1, 2, 3, 3, 4, 4, 5]
print(list(OrderedDict.fromkeys(data))) # output is [1, 2, 3, 4, 5]

The fromkeys function generates a dictionary from a list of keys and values. If no values are specified, then all the values of the result dict will be None. Since a dictionary cannot have duplicate keys, all the duplicate list items are ignored. When we call list() constructor on this dict, it will construct the list using only the keys of dict.

OrderedDict.fromkeys(data) # output is {1: None, 2: None, 3: None, 4: None, 5: None}
list(OrderedDict.fromkeys(data)) # output is [1, 2, 3, 4, 5]

If you are using Python >=3.7, then the dictionary constructor will preserve insertion order of items. In this case, you can directly use the plain old dict module for this purpose.

print(list(dict.fromkeys(data))) # output is [1, 2, 3, 4, 5]

In the above code, we directly call dict.fromkeys() on our data list. We do not need to import any module. However, please note, if you use Python < 3.7, then the dictionary created using our data list may not preserve the order of elements.

3. Using List Comprehension

List comprehensions provide a compact way to iterate through a list and perform a given task. Here is an example to illustrate it.

data = [1, 2, 3, 3, 4, 4, 5]
result = []
[result.append(val) for val in data if val not in result]
print(result) # output is [1, 2, 3, 4, 5]

In the above code, we define an empty list result to store the new list without duplicates. We use a list comprehension to iterate through all items of list. In each iteration, we check if the item is present in the new list or not. If not, then we append it to the new list.

The benefit of this solution is that it preserves the order of items from the original list. Also, you can customize the list comprehension to exclude certain items if you want.

4. Using Unpacking Operator

You can also use unpacking operator (*) to first unpack the list items into a set, and then use it again, to unpack/expand the set items to a list. When the original list is converted into a set, its duplicates are automatically dropped since a set cannot contain duplicates.

data = [1, 2, 3, 3, 4, 4, 5]
result = [*{*data}]
print(result) # output is [1, 2, 3, 4, 5]

This is a super compact way of removing duplicates from a list. Please note, since we convert list into set, the order of its items may not be preserved. If you are not particular about the order of items in the final list, then you can use this solution.

5. In Numpy

Numpy is a powerful Python module that allows you to easily process data and numbers. It provides unique() function to easily remove duplicates from a list. Here is an example.

import numpy as np
data = [5, 4, 4, 1, 2, 3, 3]
result = np.unique(data).tolist()
print(result)

In the above code, we first import Numpy module. Then we call unique() function on our original list data. This will remove all the duplicates from the list. It returns a numpy.ndarray. Next, we call tolist() function to convert the result of unique() function to a list. Please note, it sorts the values so you will see the output to be unique but sorted items. Here is the output.

[1, 2, 3, 4, 5]

This is very useful if you are using Numpy in your program.

6. In Pandas

Pandas is a popular Python library that allows you to store and analyze data in a tabular manner, using rows and columns. Like Numpy library, it also provides unique() and tolist() functions to remove duplicates from a list, and convert the result back into list respectively.

import pandas as pd
data = [5, 4, 4, 1, 2, 3, 3]
result = pd.unique(data).tolist()
print(result) # output is [5, 4, 1, 2, 3]

As you can see, unlike Numpy, the unique() function in Pandas does not sort the result but preserves the order of items in the original list. If you are using Pandas library in your code, then this is a convenient method.

7. Using for loop

This is perhaps the most basic way to remove duplicates in a list. First, we create an empty list to store the items without duplicates. We simply loop through the items of list. In each iteration, we check if the item is present in the new list. If not, then we append the item into the list.

data = [5, 4, 4, 1, 2, 3, 3]
result = []

for val in data:
if val not in result:
result.append(val)
print(result) # output is [5, 4, 1, 2, 3]

Although this is the most verbose solution among all, it provides tremendous flexibility and customization. For example, here is an example to exclude item=4 from de-duplication.

data = [5, 4, 4, 1, 2, 3, 3]
result = []

for val in data:
if val not in result or val==4:
result.append(val)
print(result) # output is [5, 4, 4, 1, 2, 3]

Secondly, it also preserves the order of items in the original list.

For Strings

So far, we have seen how to remove duplicates in Python list. But these solutions can also be applied to strings. The result in each case, we will be a list of unique characters from the string. Here is an example to demonstrate it.

data = "good morning"

print(list(set(data))) # output is [' ', 'n', 'i', 'g', 'o', 'r', 'd', 'm']
print(list(dict.fromkeys(data))) # output is ['g', 'o', 'd', ' ', 'm', 'r', 'n', 'i']
result=[]
[result.append(val) for val in data if val not in result]
print(result) # output is ['g', 'o', 'd', ' ', 'm', 'r', 'n', 'i']

print([*{*data}]) # output is [' ', 'n', 'i', 'g', 'o', 'r', 'd', 'm']

The unique() function in Numpy and Pandas cannot be used for strings.

Conclusion

In this article, we have learnt many different ways to remove duplicates in Python list. If you are not particular about the order of items, then you can use set() function since it contains less overhead. If you need to preserve order of items, then you can use OrderedDict.fromkeys or dict.fromkeys function. If you do not want a direct list of unique items but need to customize the deduplication, then you can use list comprehension or plain for-loop. If you are already using Numpy or Pandas library in your program, then you can use their unique() and tolist() functions to remove duplicates.

Also read:

How to Insert New Column to Pandas Dataframe
How to Change Order of Dataframe Columns in Python
How to Select Multiple Columns in Pandas Dataframe

Leave a Reply

Your email address will not be published. Required fields are marked *