Removing duplicates from a list in Haskell without elem

Question

I'm trying to define a function which will remove duplicates from a list. So far I have a working implementation:

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs)   | x `elem` xs   = rmdups xs
                | otherwise     = x : rmdups xs

However I'd like to rework this without using elem. What would be the best method for this?

I'd like to do this using my own function and not nub or nubBy.

Link to Data.List (nub) for when I'm googling it again...

mb21
– mb21

2015-10-17 08:58:11 +00:00
Commented Oct 17, 2015 at 8:58 — mb21
– mb21, Commented Oct 17, 2015 at 8:58

scvalex · Accepted Answer · 2013-04-19 16:36:24Z

66

Both your code and nub have O(N^2) complexity.

You can improve the complexity to O(N log N) and avoid using elem by sorting, grouping, and taking only the first element of each group.

Conceptually,

rmdups :: (Ord a) => [a] -> [a]
rmdups = map head . group . sort

Suppose you start with the list [1, 2, 1, 3, 2, 4]. By sorting it, you get, [1, 1, 2, 2, 3, 4]; by grouping that, you get, [[1, 1], [2, 2], [3], [4]]; finally, by taking the head of each list, you get [1, 2, 3, 4].

The full implementation of the above just involves expanding each function.

Note that this requires the stronger Ord constraint on the elements of the list, and also changes their order in the returned list.

edited Apr 19, 2013 at 16:36

answered Apr 19, 2013 at 16:29

scvalex

15.4k2 gold badges36 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Benjamin Hodgson Over a year ago

Very nice, but note that this places an Ord restriction on the list elements, rather than just Eq, and the order is not preserved.

scvalex Over a year ago

Good point. Made a note of that and the other change in semantics.

nh2 Over a year ago

ordNub provides a copy-paste-ready, stable (order-preserving) variant of @scvalex's suggestion to use Ord. It also contains analogues for \`, union` and intersect.

The Internet · Accepted Answer · 2016-02-27 03:55:40Z

43

Even easier.

import Data.Set 
mkUniq :: Ord a => [a] -> [a]
mkUniq = toList . fromList

Convert the set to a list of elements in O(n) time:

toList :: Set a -> [a]

Create a set from a list of elements in O(n log n) time:

fromList :: Ord a => [a] -> Set a

In python it would be no different.

def mkUniq(x): 
   return list(set(x)))

edited Feb 27, 2016 at 3:55

answered Sep 5, 2013 at 4:35

The Internet

8,11312 gold badges59 silver badges92 bronze badges

2 Comments

sam boosalis Over a year ago

elegant, but I don't think Set preserves the order.

The Internet Over a year ago

yes, he didn't mention order preservation in his OP.

Nikita Volkov · Accepted Answer · 2013-04-24 14:03:05Z

28

Same as @scvalex's solution the following has an O(n * log n) complexity and an Ord dependency. In difference to it, it preserves the order, keeping the first occurences of items.

import qualified Data.Set as Set

rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
  rmdups' _ [] = []
  rmdups' a (b : c) = if Set.member b a
    then rmdups' a c
    else b : rmdups' (Set.insert b a) c

Benchmark results

benchmark results

As you can see, the benchmark results prove this solution to be the most effective. You can find the source of this benchmark here.

edited Apr 24, 2013 at 14:03

answered Apr 19, 2013 at 18:17

Nikita Volkov

43.4k12 gold badges98 silver badges172 bronze badges

2 Comments

Benjamin Hodgson Over a year ago

+1. Set is definitely a more efficient data structure for large inputs. I'd like to see where the standard library's nub fits into this graph - does laziness have an effect on the performance?

Inaimathi Over a year ago

Heh. I was going to write up and submit a version using Set insertion, but you beat me to it. Good show.

Benjamin Hodgson · Accepted Answer · 2015-04-22 21:31:38Z

23

I don't think you'll be able to do it without elem (or your own re-implementation of it).

However, there is a semantic issue with your implementation. When elements are duplicated you're keeping the last one. Personally, I'd expect it to keep the first duplicate item and drop the rest.

*Main> rmdups "abacd"
"bacd"

The solution is to thread the 'seen' elements through as a state variable.

removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
    where rdHelper seen [] = seen
          rdHelper seen (x:xs)
              | x `elem` seen = rdHelper seen xs
              | otherwise = rdHelper (seen ++ [x]) xs

This is more-or-less how nub is implemented in the standard library (read the source here). The small difference in nub's implementation ensures that it is non-strict, while removeDuplicates above is strict (it consumes the entire list before returning).

Primitive recursion is actually overkill here, if you're not worried about strictness. removeDuplicates can be implemented in one line with foldl:

removeDuplicates2 = foldl (\seen x -> if x `elem` seen
                                      then seen
                                      else seen ++ [x]) []

edited Apr 22, 2015 at 21:31

answered Apr 19, 2013 at 16:03

Benjamin Hodgson

44.9k18 gold badges115 silver badges168 bronze badges

2 Comments

Nikita Volkov Over a year ago

@BradStevenson These solutions are based on very inefficient operations - both the elem function and (++) have an O(n) complexity on list. Although Haskell's laziness protects the algorithm from executing (++) on every cycle, these implementations still fall short quite contrastly as compared to alternative implementations presented in other answers. See benchmark results.

Andrei-Niculae Petre Over a year ago

I think removeDuplicates3 = foldr (\x seen -> if x `elem` seen then seen else x : seen) [] runs faster than removeDuplicates2, as (:) operation is constant.

fp_mora · Accepted Answer · 2018-05-07 21:08:22Z

4

Graham Hutton has a rmdups function on p. 86 of Programming in Haskell. It preserves order. It is as follows.

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : filter (/= x) (rmdups xs)
rmdups "maximum-minimum"

"maxiu-n"

This was bothering me until I saw Hutton's function. Then, I tried, again. There are two versions, The first keeps the last duplicate, the second keeps the first.

rmdups ls = [d|(z,d)<- zip [0..] ls, notElem d $ take z ls]
rmdups "maximum-minimum"

"maxiu-n"

If you want to take the first and not the last duplicate elements of the list, as you are trying to do, just change take to drop in the function and change the enumeration zip [0..] to zip [1..].

edited May 7, 2018 at 21:08

answered May 5, 2018 at 23:12

fp_mora

7246 silver badges11 bronze badges

Comments

Muhammed Hasan Celik · Accepted Answer · 2017-09-19 19:58:16Z

3

It is too late to answer this question but I want to share my solution which is original without using elem and don't assume Ord.

rmdups' :: (Eq a) => [a] -> [a]
rmdups' [] = []
rmdups' [x] = [x]
rmdups' (x:xs) = x : [ k  | k <- rmdups'(xs), k /=x ]

This solution removes duplicates in the end of input, while question implementation deletes in the beginning. For example,

rmdups "maximum-minimum"
-- "ax-nium"

rmdups' "maximum-minimum"
-- ""maxiu-n"

Also, this code complexity is O(N*K) where N is the length of string and K is the number of unique characters in the string. N >= K thus, it will be O(N^2) in worst-case but this means that there is no repetition in the string and this is unlike since you try to delete duplicates in the string.

edited Sep 19, 2017 at 19:58

answered Sep 16, 2017 at 9:12

Muhammed Hasan Celik

6821 gold badge8 silver badges20 bronze badges

Comments

Mahmoud · Accepted Answer · 2021-05-13 18:19:33Z

3

I would like to add to @fp_mora answer that on page 136 of Programming in Haskell there is another slightly different implementation:

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x : xs) = x : rmdups (filter (/= x) xs)

It was easier for me to wrap my head around this one.

answered May 13, 2021 at 18:19

Mahmoud

11.6k2 gold badges40 silver badges52 bronze badges

Comments

score 2 · Accepted Answer · 2017-08-23 03:58:22Z

2

Using recursion-schemes:

import Data.Functor.Foldable

dedup :: (Eq a) => [a] -> [a]
dedup = para pseudoalgebra
    where pseudoalgebra Nil                 = []
          pseudoalgebra (Cons x (past, xs)) = if x `elem` past then xs else x:xs

While this is certainly more advanced, I think it is quite elegant and shows off some worthwhile functional programming paradigms.

edited Aug 23, 2017 at 3:58

answered Aug 23, 2017 at 3:06

user8174234

Comments

mrkanet · Accepted Answer · 2019-01-04 06:43:19Z

1

You can use this compress function too.

cmprs ::Eq a=>[a] -> [a]
--cmprs [] = [] --not necessary
cmprs (a:as) 
    |length as == 1 = as
    |a == (head as) = cmprs as
    |otherwise = [a]++cmprs as

answered Jan 4, 2019 at 6:43

mrkanet

217 bronze badges

Comments

Mifeet · Accepted Answer · 2015-11-30 16:04:22Z

0

...or by using the function union from Data.List applied to itself:

import Data.List

unique x = union x x

edited Nov 30, 2015 at 16:04

Mifeet

13.8k6 gold badges62 silver badges110 bronze badges

answered Nov 30, 2015 at 14:56

Fabio

331 bronze badge

3 Comments

Ludovic Kuty Over a year ago

"Duplicates, and elements of the first list, are removed from the the second list, but if the first list contains duplicates, so will the result.". Cfr documentation

Blomex Over a year ago

unique x = union [] x would probably be better idea.

outoftime Over a year ago

It is slow as hell

Lokesh Mohanty · Accepted Answer · 2020-01-09 10:22:13Z

0

Using dropWhile also works, but remember to sort the list before using this

rmdups :: (Eq a) => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : (rmdups $ dropWhile (\y -> y == x) xs)

answered Jan 9, 2020 at 10:22

Lokesh Mohanty

34 bronze badges

Comments

Federico Fogli · Accepted Answer · 2022-06-16 15:35:28Z

0

remdups xs = foldr (\y ys -> y:filter (/= y) ys) [] xs

this apply the function to the first element and the list cnstructed recursively in the same way. at the first iteration basically you create a list where you only know the first element, and the rest of the list is constructed in the same way (adding the element to the list), and then is filtered to remove the item that specific cycle is adding.

So every iteration adds an element (call it X) to the list and filter the list removing all elements =X

edited Jun 16, 2022 at 15:35

answered Jun 16, 2022 at 15:29

Federico Fogli

363 bronze badges

Comments

Sebastián Palma · Accepted Answer · 2021-03-16 19:28:37Z

-1

remove_duplicates (x:xs)
  | xs == []       = [x]
  | x == head (xs) = remove_duplicates xs
  | otherwise      = x : remove_duplicates xs

You could try doing this. I've merely replaced 'elem' with my own implementation. It works for me.

edited Mar 16, 2021 at 19:28

Sebastián Palma

33.6k6 gold badges45 silver badges65 bronze badges

answered Nov 30, 2020 at 18:52

RISHAV DAS

1

1 Comment

Mahmoud Over a year ago

this only works for consecutive duplicates (they have to next to each other).

Collectives™ on Stack Overflow

Removing duplicates from a list in Haskell without elem

13 Answers 13

3 Comments

2 Comments

Benchmark results

2 Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

3 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

3 Comments

2 Comments

Benchmark results

2 Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

3 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related