39

Let's say I have a list of student cities and the size of it could be 100 or 1000, and I want to filter out all duplicates cities.

I want a generic solution that I can use to remove all duplicate strings from any slice.

I am new to Go Language, So I tried to do it by looping and checking if the element exists using another loop function.

Students' Cities List (Data):

studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}

Functions that I created, and it's doing the job:

func contains(s []string, e string) bool {
    for _, a := range s {
        if a == e {
            return true
        }
    }
    return false
}

func removeDuplicates(strList []string) []string {
    list := []string{}
    for _, item := range strList {
        fmt.Println(item)
        if contains(list, item) == false {
            list = append(list, item)
        }
    }
    return list
}

My solution test

func main() {
    studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}

    uniqueStudentsCities := removeDuplicates(studentsCities)
    
    fmt.Println(uniqueStudentsCities) // Expected output [Mumbai Delhi Ahmedabad Bangalore Kolkata Pune]
}

I believe that the above solution that I tried is not an optimum solution. Therefore, I need help from you guys to suggest the fastest way to remove duplicates from the slice?

I checked StackOverflow, this question is not being asked yet, so I didn't get any solution.

3

18 Answers 18

56

I found Burak's and Fazlan's solution helpful. Based on that, I implemented the simple functions that help to remove or filter duplicate data from slices of strings, integers, or any other types with generic approach.

Here are my three functions, first is generic, second one for strings and last one for integers of slices. You have to pass your data and return all the unique values as a result.

Generic solution: => Go v1.18

func removeDuplicate[T comparable](sliceList []T) []T {
    allKeys := make(map[T]bool)
    list := []T{}
    for _, item := range sliceList {
        if _, value := allKeys[item]; !value {
            allKeys[item] = true
            list = append(list, item)
        }
    }
    return list
}

To remove duplicate strings from slice:

func removeDuplicateStr(strSlice []string) []string {
    allKeys := make(map[string]bool)
    list := []string{}
    for _, item := range strSlice {
        if _, value := allKeys[item]; !value {
            allKeys[item] = true
            list = append(list, item)
        }
    }
    return list
}

To remove duplicate integers from slice:

func removeDuplicateInt(intSlice []int) []int {
    allKeys := make(map[int]bool)
    list := []int{}
    for _, item := range intSlice {
        if _, value := allKeys[item]; !value {
            allKeys[item] = true
            list = append(list, item)
        }
    }
    return list
}

You can update the slice type, and it will filter out all duplicates data for all types of slices.

Here is the GoPlayground link: https://go.dev/play/p/iyb97KcftMa

Sign up to request clarification or add additional context in comments.

5 Comments

Go 1.20 adds the comparable type. See my answer for an example. (based on Riyaz' code above - thank you!)
Slices and maps should be initialized in all cases where the capacity, or the upper bound thereof, can be computed. This is a textbook case where such values are known and should be employed.
How would you do that here, @RayfenWindspear? Wouldn't the list without duplicates possibly contain trailing zero-values?
I said initialize with the upper bounded capacity, not length. Generally the length is a product of just how the logic and the variables happen to go, but capacity will be a known value in most cases for what those variables are in any given case.
See go.dev/blog/slices-intro if you are unfamiliar with the internals and/or the difference between the two terms.
54

Sort and then compact, e.g. like this:

package main

import (
    "fmt"

    "slices"
)

func main() {
    list := []string{"c", "a", "b", "c", "b", "b"}
    slices.Sort(list)
    fmt.Println(slices.Compact(list)) // [a b c]
}

4 Comments

Great answer! The slices package was also added to the standard library in Go 1.21 pkg.go.dev/slices#Compact
This is now the better answer with the standard slices package. Updated playground link: go.dev/play/p/n6ozDU8RFgN
this only works if the duplicate items appear in a consecutive order, if they are scrambled across the slice it will not work.
@shubhang-b hence the sorting.
10

Adding this answer which worked for me, does require/include sorting, however.

func removeDuplicateStrings(s []string) []string {
    if len(s) < 1 {
        return s
    }

    sort.Strings(s)
    prev := 1
    for curr := 1; curr < len(s); curr++ {
        if s[curr-1] != s[curr] {
            s[prev] = s[curr]
            prev++
        }
    }

    return s[:prev]
}

For fun, I tried using generics! (Go 1.18+ only)

type SliceType interface {
    ~string | ~int | ~float64 // add more *comparable* types as needed
}

func removeDuplicates[T SliceType](s []T) []T {
    if len(s) < 1 {
        return s
    }

    // sort
    sort.SliceStable(s, func(i, j int) bool {
        return s[i] < s[j]
    })

    prev := 1
    for curr := 1; curr < len(s); curr++ {
        if s[curr-1] != s[curr] {
            s[prev] = s[curr]
            prev++
        }
    }

    return s[:prev]
}

Go Playground Link with tests: https://go.dev/play/p/bw1PP1osJJQ

Comments

7

You can do in-place replacement guided with a map:

processed := map[string]struct{}{}
w := 0
for _, s := range cities {
    if _, exists := processed[s]; !exists {
        // If this city has not been seen yet, add it to the list
        processed[s] = struct{}{}
        cities[w] = s
        w++
    }
}
cities = cities[:w]

3 Comments

Thanks man, Just I want to know that, is it the best solution that I can rely on to loop more than 1000+ cities slice?
It is a good way, until someone else comes up with a better way.
If you have a sorted list of cities you can do this without the map. In a sorted array, duplicates will be right next to each other. If you really want to not have a map you might sort the cities, if you do not mind sorting them. Possible optimizations depend on your problem.
7

reduce memory usage:

package main

import (
    "fmt"
    "reflect"
)

type void struct{}

func main() {

    digits := [6]string{"one", "two", "three", "four", "five", "five"}

    set := make(map[string]void, 6)

    for _, element := range digits {
        set[element] = void{}
    }

    fmt.Println(reflect.ValueOf(set).MapKeys())
}

p.s. playground

Comments

4

try: https://github.com/samber/lo#uniq

names := lo.Uniq[string]([]string{"Samuel", "John", "Samuel"})
// []string{"Samuel", "John"}

Comments

2

Go 1.20 adds the comparable generic type, which makes it possible to expand on Riyaz' answer like so:

func dedupeSlice[T comparable](sliceList []T) []T {
    dedupeMap := make(map[T]struct{})
    list := []T{}

    for _, slice := range sliceList {
        if _, exists := dedupeMap[slice]; !exists {
            dedupeMap[slice] = struct{}{}
            list = append(list, slice)
        }
    }

    return list
}

This allows you to feed in slices of any comparable type in the language, including strings, integers, floats, booleans, pointers, and more.

Comments

1

It can also be done with a set-like map:

ddpStrings := []string{}
m := map[string]struct{}{}

for _, s := range strings {
    if _, ok := m[scopeStr]; ok {
        continue
    }
    ddpStrings = append(ddpStrings, s)
    m[s] = struct{}{}
}

Comments

1

Simple to understand.

func RemoveDuplicate(array []string) []string {
    m := make(map[string]string)
    for _, x := range array {
        m[x] = x
    }
    var ClearedArr []string
    for x, _ := range m {
        ClearedArr = append(ClearedArr, x)
    }
    return ClearedArr
}

1 Comment

You don't need map[string]string. map[string]struct{} is sufficient
1

Updated answer based on Jasper's original answer now that slices is part of the standard library:

package main

import (
    "fmt"
    "slices"
)

func main() {
    list := []string{"a", "b", "c", "a", "a", "c", "d", "d"} //[a b c a a c d d]
    slices.Sort(list) //[a a a b c c d d]
    list = slices.Compact(list) //[a b c d]

    fmt.Printf("Compacted List: %v\n", list)
}

Playground: https://go.dev/play/p/n6ozDU8RFgN

Comments

0

If you want to don't waste memory allocating another array for copy the values, you can remove in place the value, as following:

package main

import "fmt"

var studentsCities = []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}

func contains(s []string, e string) bool {
    for _, a := range s {
        if a == e {
            return true
        }
    }
    return false
}

func main() {
    fmt.Printf("Cities before remove: %+v\n", studentsCities)
    for i := 0; i < len(studentsCities); i++ {
        if contains(studentsCities[i+1:], studentsCities[i]) {
            studentsCities = remove(studentsCities, i)
            i--
        }
    }
    fmt.Printf("Cities after remove: %+v\n", studentsCities)
}
func remove(slice []string, s int) []string {
    return append(slice[:s], slice[s+1:]...)
}

Result:

Cities before remove: [Mumbai Delhi Ahmedabad Mumbai Bangalore Delhi Kolkata Pune]
Cities after remove: [Ahmedabad Mumbai Bangalore Delhi Kolkata Pune]

1 Comment

depending on the data, allocating a new slice may be more efficient than continually copying the same data over and over again. If order doesn't matter (which the question doesn't specify), remove the items with individual swaps.
0
func UniqueNonEmptyElementsOf(s []string) []string {
    unique := make(map[string]bool, len(s))
    var us []string
    for _, elem := range s {
        if len(elem) != 0 {
            if !unique[elem] {
                us = append(us, elem)
                unique[elem] = true
            }
        }
    }

    return us
}

send the duplicated splice to the above function, this will return the splice with unique elements.

func main() {
    studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}

    uniqueStudentsCities := UniqueNonEmptyElementsOf(studentsCities)
    
    fmt.Println(uniqueStudentsCities)
}

Comments

0

Based on Riyaz's solution, you can use generics since Go 1.18

func removeDuplicate[T string | int](tSlice []T) []T {
    allKeys := make(map[T]bool)
    list := []T{}
    for _, item := range tSlice {
        if _, value := allKeys[item]; !value {
            allKeys[item] = true
            list = append(list, item)
        }
    }
    return list
}

Generics minimizes code duplication.

Go Playground link : https://go.dev/play/p/Y3fEtHJpP7Q

Comments

0

So far @snassr has given the best answer as it is the most optimized way in terms of memory (no extra memory) and runtime (nlogn). But one thing I want to emphasis here is if we want to delete any index/element of an array we should loop from end to start as it reduces complexity. If we loop from start to end then if we delete nth index then we will accidentally miss the nth element (which was n+1th before deleting nth element) as in the next iteration we will get the n+1th element.

Example Code

func Dedup(strs []string) {
    sort.Strings(strs)
    for i := len(strs) - 1; i > 0; i-- {
        if strs[i] == strs[i-1] {
            strs = append(strs[:i], strs[i+1:]...)
        }
    }
}

Comments

0

Here is a little bit simpler solution than the accepted answer

func RemoveDuplicates[T comparable](slice []T) []T {
    for i := 0; i < len(slice); i++ {
        for j := len(slice) - 1; j > i; j-- {
            if slice[i] == slice[j] {
                slice = append(slice[:j], slice[j+1:]...)
            }
        }
    }
    return slice
}

Comments

0

Modern, generic solution to remove duplicates without reordering (stable):

func RemoveDuplicates[T comparable](s []T) []T {
    alreadySeen := make(map[T]struct{}, len(s))
    return slices.DeleteFunc(s, func(val T) bool {
        _, duplicate := alreadySeen[val]
        alreadySeen[val] = struct{}{}
        return duplicate
    })
}

It works on all comparable types and uses slices.DeleteFunc for memory efficiency.

Comments

-1

Here's a mapless index based slice's duplicate "remover"/trimmer. It use a sort method.

The n value is always 1 value lower than the total of non duplicate elements that's because this methods compare the current (consecutive/single) elements with the next (consecutive/single) elements and there is no matches after the lasts so you have to pad it to include the last.

Note that this snippet doesn't empty the duplicate elements into a nil value. However since the n+1 integer start at the duplicated item's indexes, you can loop from said integer and nil the rest of the elements.

sort.Strings(strs)
for n, i := 0, 0; ; {
    if strs[n] != strs[i] {
        if i-n > 1 {
            strs[n+1] = strs[i]
        }
        n++
    }
    i++
    if i == len(strs) {
        if n != i {
            strs = strs[:n+1]
        }
        break
    }
}
fmt.Println(strs)

Comments

-4

To remove duplicate elements from array in go

        func main() {
        arr := []int{1,3,2,3,3,5}
        var i,j int
        var size = len(arr)
        for i=0;i<size;i++ {
                for j=i+1;j<size;j++ {
                        if arr[i]==arr[j] {
                                for k:=j;k<size-1;k++ {
                                        arr[k]=arr[k+1]
                                }
                                size--
                                j--
                        }
                }
        }
        for z:=0;z<size;z++ {
                fmt.Printf("%d\n",arr[z])
        }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.