How to find the statistical mode?

Question

In R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the object, not the value that occurs the most in its argument. But is there is a standard library function that implements the statistical mode for a vector (or list)?

You need to clarify whether your data is integer, numeric, factor...? Mode estimation for numerics will be different, and uses intervals. See modeest — smci
– smci, Commented May 10, 2012 at 23:56
Why does R not have a built-in function for mode? Why does R consider mode to be the same as the function class ? — Corey Levinson
– Corey Levinson, Commented Nov 13, 2018 at 17:58

Ken Williams · Accepted Answer · 2018-09-14 15:19:37Z

536

Answer recommended by R Language Collective

One more solution, which works for both numeric & character/factor data:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.

If your data set might have multiple modes, the above solution takes the same approach as which.max, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

edited Sep 14, 2018 at 15:19

answered Nov 18, 2011 at 21:33

Ken Williams

24.3k12 gold badges100 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

DavidC Over a year ago

Also works for logicals! Preserves data type for all types of vectors (unlike some implementations in other answers).

digEmAll Over a year ago

This does not return all the modes in case of multi-modal dataset (e.g. c(1,1,2,2)). You should change your last line with : tab <- tabulate(match(x, ux)); ux[tab == max(tab)]

Ken Williams Over a year ago

@verybadatthis For that, you would replace ux[which.max(tabulate(match(x, ux)))] with just max(tabulate(match(x, ux))).

Enrique Pérez Herrero Over a year ago

You note that Mode(1:3) gives 1 and Mode(3:1) gives 3, so Mode returns the most frequent element or the first one if all of them are unique.

not2qubit Over a year ago

As Enrique said: This fails when there is no mode, and instead give you the impression that the first value is the mode. Would have been far better if it returned 0 or at NA in those cases.

|

Dan · Accepted Answer · 2010-03-30 18:19:29Z

83

found this on the r mailing list, hope it's helpful. It is also what I was thinking anyways. You'll want to table() the data, sort and then pick the first name. It's hackish but should work.

names(sort(-table(x)))[1]

answered Mar 30, 2010 at 18:19

Dan

6,4967 gold badges42 silver badges42 bronze badges

4 Comments

mjv Over a year ago

That's a clever work around as well. It has a few drawbacks: the sort algorithm can be more space and time consuming than max() based approaches (=> to be avoided for bigger sample lists). Also the ouput is of mode (pardon the pun/ambiguity) "character" not "numeric". And, of course, the need to test for multi-modal distribution would typically require the storing of the sorted table to avoid crunching it anew.

vonjd Over a year ago

I measured running time with a factor of 1e6 elements and this solution was faster than the accepted answer by almost factor 3!

Abhishek Singh Over a year ago

I just converted it into number using as.numeric(). Works perfectly fine. Thank you!

vonjd Over a year ago

The problem with this solution is that it is not correct in cases where there is more than one mode.

user1967010 · Accepted Answer · 2022-11-14 20:56:48Z

77

There is package modeest which provide estimators of the mode of univariate unimodal (and sometimes multimodal) data and values of the modes of usual probability distributions.

mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)

library(modeest)
mlv(mySamples, method = "mfv")

Mode (most likely value): 19 
Bickel's modal skewness: -0.1 
Call: mlv.default(x = mySamples, method = "mfv")

For more information see this page

You may also look for "mode estimation" in CRAN Task View: Probability Distributions. Two new packages have been proposed.

edited Nov 14, 2022 at 20:56

user1967010

134 bronze badges

answered Mar 30, 2010 at 19:05

Yorgos

30.6k20 gold badges114 silver badges152 bronze badges

5 Comments

atomicules Over a year ago

So to just get the mode value, mfv(mySamples)[1]. The 1 being important as it actually returns the most frequent values.

Agus camacho Over a year ago

it does not seem to work in this example: library(modeest) a <- rnorm( 50, 30, 2 ) b <- rnorm( 100, 35, 2 ) c <- rnorm( 20, 37, 2 ) temperatureºC <- c( a, b, c ) hist(temperatureºC) #mean abline(v=mean(temperatureºC),col="red",lwd=2) #median abline(v=median(temperatureºC),col="black",lwd=2) #mode abline(v=mlv(temperatureºC, method = "mfv")[1],col="orange",lwd=2)

petzi Over a year ago

@atomicules: with [1] you get only the first mode. For bimodal or general n-modal distribution you would need just mfv(mySamples)

Dr Nisha Arora Over a year ago

For R version 3.6.0, it says function 'could not find function "mlv"' and the same error when I tried mfv(mysamples). Is it depreciated?

petzi Over a year ago

@DrNishaArora: Did you download the 'modeest' package?

Gregor Thomas · Accepted Answer · 2015-03-08 20:24:38Z

66

I found Ken Williams post above to be great, I added a few lines to account for NA values and made it a function for ease.

Mode <- function(x, na.rm = FALSE) {
  if(na.rm){
    x = x[!is.na(x)]
  }

  ux <- unique(x)
  return(ux[which.max(tabulate(match(x, ux)))])
}

edited Mar 8, 2015 at 20:24

Gregor Thomas

147k22 gold badges185 silver badges320 bronze badges

answered Sep 3, 2014 at 3:21

jprockbelly

1,60517 silver badges32 bronze badges

1 Comment

Dan Houghton Over a year ago

I've found a couple of speed ups to this, see answer below.

Rasmus Bååth · Accepted Answer · 2012-12-14 08:00:22Z

45

A quick and dirty way of estimating the mode of a vector of numbers you believe come from a continous univariate distribution (e.g. a normal distribution) is defining and using the following function:

estimate_mode <- function(x) {
  d <- density(x)
  d$x[which.max(d$y)]
}

Then to get the mode estimate:

x <- c(5.8, 5.6, 6.2, 4.1, 4.9, 2.4, 3.9, 1.8, 5.7, 3.2)
estimate_mode(x)
## 5.439788

answered Dec 14, 2012 at 8:00

Rasmus Bååth

5,2536 gold badges35 silver badges29 bronze badges

5 Comments

Jota Over a year ago

Just a note on this one: you can get a "mode" of any group of continuous numbers this way. The data don't need to come from a normal distribution to work. Here is an example taking numbers from a uniform distribution. set.seed(1); a<-runif(100); mode<-density(a)$x[which.max(density(a)$y)]; abline(v=mode)

Sergio Over a year ago

error in density.default(x, from = from, to = to) : need at least 2 points to select a bandwidth automatically

Rasmus Bååth Over a year ago

@xhie That error message tells you everything you need to know. If you just have one point you need to set the bandwidth manually when calling density. However, if you just have one datapoint then the value of that datapoint will probably be your best guess for the mode anyway...

Sergio Over a year ago

You are right, but i added just one tweak: estimate_mode <- function(x) { if (length(x)>1){ d <- density(x) d$x[which.max(d$y)] }else{ x } } I'm testing the method to estimate predominant direction wind, instead of mean of direction using vectorial average with circular package. I', working with points over a polygon grade, so , sometimes there is only one point with direction. Thanks!

Rasmus Bååth Over a year ago

@xhie Sounds reasonable :)

Sebastian · Accepted Answer · 2024-10-02 08:44:45Z

The generic function fmode() in the collapse package implements an optimized C-level hashing algorithm to computed the (weighted) mode. It is significantly faster than the above approaches, and also supports grouping and multithreading. It comes with methods for vectors, matrices, and (grouped) data.frame-like objects. Syntax:

library(collapse)
fmode(x, g = NULL, w = NULL, ..., ties = "first")

where x can be one of the above objects, g supplies an optional grouping vector or list of grouping vectors (for grouped mode calculations), and w (optionally) supplies a numeric weight vector. ties can be "first", "last", "min" or "max", e.g.

x <- c(1, 3, 2, 2, 4, 4, 1, 7, NA, NA, NA)
fmode(x)                # Default is ties = "first"
#> [1] 2
fmode(x, ties = "last")
#> [1] 1
fmode(x, ties = "min")
#> [1] 1
fmode(x, ties = "max")
#> [1] 4
fmode(x, na.rm = FALSE) # Here NA is the mode
#> [1] NA

As an example of the grouped data frame method, this computes a population weighted first mode of a heterogeneous development indicators dataset by income group (using 4 threads).

wlddev |> fgroup_by(income) |> fmode(POP, nthreads = 4)
#>                income      sum.POP       country iso3c       date year decade
#> 1         High income  58840837058 United States   USA 2020-01-01 2019   2010
#> 2          Low income  20949161394      Ethiopia   ETH 2020-01-01 2019   2010
#> 3 Lower middle income 113837684528         India   IND 2020-01-01 2019   2010
#> 4 Upper middle income 119606023798         China   CHN 2020-01-01 2019   2010
#>                  region  OECD      PCGDP   LIFEEX GINI        ODA
#> 1 Europe & Central Asia  TRUE 55753.1444 78.53902 40.0  -76339996
#> 2    Sub-Saharan Africa FALSE   602.6341 66.59700 35.0 4893290039
#> 3            South Asia FALSE  2151.7260 69.65600 35.7 2608629883
#> 4   East Asia & Pacific FALSE  8242.0546 76.91200 39.7 -559890015

A benchmark would be insightful. I'm seeing 2x to 5x speedups. +1

Chris · Accepted Answer · 2015-03-11 14:31:31Z

14

The following function comes in three forms:

method = "mode" [default]: calculates the mode for a unimodal vector, else returns an NA
method = "nmodes": calculates the number of modes in the vector
method = "modes": lists all the modes for a unimodal or polymodal vector

modeav <- function (x, method = "mode", na.rm = FALSE)
{
  x <- unlist(x)
  if (na.rm)
    x <- x[!is.na(x)]
  u <- unique(x)
  n <- length(u)
  #get frequencies of each of the unique values in the vector
  frequencies <- rep(0, n)
  for (i in seq_len(n)) {
    if (is.na(u[i])) {
      frequencies[i] <- sum(is.na(x))
    }
    else {
      frequencies[i] <- sum(x == u[i], na.rm = TRUE)
    }
  }
  #mode if a unimodal vector, else NA
  if (method == "mode" | is.na(method) | method == "")
  {return(ifelse(length(frequencies[frequencies==max(frequencies)])>1,NA,u[which.max(frequencies)]))}
  #number of modes
  if(method == "nmode" | method == "nmodes")
  {return(length(frequencies[frequencies==max(frequencies)]))}
  #list of all modes
  if (method == "modes" | method == "modevalues")
  {return(u[which(frequencies==max(frequencies), arr.ind = FALSE, useNames = FALSE)])}  
  #error trap the method
  warning("Warning: method not recognised.  Valid methods are 'mode' [default], 'nmodes' and 'modes'")
  return()
}

edited Mar 11, 2015 at 14:31

answered Mar 25, 2013 at 17:21

Chris

1531 silver badge8 bronze badges

5 Comments

Grzegorz Adam Kowalski Over a year ago

In your description of this functions you swapped "modes" and "nmodes". See the code. Actually, "nmodes" returns vector of values and "modes" returns number of modes. Nevethless your function is the very best soultion to find modes I've seen so far.

Chris Over a year ago

Many thanks for the comment. "nmode" and "modes" should now behave as expected.

hugovdberg Over a year ago

Your function works almost, except when each value occurs equally often using method = 'modes'. Then the function returns all unique values, however actually there is no mode so it should return NA instead. I'll add another answer containing a slightly optimised version of your function, thanks for the inspiration!

Chris Over a year ago

The only time a non-empty numeric vector should normally generate an NA with this function is when using the default method on a polymodal vector. The mode of a simple sequence of numbers such as 1,2,3,4 is actually all of those numbers in the sequence, so for similar sequences "modes" is behaving as expected. e.g. modeave(c(1,2,3,4), method = "modes") returns [1] 1 2 3 4 Regardless of this, I'd be very interested to see the function optimised as it's fairly resource intensive in its current state

Chris Over a year ago

For a more efficient version of this function, see @hugovdberg's post above :)

teucer · Accepted Answer · 2010-03-31 06:45:06Z

12

Here, another solution:

freq <- tapply(mySamples,mySamples,length)
#or freq <- table(mySamples)
as.numeric(names(freq)[which.max(freq)])

edited Mar 31, 2010 at 6:45

answered Mar 30, 2010 at 20:21

teucer

6,2682 gold badges28 silver badges36 bronze badges

2 Comments

Jonathan Chang Over a year ago

You can replace the first line with table.

teucer Over a year ago

I was thinking that 'tapply' is more efficient than 'table', but they both use a for loop. I think the solution with table is equivalent. I update the answer.

AleRuete · Accepted Answer · 2013-09-12 11:50:04Z

11

I can't vote yet but Rasmus Bååth's answer is what I was looking for. However, I would modify it a bit allowing to contrain the distribution for example fro values only between 0 and 1.

estimate_mode <- function(x,from=min(x), to=max(x)) {
  d <- density(x, from=from, to=to)
  d$x[which.max(d$y)]
}

We aware that you may not want to constrain at all your distribution, then set from=-"BIG NUMBER", to="BIG NUMBER"

answered Sep 12, 2013 at 11:50

AleRuete

3233 silver badges7 bronze badges

2 Comments

Sergio Over a year ago

error in density.default(x, from = from, to = to) : need at least 2 points to select a bandwidth automatically

AleRuete Over a year ago

x should be a vector

hugovdberg · Accepted Answer · 2016-06-29 11:32:02Z

11

Based on @Chris's function to calculate the mode or related metrics, however using Ken Williams's method to calculate frequencies. This one provides a fix for the case of no modes at all (all elements equally frequent), and some more readable method names.

Mode <- function(x, method = "one", na.rm = FALSE) {
  x <- unlist(x)
  if (na.rm) {
    x <- x[!is.na(x)]
  }

  # Get unique values
  ux <- unique(x)
  n <- length(ux)

  # Get frequencies of all unique values
  frequencies <- tabulate(match(x, ux))
  modes <- frequencies == max(frequencies)

  # Determine number of modes
  nmodes <- sum(modes)
  nmodes <- ifelse(nmodes==n, 0L, nmodes)

  if (method %in% c("one", "mode", "") | is.na(method)) {
    # Return NA if not exactly one mode, else return the mode
    if (nmodes != 1) {
      return(NA)
    } else {
      return(ux[which(modes)])
    }
  } else if (method %in% c("n", "nmodes")) {
    # Return the number of modes
    return(nmodes)
  } else if (method %in% c("all", "modes")) {
    # Return NA if no modes exist, else return all modes
    if (nmodes > 0) {
      return(ux[which(modes)])
    } else {
      return(NA)
    }
  }
  warning("Warning: method not recognised.  Valid methods are 'one'/'mode' [default], 'n'/'nmodes' and 'all'/'modes'")
}

Since it uses Ken's method to calculate frequencies the performance is also optimised, using AkselA's post I benchmarked some of the previous answers as to show how my function is close to Ken's in performance, with the conditionals for the various ouput options causing only minor overhead:

edited Jun 29, 2016 at 11:32

answered Jun 29, 2016 at 11:05

hugovdberg

1,65115 silver badges25 bronze badges

8 Comments

AkselA Over a year ago

The code you present appears to be a more or less straight copy of the Mode function found in the pracma package. Care to explain?

hugovdberg Over a year ago

Really? Apparently I'm not the only one to think this is a good way to calculate the Mode, but I honestly didn't know that (never knew that package before just now). I cleaned up Chris's function and improved on it by leveraging Ken's version, and if it resembles someone else's code that is purely coincidental.

hugovdberg Over a year ago

I looked into it just now, but which version of the pracma package do you refer to? Version 1.9.3 has a completely different implementation as far as I can see.

Chris Over a year ago

Nice amendment to the function. After some further reading, I'm led to the conclusion that there is no consensus on whether uniform or monofrequency distributions have nodes, some sources saying that the list of modes are the distributions themselves, others that the there is no node. The only agreement is that producing a list of modes for such distributions is neither very informative nor particularly meaningful. IF you wish the above function to produce modes such cases then remove the line: nmodes <- ifelse(nmodes==n, 0L, nmodes)

hugovdberg Over a year ago

@greendiod sorry, I missed your comment. It is available through this gist: gist.github.com/Hugovdberg/0f00444d46efd99ed27bbe227bdc4d37

|

C8H10N4O2 · Accepted Answer · 2017-07-20 17:23:26Z

A small modification to Ken Williams' answer, adding optional params na.rm and return_multiple.

Unlike the answers relying on names(), this answer maintains the data type of x in the returned value(s).

stat_mode <- function(x, return_multiple = TRUE, na.rm = FALSE) {
  if(na.rm){
    x <- na.omit(x)
  }
  ux <- unique(x)
  freq <- tabulate(match(x, ux))
  mode_loc <- if(return_multiple) which(freq==max(freq)) else which.max(freq)
  return(ux[mode_loc])
}

To show it works with the optional params and maintains data type:

foo <- c(2L, 2L, 3L, 4L, 4L, 5L, NA, NA)
bar <- c('mouse','mouse','dog','cat','cat','bird',NA,NA)

str(stat_mode(foo)) # int [1:3] 2 4 NA
str(stat_mode(bar)) # chr [1:3] "mouse" "cat" NA
str(stat_mode(bar, na.rm=T)) # chr [1:2] "mouse" "cat"
str(stat_mode(bar, return_mult=F, na.rm=T)) # chr "mouse"

Thanks to @Frank for simplification.

d-cubed · Accepted Answer · 2013-06-23 21:22:15Z

I've written the following code in order to generate the mode.

MODE <- function(dataframe){
    DF <- as.data.frame(dataframe)

    MODE2 <- function(x){      
        if (is.numeric(x) == FALSE){
            df <- as.data.frame(table(x))  
            df <- df[order(df$Freq), ]         
            m <- max(df$Freq)        
            MODE1 <- as.vector(as.character(subset(df, Freq == m)[, 1]))

            if (sum(df$Freq)/length(df$Freq)==1){
                warning("No Mode: Frequency of all values is 1", call. = FALSE)
            }else{
                return(MODE1)
            }

        }else{ 
            df <- as.data.frame(table(x))  
            df <- df[order(df$Freq), ]         
            m <- max(df$Freq)        
            MODE1 <- as.vector(as.numeric(as.character(subset(df, Freq == m)[, 1])))

            if (sum(df$Freq)/length(df$Freq)==1){
                warning("No Mode: Frequency of all values is 1", call. = FALSE)
            }else{
                return(MODE1)
            }
        }
    }

    return(as.vector(lapply(DF, MODE2)))
}

Let's try it:

MODE(mtcars)
MODE(CO2)
MODE(ToothGrowth)
MODE(InsectSprays)

Nsquare · Accepted Answer · 2016-09-13 07:01:34Z

6

This hack should work fine. Gives you the value as well as the count of mode:

Mode <- function(x){
a = table(x) # x is a vector
return(a[which.max(a)])
}

answered Sep 13, 2016 at 7:01

Nsquare

911 silver badge5 bronze badges

Comments

Dan Houghton · Accepted Answer · 2018-11-13 23:33:58Z

5

This builds on jprockbelly's answer, by adding a speed up for very short vectors. This is useful when applying mode to a data.frame or datatable with lots of small groups:

Mode <- function(x) {
   if ( length(x) <= 2 ) return(x[1])
   if ( anyNA(x) ) x = x[!is.na(x)]
   ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}

edited Nov 13, 2018 at 23:33

answered Nov 13, 2018 at 22:56

Dan Houghton

1212 silver badges4 bronze badges

Comments

statistic1979 · Accepted Answer · 2014-02-07 04:31:35Z

4

This works pretty fine

> a<-c(1,1,2,2,3,3,4,4,5)
> names(table(a))[table(a)==max(table(a))]

edited Feb 7, 2014 at 4:31

answered Feb 7, 2014 at 4:16

statistic1979

451 silver badge4 bronze badges

Comments

mjv · Accepted Answer · 2010-03-30 19:04:30Z

R has so many add-on packages that some of them may well provide the [statistical] mode of a numeric list/series/vector.

However the standard library of R itself doesn't seem to have such a built-in method! One way to work around this is to use some construct like the following (and to turn this to a function if you use often...):

mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)
tabSmpl<-tabulate(mySamples)
SmplMode<-which(tabSmpl== max(tabSmpl))
if(sum(tabSmpl == max(tabSmpl))>1) SmplMode<-NA
> SmplMode
[1] 19

For bigger sample list, one should consider using a temporary variable for the max(tabSmpl) value (I don't know that R would automatically optimize this)

Reference: see "How about median and mode?" in this KickStarting R lesson
This seems to confirm that (at least as of the writing of this lesson) there isn't a mode function in R (well... mode() as you found out is used for asserting the type of variables).

Ernest S Kirubakaran · Accepted Answer · 2015-09-06 09:09:18Z

3

Here is a function to find the mode:

mode <- function(x) {
  unique_val <- unique(x)
  counts <- vector()
  for (i in 1:length(unique_val)) {
    counts[i] <- length(which(x==unique_val[i]))
  }
  position <- c(which(counts==max(counts)))
  if (mean(counts)==max(counts)) 
    mode_x <- 'Mode does not exist'
  else 
    mode_x <- unique_val[position]
  return(mode_x)
}

answered Sep 6, 2015 at 9:09

Ernest S Kirubakaran

1,56213 silver badges16 bronze badges

Comments

David Arenburg · Accepted Answer · 2017-02-21 17:43:14Z

3

Below is the code which can be use to find the mode of a vector variable in R.

a <- table([vector])

names(a[a==max(a)])

edited Feb 21, 2017 at 17:43

David Arenburg

92.4k18 gold badges145 silver badges202 bronze badges

answered Feb 21, 2017 at 10:58

Gaurav

1,09910 silver badges11 bronze badges

Comments

Abhiroop Sarkar · Accepted Answer · 2018-04-25 00:46:15Z

3

There are multiple solutions provided for this one. I checked the first one and after that wrote my own. Posting it here if it helps anyone:

Mode <- function(x){
  y <- data.frame(table(x))
  y[y$Freq == max(y$Freq),1]
}

Lets test it with a few example. I am taking the iris data set. Lets test with numeric data

> Mode(iris$Sepal.Length)
[1] 5

which you can verify is correct.

Now the only non numeric field in the iris dataset(Species) does not have a mode. Let's test with our own example

> test <- c("red","red","green","blue","red")
> Mode(test)
[1] red

EDIT

As mentioned in the comments, user might want to preserve the input type. In which case the mode function can be modified to:

Mode <- function(x){
  y <- data.frame(table(x))
  z <- y[y$Freq == max(y$Freq),1]
  as(as.character(z),class(x))
}

The last line of the function simply coerces the final mode value to the type of the original input.

edited Apr 25, 2018 at 0:46

answered Apr 24, 2018 at 12:43

Abhiroop Sarkar

2,3211 gold badge30 silver badges48 bronze badges

1 Comment

Frank Over a year ago

This returns a factor, while the user probably wants to preserve the type of the input. Maybe add a middle step y[,1] <- sort(unique(x))

alicederyn · Accepted Answer · 2012-12-04 14:29:14Z

2

Another simple option that gives all values ordered by frequency is to use rle:

df = as.data.frame(unclass(rle(sort(mySamples))))
df = df[order(-df$lengths),]
head(df)

answered Dec 4, 2012 at 14:29

alicederyn

13.3k6 gold badges53 silver badges60 bronze badges

Comments

Yollanda Beetroot · Accepted Answer · 2014-05-02 14:57:26Z

2

I would use the density() function to identify a smoothed maximum of a (possibly continuous) distribution :

function(x) density(x, 2)$x[density(x, 2)$y == max(density(x, 2)$y)]

where x is the data collection. Pay attention to the adjust paremeter of the density function which regulate the smoothing.

edited May 2, 2014 at 14:57

answered May 2, 2014 at 10:03

Yollanda Beetroot

3336 silver badges15 bronze badges

Comments

RandallShanePhD · Accepted Answer · 2014-12-24 16:08:02Z

2

While I like Ken Williams simple function, I would like to retrieve the multiple modes if they exist. With that in mind, I use the following function which returns a list of the modes if multiple or the single.

rmode <- function(x) {
  x <- sort(x)  
  u <- unique(x)
  y <- lapply(u, function(y) length(x[x==y]))
  u[which( unlist(y) == max(unlist(y)) )]
}

answered Dec 24, 2014 at 16:08

RandallShanePhD

5,8642 gold badges23 silver badges31 bronze badges

3 Comments

asachet Over a year ago

It would be more consistent for programmatic use if it always returned a list -- of length 1 if there is only one mode

RandallShanePhD Over a year ago

That's a valid point @antoine-sac. What I like about this solution is the vector that is returned leaves the answers easily addressable. Simply address the output of the function: r <- mode( c(2, 2, 3, 3)) with the modes available at r[1] and r[2]. Still, you do make a good point!!

asachet Over a year ago

Precisely, this is where your solution falls short. If mode returns a list with several values, then r[1] is not the first value ; it is instead a list of length 1 containing the first value and you have to do r[[1]] to get the first mode as a numeric and not a list. Now when there is a single mode, your r is not a list so r[1] works, which is why I thought it was inconsistent. But since r[[1]] also works when r is a simple vector, there is actually a consistency i hadn't realised in that you can always use [[ to access elements.

AkselA · Accepted Answer · 2016-05-27 03:08:23Z

I was looking through all these options and started to wonder about their relative features and performances, so I did some tests. In case anyone else are curious about the same, I'm sharing my results here.

Not wanting to bother about all the functions posted here, I chose to focus on a sample based on a few criteria: the function should work on both character, factor, logical and numeric vectors, it should deal with NAs and other problematic values appropriately, and output should be 'sensible', i.e. no numerics as character or other such silliness.

I also added a function of my own, which is based on the same rle idea as chrispy's, except adapted for more general use:

library(magrittr)

Aksel <- function(x, freq=FALSE) {
    z <- 2
    if (freq) z <- 1:2
    run <- x %>% as.vector %>% sort %>% rle %>% unclass %>% data.frame
    colnames(run) <- c("freq", "value")
    run[which(run$freq==max(run$freq)), z] %>% as.vector   
}

set.seed(2)

F <- sample(c("yes", "no", "maybe", NA), 10, replace=TRUE) %>% factor
Aksel(F)

# [1] maybe yes  

C <- sample(c("Steve", "Jane", "Jonas", "Petra"), 20, replace=TRUE)
Aksel(C, freq=TRUE)

# freq value
#    7 Steve

I ended up running five functions, on two sets of test data, through microbenchmark. The function names refer to their respective authors:

Chris' function was set to method="modes" and na.rm=TRUE by default to make it more comparable, but other than that the functions were used as presented here by their authors.

In matter of speed alone Kens version wins handily, but it is also the only one of these that will only report one mode, no matter how many there really are. As is often the case, there's a trade-off between speed and versatility. In method="mode", Chris' version will return a value iff there is one mode, else NA. I think that's a nice touch. I also think it's interesting how some of the functions are affected by an increased number of unique values, while others aren't nearly as much. I haven't studied the code in detail to figure out why that is, apart from eliminating logical/numeric as a the cause.

I like that you included code for the benchmarking, but benchmarking on 20 values is pretty pointless. I'd suggest running on at least a few hundred thousand records.

Jibin · Accepted Answer · 2018-09-20 05:47:11Z

2

Mode can't be useful in every situations. So the function should address this situation. Try the following function.

Mode <- function(v) {
  # checking unique numbers in the input
  uniqv <- unique(v)
  # frquency of most occured value in the input data
  m1 <- max(tabulate(match(v, uniqv)))
  n <- length(tabulate(match(v, uniqv)))
  # if all elements are same
  same_val_check <- all(diff(v) == 0)
  if(same_val_check == F){
    # frquency of second most occured value in the input data
    m2 <- sort(tabulate(match(v, uniqv)),partial=n-1)[n-1]
    if (m1 != m2) {
      # Returning the most repeated value
      mode <- uniqv[which.max(tabulate(match(v, uniqv)))]
    } else{
      mode <- "Two or more values have same frequency. So mode can't be calculated."
    }
  } else {
    # if all elements are same
    mode <- unique(v)
  }
  return(mode)
}

Output,

x1 <- c(1,2,3,3,3,4,5)
Mode(x1)
# [1] 3

x2 <- c(1,2,3,4,5)
Mode(x2)
# [1] "Two or more varibles have same frequency. So mode can't be calculated."

x3 <- c(1,1,2,3,3,4,5)
Mode(x3)
# [1] "Two or more values have same frequency. So mode can't be calculated."

edited Sep 20, 2018 at 5:47

answered Sep 5, 2018 at 10:09

Jibin

3333 silver badges9 bronze badges

2 Comments

not2qubit Over a year ago

Sorry, I just don't see how this adds anything new to what has already been posted. In addition your output seem inconsistent with your function above.

Gregor Thomas Over a year ago

Returning strings with messages is not useful programmatically. Use stop() for an error with no result or use warning()/message() with an NA result if the inputs are not appropriate.

Fredy Yoseph Marianto · Accepted Answer · 2020-07-29 20:26:28Z

2

If you ask the built-in function in R, maybe you can find it on package pracma. Inside of that package, there is a function called Mode.

answered Jul 29, 2020 at 20:26

Fredy Yoseph Marianto

312 bronze badges

Comments

Gaspar · Accepted Answer · 2024-04-11 11:21:51Z

2

One quick way would be to use DescTools::Mode.

answered Apr 11, 2024 at 11:21

Gaspar

1554 silver badges11 bronze badges

Comments

Naimish Agarwal · Accepted Answer · 2015-12-16 02:45:39Z

1

Another possible solution:

Mode <- function(x) {
    if (is.numeric(x)) {
        x_table <- table(x)
        return(as.numeric(names(x_table)[which.max(x_table)]))
    }
}

Usage:

set.seed(100)
v <- sample(x = 1:100, size = 1000000, replace = TRUE)
system.time(Mode(v))

Output:

   user  system elapsed 
   0.32    0.00    0.31

answered Dec 16, 2015 at 2:45

Naimish Agarwal

5165 silver badges14 bronze badges

Comments

GKi · Accepted Answer · 2019-03-27 08:31:39Z

I case your observations are classes from Real numbers and you expect that the mode to be 2.5 when your observations are 2, 2, 3, and 3 then you could estimate the mode with mode = l1 + i * (f1-f0) / (2f1 - f0 - f2) where l1..lower limit of most frequent class, f1..frequency of most frequent class, f0..frequency of classes before most frequent class, f2..frequency of classes after most frequent class and i..Class interval as given e.g. in 1, 2, 3:

#Small Example
x <- c(2,2,3,3) #Observations
i <- 1          #Class interval

z <- hist(x, breaks = seq(min(x)-1.5*i, max(x)+1.5*i, i), plot=F) #Calculate frequency of classes
mf <- which.max(z$counts)   #index of most frequent class
zc <- z$counts
z$breaks[mf] + i * (zc[mf] - zc[mf-1]) / (2*zc[mf] - zc[mf-1] - zc[mf+1])  #gives you the mode of 2.5


#Larger Example
set.seed(0)
i <- 5          #Class interval
x <- round(rnorm(100,mean=100,sd=10)/i)*i #Observations

z <- hist(x, breaks = seq(min(x)-1.5*i, max(x)+1.5*i, i), plot=F)
mf <- which.max(z$counts)
zc <- z$counts
z$breaks[mf] + i * (zc[mf] - zc[mf-1]) / (2*zc[mf] - zc[mf-1] - zc[mf+1])  #gives you the mode of 99.5

In case you want the most frequent level and you have more than one most frequent level you can get all of them e.g. with:

x <- c(2,2,3,5,5)
names(which(max(table(x))==table(x)))
#"2" "5"

hrbrmstr · Accepted Answer · 2014-10-12 12:33:46Z

0

Could try the following function:

transform numeric values into factor
use summary() to gain the frequency table
return mode the index whose frequency is the largest
transform factor back to numeric even there are more than 1 mode, this function works well!

mode <- function(x){
  y <- as.factor(x)
  freq <- summary(y)
  mode <- names(freq)[freq[names(freq)] == max(freq)]
  as.numeric(mode)
}

edited Oct 12, 2014 at 12:33

hrbrmstr

79.1k11 gold badges146 silver badges209 bronze badges

answered Apr 5, 2014 at 7:36

Wei

3754 silver badges3 bronze badges

Comments

David Arenburg · Accepted Answer · 2017-02-21 17:41:41Z

0

Calculating Mode is mostly in case of factor variable then we can use

labels(table(HouseVotes84$V1)[as.numeric(labels(max(table(HouseVotes84$V1))))])

HouseVotes84 is dataset available in 'mlbench' package.

it will give max label value. it is easier to use by inbuilt functions itself without writing function.

edited Feb 21, 2017 at 17:41

David Arenburg

92.4k18 gold badges145 silver badges202 bronze badges

answered Sep 21, 2016 at 19:15

Ashutosh Agrahari

313 bronze badges

Collectives™ on Stack Overflow

How to find the statistical mode?

38 Answers 38

10 Comments

4 Comments

5 Comments

1 Comment

5 Comments

1 Comment

5 Comments

2 Comments

2 Comments

8 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

EDIT

1 Comment

Comments

Comments

3 Comments

1 Comment

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

38 Answers 38

10 Comments

4 Comments

5 Comments

1 Comment

5 Comments

1 Comment

5 Comments

2 Comments

2 Comments

8 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

EDIT

1 Comment

Comments

Comments

3 Comments

1 Comment

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related