Similar aggregate function to unique/distinct in R Postgres Backend

Question

How does it work to aggregate a variable in a postgres db backend table to its unique value?

For example, I have the following table:

library(tidyverse)
library(dbplyr)

dbplyr::memdb_frame(a=c(2,2,2), b=c(2,3,4)) %>%
summarise(aggregatedSum = sum(b), 
          aggregatedUnique = unique(a))

But neither unique() nor distinct() are doing the job. Any ideas how to achieve my desired outcome like so when I collect() the table before summarise:

dbplyr::memdb_frame(a=c(2,2,2), b=c(2,3,4)) %>%
collect() %>%
summarise(aggregatedSum = sum(b), 
          aggregatedUnique = unique(a))

# A tibble: 1 x 2
  aggregatedSum aggregatedUnique
          <dbl>            <dbl>
1             9                2

Does column a in your database always only contain one distinct value? Or do you have several distinct values in column a and you want a row for each one in your output? — henryn
– henryn, Commented Oct 14, 2021 at 14:55

henryn · Accepted Answer · 2021-10-14 15:16:14Z

1

You can just add a group_by to your dplyr pipe:

> dbplyr::memdb_frame(a=c(2,2,2), b=c(2,3,4)) %>%
+   group_by(a) %>% 
+   summarise(aggregatedSum = sum(b)) %>% 
+   rename(aggregatedUnique = a) %>% 
+   select(aggregatedSum, aggregatedUnique)

# Source:   lazy query [?? x 2]
# Database: sqlite 3.34.1 [:memory:]
  aggregatedSum aggregatedUnique
          <dbl>            <dbl>
1             9                2

If there are multiple distinct values in column a, this will return a row per value (with the sum of the b values that occur alongside them).

answered Oct 14, 2021 at 15:16

henryn

1,2367 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

werN Over a year ago

group_by() would be a possibility, but I always try to avoid it because it is more computationally intensive. But unfortunately it seems there are no unique/distinct alike functions. I'll will then just use another aggregate function like max()/min()/mean() for such cases, since every value in a is equal anyway.

r2evans · Accepted Answer · 2021-10-14 15:22:50Z

1

I might be misinterpreting, but this seems like a grouping operation, where you might want the sum of b for each unique value of a. If so, then group_by(a):

dbplyr::memdb_frame(a=c(2,2,2), b=c(2,3,4)) %>%
  group_by(a) %>%
  summarise(aggregatedSum = sum(b))
# Source:   lazy query [?? x 2]
# # Database: sqlite 3.33.0 [:memory:]
#       a aggregatedSum
#   <dbl>         <dbl>
# 1     2             9

This is related to How to combine SELECT DISTINCT and SUM() in that I believe SQL does not let you sum(.) and distinct(.) in the same query. The unchanged query looks like:

dbplyr::memdb_frame(a=c(2,2,2), b=c(2,3,4)) %>%
  summarise(aggregatedSum = sum(b), 
          aggregatedUnique = distinct(a)) %>%
  show_query()
# <SQL>
# SELECT SUM(`b`) AS `aggregatedSum`, distinct(`a`) AS `aggregatedUnique`
# FROM `dbplyr_014`

whereas the updated query is

dbplyr::memdb_frame(a=c(2,2,2), b=c(2,3,4)) %>%
  group_by(a) %>%
  summarise(aggregatedSum = sum(b)) %>%
  show_query()
# <SQL>
# SELECT `a`, SUM(`b`) AS `aggregatedSum`
# FROM `dbplyr_016`
# GROUP BY `a`

which is aligned with the linked question/answer.

edited Oct 14, 2021 at 15:22

answered Oct 14, 2021 at 15:17

r2evans

167k8 gold badges92 silver badges176 bronze badges

2 Comments

werN Over a year ago

group_by() would be a possibility, but I always try to avoid it because it is more computationally intensive. But unfortunately it seems there are no unique/distinct alike functions. I'll will then just use another aggregate function like max()/min()/mean() for such cases, since every value in a is equal anyway.

r2evans Over a year ago

The data-viewpoint here is confusing to me. I suspect that it is overly-simplified for the sake of a question on SO, where the variability of the larger real data is unknown to us. Good luck.

Collectives™ on Stack Overflow

Similar aggregate function to unique/distinct in R Postgres Backend

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related