3

I need to combine interconnected list elements to form distinct elements in base R with no additional packages required (while removing NA and zero-length elements).

Edit: I look for flexibility of data types (character, numeric etc), lists below augmented.

The lists:

list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))
list(c("a"),c("a","b"),c("a","b"),c("b","c","d"),c("h","i","j"),c("j","k"),c(NA))

...should become:

list(c(1,2,3,4),c(8,9,10,11))
list(c("a","b","c","d"),c("h","i","j","k"))

This is my solution so far which seem to work well:

ml <- list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))
# Remove duplicates
ml <- ml[!duplicated(ml)]
# Combine connected list elements
for (l1 in seq(1,length(ml))){
  for (l2 in seq(1,length(ml))){
    if (l1 != l2){
      if(any(ml[[l2]] %in% ml[[l1]])){
        ml[[l1]] <- sort(unique(c(ml[[l1]],ml[[l2]])))
        ml[[l2]] <- NA
      }
    }
  }
}
# Clean up list (remove NA, and zero-length elements)
ml <- Filter(Negate(anyNA), ml)
ml <- ml[lapply(ml, length)>0]

Is there any way to achieve the required result with fewer lines of code?

1
  • 2
    "I look for flexibility of data types (character, numeric etc)"--please provide those. Commented Feb 27 at 11:07

5 Answers 5

5

It might be easier to work with data.frames on that problem. Here's a solution using igraph and working on all data types:

library(igraph)

dat <- 
  setNames(l, paste0('node', 1:length(l))) |> stack() |> na.omit()

dat |>
  graph_from_data_frame(dir = FALSE) |>
  components() |>
  getElement('membership') |>
  stack() |>
  subset(!grepl("node", ind)) |>
  aggregate(ind ~ values, FUN = toString)

#  values          ind
#1      1   1, 2, 3, 4
#2      2 8, 9, 10, 11
Sign up to request clarification or add additional context in comments.

Comments

5

There are 6 basic vector data types in R: logical, numeric, integer, complex, character and raw (see typeof) and this should work with all except possibly raw. It creates a 2 column data frame edges each of whose rows represents an edge and from that an igraph object g. At the end we show the graph.

library(igraph)

L <- list(c(1),c(1,2),c(1,2),c(2,3,4),c(8,9,10),c(10,11),c(NA))

L2 <- Map(\(x) unique(na.omit(x)), L)
edges <- L2 |>
  Filter(f = \(x) length(x) > 1) |>
  Map(f = \(x) data.frame(from = head(x, -1), to = tail(x, -1))) |>
  do.call(what = "rbind") |>
  unique()

g <- graph_from_data_frame(edges, vertices = unique(unlist(L2)), directed = FALSE)

cl <- components(g)$membership
s <- split( as(names(cl), typeof(unlist(L))), cl)

str(s)
## List of 2
##  $ 1: num [1:4] 1 2 3 4
##  $ 2: num [1:4] 8 9 10 11

Image of graph follows. Note that it has an edge for each element to the next within each component of L with duplicates removed.

set.seed(123)
plot(g)

screenshot

Comments

4

You could try to unlist and sort (which removes NA), then split along cumulated differences.

> u <- sort(unique(unlist(lst)))
> split(u, cumsum(c(1, diff(u)) > 1)) |> unname()
[[1]]
[1] 1 2 3 4

[[2]]
[1]  8  9 10 11

[[3]]
[1] 20 21 22 23

Data:

Slightly expanded.

> dput(lst)
list(1, c(1, 2), c(1, 2), c(2, 3, 4), c(8, 9, 10), c(10, 11), 
    20:23, NA)

3 Comments

Thank you. However, this will only work with numerical values, I suppose. In the case of character values in the list the second line will not work. I should have mentioned in the original question, that I look for flexibility with regard to data type.
This is interesting and efficient solution, but won't work as expected if the interconnected groups are separated by other groups in the list. You can try a list like set.seed(0); lst <- sample(c(rep(NA, 3), replicate(7, list(sample(20, sample(5, 1))))))
@ThomasIsCoding It still separates into consecutive snippets, but probably I misunderstood "interconnected".
2

There are already a bunch of excellent solutions. Below is just another option, using graph_from_biadjacency_matrix and bipartite_projection

library(igraph)

lst %>%
    Filter(f = Negate(anyNA)) %>%
    setNames(paste0("x", seq_along(.))) %>%
    stack() %>%
    table() %>%
    graph_from_biadjacency_matrix() %>%
    bipartite_projection(which = "false") %>%
    decompose() %>%
    map(~ as.integer(names(V(.x))))

and you will obtain

[[1]]
[1] 1 2 3 4

[[2]]
[1]  8  9 10 11

data

lst <- list(c(1), c(1, 2), c(1, 2), c(2, 3, 4), c(8, 9, 10), c(10, 11), NA)

Let's take a look how bipartite graph makes things work. Given a graph g obtained from below

g <- lst %>%
    Filter(f = Negate(anyNA)) %>%
    setNames(paste0("x", seq_along(.))) %>%
    stack() %>%
    table() %>%
    graph_from_biadjacency_matrix()


g %>%
    plot(layout = layout_as_bipartite, vertex.color = factor(V(.)$type))

we see

enter image description here

Then, using bipartite_projection

g %>%
    bipartite_projection(which = "false") %>%
    plot()

we see enter image description here

where which = "false" indicates the vertices as shown in lst, otherwise which = "true" denotes the vertices of names starting with x.

Comments

2

If you are interested in working with base R only but with shorter code, you can try

  • Option 1: Use tcrossprod + %*%
l <- Filter(Negate(anyNA), lst)
M <- tcrossprod(table(stack(setNames(l, paste0("x", seq_along(l))))))
p <- M > 0
repeat {
    q <- (p %*% M) > 0
    if (identical(p, q)) break
    p <- q
}
k <- order(p[, 1])
out <- apply(unique(p[k, k]), 1, \(x) as.integer(names(which(x))))

and you will see

> out
$`2`
[1] 2

$`4`
[1] 4

$`1`
 [1]  1  5  7  9 10 11 12 14 17 18 20
  • Option 2: Use repeat + Reduce
out <- list()
l <- Filter(negate(anyNA), lst)
repeat {
    if (!length(l)) break
    v <- unique(unlist(l))
    sz <- 0
    repeat{
        p <- Reduce(\(x, y) c(x, if (any(y %in% x)) setdiff(y, x)), l)
        l <- lapply(l, \(x) if (all(x %in% p)) p else x)
        if (length(p) == sz) break
        sz <- length(p)
    }
    out <- c(out, list(p))
    l <- Filter(\(x) !any(x %in% p), l)
}

gives

> out
[[1]]
 [1] 11 14 18  1  5  9 20 17 10  7 12

[[2]]
[1] 4

[[3]]
[1] 2

data


Given a "tough" list for example

set.seed(0)
lst <- sample(c(rep(NA, 3), replicate(7, list(sample(20, sample(5, 1))))))

> lst
[[1]]
[1] NA

[[2]]
[1] 11 14 18  1  5

[[3]]
[1] NA

[[4]]
[1]  5  9 14 20 17

[[5]]
[1] 10 12

[[6]]
[1] 4

[[7]]
[1] 2

[[8]]
[1] 10 14 20  7  9

[[9]]
[1] 20

[[10]]
[1] NA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.