0

I have a list (IPCs) containing multiple data frames.

here is a sample from my list:

  $ http://www.sumobrain.com/patents/us/Measured-object-support-mechanism-for-unbalance-measuring-apparatus/4981043.html           
:List of 1
..$ :'data.frame':  3 obs. of  5 variables:
.. ..$ X1: chr [1:3] "2001826A" "2857764A" "3452604A"
.. ..$ X2: chr [1:3] "1935-05-21" "1958-10-28" "1969-07-01"
.. ..$ X3: chr [1:3] "Russell et al." "Frank" "Schaub"
.. ..$ X4: chr [1:3] "73/478" "73/477" "73/475"
.. ..$ X5: chr [1:3] "Machine for balancing heavy bodies" "Rotor balance testing machine" "BALANCE TESTING APPARATUS HEAD"
$ http://www.sumobrain.com/patents/us/Encoder-with-wide-index/4982189.html   
 :List of 1
..$ :'data.frame':  8 obs. of  5 variables:
.. ..$ X1: chr [1:8] "3500449A" "4212000A" "4233592A" "4524347A" ...
.. ..$ X2: chr [1:8] "1970-03-10" "1980-07-08" "1980-11-11" "1985-06-18" ...
.. ..$ X3: chr [1:8] "Lenz" "Yamada" "Leichle" "Rogers" ...
.. ..$ X4: chr [1:8] "341/6" "341/16" "341/6" "341/3" ...
.. ..$ X5: chr [1:8] "ELECTRONIC ENCODER INDEX" "Position-to-digital encoder" "Method for detection of the angular position of a part driven in rotation and instrumentation using it" "Position encoder" ...
$ http://www.sumobrain.com/patents/us/Device-for-detecting-at-least-one-variable-relating-to-the-movement-of-a-movable-body/4982106.html   
:List of 1
..$ :'data.frame':  2 obs. of  5 variables:
.. ..$ X1: chr [1:2] "3956973A" "4797564A"
.. ..$ X2: chr [1:2] "1976-05-18" "1989-01-10"
.. ..$ X3: chr [1:2] "Pomplas" "Ramunas"
.. ..$ X4: chr [1:2] "92/5R" "307/119"
.. ..$ X5: chr [1:2] "Die casting machine with piston positioning control" "Robot overload detection mechanism"

I would like to select only the first and fifth elements (X1 and X5) from all data frames, to later construct a further dataset with only these two elements.

I have tried to grab X1 with this:

citations_IPC <- sapply(IPCs, function(x){
y<-x[,1]
return(y)
})

and X5 with:

citations_titles <- sapply(IPCs[[1]], function(z){
e<-z[,5]
return(e)
})

Then I convert citations_IPCs and citations_titles into a single data frame with:

citation_list <-  data.frame(IPC = unlist(lapply(citations_IPC, paste)), title = unlist(lapply(citations_titles, paste)) ) 

1#problem

If I write the sapply function on an individual list (e.g. IPCs[[1]]) I get the result I want:

citations_IPC <- sapply(IPCs[[1]], function(x){
y<-x[,1]
return(y)
})

result:

> citations_IPC
      [,1]      
 [1,] "3415985A"
 [2,] "3916190A"
 [3,] "4088895A"
 [4,] "4633084A"
 [5,] "4670651A"
 [6,] "4860224A"

However, this function doesn't work for the whole lists (IPCs). The error I get is: "Error in x[, 1] : incorrect number of dimensions"

I am guessing the problem might be due to a few lists within my dataset with no data frame, no observations and no variables. In that case I would need a function which allows me to use the sapply() on the dataset despite the lines without data frame.

Please any suggestions would be really appreciated.

Many thanks

str(IPCs)

> str(IPCs)
 List of 19
 $ http://www.sumobrain.com/patents/us/Method-and-apparatus-for-the-quantitative,-depth-differential-analysis-of-solid-samples-with-the-use-of-two-ion-beams/4982090.html       :List of 1
  ..$ :'data.frame':    6 obs. of  5 variables:
  .. ..$ X1: chr [1:6] "3415985A" "3916190A" "4088895A" "4633084A" ...
  .. ..$ X2: chr [1:6] "1968-12-10" "1975-10-28" "1978-05-09" "1986-12-30" ...
  .. ..$ X3: chr [1:6] "Castaing et al." "Valentine et al." "Martin" "Gruen et al." ...
  .. ..$ X4: chr [1:6] "250/309" "250/309" "250/309" "250/309" ...
  .. ..$ X5: chr [1:6] "Ionic microanalyzer wherein secondary ions are emitted from a sample surface upon bombardment by neutral atoms" "Depth profile analysis apparatus" "Memory device utilizing ion beam readout" "High efficiency direct detection of ions from resonance ionization of sputtered atoms" ...
 $ http://www.sumobrain.com/patents/us/Set-on-oscillator/4982165.html    
 :List of 1
  ..$ :'data.frame':    2 obs. of  5 variables:
  .. ..$ X1: chr [1:2] "4437066A" "4558282A"
  .. ..$ X2: chr [1:2] "1984-03-13" "1985-12-10"
  .. ..$ X3: chr [1:2] "Gordon" "Lowenschuss"
  .. ..$ X4: chr [1:2] "328/14" "307/523"
  .. ..$ X5: chr [1:2] "Apparatus for synthesizing a signal by producing samples of such signal at a rate less than the Nyquist sampling rate" "Digital frequency synthesizer"
 $ http://www.sumobrain.com/patents/us/Voltage-measuring-apparatus/4982151.html 
 :List of 1
  ..$ :'data.frame':    7 obs. of  5 variables:
  .. ..$ X1: chr [1:7] "3419802A" "3419803A" "4446425A" "4603293A" ...
  .. ..$ X2: chr [1:7] "1968-12-31" "1968-12-31" "1984-05-01" "1986-07-29" ...
  .. ..$ X3: chr [1:7] "Pelenc et al." "Pelenc et al." "Valdmanis et al." "Mourou et al." ...
  .. ..$ X4: chr [1:7] "324/96" "324/96" "" "" ...
  .. ..$ X5: chr [1:7] "Apparatus for current measurement by means of the faraday effect" "Apparatus for current measurement by means of the faraday effect" "Measurement of electrical signals with picosecond resolution" "Measurement of electrical signals with subpicosecond resolution" ...

2 Answers 2

3

Here is an example:

First lets make a list with some random iris columns:

data(iris)
lis = list(iris[1:3], iris[2:4])

using lapply with a custom function to extract columns 1 and 2 from each data frame. If they are not named the same force a rename of the columns for the next step:

b = lapply(lis, function(x){
  z = x[,c(1,2)]
  colnames(z) = c("z1", "z2")
  return(z)
}
)

Now b is a list of only the columns you wish.

rbind the data frames in b:

do.call(rbind, b)

done

Sign up to request clarification or add additional context in comments.

Comments

1

Here is a way to do what I understand of your question.
First some fake data.

op <- options(stringsAsFactors = FALSE)  # to make sure we have characters not factors
set.seed(9506)

nr <- c(6, 2, 7)
IPCs <- lapply(1:3, function(n){
        res <- as.data.frame(replicate(5, sample(LETTERS, nr[n], TRUE)))
        names(res) <- paste0("X", 1:5)
        res
})
names(IPCs) <- paste0("df", seq_along(dat))
str(IPCs)
options(op)   # put it back as it was

Now the code to extract the 1st and 5th columns of each data.frame and paste them together in order to form a df.

result <- list(
    sapply(IPCs, `[[`, 1),
    sapply(IPCs, function(x) x[[ncol(x)]])
)
result <- as.data.frame(lapply(result, function(x) sapply(x, paste, collapse = "")))
names(result) <- c("citations_IPC", "citations_titles")
result

11 Comments

This looks like a really good solution, but unfortunately it does not work for my dataset. I get the error: "Error in FUN(X[[i]], ...) : subscript out of bounds". Might it be that within my dataset there are lists with no variables, and therefore I get this error?
@Amleto So in your dataset there are lists with no variables? Can you update the question with the output of str(IPCs)? The dataset I've made up has the same structure as your post.
I think the problem with my dataset is that I have a number of lists inside a list. I add the str(IPCs) above.
@Amleto No, that was not the problem. The problem was that in your first post of str your df's all had the same number of rows and now they don't. Give me just a few minutes to think about this
The df's variables are always 5 while the observations change, but that is not the problem. Sorry
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.