Ruby array interpolation?

Question

So I have an array to be rendered and displayed in some charts, but say my dataset is going to be far too large, how can I take an array that is say 20,000 items in length and like either drop every other item until the array is 1,000 items or interpolate the array until it's that size?

Example, say I have the following array (of hashes):

[ 
  {"timestamp"=>2011-09-05 14:30:00 UTC, "count"=>4488.0},
  {"timestamp"=>2011-09-05 14:45:00 UTC, "count"=>4622.0},
  {"timestamp"=>2011-09-05 15:00:00 UTC, "count"=>4655.0},
  {"timestamp"=>2011-09-05 15:15:00 UTC, "count"=>4533.0},
  {"timestamp"=>2011-09-05 15:30:00 UTC, "count"=>4439.0},
  {"timestamp"=>2011-09-05 15:45:00 UTC, "count"=>4468.0},
  {"timestamp"=>2011-09-05 16:00:00 UTC, "count"=>4419.0},
  {"timestamp"=>2011-09-05 16:15:00 UTC, "count"=>4430.0},
  {"timestamp"=>2011-09-05 16:30:00 UTC, "count"=>4429.0},
  {"timestamp"=>2011-09-05 16:45:00 UTC, "count"=>4502.0},
  {"timestamp"=>2011-09-05 17:00:00 UTC, "count"=>4497.0},
  {"timestamp"=>2011-09-05 17:15:00 UTC, "count"=>4468.0},
  {"timestamp"=>2011-09-05 17:30:00 UTC, "count"=>4510.0},
  {"timestamp"=>2011-09-05 17:45:00 UTC, "count"=>4547.0},
  {"timestamp"=>2011-09-05 18:00:00 UTC, "count"=>4471.0},
  {"timestamp"=>2011-09-05 18:15:00 UTC, "count"=>4501.0},
  {"timestamp"=>2011-09-05 18:30:00 UTC, "count"=>4451.0},
  {"timestamp"=>2011-09-05 18:45:00 UTC, "count"=>4453.0},
  {"timestamp"=>2011-09-05 19:00:00 UTC, "count"=>4593.0},
  {"timestamp"=>2011-09-05 19:15:00 UTC, "count"=>4540.0},
  {"timestamp"=>2011-09-05 19:30:00 UTC, "count"=>4516.0},
  {"timestamp"=>2011-09-05 19:45:00 UTC, "count"=>4494.0}
]

And I want an array of the intermediary values, either just dropped out of the array or somehow interpolated, like such:

[ 
  {"timestamp"=>2011-09-05 14:45:00 UTC, "count"=>4622.0},
  {"timestamp"=>2011-09-05 15:00:00 UTC, "count"=>4655.0},
  {"timestamp"=>2011-09-05 15:30:00 UTC, "count"=>4439.0},
  {"timestamp"=>2011-09-05 16:00:00 UTC, "count"=>4419.0},
  {"timestamp"=>2011-09-05 16:30:00 UTC, "count"=>4429.0},
  {"timestamp"=>2011-09-05 17:00:00 UTC, "count"=>4497.0},
  {"timestamp"=>2011-09-05 17:30:00 UTC, "count"=>4510.0},
  {"timestamp"=>2011-09-05 18:00:00 UTC, "count"=>4471.0},
  {"timestamp"=>2011-09-05 18:30:00 UTC, "count"=>4451.0},
  {"timestamp"=>2011-09-05 19:00:00 UTC, "count"=>4593.0},
  {"timestamp"=>2011-09-05 19:15:00 UTC, "count"=>4540.0},
  {"timestamp"=>2011-09-05 19:45:00 UTC, "count"=>4494.0}
]

Any thoughts or help on this would be greatly appreciated, I may just be missing the point here as well.

Off topic, but for every "timestamp" and for every "count" a new string will be created. A symbol ( :timestamp ) is a lot better, or create a Struct. — steenslag
– steenslag, Commented Sep 9, 2011 at 22:50
What exactly do you need to do? Grab a random sample of 1000 entries in "timestamp" order? — mu is too short
– mu is too short, Commented Sep 9, 2011 at 23:19
I do agree with you however the array is the results of MongoDB's map reducer :( I'm not sure how to make it a pure ruby object, any ideas? — JP Silvashy
– JP Silvashy, Commented Sep 12, 2011 at 19:06
Mu, what I want to do is take a very large dataset and make it smaller so the browser can render it easily, and since its a time range it needs to be linearly condensed... I'm dumb, I can't figure out how to do this properly :( — JP Silvashy
– JP Silvashy, Commented Sep 12, 2011 at 19:09

duncan · Accepted Answer · 2011-09-09 23:41:30Z

require 'pp'

# Interval in seconds (30 min)
INTERVAL = 1800

# generate the data
start = Time.mktime(2001, 9, 5, 14, 30)

data = Array.new
1000.times do |i|
  data << {:timestamp => start + i*INTERVAL, :count => rand(4000)}
end

# Plain data
pp data

puts # blank

# Simply gets de data from the sample number 300 to 400
pp data[300..400]

puts # blank

# For example, data from from the second hour, for 3 hours long
pp data[2*60*60/INTERVAL..(2+3)*60*60/INTERVAL]

puts # blank

# Make it smaller (50%)
# We need data.size * 0.5 elements
# Calculate the step we need to iterate to get
# 50% elements. In this case skipping one between two
step = (data.size/(data.size * 0.5)).to_i

# We use Range#step to get the array of indexes, and then
# transform it using Enumerable#collect to get the array
# of Hashes. and filter nils
#
# Probably there is a simpler way to do this. Too late to think
pp (0..data.size).step(step.to_i).collect {|index| data[index]}.reject{|x| x.nil?}

Also you may want to look a Enumerable#each_slice(n)

(1..10).each_slice(3) {|a| p a}
    # outputs below
    [1, 2, 3]
    [4, 5, 6]
    [7, 8, 9]
    [10]

You can reduce the set by making slices of n elements, and then creating a new element from each slice. The element in the middle, an average, etc.

data.each_slice(3).collect { |slice| make_one_out_of_a_slice(slice) }

mu is too short · Accepted Answer · 2011-09-09 23:07:48Z

2

Use Array#sample:

a = [ 1, 2, 3, 4, 5, 6 ]
smaller = a.sample(3)
# [4, 2, 1]

In your case you'd do something like this:

a = [
    # 10 000 little hashes
]
smaller = a.sample(1000)

and then send smaller off to be displayed.

And if you want them in order you could just sort them again:

smaller.sort! { |a,b| a['timestamp'] <=> b['timestamp'] }

edited Sep 9, 2011 at 23:07

answered Sep 9, 2011 at 22:36

mu is too short

436k71 gold badges863 silver badges822 bronze badges

3 Comments

JP Silvashy Over a year ago

I don't want a random sampling though, as they are dates and the data needs to remain linear and in the same order. Any ideas?

JP Silvashy Over a year ago

Hmm, I still end out with an array that is out of order.

mu is too short Over a year ago

@Joseph: You could sort them to put them back in order.

f00860 · Accepted Answer · 2011-09-09 22:50:08Z

0

To condense your array, you have to define some rule on which criteria you want to drop out the samples. To make it easier to understand, I use s simple integer as the timestamp instead. If you want to use it with your data, you have to modify the reject method a little bit.

 samples = 100.times.map do |i|
   {"timestamp" => i, "count" => rand(100)}
 end

 i = samples.size
 samples.reject! do |item| item["timestamp"]%2 == 0 end

The item["timestamp"]%2 == 0 is the rule on which the sample gets droped of the sample set. You can define some time ranges or something else on it for your data.

 $> samples.size # => 50

answered Sep 9, 2011 at 22:50

f00860

3,6168 gold badges45 silver badges65 bronze badges

Collectives™ on Stack Overflow

Ruby array interpolation?

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related