0

So I have an array to be rendered and displayed in some charts, but say my dataset is going to be far too large, how can I take an array that is say 20,000 items in length and like either drop every other item until the array is 1,000 items or interpolate the array until it's that size?

Example, say I have the following array (of hashes):

[ 
  {"timestamp"=>2011-09-05 14:30:00 UTC, "count"=>4488.0},
  {"timestamp"=>2011-09-05 14:45:00 UTC, "count"=>4622.0},
  {"timestamp"=>2011-09-05 15:00:00 UTC, "count"=>4655.0},
  {"timestamp"=>2011-09-05 15:15:00 UTC, "count"=>4533.0},
  {"timestamp"=>2011-09-05 15:30:00 UTC, "count"=>4439.0},
  {"timestamp"=>2011-09-05 15:45:00 UTC, "count"=>4468.0},
  {"timestamp"=>2011-09-05 16:00:00 UTC, "count"=>4419.0},
  {"timestamp"=>2011-09-05 16:15:00 UTC, "count"=>4430.0},
  {"timestamp"=>2011-09-05 16:30:00 UTC, "count"=>4429.0},
  {"timestamp"=>2011-09-05 16:45:00 UTC, "count"=>4502.0},
  {"timestamp"=>2011-09-05 17:00:00 UTC, "count"=>4497.0},
  {"timestamp"=>2011-09-05 17:15:00 UTC, "count"=>4468.0},
  {"timestamp"=>2011-09-05 17:30:00 UTC, "count"=>4510.0},
  {"timestamp"=>2011-09-05 17:45:00 UTC, "count"=>4547.0},
  {"timestamp"=>2011-09-05 18:00:00 UTC, "count"=>4471.0},
  {"timestamp"=>2011-09-05 18:15:00 UTC, "count"=>4501.0},
  {"timestamp"=>2011-09-05 18:30:00 UTC, "count"=>4451.0},
  {"timestamp"=>2011-09-05 18:45:00 UTC, "count"=>4453.0},
  {"timestamp"=>2011-09-05 19:00:00 UTC, "count"=>4593.0},
  {"timestamp"=>2011-09-05 19:15:00 UTC, "count"=>4540.0},
  {"timestamp"=>2011-09-05 19:30:00 UTC, "count"=>4516.0},
  {"timestamp"=>2011-09-05 19:45:00 UTC, "count"=>4494.0}
]

And I want an array of the intermediary values, either just dropped out of the array or somehow interpolated, like such:

[ 
  {"timestamp"=>2011-09-05 14:45:00 UTC, "count"=>4622.0},
  {"timestamp"=>2011-09-05 15:00:00 UTC, "count"=>4655.0},
  {"timestamp"=>2011-09-05 15:30:00 UTC, "count"=>4439.0},
  {"timestamp"=>2011-09-05 16:00:00 UTC, "count"=>4419.0},
  {"timestamp"=>2011-09-05 16:30:00 UTC, "count"=>4429.0},
  {"timestamp"=>2011-09-05 17:00:00 UTC, "count"=>4497.0},
  {"timestamp"=>2011-09-05 17:30:00 UTC, "count"=>4510.0},
  {"timestamp"=>2011-09-05 18:00:00 UTC, "count"=>4471.0},
  {"timestamp"=>2011-09-05 18:30:00 UTC, "count"=>4451.0},
  {"timestamp"=>2011-09-05 19:00:00 UTC, "count"=>4593.0},
  {"timestamp"=>2011-09-05 19:15:00 UTC, "count"=>4540.0},
  {"timestamp"=>2011-09-05 19:45:00 UTC, "count"=>4494.0}
]

Any thoughts or help on this would be greatly appreciated, I may just be missing the point here as well.

4
  • Off topic, but for every "timestamp" and for every "count" a new string will be created. A symbol ( :timestamp ) is a lot better, or create a Struct. Commented Sep 9, 2011 at 22:50
  • What exactly do you need to do? Grab a random sample of 1000 entries in "timestamp" order? Commented Sep 9, 2011 at 23:19
  • I do agree with you however the array is the results of MongoDB's map reducer :( I'm not sure how to make it a pure ruby object, any ideas? Commented Sep 12, 2011 at 19:06
  • Mu, what I want to do is take a very large dataset and make it smaller so the browser can render it easily, and since its a time range it needs to be linearly condensed... I'm dumb, I can't figure out how to do this properly :( Commented Sep 12, 2011 at 19:09

3 Answers 3

2
require 'pp'

# Interval in seconds (30 min)
INTERVAL = 1800

# generate the data
start = Time.mktime(2001, 9, 5, 14, 30)

data = Array.new
1000.times do |i|
  data << {:timestamp => start + i*INTERVAL, :count => rand(4000)}
end

# Plain data
pp data

puts # blank

# Simply gets de data from the sample number 300 to 400
pp data[300..400]

puts # blank

# For example, data from from the second hour, for 3 hours long
pp data[2*60*60/INTERVAL..(2+3)*60*60/INTERVAL]

puts # blank

# Make it smaller (50%)
# We need data.size * 0.5 elements
# Calculate the step we need to iterate to get
# 50% elements. In this case skipping one between two
step = (data.size/(data.size * 0.5)).to_i

# We use Range#step to get the array of indexes, and then
# transform it using Enumerable#collect to get the array
# of Hashes. and filter nils
#
# Probably there is a simpler way to do this. Too late to think
pp (0..data.size).step(step.to_i).collect {|index| data[index]}.reject{|x| x.nil?}

Also you may want to look a Enumerable#each_slice(n)

(1..10).each_slice(3) {|a| p a}
    # outputs below
    [1, 2, 3]
    [4, 5, 6]
    [7, 8, 9]
    [10]

You can reduce the set by making slices of n elements, and then creating a new element from each slice. The element in the middle, an average, etc.

data.each_slice(3).collect { |slice| make_one_out_of_a_slice(slice) }
Sign up to request clarification or add additional context in comments.

Comments

2

Use Array#sample:

a = [ 1, 2, 3, 4, 5, 6 ]
smaller = a.sample(3)
# [4, 2, 1]

In your case you'd do something like this:

a = [
    # 10 000 little hashes
]
smaller = a.sample(1000)

and then send smaller off to be displayed.

And if you want them in order you could just sort them again:

smaller.sort! { |a,b| a['timestamp'] <=> b['timestamp'] }

3 Comments

I don't want a random sampling though, as they are dates and the data needs to remain linear and in the same order. Any ideas?
Hmm, I still end out with an array that is out of order.
@Joseph: You could sort them to put them back in order.
0

To condense your array, you have to define some rule on which criteria you want to drop out the samples. To make it easier to understand, I use s simple integer as the timestamp instead. If you want to use it with your data, you have to modify the reject method a little bit.

 samples = 100.times.map do |i|
   {"timestamp" => i, "count" => rand(100)}
 end

 i = samples.size
 samples.reject! do |item| item["timestamp"]%2 == 0 end

The item["timestamp"]%2 == 0 is the rule on which the sample gets droped of the sample set. You can define some time ranges or something else on it for your data.

 $> samples.size # => 50

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.