How to generate random data off of existing sample data?

Question

I have a set of existing data, lets say:

sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]

off of this sample data, i would like to generate a random set of data of a certain length. This should not be off of the sample data, but off of a distribution that was generated off of the sample data.

expected output if i wanted 5 random points:

output_data = [3.4,2.3,1.5,5.2,1.3]

possible duplicate of: stackoverflow.com/questions/22741319/… — Dani Mesejo
– Dani Mesejo, Commented Feb 1, 2019 at 17:38

Sociopath · Accepted Answer · 2019-02-01 17:30:00Z

2

Use random.sample :

import random

sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]
# if you want to select 5 samples from above data
print(random.sample(sample_data, 5))

Output:

[3, 2, 2, 4, 2]

answered Feb 1, 2019 at 17:30

Sociopath

13.4k22 gold badges53 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Brian Chen Over a year ago

hey - I dont want to select x amount of samples from the data, but rather generate data based on the existing data.

Sociopath Over a year ago

what's the difference between you prior and later sentence? Maybe you need to edit the question and elaborate further.

Brian Chen Over a year ago

To clarify - I would like to find a distribution fit off of a data set, and then create a random set of data based off of that distribution.

Rocky Li Over a year ago

@BrianChen This is not what was asked in the question, please edit.

Onyambu · Accepted Answer · 2019-02-01 17:30:00Z

1

import numpy as np
length = 3
sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]

np.random.choice(sample_data, length, False) #Sampling without replacement
Out[287]: array([4, 4, 2])

answered Feb 1, 2019 at 17:30

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

6 Comments

Brian Chen Over a year ago

hey - i dont want to select x amount of samples from the data, but rather generate data based on the existing data.

Onyambu Over a year ago

@BrianChen just remove False from the above code and run the code with length being 30 for example

Brian Chen Over a year ago

it is still just outputting values from the data set - not generating new data points based off of a distribution.

Onyambu Over a year ago

What do you mean by generating new data points based of a distribution? can you elaborate more?

Brian Chen Over a year ago

hey, thanks for replying - i would like for python to determine what kind of distribution the data best fits to (as well as parameters) and use this data to create x amount of random data from this new distribution/parameters. For example, my data set best fits a normal distribution of (10,1), then use this normal distribution of (10,1) to generate 15 new data points

|

Rocky Li · Accepted Answer · 2019-02-01 19:07:55Z

1

There's an important premise of the question that needs to be decided: what kind of distribution do you want?. Now as humans we probably can classify distribution by the shape of it, when we have enough data. But machines don't, to install an distribution type, say uniform or binomial to a new input is arbitrary. Here I'll provide a brief answer with the gold standard of statistic - normal distribution (according to Central Limit Theorem, sufficient large sample sizes converge to normal)

import numpy as np

sample_data = [2,2,2,2,2,2,3,3,3,3,4,4,4,4,4]
size = 5
new_samples = np.random.normal(np.mean(sample_data), np.std(sample_data), size)

>>> new_samples
array([ 2.01221231,  2.62772975,  1.79965428,  3.83601719,  2.44967777])

The new samples are generated by a normal distribution that assume the mean and standard deviation of the original samples.

answered Feb 1, 2019 at 19:07

Rocky Li

5,9862 gold badges21 silver badges36 bronze badges

1 Comment

Brian Chen Over a year ago

hey, thanks for replying - i would like for python to determine what kind of distribution the data best fits to (as well as parameters) and use this data to create x amount of random data from this new distribution/parameters. For example, my data set best fits a normal distribution of (10,1), then use this normal distribution of (10,1) to generate 15 new data points.

Collectives™ on Stack Overflow

How to generate random data off of existing sample data?

3 Answers 3

4 Comments

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related