1

I am trying to create a function that creates fake data to use in a separate analysis. Here are the requirements for the function.

Problem 1

In this problem you will create fake data using numpy. In the cell below the function create_data takes in 2 parameters "n" and "rand_gen.

  • The "rand_gen" parameter is a pseudo-random number generator. We are using a pseudo-random number generator to produce the same results.
  • Use the numpy.random.randn function of the pseudo-random generator to create a numpy array of length n and return the array.

Here is the function I have created.

def create_data(n, rand_gen):
'''
Creates a numpy array with n samples from the standard normal distribution

Parameters
-----------
n : integer for the number of samples to create
rand_gen : pseudo-random number generator from numpy  

Returns
-------
numpy array from the standard normal distribution of size n
'''

numpy_array = np.random.randn(n)
return numpy_array

Here is the first test I run on my function.

create_data(10, np.random.RandomState(seed=23))

I need the output to be this exact array.

[0.66698806, 0.02581308, -0.77761941, 0.94863382, 0.70167179,
                       -1.05108156, -0.36754812, -1.13745969, -1.32214752,  1.77225828]

My output is still completely random and I do not fully understand what the RandomState call is trying to do with the seed to create the above array rather than have it be completely random. I know I need to use the rand_gen variable in my function, but I do not know where and I think it's because I just don't understand what it is trying to do.

6
  • 2
    Related? stackoverflow.com/questions/22994423/… Commented Oct 18, 2018 at 18:23
  • 4
    You're not using rand_gen at all in your function? It looks like you create a seeded generator and then just default back to the standard module RNG Commented Oct 18, 2018 at 18:23
  • I don't know numpy, but I'm able to reproduce the required results. Take a look at numpy.random.seed. You need to set the seed before you go and get your array. Commented Oct 18, 2018 at 18:25
  • Correct, I am not sure how to use it. I tried using it where I am currently using n, but that gave me an error and then I wasn't using n, so I took it out because I could still get an array, just not the one I need. Commented Oct 18, 2018 at 18:26
  • 1
    Just call rand_gen.randn(n). docs Commented Oct 18, 2018 at 18:29

2 Answers 2

1

Define numpy_array = rand_gen.randn(n)

Sign up to request clarification or add additional context in comments.

Comments

1

I think the question you are asking is about pseudo-random numbers and reproducible randoms.

Real random numbers are made with real-word unpredictable data, like watching lava lamps, while pseudo-random numbers create a long sequence of numbers that appears random.

The basic algorithm is:

  1. get a seed, or a big number, maybe from the current clock time.
  2. take part of the seed as the random number
  3. do unspeakable mathematical mutilations to the seed involving bit-shifts, exponents, and multiplications.
  4. use the output of these calculations as the new seed, go to step 2.

The trick is that specifying the same seed means you get the same sequence every time. You can set this with numpy.random.seed() and then get the same sequence each time.

I hope this is the question you were asking.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.