9

How can I generate random dates within a range of dates on bimonthly basis in numpy? One way I can think of is generating two sets of random integer arrays:

bimonthly1 = np.random.randint(1,15,12)
bimonthly2 = np.random.randint(16,30,12)

I can then generate the dates, with the 'day' values from the above two arrays for each month. However, this will require me to explicitly pass month and year data. A solution would be to generate the desired date_range first and substitute the 'days' in the range with the above array values. But for a large array, this may not be the best solution. This method will require operation on each and every element of the range.

I would appreciate any pointers on how to do this in numpy more efficiently.

1
  • 1
    If you want each day to have the same probability, using timedelta is a much better idea. Commented Sep 6, 2017 at 15:59

5 Answers 5

11

There is a much easier way to achieve this, without needing to explicitly call any libraries beyond numpy.

Numpy has a datetime datatype that is quite powerful: specifically for this case you can add and subtract integers and it treats it like the smallest time unit available. for example, for a %Y-%m-%d format:

exampledatetime1 = np.datetime64('2017-01-01')
exampledatetime1 + 1
>>
2017-01-02

however, for a %Y-%m-%d %H:%M:%S format:

exampledatetime2 = np.datetime64('2017-01-01 00:00:00')
exampledatetime2 + 1
>>
2017-01-01 00:00:01

in this case, as you only have information down to a day resolution, you can simply do the following:

import numpy as np

bimonthly_days = np.arange(0, 60)
base_date = np.datetime64('2017-01-01')
random_date = base_date + np.random.choice(bimonthly_days)

or if you wanted to be even cleaner about it:

import numpy as np

def random_date_generator(start_date, range_in_days):
    days_to_add = np.arange(0, range_in_days)
    random_date = np.datetime64(start_date) + np.random.choice(days_to_add)
    return random_date

and then just use:

yourdate = random_date_generator('2012-01-15', 60)
Sign up to request clarification or add additional context in comments.

1 Comment

np.choice in an array of times is not scalable. This becomes more apparent as you change the frequency and broaden the time range.
2

You could create the date range a priori, e.g. using pandas's date_range, and convert it to a numpy array. Then, make random choices from this array of dates using numpy.random.choice.

Comments

1

What if you define a start date as the first of the month and then add a random timedelta?

e.g.

import datetime
d0 = datetime.datetime.strptime('01/01/2016', '%d/%m/%Y')

from calendar import monthrange
max_day = monthrange(d0.year, d0.month)[1]

import numpy as np
random_dates_1 = []
random_dates_2 = []
for i in range(10):
    random_dates_1.append( d0 + datetime.timedelta(days=np.random.randint(0, int(max_day/2))) )
    random_dates_2.append( d0 + datetime.timedelta(days=np.random.randint(int(max_day/2), max_day+1)) )

1 Comment

Honestly I think this is the only scalable solution, much moreso than generating an entire array and then choosing from it.
1

Here is a pure numpy implementation that creates two arrays of datetimes for each month of the year. The first array has random values from the first half of each month and the second array from the second half of each month.

import datetime
from calendar import monthrange
import numpy as np

arr_first = np.array([])
arr_second = np.array([])

for i in range(1, 13):
    base = datetime.datetime(2016, i, 1)
    max_days = monthrange(2016, i)[1]
    first = np.random.randint(0, max_days // 2)
    second =np.random.randint(max_days // 2, max_days)
    arr_first = np.append(arr_first, base + datetime.timedelta(days=first))
    arr_second = np.append(arr_second, base + datetime.timedelta(days=second))

1 Comment

Not a great idea to iteratively expand the arrays - they should be statically allocated to 13 elements beforehand - but otherwise this is the preferred approach.
1

All of the answers already given involve some kind of loop when generating multiple dates at once. Here is a fully-parallelized function that uses the same basic approach as @Alex but is done entirely without iteration or appendation.

Instead of building an array one-by-one by adding to a known start value, this code works by making an array of the start value and an array of random offsets, and then adds them together.

import numpy as np

def random_dates(start, range_in_days, count):
    """
    Generate a number of random dates in Datetime format.
    :param start: Start date. Must be string or Datetime object.
    :param range_in_days: Number of days past the start (exclusive). Must be an int.
    :param count: Number of values to generate
    :return: An ndarray of length count and dtype datetime64, full of random dates.
    """
    start = np.datetime64(start)
    base = np.full(count, start)
    offset = np.random.randint(0, range_in_days, count)
    offset = offset.astype('timedelta64[D]')
    return base + offset

# prints 30 random dates within the month of January 2023
print(random_dates("2023-01-01", 31, 10))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.