0

I had a look on google and on here to try find the answer but couldn't seem to word it correctly to get help with this exact issue.

I want to create a Dataframe which has a column called 'Department' with values from a list and then for each value in that column I want the same datetime range.

The list is:

departments = ['Sales', 'Specialist', 'Purchase', 'HR']

and the daterange is (the df being a different dataframe I have with the original date range.):

pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')

So, I tried this but it gave me an error because of the shape, which I understand just not sure how to solve it.

df2 = pd.DataFrame(department,(pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')), columns=['Department',"InvoiceDate"])

The desired outcome is something like this:

          Department    InvoiceDate
    0        Sales      2019-03-25
    1        Sales      2019-03-26
    2        Sales      2019-03-27
    ...
    5     Specialist    2019-03-25
    6     Specialist    2019-03-26
    7     Specialist    2019-03-27
    ...
    8      Purchase     2019-03-25
    9      Purchase     2019-03-26
   10      Purchase     2019-03-27
    ...
   11         HR        2019-03-25
   12         HR        2019-03-26
   13         HR        2019-03-27

Thank you

EDIT: Error Code

>>> df2 = pd.DataFrame(workstream,(pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')), columns=['WorkStream',"InvoiceDate"])
Traceback (most recent call last):
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1678, in create_block_manager_from_blocks
    make_block(values=blocks[0], placement=slice(0, len(axes[0])))
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 3284, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 2792, in __init__
    super().__init__(values, ndim=ndim, placement=placement)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 126, in __init__
    raise ValueError(
ValueError: Wrong number of items passed 1, placement implies 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python38-32\lib\site-packages\pandas\core\frame.py", line 464, in __init__
    mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\construction.py", line 213, in init_ndarray
    return create_block_manager_from_blocks(block_values, [columns, index])
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1688, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1718, in construction_error
    raise ValueError(
ValueError: Shape of passed values is (8, 1), indices imply (533, 2)
1
  • What is the error pandas is logging? Commented Sep 11, 2020 at 10:43

2 Answers 2

2

To do it, you can use below code:

Declare list of departments and get list of dates from range(min and max)

departments = ['Sales', 'Specialist', 'Purchase', 'HR']

dates = pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D').tolist()

You want to have a cartesian product so use the below function

def cartesian_product(data):
    index = pd.MultiIndex.from_product(data.values(), names=data.keys())
    return pd.DataFrame(index=index).reset_index()

cartesian_product({'departments': departments,
                   'date': a})

And here link you can read more about pandas and MultiIndex

Sign up to request clarification or add additional context in comments.

2 Comments

Worked perfectly thank you! I cant say I fully understand what is happening in the function however? do you mind explaining it just so I can understand? @K.Oleksy
Sure! MultiIndex.from_product() creates a cartesian product for given objects, where the first parameter is data values and the second data keys. These functions create all possible combinations of elements from two or more iterables see more. Of course, you can use other MultiIndex functions as: MultiIndex.from_arrays, MultiIndex.from_tuples or MultiIndex.from_frame. You can see more information about it in pandas documentation.
1

You're calling pd.DataFrame() in the wrong way. Also the 2 array provided as data are of different sizes. To solve you can do as follows:

 departments = ['Sales', 'Specialist', 'Purchase', 'HR']
 sizeDates = len(dates)
 sizeDep = len(departments)
 departments = departments * sizeDates
 dates = dates * sizeDep 
 dates = pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D').tolist()
 departments = departments * len(dates)
 data = {'departments': departments,'date': dates}

 df2 = pd.DataFrame(data)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.