1

I have data in .csv file called 'Max.csv':

Valid Date  MAX
1/1/1995    51
1/2/1995    45
1/3/1995    48
1/4/1995    45

Another csv called 'Min.csv' looks like:

Valid Date  MIN
1/2/1995    33
1/4/1995    31
1/5/1995    30
1/6/1995    39

I want two generate two dictionaries or any other suggested data structure so that I can have two separate variables Max and Min in python respectively as:

Valid Date  MAX
1/2/1995    45
1/4/1995    45

Valid Date  MIN
1/2/1995    33
1/4/1995    31

i.e. select the elements from Max and Min so that only the common elements are output.

I am thinking about using numpy.intersect1d, but that means I have to separately compare the Max and Min first on date column, find the index of common dates and then grab the second columns for Max and Min. This appears too complicated and I feel there are smarter ways to intersect two curves Max and Min.

2 Answers 2

2

You mention that:

I have to separately compare the Max and Min first on date column, find the index of common dates and then grab the second columns for Max and Min. This appears too complicated...

Indeed this is fundamentally what you need to do, one way or the other; but using the numpy_indexed package (disclaimer: I am its author), this isn't complicated in the slightest:

import numpy_indexed as npi
common_dates = npi.intersection(min_dates, max_dates)
print(max_values[npi.indices(max_dates, common_dates)])
print(min_values[npi.indices(min_dates, common_dates)])

Note that this solution is fully vectorized (contains no loops on the python-level), and as such is bound to be much faster than the currently accepted answer.

Note2: this is assuming the date columns are unique; if not, you should replace 'npi.indices' with 'npi.in_'

Sign up to request clarification or add additional context in comments.

Comments

1

The set() builtin must be enough as follows:

>>> max = {"1/1/1995":"51", "1/2/1995":"45", "1/3/1995":"48", "1/4/1995":"45"}
>>> min = {"1/2/1995":"33", "1/4/1995":"31", "1/5/1995":"30", "1/6/1995":"39"}

>>> a = set(max)
>>> b = set(min)
>>> {x:max[x] for x in a.intersection(b)}
{'1/4/1995': '45', '1/2/1995': '45'}
>>> {x:min[x] for x in a.intersection(b)}
{'1/2/1995': '33', '1/4/1995': '31'}

6 Comments

Can you please provide hint on how to create set from the csv file? I use pandas to read the csv file into a dataframe.
please vote up and check right my answer, I delivered.
Zanam Did you succeed?
Yes I did but I like the answer @Eelco as it is not running loop
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.