How to process a dictionary of strings in python

Question

I have a dictionary of values that follow this string pattern informationGain_$index$ and threshold_$index$. My goal is to retrieve the maximum informationGain_$index$ and threshold_$index$.

An example dictionary looks like so:

{'informationGain_0': 0.9949486404805016, 'threshold_0': 5.0, 'informationGain_1': 0.9757921620455572, 'threshold_1': 12.5, 'informationGain_2': 0.7272727272727273, 'threshold_2': 11.5, 'informationGain_3': 0.5509775004326937, 'threshold_3': 8.6, 'informationGain_4': 0.9838614413637048, 'threshold_4': 7.0, 'informationGain_5': 0.9512050593046015, 'threshold_5': 6.0, 'informationGain_6': 0.8013772106338303, 'threshold_6': 5.9, 'informationGain_7': 0.9182958340544896, 'threshold_7': 1.5, 'informationGain_8': 0.0, 'threshold_8': 9.0, 'informationGain_9': 0.6887218755408672, 'threshold_9': 7.8, 'informationGain_10': 0.9182958340544896, 'threshold_10': 2.1, 'informationGain_11': 0.0, 'threshold_11': 13.5}

I written code to generate the dataset.

def entropy_discretization(s):

    I = {}
    i = 0
    while(uniqueValue(s)):
        # Step 1: pick a threshold
        threshold = s['A'].iloc[0]

        # Step 2: Partititon the data set into two parttitions
        s1 = s[s['A'] < threshold]
        print("s1 after spitting")
        print(s1)
        print("******************")
        s2 = s[s['A'] >= threshold]
        print("s2 after spitting")
        print(s2)
        print("******************")
            
        # Step 3: calculate the information gain.
        informationGain = information_gain(s1,s2,s)
        I.update({f'informationGain_{i}':informationGain,f'threshold_{i}': threshold})
        print(f'added informationGain_{i}: {informationGain}, threshold_{i}: {threshold}')
        s = s[s['A'] != threshold]
        i += 1

    print(I)

Given the example dataset, the maximum information gain is associated with threshold_0 and informationGain_0. I would like to find a general way of identifying these key values pairs from the dataset. Is there a way to search the dictionary such that I can return informationGain_*,threshold_* such that informationGain_* == max?

is there any particular reason you are structuring your data like this instead of using say a set of namedtuples? Or just 2 parallel dicts where the key is just the index, or just a list of namedtuples if all indices will exist, or even a list of dicts that only have the keys informationGain and threshold if you don't like named tuples. All of those representations makes this task a lot easier. — Tadhg McDonald-Jensen
– Tadhg McDonald-Jensen, Commented Oct 17, 2021 at 21:44

mozway · Accepted Answer · 2021-10-17 21:59:43Z

2

Here is a solution using a custom key with max. It works even if the dictionary is not sorted. This is assuming the input dictionary is named d.

M = max((k for k in d if k.startswith('i')),
        key=lambda x: d[x])
T = f'threshold_{M.rsplit("_")[-1]}'
out = {M: d[M], T: d[T]}

Output:

{'informationGain_0': 0.9949486404805016, 'threshold_0': 5.0}

NB. I used a simple test on the dictionary keys to check those that start with i in order to identify the informationGain_X keys. If you have a more complex real life dictionary, you might want to update this to use a full match or any other way to make identification of the key non ambiguous.

edited Oct 17, 2021 at 21:59

answered Oct 17, 2021 at 21:37

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ddejohn Over a year ago

+1 this is a great answer. I have one small gripe which is that k.startswith('i') is a relatively weak conditional. I think regex would be a more appropriate choice: r"informationGain_(\d*)".

mozway Over a year ago

@ddejohn I hesitated to put a longer string and decided on the simplest solution given OP's data. Also a simpler conditional means a faster code. But I'll add a comment on that.

Evan Gertis · Accepted Answer · 2021-10-17 21:38:37Z

1

I've also found a way of doing this. It just took a few tries

    n = int(((len(I)/2)-1))
    print("Calculating maximum threshold")
    print("*****************************")
    maxInformationGain = 0
    maxThreshold       = 0 
    for i in range(0, n):
        if(I[f'informationGain_{i}'] > maxInformationGain):
            maxInformationGain = I[f'informationGain_{i}']
            maxThreshold       = I[f'threshold_{i}']

    print(f'maxThreshold: {maxThreshold}, maxInformationGain: {maxInformationGain}')

answered Oct 17, 2021 at 21:38

Evan Gertis

2,0767 gold badges37 silver badges81 bronze badges

Comments

antaz · Accepted Answer · 2021-10-17 21:35:25Z

0

One way to do this is as follows:

assuming your dictionary name is d

informationGain_max = max(list(d.values())[::2])
threshold_max = max(list(d.values())[1::2])

this only works under the assumption that since python 3.6 standard dict maintains the order of insertions.

answered Oct 17, 2021 at 21:35

antaz

564 bronze badges

2 Comments

Tadhg McDonald-Jensen Over a year ago

I'd significantly prefer a solution that uses filter or sort or list comprehension with a condition, making a list and slicing seems like such a bad idea with something that is logically an unordered set

antaz Over a year ago

@TadhgMcDonald-Jensen I agree. i think that even using ordered items like informationGain_i and threshold_i in a dict is already a bad idea because elements on a dict are not supposed to be ordered.

Doma · Accepted Answer · 2021-10-17 21:58:30Z

-1

Lets make a list, and each member of that list should be a tuple or list that contains two elements: first the information gain, and then the threshold. We can sort this list with either the .sort() method of the list or by using the sorted() function. The last tuple of the sorted list will contain the values you seek. If you are also interested in the index of these values then add their index as a third element of the tuples.

edited Oct 17, 2021 at 21:58

answered Oct 17, 2021 at 21:48

Doma

523 bronze badges

Collectives™ on Stack Overflow

How to process a dictionary of strings in python

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related