4

While doing some coding exercise, I came across this problem:

"Write a function that takes in a list of dictionaries with a key and list of integers and returns a dictionary with standard deviation of each list."

e.g

input = [
    
    { 
        'key': 'list1',
        'values': [4,5,2,3,4,5,2,3]
        },

    {
        'key': 'list2',
        'values': [1,1,34,12,40,3,9,7],
    }
]

Answer: Answer: {'list1': 1.12, 'list2':14.19}

Note, the 'values' is actually the key to the list of the value, a little decepting at first!

My attempt:

def stdv(x):
    
    for i in range(len(x)):
        
        for k,v in x[i].items():
            result = {}
            print(result)
        
            if k == 'values':
                mean = sum(v)/len(v)                
                variance = sum([(j - mean)**2 for j in v]) / len(v)        
                
                stdv = variance**0.5
                
                return stdv   # not sure here!!
            
            result = {k, v} # this is where i get stuck

I was able to calculate the standard deviation, but I have no idea how to put the results back into the dictionary as suggested in the answer. Can anyone shed some lights into it? Much appreciated!

4 Answers 4

2

I would try another implementation of the std calculation since that one is O(2n), because you first loop to get the mean and then to get the std. It can be done in a single loop as noted here.

I'm not sure about it, but i think numpy's implementation does that. So, you can make a function like this one:

from numpy import std
def stdv(list_of_dicts):
    return {d['key'] : std(d['values']) for d in list_of_dicts}

UPDATED:
if you really need to implement the std calculation yourself, you can make another function for that:

def std(arr):
    n = len(arr)
    if n == 0: return 0

    _sum, _sq_sum = 0, 0
    for v in arr:
        _sum += v 
        _sq_sum += v ** 2
    _sq_mean = (_sum / n) ** 2
    return (_sq_sum / n - _sq_mean) ** 0.5

Since you said that this is a coding exercise, i will try to point out where i think your mistake is.

You made a loop in x[i].items() to get the key and the values and then check whether you key is 'values' to perform the std calculation. Since you want to store the result in a dictionary, you also need to have the value in the 'key' field simultaneously. With that loop you are only getting one of those at a time.

Also, not directly related, but if you want to loop over a list to get the values inside, and you dont care about the index, is better to do:

for x_i in x:
    for k,v in x_i.items():

Instead of:

for i in range(len(x)):
    for k,v in x[i].items():

I would recomend this video.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes I agree with this, I could have directly access the items from the list, instead of index based.
1

Try the following, note that it is not adding the values to the array of dictionaries. Instead, it returns a new dictionary (AS SHOWN IN 'Answer:') where each key is the key from the array of dictionaries...:

def stdv(x):
  ret = {}
  for i in range(len(x)):
    v = x[i]['values']
    mean = sum(v)/len(v)
    variance = sum([(j - mean)**2 for j in v]) / len(v)        
    ret[x[i]['key']] = variance**0.5
  return ret  

2 Comments

I came up with exactly same solution later with some help, I think, I had a wrong idea on appending the value in (key, value) pair. And this method is actually neat and eye opening to me. Thank you though.
Cool... Thanks! lemme know if you need anything else...
1

Using statistics.pstdev and dictionary comprehensions.

from statistics import stdev, pstdev

#dont shadow the input builtin!
input_ = [
    
    { 
        'key': 'list1',
        'values': [4,5,2,3,4,5,2,3]
        },

    {
        'key': 'list2',
        'values': [1,1,34,12,40,3,9,7],
    }
]

result = { di["key"] : pstdev(di["values"]) for di in input_}  
print(result)

output:

{'list1': 1.118033988749895, 'list2': 14.185710239533304}

Comments

0

you can add with the update function like this

x = [

{ 
    'key': 'list1',
    'values': [4,5,2,3,4,5,2,3]
    },

{
    'key': 'list2',
    'values': [1,1,34,12,40,3,9,7],
}
]
arr=[]
for i in range(len(x)):
    
    for k,v in x[i].items():
        result = {}
        print(result)
    
        if k == 'values':
            mean = sum(v)/len(v)                
            variance = sum([(j - mean)**2 for j in v]) / len(v)        
            
            stdv = variance**0.5
            
            #print( stdv)   # not sure here!!
            arr.append(stdv)
        #result = {k, v} # this is where i get stuck
for i in range(len(arr)):
    x[i].update({"varience":arr[i]})
print(x)    

1 Comment

Thank you. But I think, the 4 for loops would make it little inefficient?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.