1

From my basic math I know the median salary is 40000 of all jobs listed but how would I obtain that using NumPy?

eg Find the salary of the median of all jobs listed

  • 1st column = salary

  • 2nd column = no. of jobs advertised

     ``` x = np.array([
               [10000, 329],
               [20000, 329],
               [30000, 323],
               [40000, 310],
               [50000, 284],
               [60000, 232],
               [70000, 189],
               [80000, 130],
               [90000, 87],
               [100000, 71]]
               )
    
1
  • You're looking for a weighed median where the second column is the weights. This is not built into numpy, but you can write a function as demonstrated here and here. The second link is a more general solution for quantiles (median is the 0.50 quantile). Commented Feb 23, 2022 at 14:30

1 Answer 1

2

You have a frequency table. You are interested in finding the first value from x[:, 0] corresponding to where the midpoint falls on the cumulative frequency.

You can use:

def median_freq_table(freq_table: np.ndarray) -> float:
    """
    Find median of an array represented as a frequency table [[ val, freq ]].
    """
    values = freq_table[:, 0]
    freqs = freq_table[:, 1]

    # cumulative frequencies
    cf = np.cumsum(freqs)
    # total number of elements
    n = cf[-1]

    # get the left and right buckets
    # of where the midpoint falls,
    # accounting for both even and odd lengths
    l = (n // 2 - 1) < cf
    r = (n // 2) < cf

    # median is the midpoint value (which falls in the same bucket)
    if n % 2 == 1 or (l == r).all():
        return values[r][0]
    # median is the mean between the mid adjacent buckets
    else:
        return np.mean(values[l | r][:2])

Your input:

>>> xs = np.array(
    [
        [10000, 329],
        [20000, 329],
        [30000, 323],
        [40000, 310],
        [50000, 284],
        [60000, 232],
        [70000, 189],
        [80000, 130],
        [90000, 87],
        [100000, 71],
    ]
)
>>> median_freq_table(xs)
40000

Simple, even-length array:

>>> xs = np.array([[1, 3], [10, 3]])
>>> median_freq_table(xs)
5.5
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.