0

I'm working on a Python program that analyzes social media data from a CSV file. The program is supposed to calculate the correlation coefficient between two variables from the data, but I'm encountering an issue where the correlation coefficient is always returning '0' regardless of the input data. We aren't allowed to import or call for print.

I've included the relevant code snippet below:

def platform_with_highest_users(data):
    # If data is empty
    if not data:
        return "Error: Empty data"
    
    # Initialize an empty dictionary to store platform counts
    platform_counts = {}
    
    # Count users for each platform
    for row in data:
        platform = row[4]
        platform_counts[platform] = platform_counts.get(platform, 0) + 1
    
    # Find the platform with the highest number of users
    highest_users = max(platform_counts.values())
    highest_platforms = [platform for platform, count in platform_counts.items() if count == highest_users]
    
    # If there's only one platform with the highest number of users, directly return its filtered data
    if len(highest_platforms) == 1:
        chosen_platform = highest_platforms[0]
    else:
        # Sort the highest platforms alphabetically and pick the first one
        chosen_platform = sorted(highest_platforms)[0]
    
    # Filter the data for the chosen platform
    filtered_data = [row for row in data if row[4] == chosen_platform]
    
    # Return filtered data for the chosen platform
    return filtered_data

    
def correlation_coefficient(filtered_data):
    if not filtered_data:
        return 0  

    x = [int(row[1]) for row in filtered_data]
    y = [int(row[9]) for row in filtered_data]

    # Calculate the correlation value following the given formula
    avg_x = (sum(x) / len(x))
    avg_y = (sum(y) / len(y))

    numerator = sum((x[i] - avg_x) * (y[i] - avg_y) for i in range(len(filtered_data)))
    denominator_x = (sum((x[i] - avg_x)) ** 2 for i in range(len(filtered_data)))
    denominator_y = (sum((y[i] - avg_y)) ** 2 for i in range(len(filtered_data)))
    
    # Calculate the denominator correctly
    denominator = (denominator_x * denominator_y) ** 0.5
    
    # Avoid division by zero
    correlation = numerator / denominator if denominator != 0 else 0
    
    return round(correlation, 4)

I know the correlation value can't be zero for we were given this sample output for reference.

OP1, OP2, OP3, OP4 = main('SocialMedia.csv', [18,25], 'australia')
The returned output variables are: OP1 [['11', 4708.0], ['126', 5785.0], ['184', 9266.0]] OP2 ['australia', 'bangladesh', 'ireland', 'new zealand', 'pakistan', 'yemen'] >> OP3 [3.5556, 112446.1548, 'rural']
OP4 0.4756

Correlation value: A numeric value for correlation between age and income for the user base of the social media platform that has the highest number of users. If there are multiple social media platforms having same highest number of users then sort them in alphabetical order and find correlation considering the first one.

2
  • provide an example of data or an example of filtered_data. Commented Apr 17, 2024 at 6:04
  • Your function will not return 0 due to a TypeError exception Commented Apr 17, 2024 at 8:05

1 Answer 1

0

In your code, denominator_x and denominator_y are generators. You can't multiply one generator by another. Your code will raise a TypeError exception.

A more robust implementation is:

def correlation_coefficient(x: list[float], y: list[float]) -> float:
    if (_n := len(x)) and _n == len(y):
        mx = sum(x) / _n
        my = sum(y) / _n
        n = da = db = 0.0
        for _x, _y in zip(x, y):
            dx = _x - mx
            dy = _y - my
            n += dx * dy
            da += dx * dx
            db += dy * dy
        try:
            return n / (da ** 0.5 * db ** 0.5)
        except ZeroDivisionError:
            pass
    return 0.0

Note:

scipy.stats.pearsonr would be better if you were allowed to import additional modules

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.