I'm working on a Python program that analyzes social media data from a CSV file. The program is supposed to calculate the correlation coefficient between two variables from the data, but I'm encountering an issue where the correlation coefficient is always returning '0' regardless of the input data. We aren't allowed to import or call for print.
I've included the relevant code snippet below:
def platform_with_highest_users(data):
# If data is empty
if not data:
return "Error: Empty data"
# Initialize an empty dictionary to store platform counts
platform_counts = {}
# Count users for each platform
for row in data:
platform = row[4]
platform_counts[platform] = platform_counts.get(platform, 0) + 1
# Find the platform with the highest number of users
highest_users = max(platform_counts.values())
highest_platforms = [platform for platform, count in platform_counts.items() if count == highest_users]
# If there's only one platform with the highest number of users, directly return its filtered data
if len(highest_platforms) == 1:
chosen_platform = highest_platforms[0]
else:
# Sort the highest platforms alphabetically and pick the first one
chosen_platform = sorted(highest_platforms)[0]
# Filter the data for the chosen platform
filtered_data = [row for row in data if row[4] == chosen_platform]
# Return filtered data for the chosen platform
return filtered_data
def correlation_coefficient(filtered_data):
if not filtered_data:
return 0
x = [int(row[1]) for row in filtered_data]
y = [int(row[9]) for row in filtered_data]
# Calculate the correlation value following the given formula
avg_x = (sum(x) / len(x))
avg_y = (sum(y) / len(y))
numerator = sum((x[i] - avg_x) * (y[i] - avg_y) for i in range(len(filtered_data)))
denominator_x = (sum((x[i] - avg_x)) ** 2 for i in range(len(filtered_data)))
denominator_y = (sum((y[i] - avg_y)) ** 2 for i in range(len(filtered_data)))
# Calculate the denominator correctly
denominator = (denominator_x * denominator_y) ** 0.5
# Avoid division by zero
correlation = numerator / denominator if denominator != 0 else 0
return round(correlation, 4)
I know the correlation value can't be zero for we were given this sample output for reference.
OP1, OP2, OP3, OP4 = main('SocialMedia.csv', [18,25], 'australia')
The returned output variables are: OP1 [['11', 4708.0], ['126', 5785.0], ['184', 9266.0]] OP2 ['australia', 'bangladesh', 'ireland', 'new zealand', 'pakistan', 'yemen'] >> OP3 [3.5556, 112446.1548, 'rural']
OP4 0.4756
Correlation value: A numeric value for correlation between age and income for the user base of the social media platform that has the highest number of users. If there are multiple social media platforms having same highest number of users then sort them in alphabetical order and find correlation considering the first one.
dataor an example offiltered_data.