1

I am working on a final assignment for a python data science course and we are supposed to process data for sunspots from 1700 to 2019. This is processing basic data and developing visualizations for it using matplotlib. I asked the instructor about using Pandas, but we are only allowed to use the Numpy library for this project. We also have not learned about classes, so I assume that is also off limits.

Someone has asked and solved the entire problem at the following link. I looked at their solution for guidance (I don't need the whole thing solved, I just need to be pointed in the right direction), but it used Pandas. External Link to Assignment Posted on Chegg

The data comes in as a csv in the following form:

record,year,sunspots
1,1700,5
2,1701,11
3,1702,16
4,1703,23
...
316,2015,130
317,2016,133
318,2017,127.9
319,2018,144
320,2019,141

Based on the prompt, I believe the idea is to have the data read out as a complete table which looks something like:

        Min   Max   Total   Average  Stdev
18th C. 0     154   4544    45.44    35.79
19th C. 0     139   4218    42.18    33.35
20th C. 1     190   6256    61.56    46.28
21st C. 31    229   2451    122.55   53.05

Currently I have the data reading in correctly (I think) as follows:

# importing libraries
import numpy as np
import matplotlib.pyplot as mpl

# importing file and assigning relevant header information
sunspot_data = np.genfromtxt('project2_dataset.csv', delimiter=',', skip_header=False, dtype=str)
header = sunspot_data[0]
spot_data = sunspot_data[1:]

# indicating the data types and where they begin within the csv
record = spot_data[:, 0].astype(int)
year = spot_data[:, 1].astype(int)
num_spots = spot_data[:, 2].astype(float)

# creating the empty array and creating the arrays for the row and column headers
data_array = np.zeros((5, 6))
row_header = np.array(['','18th C.', '19th C.', '20th C.', '21st C.']).astype(str)
column_header = np.array(['','Minimum', 'Maximum', 'Total', 'Average', 'Standard Dev.']).astype(str)

The problem I am having is that I am running a 'for' loop to get the various values, but I cannot get them to store as an array to be able to populate a np.array. The code which I am currently using is as follows:

# defining the centuries within the data
cen18 = num_spots[0:100].astype(int)
cen19 = num_spots[100:200].astype(int)
cen20 = num_spots[200:300].astype(int)
cen21 = num_spots[300:].astype(int)

# creates a list of the centuries for processing
century_list = [cen18,cen19, cen20, cen21]

# for loop to get the descriptive statistics 
for lists in century_list:
    min_list = np.array(np.min(lists))
    max_list = np.array(np.max(lists))
    sum_list = np.array(np.sum(lists))
    mean_list = np.array(np.mean(lists))
    stdev_list = np.array(np.std(lists))

I am trying to get this to print correctly, but the following is the code I have written and what its output currently is.

in:

# attempt to insert the data within the array created above
data_array[1:,1] = min_list
data_array[1:,2] = max_list
data_array[1:,3] = sum_list
data_array[1:,4] = mean_list
data_array[1:,5] = stdev_list
print(data_array)

out:

[[   0.           0.           0.           0.           0.       0.        ]
 [   0.          33.         229.        2451.         122.55    53.05136662]
 [   0.          33.         229.        2451.         122.55    53.05136662]
 [   0.          33.         229.        2451.         122.55    53.05136662]
 [   0.          33.         229.        2451.         122.55    53.05136662]]

row 0 and col 0 should be headers as seen above, which is a whole different issue to solve...

So I guess my question is - how can I get that output to correctly go into a np.array, and when I move on to process the data on the decade-level, how can I do that efficiently without going through and creating a new variable for each decade?

2
  • 1
    The link you provide doesn't work (it points to StackOverflow). If I copy the text of the link itself, I am redirected to a site that blocks me and asks to prove that I am human. Generally speaking, please try to ask self-contained questions. Also, please provide a minimal reproducible example, including a small example input data and the corresponding expected result. Commented Dec 14, 2022 at 22:37
  • @PierreD, I am unsure why the link is not functioning. I will just remove it so as to not confuse anyone visiting this in the future. Additionally, I thought I had added enough of my code to make it reproducible, but I will try to add to it for clarification. From your solution below, I was able to get the code to function within my parameters. Thank you! Commented Dec 15, 2022 at 14:20

1 Answer 1

1

You could try this:

# your example data
a = np.genfromtxt(io.StringIO("""
1,1700,5
2,1701,11
3,1702,16
4,1703,23
316,2015,130
317,2016,133
318,2017,127.9
319,2018,144
320,2019,141"""), delimiter=',')[:, 1:].copy()

# a kind of groupby -- requires the centuries in the data to be contiguous
century, ix = np.unique(a[:, 0].astype(int) // 100, return_index=True)
out = np.c_[century + 1, [
    (v.min(), v.max(), v.sum(), v.mean(), v.std())
    for v in np.split(a[:,1], ix[1:])
]]

>>> out.round(3)
array([[ 18.   ,   5.   ,  23.   ,  55.   ,  13.75 ,   6.61 ],
       [ 21.   , 127.9  , 144.   , 675.9  , 135.18 ,   6.265]])

(That is read as: in the 18th century, min was 5, max was 23, total was 55, average was 13.75, stddev was 6.61).

Important: the data needs to be ordered by year (in order to make sure each century group is contiguous. If it isn't, you need to sort it first.

Source of inspiration & credits due to: This answer about groupby in numpy by Vincent J.

Sign up to request clarification or add additional context in comments.

2 Comments

I was able to make this code work to output an array pretty much how I needed, so thank you! As a side note, since I am very new to programming the syntax is still very confusing to me. I doubt I would have been able to get this to run without your response. So thank you, again!
I should have put an explanation, sorry -- I'll try when I have a bit more time. Meanwhile, try to take individual bits and pieces to make sure you see how they work. Key tools used here: np.unique and np.split.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.