I want to insert an array that contains empty values created from Pandas in Python, and these empty values default to np.nan in Pandas dataframe. I don't want them to be 'NaN' in my PostgreSQL database, I want my PostgreSQL arrays to contain empty values like this: '{123,24,,23}' so they are not counted in my aggregate functions like calculating the mean or standard deviation across indices. I am not sure if it is possible to have sparse arrays in PostgreSQL. There won't be a lot of sparse arrays in my dataset, I am just testing this for edge case purposes.
My table schema:
create_table = '''
CREATE TABLE {t} (
patient_id VARCHAR[20] PRIMARY KEY,
gene_expression double precision []
);
'''
The relevant Python code (I don't know how to write the proper SQL code here). Here I converted the array into a string, because Python arrays cannot be sparse:
df = df.fillna('')
NCI = [1]
MCI = [2,3]
AD = [4,5]
other = [6]
insert_sql = '''
INSERT INTO {t} (patient_id, gene_expression)
VALUES (%s,%s);
'''
cur = psql_conn.cursor()
for index, row in df.iterrows():
arr = row[2:].tolist()
postgres_arr = ','.join(map(str, arr))
if row['DIAGNOSIS'].isdigit():
if int(row['DIAGNOSIS']) in NCI:
cur.execute(insert_sql.format(t='nci'), (row['PATIENT_ID'], postgres_arr,))
elif int(row['DIAGNOSIS']) in MCI:
cur.execute(insert_sql.format(t='mci'), (row['PATIENT_ID'], postgres_arr,))
elif int(row['DIAGNOSIS']) in AD:
cur.execute(insert_sql.format(t='ad'), (row['PATIENT_ID'], postgres_arr,))
elif int(row['DIAGNOSIS']) in other:
cur.execute(insert_sql.format(t='other'), (row['PATIENT_ID'], postgres_arr,))
elif row['DIAGNOSIS'] == '':
cur.execute(insert_sql.format(t='na'), (row['PATIENT_ID'], postgres_arr,))
else:
print('ERROR: unknown diagnosis {d}.'.format(d=diagnosis))
psql_conn.commit()
cur.close()
My Error:
psycopg2.DataError: malformed array literal: "{2.0,2.4,}"
LINE 3: VALUES ('X100_120417','{2.0,2.4,}');
^
DETAIL: Unexpected "}" character.