0

I have to insert few records into an OracleDB using cx_Oracle, however, I'm confronted with the below error for some records due to the presence of non-ASCII characters in my data (some symbols, etc.).

I have tried the following methods while reading input excel file,

data=pd.read_excel(os.path.join(DATA_PATH,fname))
data=pd.read_excel(os.path.join(DATA_PATH,fname),encoding='utf-8')
data=pd.read_excel(os.path.join(DATA_PATH,fname),encoding=sys.getfilesystemencoding())

Later, I have tried to clean the data to make it compatible with my database schema.

#Cleaning for numeric, string and Unicode
data._get_numeric_data().fillna(0,inplace=True)
for col in data.select_dtypes(include='object').columns:
   #data[col]=data[col].str.strip()
   data[col]=data[col].values.astype('unicode')
data.fillna('',inplace=True)
data.replace({r'\u':''},inplace=True)

#Generating query statement
insert_columns='"'.join(data.columns.tolist())
insert_columns='("'+'","'.join(data.columns.tolist())+'")'
bind_variables='(:'+ ",:".join([str(x) for x in list(range(1,len(data.columns)+1))])+')'
insert_query="INSERT INTO " + table_map[source]+" "+ insert_columns + ' VALUES '+ bind_variables 

db_data=list(data.iloc[:,:].itertuples(index=False, name=None))

#Converting to native python dtypes
for row in db_data:
    for item in row:
        if type(item)==np.int64:
            item=int(np.int64(item).item())
        if type(item)==np.float64:
            item=float(np.float64(item).item())     

#Inserting data #1
for i in range(len(db_data)+1):
    print(i)
    cursor.executemany(insert_query, db_data[i:i+1])

#Inserting data #2
cursor.executemany(insert_query, db_data[:])

However, in the end, I'm faced with the same Unicode error for some records. In the python object, I have these as non-ASCII, but I'm unable to filter them out.

In [152]: db_data[10:11][-1][-1]
Out[152]: 'PO3005544, PO3005729\xa0'

In [153]: print(db_data[10:11][-1][-1])
PO3005544, PO3005729 

In [154]: data.iloc[10:11,-1]
Out[154]: 
10    PO3005544, PO3005729 
Name: AMENDMENT_DESCRIPTION, dtype: object

In [155]: data.iloc[10:11,-1].values
Out[155]: array(['PO3005544, PO3005729\xa0'], dtype=object)

In [157]: data.iloc[10:11,-1].values[-1][-1]
Out[157]: '\xa0'

In [158]: len(data.iloc[10:11,-1].values[-1][-1])
Out[158]: 1

Traceback (most recent call last):

File "", line 3, in cursor.executemany(insert_query, db_data[i:i+1])

UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 20: ordinal not in range(128)

2
  • stackoverflow.com/questions/33263669/… ? Commented Jan 29, 2020 at 6:17
  • 1
    "I have tried to clean the data to make it compatible with my database schema." - Do you have a good reason for choosing/configuring a database that doesn't support Unicode strings? Commented Mar 28, 2020 at 9:15

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.