dask read_sql error when querying from MYSQL

Question

I am using python 2.7 with dask and trying to query a db table from a remote machine to a dask dataframe

I have a multiple column index in the table, and I try to read it using the following script

ddf = dd.read_sql_table("table name", "mysql://user:pass@ip:port/Dbname",spesific column name).head()

And getting the following error

start = asanyarray(start) * 1.0 TypeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('S32')

dtype('S32') dtype('S32')

I got the sqlalchemy uri as explained here

i'm not sure what's the problem, when I try to query by another column as the index, and only use the ddf head(), i don't get an error, and when I try to compute the whole ddf i get the same error, i assume it's an issue regarding the column not being of unique values, I don't have a single column index, but multiple column, what is the solution to read the entire table here?

Thanks.

full traceback

> Traceback (most recent call last):   File "path", line 28, in <module>
>     ddf = dd.read_sql_table("tablename", "mysql://user:pass@ip:port/dbname","indexcolumn")   File "file", line
> 123, in read_sql_table
>     divisions = np.linspace(mini, maxi, npartitions + 1).tolist()   File
> "/home/user/.local/lib/python2.7/site-packages/numpy/core/function_base.py",
> line 108, in linspace
>     start = asanyarray(start) * 1.0 TypeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('S32')
> dtype('S32') dtype('S32')

Please show a more detailed traceback and perhaps run debug to find the value of start when the error happens. — mdurant
– mdurant, Commented Nov 30, 2017 at 13:47

mdurant · Accepted Answer · 2017-11-30 23:28:56Z

2

For the case where you provide no further information or only specify number of partitions, the partitioning logic in read_sql_table only works for numbers, because we need a way to make ordered divisions between the minimum and maximum values.

Apparently, but the query (to get the max/min) is returning a string for this case. read_sql_table can still work, but you will need to define the divisions to split on yourself, and supply them with the divisions keyword, e.g.,

ddf = dd.read_sql_table("table name", "mysql://user:pass@ip:port/Dbname", 
    'index_col', divisions=['aardvark', 'llama', 'tapir', 'zebra']).head()

Alternatively, the string in question certainly looks like a number, so you might need to update the schema of the table to make sure it is interpreted as a number.

edited Nov 30, 2017 at 23:28

answered Nov 30, 2017 at 22:39

mdurant

28.8k5 gold badges49 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

thebeancounter Over a year ago

Could you provide a full example of the first solution?

thebeancounter Over a year ago

could you provide with a working example for the first solution?

thebeancounter Over a year ago

thanks! so, just for me to understand it all the way, the divisions are in terms of the index column and the index column only? (and how will it use them, given that you need to sort the strings to fit within those divisions, does it use lexicographic order?

mdurant Over a year ago

Yes, the divisions are the boundaries of each partition for the index column, say the second partition would be WHERE index_col > "llama" AND index_col <= "tapir". If you provide divisions, it is up to you to order them, and understand what your DB will understand by it.

Collectives™ on Stack Overflow

dask read_sql error when querying from MYSQL

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related