0

I have a "big" csv file (around 1GB of data, 3M lines) to import into a MariaDB table.

The thing is, almost every field of every line is considered as a string. So, I have to convert "1 337" (string) into 1337 (integer).

Here is the script used to import into the table :

LOAD DATA LOW_PRIORITY LOCAL
    INFILE 'data.txt'
    INTO TABLE `test`.`test_import`
    CHARACTER SET utf8
    FIELDS TERMINATED BY ';'
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '"'
    LINES TERMINATED BY '\r\n'
    (`id`,
        `data`,
        @NumberOne,
        @NumberTwo,
        @NumberThree,
        @NumberFour)
        SET `Number One` = REPLACE(@NumberOne, ' ', ''),
            `Number Two` = REPLACE(@NumberOne, ' ', ''),
            `Number Three` = REPLACE(@NumberOne, ' ', ''),
            `Number Four` = REPLACE(@NumberOne, ' ', '');

Using this script, there is no problem importing numbers below 999. But beginning with 1000 (written "1 000" in my csv), all I have is a warning (Truncated incorrect INTEGER value: '1 000') and the value 1 in my database.

The "funny" thing is, when I try with this :

SET `Number One` = REPLACE(@NumberOne, '1', 'k'),
                `Number Two` = REPLACE(@NumberOne, '1', 'k'),
                `Number Three` = REPLACE(@NumberOne, '1', 'k'),
                `Number Four` = REPLACE(@NumberOne, '1', 'k')

-> REPLACE() works, "1 000" become "k 000".

So, how can I use REPLACE() to remove spaces in numbers ? Or, how to make CAST()/CONVERT() working correctly on strings like "1 337" ?


Some more informations.

Here is a fresh test table :

CREATE OR REPLACE TABLE test_spaces_extr (
    `Identifier`   tinytext,
    `First name`   tinytext,
    `Last name`    tinytext,
    `Number One`   int unsigned,
    `Number Two`   int unsigned,
    `Number Three` int unsigned,
    `Number Four`  int unsigned,
    `Number Five`  int unsigned,
    `Number Six`   int unsigned,
    `Number Seven` int unsigned
);

Here is the script to import CSV :

LOAD DATA LOW_PRIORITY LOCAL
    INFILE 'some_data.txt'
    INTO TABLE `test`.`test_spaces_extr`
    CHARACTER SET utf8
    FIELDS TERMINATED BY ';'
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '"'
    LINES TERMINATED BY '\r\n'
    (`Identifier`,
        `First name`,
        `Last name`,
        @NumberOne,
        @NumberTwo,
        @NumberThree,
        @NumberFour,
        @NumberFive,
        @NumberSix,
        @NumberSeven)
        SET `Number One` = REPLACE(@NumberOne, ' ', ''),
            `Number Two` = REPLACE(@NumberTwo, ' ', ''),
            `Number Three` = REPLACE(@NumberThree, ' ', ''),
            `Number Four` = REPLACE(@NumberFour, ' ', ''),
            `Number Five` = REPLACE(@NumberFive, ' ', ''),
            `Number Six` = REPLACE(@NumberSix, ' ', ''),
            `Number Seven` = REPLACE(@NumberSeven, ' ', '');

Here is the full content of some_data.txt :

"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"

(One line, yes.)

Here is the result :

"Identifier"    "First name"    "Last name" "Number One"    "Number Two"    "Number Three"  "Number Four"   "Number Five"   "Number Six"    "Number Seven"
"3efa639b3a"    "Censored"  "Censored"  "7896"  "3468"  "3854"  "5000"  "1234"  "9654"  "0"

In fact, the "Number" fields became integers here. All of them, but not the last one ("Number Seven" -> "0").

It's becoming weirder ...

7
  • did you try to set data type as unsigned int ? Commented Jun 13, 2017 at 12:46
  • @PrabhatG Not really. At the moment, these fields are "integer". But even using CAST(REPLACE(@var, ' ', '') as unsigned), it won't work.It's like if REPLACE() doesn't want to remove spaces. Commented Jun 13, 2017 at 12:52
  • @lksen : Just a thought...can you open it in excel and use =SUBSTITUTE(A1," ","") to remove spaces from numeric and then load in MySQL ? Commented Jun 13, 2017 at 12:54
  • @PrabhatG Unfortunately, that's not possible, the file is just too big for that Commented Jun 13, 2017 at 13:00
  • Let me get this straight, do you see 1 000 or 1 in your database? Commented Jun 13, 2017 at 13:24

1 Answer 1

1

I can't reproduce the problem:

$ mysql -u user -p --column-type-info
MariaDB [(none)]> SELECT VERSION();
Field   1:  `VERSION()`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8_general_ci (33)
Length:     72
Max_length: 24
Decimals:   31
Flags:      NOT_NULL 


+-----------------+
| VERSION()       |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT CAST(REPLACE('1 337', ' ', '') AS UNSIGNED);
Field   1:  `CAST(REPLACE('1 337', ' ', '') AS UNSIGNED)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     5
Max_length: 4
Decimals:   0
Flags:      NOT_NULL UNSIGNED BINARY NUM 


+---------------------------------------------+
| CAST(REPLACE('1 337', ' ', '') AS UNSIGNED) |
+---------------------------------------------+
|                                        1337 |
+---------------------------------------------+
1 row in set (0.00 sec)

UPDATE

File: /path/to/data.csv

"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"
MariaDB [_]> SELECT VERSION();
+-----------------+
| VERSION()       |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [_]> DROP TABLE IF EXISTS `test_spaces_extr`;
Query OK, 0 rows affected (0.07 sec)

MariaDB [_]> CREATE OR REPLACE TABLE `test_spaces_extr` (
    ->     `Identifier`   tinytext,
    ->     `First name`   tinytext,
    ->     `Last name`    tinytext,
    ->     `Number One`   int unsigned,
    ->     `Number Two`   int unsigned,
    ->     `Number Three` int unsigned,
    ->     `Number Four`  int unsigned,
    ->     `Number Five`  int unsigned,
    ->     `Number Six`   int unsigned,
    ->     `Number Seven` int unsigned
    -> );
Query OK, 0 rows affected (0.00 sec)

MariaDB [_]> LOAD DATA LOW_PRIORITY LOCAL INFILE '/path/to/data.csv'
    ->   INTO TABLE `test_spaces_extr`
    ->   CHARACTER SET utf8
    ->   FIELDS TERMINATED BY ';'
    ->   OPTIONALLY ENCLOSED BY '"'
    ->   ESCAPED BY '"'
    ->   LINES TERMINATED BY '\r\n'
    ->   (
    ->     `Identifier`,
    ->     `First name`,
    ->     `Last name`,
    ->     @`NumberOne`,
    ->     @`NumberTwo`,
    ->     @`NumberThree`,
    ->     @`NumberFour`,
    ->     @`NumberFive`,
    ->     @`NumberSix`,
    ->     @`NumberSeven`
    ->   )
    ->   SET
    ->   `Number One` = REPLACE(@`NumberOne`, ' ', ''),
    ->   `Number Two` = REPLACE(@`NumberTwo`, ' ', ''),
    ->   `Number Three` = REPLACE(@`NumberThree`, ' ', ''),
    ->   `Number Four` = REPLACE(@`NumberFour`, ' ', ''),
    ->   `Number Five` = REPLACE(@`NumberFive`, ' ', ''),
    ->   `Number Six` = REPLACE(@`NumberSix`, ' ', ''),
    ->   `Number Seven` = REPLACE(@`NumberSeven`, ' ', '');
Query OK, 1 row affected (0.00 sec)                  
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

MariaDB [_]> SELECT
    ->   `Identifier`,
    ->   `First name`,
    ->   `Last name`,
    ->   `Number One`,
    ->   `Number Two`,
    ->   `Number Three`,
    ->   `Number Four`,
    ->   `Number Five`,
    ->   `Number Six`,
    ->   `Number Seven`
    -> FROM
    ->   `test_spaces_extr`;
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| Identifier | First name | Last name | Number One | Number Two | Number Three | Number Four | Number Five | Number Six | Number Seven |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| 3efa639b3a | Censored   | Censored  |       7896 |       3468 |         3854 |        5000 |        1234 |       9654 |         1337 |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
1 row in set (0.00 sec)
Sign up to request clarification or add additional context in comments.

1 Comment

Yup. REPLACE(str, ' ', '') (to remove spaces) works in this context, I tested it too. But not when used in the context of importing a CSV.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.