3

First I created database with utf8mb4_general_ci collation and created table with same collation. Then I import csv file with

load data local infile '/mnt/c/Users/justi/Desktop/enml/enml.csv' 
into table dict 
CHARACTER SET utf8mb4
fields terminated by '\t' 
IGNORE 1 ROWS;

Sample data


+--------+----------------+----------------+---------------------------------+
| # id   | english_word   | part_of_speech | malayalam_definition            |
+--------+----------------+----------------+---------------------------------+
| 174569 | .net           | n              | പുത്തന്‍ കമ്പ്യൂട്ടര്‍ സാങ്കേതികത ഭാഷ      |
+--------+----------------+----------------+---------------------------------+
| 116102 | A bad patch    | n              | കുഴപ്പം പിടിച്ച സമയം               |
+--------+----------------+----------------+---------------------------------+
| 219752 | a bag of bones | phr            | വളരെയതികം മെലിഞ്ഞ വ്യക്തി അഥവാ മൃഗം |
+--------+----------------+----------------+---------------------------------+

I check with
SELECT malayalam_definition from dict;
then var_dump($row); gives

array(1) { ["malayalam_definition"]=> string(19) "ശരശയ്യ " }  
 array(1) { ["malayalam_definition"]=> string(22) "പൂമെത്ത " }  
 array(1) { ["malayalam_definition"]=> string(41) "സുഖകരമായ അവസ്ഥ " }   
  array(1) { ["malayalam_definition"]=> string(44) "അസുഖകരമായ അവസ്ഥ " }   
  array(1) { ["malayalam_definition"]=> string(22) "പൂമെത്ത " } 
  array(1) { ["malayalam_definition"]=> string(123) "സുഖകരമെങ്കിലും സ്വാതന്ത്യ്രമില്ലാത്ത അവസ്ഥ " }
...

You can find an unknown character after each word like "ശരശയ്യ ". I tried select trim(malayalam_definition) from dict but gives same result. how to find out that character after each words?

4
  • good question, i want to know answer as well Commented Feb 11, 2019 at 8:25
  • That's probably some trash from the csv file. Commented Feb 11, 2019 at 8:30
  • Can you try the solution from stackoverflow.com/questions/1504962/… and see if it helps. Commented Feb 11, 2019 at 8:43
  • @NigelRen Tried, gives same result. Commented Feb 11, 2019 at 8:55

1 Answer 1

1

Converting the string to hex is one way:

SELECT HEX(malayalam_definition),CONCAT("{",malayalam_definition,"}")
FROM dict
WHERE id=116102
Sign up to request clarification or add additional context in comments.

4 Comments

syntax error, unexpected '") FROM dict WHERE id=116102"' (T_CONSTANT_ENCAPSED_STRING) direct query gives E0B495E0B581E0B4B4E0B4AAE0B58DE0B4AAE0B48220E0B4AAE0B4BFE0B49FE0B4BFE0B49AE0B58DE0B49A20E0B4B8E0B4AEE0B4AFE0B4820D and {കുഴപ്പം പിടിച്ച സമയം }
yes, the quotes need to be adjusted for your php quoting context.
Did you see any problem in hex value? When I converted it, I get കുഴപ്പം പിടിച്ച സമയം . If you see closely There is a space I doubt?
Found it 0D. This value is the unknown character. What is this value? Is it a space? \r?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.