2

So I have programmed a crawler to scrape information and data from a website with charset utf8. But when I tried to store the contents into MySQL, some special characters, such as Spanish letters), did not show correctly in MySQL.

Here is what I have done:

  1. Put header("Content-Type: text/html; charset=utf-8") in PHP
  2. Set all charset in MySQL into utf8-unicode-ci
  3. Have $conn->query("SET NAMES 'utf8'") this upon connection
  4. Double checked that the html I parsed was encoded in utf-8

So what are some potentially problems here?

3 Answers 3

1

Maybe you coded your crawler using functions which are not supposed to manage multi-byte characters.
For example strlen instead of mb_strlen.

Try putting:

mb_internal_encoding("UTF-8");

as first line of your php coce, and then check if you have to convert some functions in their respective mb version. Have a look at multibyte string reference

As a last chance you may play with iconv function just before inserting the string into mysql.
Something as:

$utf8_string = iconv(iconv_get_encoding($string), "UTF-8", $string);

should do the trick

Sign up to request clarification or add additional context in comments.

1 Comment

@DanielZuo happy to give something back to the net :)
1

Start by checking if the data is stored wrong in the database, in which case the problem is with your crawler. Otherwise the problem is in your presentation.

To test this, I would suggest that you use a dedicated mysql client (Such as the command line client) to inspect data.

2 Comments

Hi troelskn, my crawler used cURL to extract data and parse it with PHP DOM. The data storage is also very straightforward.
@DanielZuo The advice troelskn has given you is very good. It doesn't really matter how simple is your solution, it's important to see if the data is stored correctly or not to determine where the issue could be.
0

I remember pulling my hair out in dealing with UTF8 issues until I started adding this to my header:

setlocale(LC_ALL, 'en_US.UTF-8');

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.