Revisions to iconv fails to detect valid utf-8 character as utf-8

deleted 4 characters in body

Source Link

edited Jan 6 at 17:21

24.5k
2
69
129

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. This isSee what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

Use -t ISO-8859-15//TRANSLIT then.

As a proof of concept, this works for me (in pl_PL.UTF-8 locale):

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr (with a newline at the end). It so happens the representation of this exact string is identical in ISO-8859-15 and in UTF-8, so I chose not to obfuscate the command by additionally piping to iconv -f ISO-8859-15 -t UTF-8.

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. This is what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

Use -t ISO-8859-15//TRANSLIT then.

As a proof of concept, this works for me (in pl_PL.UTF-8 locale):

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr (with a newline at the end). It so happens the representation of this exact string is identical in ISO-8859-15 and in UTF-8, so I chose not to obfuscate the command by additionally piping to iconv -f ISO-8859-15 -t UTF-8.

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. See what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

Use -t ISO-8859-15//TRANSLIT then.

As a proof of concept, this works for me (in pl_PL.UTF-8 locale):

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr (with a newline at the end). It so happens the representation of this exact string is identical in ISO-8859-15 and in UTF-8, so I chose not to obfuscate the command by additionally piping to iconv -f ISO-8859-15 -t UTF-8.

added 26 characters in body

Source Link

edited Jan 6 at 16:22

Kamil Maciorowski

24.5k
2
69
129

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. This is what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

Use -t ISO-8859-15//TRANSLIT then.

As a proof of concept, this works for me (in pl_PL.UTF-8 locale):

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr. (Itwith a newline at the end). It so happens the representation of this exact string is identical in ISO-8859-15 and in UTF-8, so I chose not to obfuscate the command by additionally piping to iconv -f ISO-8859-15 -t UTF-8.)

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. This is what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

Use //TRANSLIT then.

As a proof of concept, this works for me (in pl_PL.UTF-8 locale):

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr. (It so happens the representation of this exact string is identical in ISO-8859-15 and in UTF-8, so I chose not to obfuscate the command by additionally piping to iconv -f ISO-8859-15 -t UTF-8.)

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. This is what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

Use -t ISO-8859-15//TRANSLIT then.

As a proof of concept, this works for me (in pl_PL.UTF-8 locale):

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr (with a newline at the end). It so happens the representation of this exact string is identical in ISO-8859-15 and in UTF-8, so I chose not to obfuscate the command by additionally piping to iconv -f ISO-8859-15 -t UTF-8.

added 71 characters in body

Source Link

edited Jan 6 at 16:12

Kamil Maciorowski

24.5k
2
69
129

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. This is what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

AndUse //TRANSLIT then.

As a proof of concept, this works for me (in pl_PL.UTF-8 locale):

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr. (It so happens the representation of this exact string is identical in ISO-8859-15 and in UTF-8, so I chose not to obfuscate the command by additionally piping to iconv -f ISO-8859-15 -t UTF-8.)

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. This is what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

And this works for me:

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr.

Your comment:

maybe the problem is with converting to ISO-8859-15 rather than converting from UTF-8

is on the right track. The problem is there is no ’ in ISO-8859-15. The most similar character is '. This is what man 1 iconv states in Debian 12 I'm using:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

Use //TRANSLIT then.

As a proof of concept, this works for me (in pl_PL.UTF-8 locale):

printf '%s\n' 'ian’s eyes abr' | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT

The output is ian's eyes abr. (It so happens the representation of this exact string is identical in ISO-8859-15 and in UTF-8, so I chose not to obfuscate the command by additionally piping to iconv -f ISO-8859-15 -t UTF-8.)

Source Link

answered Jan 6 at 16:06

Kamil Maciorowski

24.5k
2
69
129

Loading

Stack Exchange Network

Return to Answer