2

I have read this question which I thought would give me what I was after:

How Can I Best Guess the Encoding when the BOM (Byte Order Mark) is Missing?

I would like to know if there is another way to get the file encoding, without using Mozilla's i18n component in D2006? I can not use other 3d party components.

I have read the all the answers from original question, and I can not use the interface provided because the client doesn't accept the deployment of that dll:

Some of the links provided in the original question are dead, and none address my problem, which is:
How to get the file encoding without using 3rd party components?

17
  • 3
    @RobKennedy - it is not a duplicate. I've mentioned in question, that I already read that. And the answers to the question mentioned are talking about try this and try that, without solving the problem. and the guy who asked was resolving the problem by using the interface provided here: sourceforge.net/projects/chsdet/files Commented Feb 2, 2012 at 17:54
  • 2
    Look for BOM; if BOM not found, ask a user to set an encoding. Commented Feb 2, 2012 at 18:13
  • 4
    If you're unsatisfied with the answers to that question, that doesn't mean it isn't still the same question you're asking. That question asked how to guess the file encoding. You're asking the same thing. The accepted answer there is to use Chardet, but there are other answers, including one telling you to use Notepad's algorithm, followed by a couple of other algorithm descriptions. If you're not going use a library, and you don't like the built-in API, then the only answers you're going to get are algorithm descriptions. How is your question different from the original? Commented Feb 2, 2012 at 18:55
  • 4
    Attacking the quality of the answers is not what's going to convince me that this is a different question. The answers don't matter. Your question asks the same thing as the other question. As for drawing attention to an old question, I think your initial action was fine: Re-ask the question. If there are new answers, people can add them to the old question. You can also start a bounty on the old question. There are lots of posts on Meta about how to draw attention to an old post. Commented Feb 2, 2012 at 23:06
  • 3
    He's asking for "another way to get the file encoding, without using Mozilla's i18n component in D2006 [because he] can not use other 3d party components." Seems valid enough to me - he's done his research, unfortunately can't use the answer to the other question, and is asking if there's an alternative. An alternative (different answer) probably warrants a new question, since you can't have two accepted answers on one question. Commented Feb 3, 2012 at 9:38

2 Answers 2

4

I would look for a BOM first and if one is not found call IsTextUnicode. But beware that no method is foolproof.

Sign up to request clarification or add additional context in comments.

2 Comments

IsTextUnicode should be dealt with care. see this
@kobik hence my final sentence. This is an inexact science.
1

Determining the encoding of a file seems to be problematic. It appears that some of the UTF8 files do not have a BOM. This appears to work:

InputData.LoadFromFile(f,TEncoding.UTF8);
if InputData.count=0 then
  InputData.LoadFromFile(f);

Is there a better approach. I know this solution isn't very elegant.

1 Comment

Use TEncoding.GetBufferEncoding() before calling LoadFromFile(), or simply omit the Encoding parameter and let LoadFromFile() call GetBufferEncoding() internally for you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.