4

I know its simple but I still don't know it. Some people are saying that three are 7 Bits that represent a character while some are saying 8. So can anyone just tell me which one is right? If it is 8 Bits/Character then How many Bits represent a Byte? and If it's 7 then How many bits represent a Character and how many Bits represent ONE byte?

2 Answers 2

5

US-ASCII is indeed 7 bits per character. The highest code has value 127, which represents the DEL control character. Any character set that has codes with higher values is not US-ASCII (but may be an extension of it, such as Unicode).

Most microprocessors work with bytes (=smallest addressable unit of storage) of eight bits. If you want to use US-ASCII with these microprocessors, you have two options:

  • Use 7 bytes (of 8 bits each) to store 8 characters (of 7 bits each), even though that makes programs very complicated.
  • Use 1 byte (of 8 bits) to store 1 character (of 7 bits), even though you'll waste space.

The need for simple programs outweighs the need for efficient memory use in this case. That's why you usually use one 8-bit unit (an octet, for short) to store a character, even though each character is encoded in only 7-bit units. You just set the extra bit to zero (or, as was done in some cases, use the extra bit for error detection).

Sign up to request clarification or add additional context in comments.

Comments

1

I know this is an old question, but for the sake of future readers; you can determine how many bytes are in a given string (or string value) via the following (C# .NET):

Encoding.ASCII.GetByteCount("SomeString");

Remember to use the proper encoding when you are attempting to count the number of bytes since it is different with each encoding:

  • An ASCII character in 8-bit ASCII encoding is 8 bits (1 byte), though it can fit in 7 bits.
  • An ISO-8895-1 character in ISO-8859-1 encoding is 8 bits (1 byte).
  • A Unicode character in UTF-8 encoding is between 8 bits (1 byte) and 32 bits (4 bytes).
  • A Unicode character in UTF-16 encoding is between 16 (2 bytes) and 32 bits (4 bytes), though most of the common characters take 16 bits. This is the encoding used by Windows internally.
  • A Unicode character in UTF-32 encoding is always 32 bits (4 bytes).
  • An ASCII character in UTF-8 is 8 bits (1 byte), and in UTF-16 - 16 bits.
  • The additional (non-ASCII) characters in ISO-8895-1 (0xA0-0xFF) would take 16 bits in UTF-8 and UTF-16.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.