I know its simple but I still don't know it. Some people are saying that three are 7 Bits that represent a character while some are saying 8. So can anyone just tell me which one is right? If it is 8 Bits/Character then How many Bits represent a Byte? and If it's 7 then How many bits represent a Character and how many Bits represent ONE byte?
2 Answers
US-ASCII is indeed 7 bits per character. The highest code has value 127, which represents the DEL control character. Any character set that has codes with higher values is not US-ASCII (but may be an extension of it, such as Unicode).
Most microprocessors work with bytes (=smallest addressable unit of storage) of eight bits. If you want to use US-ASCII with these microprocessors, you have two options:
- Use 7 bytes (of 8 bits each) to store 8 characters (of 7 bits each), even though that makes programs very complicated.
- Use 1 byte (of 8 bits) to store 1 character (of 7 bits), even though you'll waste space.
The need for simple programs outweighs the need for efficient memory use in this case. That's why you usually use one 8-bit unit (an octet, for short) to store a character, even though each character is encoded in only 7-bit units. You just set the extra bit to zero (or, as was done in some cases, use the extra bit for error detection).
Comments
I know this is an old question, but for the sake of future readers; you can determine how many bytes are in a given string (or string value) via the following (C# .NET):
Encoding.ASCII.GetByteCount("SomeString");
Remember to use the proper encoding when you are attempting to count the number of bytes since it is different with each encoding:
- An ASCII character in 8-bit ASCII encoding is 8 bits (1 byte), though it can fit in 7 bits.
- An ISO-8895-1 character in ISO-8859-1 encoding is 8 bits (1 byte).
- A Unicode character in UTF-8 encoding is between 8 bits (1 byte) and 32 bits (4 bytes).
- A Unicode character in UTF-16 encoding is between 16 (2 bytes) and 32 bits (4 bytes), though most of the common characters take 16 bits. This is the encoding used by Windows internally.
- A Unicode character in UTF-32 encoding is always 32 bits (4 bytes).
- An ASCII character in UTF-8 is 8 bits (1 byte), and in UTF-16 - 16 bits.
- The additional (non-ASCII) characters in ISO-8895-1 (0xA0-0xFF) would take 16 bits in UTF-8 and UTF-16.