Computers only understand binary that's why we have ASCII & Unicode
ASCII
Computers understand binary. Humans understand letters. Pretty simple right?
Well if it comes to displaying letters on your display you wouldn't want to read bytes because it would be a tedious & annoying task.
ASCII (American Standard Code for Information Interchange) solves this problem. In short its a character encoding standard for computers.
ASCII's 7-bit design was a practical choice in the 1960s computing era. With 128 possible combinations (2^7). The 8th-bit is unused. (more on that below)
The 7-bit structure maintained compatibility with existing 6-bit systems and telegraph standards.
It could represent all essential English characters including letters, numbers, punctuation marks, and control characters.
For example the letter "A" is represented as 01000001 (in decimal: 65).
Note: I wrote down 8-bits because it's easier to remember than 1000001 and keeps things unifrom. (Remember a byte is an octet of bits)
The design was efficient for the limited memory and processing power of early computers.
When stored in 8-bit bytes, the unused 8th-bit served a dual purpose: error detection in data transmission and later expansion through
Extended ASCII, which added support for additional characters and languages.
If you want to know all the different combinations & Extendent-ASCII go here: https://www.ascii-code.com/
Unicode
When computers started being used worldwide, ASCII's 128 characters weren't enough. Chinese, Japanese, Arabic, and many other languages couldn't be represented.
Each country created their own encoding standards, which led to chaos - documents couldn't be shared between countries without becoming mixed up.
Unicode solved this mess. It's a universal encoding system that can represent every character from every language.
While ASCII used 7 bits, Unicode can use up to 32 bits per character, allowing it to represent over 4'294'967'296 different characters (2³²)
Even emojis like 😁 (U+1F601).
Unicode characters like U+1F601 use the "U+" as the prefix, followed by hexadecimals. (Base-16)
The most popular Unicode format is UTF-8.
- It's backward compatible with ASCII (the first 128 characters are identical)
- It uses variable-length encoding: common characters like ASCII take just 1 byte, while rare characters use more
- English text takes the same space as ASCII
- It can represent every character from every writing system in the world
For example, the Chinese character "好" is represented as E5 A5 BD in UTF-8, using 3 bytes.
The E5 A5 BD is written in hexadecimals btw. Converted to binary it looks like this 11100101 10100101 10111101.
You can convert binary to UTF-8 here: https://onlinetools.com/utf8/convert-binary-to-utf8
This way, computers can now handle text in any language while remaining efficient with storage.
Today, UTF-8 is the dominant encoding on the web, making it possible for people worldwide to communicate in their native languages through computers.