Beginners Reference Guide to CompSci

1.1 Data Representation - Contents

In this chapter, you will learn:

Table of Contents

Units
Number Representation
- Denary (base 10) Numbers
Characters and Character Sets
- ASCII
- Unicode
Images
Sounds
Video
Compression

Units

A bit is the smallest unit of data in a computer system. A bit is a single binary value, represented by either a 0 or a 1. You can put a bits together, to make larger units of data, such as nibbles and bytes.

Binary is counted as base 2; this means that there are only 2 digits possible, 0 and 1. Our regular counting system, from 0 to 9, is base 10, because there are 10 possible digits available; this is also commonly referred to as denary.

Bits can be converted into different capacities, as shown below:

4 bits in a nibble
8 bits (2 nibbles) in a byte
1000 bytes in a kilobyte
1000 kilobytes in a megabyte
1000 megabytes in a gigabyte
1000 gigabytes in a terabyte

In the past, 1024 would be used in place of the 1000 conversion measurement. Please check with your exam board as to whether you will be using 1000 or 1024.

Note	Additional Information In fact, in modern computing, a kibibyte/mebibyte/gibibyte/tebibyte is the terminology for the 1024 conversion measurement. So, a kilobyte is 1000 bytes, whilst a kibibyte is 1024 bytes, and so on.

Number Representation

To be able to represent numbers in any other format, they have to be converted to binary (base 2).

Denary (base 10) Numbers

To convert positive denary numbers to binar, you must first

Characters and Character Sets

Since computers handle all information in binary, we must have a way to represent different types of content.

Each letter on a keyboard is assigned a certain number. When the computer sees this number, it will show the correct letter. This number is called a character code. A complete collection of many character codes is known as a character set.

To be able to ensure cross compatibility with different computer systems and software, an industry standard character set is used. The two most common are ASCII, and Unicode.

ASCII

ASCII stands for the American Standard Code for Information Interchange. In ASCII, each character code is represented by 7-bits, giving 128 unique characters. That is enough for every upper-case, lower-case, digit and punctuation mark on most keyboards. However, a major limitation of ASCII is that it can only represent the English language.

Extended ASCII was developed when 8-bit computers were developed. This extended set has 256 characters, allowing broader ranges of languages to be represented. However, this still did not account for the plethora of different languages that exist, and thus a better solution was required.

Unicode

Unicode was developed as computers became more powerful and could store more information, to be able to represent a larger range of characters. It is usually represented in 16-bit or 32-bit form. Unicode allows over a million characters, albeit for taking up more space per character. It is usually used as the character set for representing web pages.

Unicode stands for UNIque UNIversal and UNIform character enCODing. It allows many more characters to be represented, including other languages.

Images

Sounds

Video

Compression

Return to home…

1.0 Introduction

1.2 Systems Architecture