503 Example

Here, an example is shown to help you understand how QOI (encoding) works. The data from the 8-by-8 sensor looks like shown below.

Example image

The header is stictly defined:

  • The first for bytes are the letters ‘q’, ‘o’, ‘i’, and ‘f’. These letters have to be UTF-8 encoded first, which results in 0x716F6966.
  • The width of the image is stored as a 32-bit value: 0x00000008.
  • The height of the image is stored as a 32-bit value: 0x00000008.
  • Depending on whether or not an alpha channel is present, the number of channels is either 3 (for RGB) or 4 (for RGBA). This is stored in a single byte: 0x03.
  • The colorspace is also represented in a single byte: 0x00 for sRGB with linear alpha and 0x01 for an image where all channels are linear.

This makes the encoding of the header equal to 0x716F696600000008000000080300 . Now we continue with the second row.

Both the channels and the colorspace fields are puely informative.

Chunks

At the start, we assume that the previous values of R, G, and B (called R_d, G_d, and B_d) are all 0x0. The previous value for alpha (A_d) is 0xFF. The running array (RA) of width 64 is initialised on 0x0’s. The current longest runs of repeating pixels is also initialised on -1 (RLE).

The pixel at (0,0) has value 0xFF0000FF

1)  It is not the same as the previous values R_d, G_d, B_d and A_d
2)  It is not present in the RA.
     Nonetheless, we add it. H(R,G,B,A) = 50, so RA[50] = 0xFF0000FF.
3)  Difference with previous pixels
     dr = 0xFF - 0x00 = 0xFF (= -1)   =>   b01
     dg = 0x00 - 0x00 = 0x00 (= 0)   =>   b10
     db = 0x00 - 0x00 = 0x00 (= 0)   =>   b10

This chunk is hence encoded as b01 b01 b10 b10 = b01011010 = 0x5A .
R_d becomes 0xFF, G_d becomes 0x00, B_d becomes 0x00, A_d remains 0xFF.

The pixel at (1,0) has value 0xFF0000FF

1)  It is the same as the previous values R_d, G_d, B_d and A_d, so this is run-length encoding. RLE=0

The pixel at (2,0) has value 0xFF0000FF

1)  It is the same as the previous values R_d, G_d, B_d and A_d, so this is run-length encoding. RLE=1

The pixel at (3,0) has value 0xFF0000FF

1)  It is the same as the previous values R_d, G_d, B_d and A_d, so this is run-length encoding. RLE=2

The pixel at (4,0) has value 0x00FF00FF

1)  It is not the same as the previous values R_d, G_d, B_d and A_d. As there was a run length ongoing, this has to be recorded.
The chunks becomes: b11 b000010 = b11000010 = 0xC2 . RLE is reset to -1.
2)  It is not present in the RA.
     Nonetheless, we add it. H(R,G,B,A) = 45, so RA[48] = 0x00FF00FF.
3)  Difference with previous pixels
     dr = 0x00 - 0xFF = 0x01 (= 1)   =>   b11
     dg = 0xFF - 0x00 = 0xFF (= -1)   =>   b01
     db = 0x00 - 0x00 = 0x00 (= 0)   =>   b10

This chunk is hence encoded as b01 b11 b01 b10 = b01110110 = 0x76 .
R_d becomes 0xFF, G_d becomes 0x00, B_d becomes 0x00, A_d remains 0xFF.

The pixel (5,0), (6,0), and (7,0) are handled similarly to (1,0), (2,0), and (3,0).

1)  It is the same as the previous values R_d, G_d, B_d and A_d, so this is run-length encoding. RLE=0, RLE=1, RLE=2

This makes the encoding of the first row, currently, equal to 0x5AC276 . Now we continue with the second row.


The pixel at (0,1) has value 0xFF0000FF

1)  It is not the same as the previous values R_d, G_d, B_d and A_d. As there was a run length ongoing, this has to be recorded.
The chunks becomes: b11 b000010 = b11000010 = 0xC2 . RLE is reset to -1.
2)  It is present in the RA. H(R,G,B,A) = 50, and RA[50] == 0xFF0000FF.
This chunk is hence encoded as b00 b110010 = b00110010 = 0x32 .

The pixels at (1,1), (2,1), and (3,1) are handled similar as before.

 

The pixel at (4,1) has value 0x00FF00FF

1)  It is not the same as the previous values R_d, G_d, B_d and A_d. As there was a run length ongoing, this has to be recorded.
The chunks becomes: b11 b000010 = b11000010 = 0xC2 . RLE is reset to -1.
2)  It is present in the RA. H(R,G,B,A) = 48, and RA[48] == 0x00FF00FF.
This chunk is hence encoded as b00 b110000 = b00110000 = 0x30 .

The pixels at (5,1), (6,1), and (7,1) are handled similar as before.

 

This makes the encoding of the second row, currently, equal to 0xC232C230 .


The third and fourth row in the image are handled identically to row 2. This makes the encoding of these rows, currently, equal to 0xC232C2300xC232C230 .


The pixel at (0,4) has value 0x0000FFFF

1)  It is not the same as the previous values R_d, G_d, B_d and A_d As there was a run length ongoing, this has to be recorded.
The chunks becomes: b11 b000010 = b11000010 = 0xC2 . RLE is reset to -1.
2)  It is not present in the RA.
     Nonetheless, we add it. H(R,G,B,A) = 46, so RA[46] = 0x0000FFFF.
3)  Difference with previous pixels
     dr = 0x00 - 0x00 = 0x00 (= 0)   =>   b10
     dg = 0x00 - 0xFF = 0x01 (= 1)   =>   b11
     db = 0xFF - 0x00 = 0x00 (= -1)   =>   b01

This chunk is hence encoded as b01 b10 b11 b01 = b01101101 = 0x6D .

The pixels at (1,4), (2,4), and (3,4) are handled similar as before.

 

The pixel at (4,4) has value 0x7F7F7FFF

1)  It is not the same as the previous values R_d, G_d, B_d and A_d As there was a run length ongoing, this has to be recorded.
The chunks becomes: b11 b000010 = b11000010 = 0xC2 . RLE is reset to -1.
2)  It is not present in the RA.
     Nonetheless, we add it. H(R,G,B,A) = 38, so RA[38] = 0x7F7F7FFF.
3)  Difference with previous pixels
     dr = 0x7F - 0x00 = 0x7F =>   not defined
     dg = 0x7F - 0x00 = 0x7F =>   not defined
     db = 0x7F - 0xFF = 0x80 =>   not defined
4)  Difference with previous pixel’s green
     dg = 0x7F - 0x00 = 0x7F =>   not defined
     … 5)  RGB
     A_d and a are equal => b11111110 x7F x7F x7F
This chunk is hence encoded as 0xFE7F7F7F .

The pixels at (5,4), (6,4), and (7,4) are handled similar as before.

 

This makes the encoding of the fifth row, currently, equal to 0xC26DC2FE7F7F7F .


The pixel at (0,5) has value 0x0000FFFF

1)  It is not the same as the previous values R_d, G_d, B_d and A_d As there was a run length ongoing, this has to be recorded.
The chunks becomes: b11 b000010 = b11000010 = 0xC2 . RLE is reset to -1.
2)  It is present in the RA. H(R,G,B,A) = 46, and RA[46] == 0x0000FFFF.
This chunk is hence encoded as b00 b101110 = b00101110 = 0x2E .

The pixels at (1,5), (2,5), and (3,5) are handled similar as before.

 

The pixel at (4,5) has value 0x7F7F7FFF

1)  It is not the same as the previous values R_d, G_d, B_d and A_d As there was a run length ongoing, this has to be recorded.
The chunks becomes: b11 b000010 = b11000010 = 0xC2 . RLE is reset to -1.
2)  It is present in the RA. H(R,G,B,A) = 38, and RA[38] == 0x7F7F7FFF.
This chunk is hence encoded as b00 b100110 = b00100110 = 0x26 .

The pixels at (5,5), (6,5), and (7,5) are handled similar as before.

 

This makes the encoding of the fifth row, currently, equal to 0xC22EC226 .


The final 2 rows are similarly processed as the previous one making the enconding equal to 0xC22EC226C22EC226 .


BEWARE!!
There is still something in the RLE!! This is however handled equally as above. This means another, final,  0xC2  needs to be added.

End marking

The end marker is a fixed 8-byte value: 0x0000000000000001 .

Results of the example

The result of the QOI encoding of the image above hence is:

 0x716F696600000008000000080300 0x5AC276 0xC232C230 0xC232C230 0xC232C230 0xC26DC2FE7F7F7F 0xC22EC226 0xC22EC226C22EC226 0xC2 0x0000000000000001 

 716F6966000000080000000803005AC276C232C230C232C230C232C230C26DC2FE7F7F7FC22EC226C22EC226C22EC226C20000000000000001 

This encoded result has a total size of 57 bytes. If all the 64 pixels (8 by 8) would have to be stored in raw data, with an alpha channel, this would result in 256 bytes. Without an alpha channel, this would result in 192 bytes which is a reduction of 70%!!

If we assume that 1 image is to be recorded every 40 ms, this would result in 25 images per second. This is approximately the framerate of video. If the sensor would make one image every 40 ms, it also has to be encoded on that speed. In the simulation, as shown below, the encoding of a single image roughly takes 190000 ns. If we extrapolate this result, we get: 190000 ns / 64 pixels = 2.97 µs per pixel, and thus a 480p image would take 640*480*3 µs = 307200*3 µs = 912384 µs = 912.4 ms.

Image of simulation

… and this is an extrapolation!!

It is hard to give a fixed duration for the encoding of single image. An image of 307'200 consecutive white pixels will be done more quickly than 307'200 random pixels.