## Sunday, October 26, 2014

### The Painless Guide to CRC isn't quite painless

So there's this “A Painless Guide to CRC Error Detection Algorithms,” which apparently tells you everything you wanted to know about CRCs, but were afraid to ask (or didn't really want to ask). The concept itself seems to be easy—a CRC is just the remainder of a particular type of division. The numerator is the data (treated as one long number) and the demoninator the “polynomial” of the CRC (even though it's a value, the specification for a given CRC is a polynomial equation—go figure). The steps of the algorithm are very simple:

- Load the remainder with zero bits.
- Augment the message by appending zero bits (equal to the size of the remainder) to the end of it.
- While (more message bits)
- Shift the remainder left by one bit, reading the next bit of the augmented message into bit position 0 of the remainder.
- If a 1 bit popped out of the remainder during the previous step, XOR the result with the polynomial

- We now have the remainder.

The size of the remainder is the size of our CRC—16 for a 16-bit CRC, 32 for a 32-bit CRC, etc. And from that description, the code pretty much follows:

/********************************************************** * Straightforward CRC implementation * Section 8 of the Guide ***********************************************************/ uint32_t crcsim(uint32_t crc,const uint8_t *p,size_t size) { size_t i; int c; int xor; while(size--) { c = *p++; for (i = 0 ; i < 8 ; i++) { xor = crc & 0x80000000uL; /* flag if we need to xor the polynomial */ crc = crc << 1; /* shift the crc register */ crc |= (c & 0x80) ? 1 : 0; /* and shift in the data bit */ if (xor) crc ^= 0x04C11DB7uL; c = c << 1; /* shift data bit */ } } return crc; }

Although, this *is* the rare case where it's easier to write it in
assembly that it is in C, since we have access to the carry bit when
shifting, which makes it easier to check:

crc32a push ebp ; boiler plate code for any mov ebp,esp ; C callable code push ebx push esi push edi mov edx,[ebp + 8] ; load passed in CRC mov esi,[ebp + 12] ; ptr to data mov ecx,[ebp + 16] ; count of data mov edi,0x04C11DB7 ; CRC polynomial .main lodsb ; read next data byte mov bl,8 ; # of bits to go through .10 shl al,1 ; shift data bit into poly register rcl edx,1 ; shift high bit out of poly register jnc .15 ; if it wasn't set, skip xor edx,edi ; xoring the CRC polynomial .15 dec bl ; more bits? jnz .10 ; if so, keep going loop .main ; do next byte mov eax,edx ; return CRC pop edi ; save pushed registers pop esi pop ebx pop ebp ret

(Note: the core of the algorithm in assembly is four instructions---the C compiler didn't do quite as good a job from the six lines of C comprising the core of the algorithm---it's about six times the object code.)

What is not shown in the code above (either version) is the agumentation step of adding additional 0-bits to the message—that's left up to the caller of these routines.

Both of these routines give the same result. Other implementations I did based upon the Guide also give the same results. And they're consistent with the results of the reference code given in the Guide.

/******************************************************************* * Table implementation * Section 9 of the Guide * Requires trailing zero bits. ********************************************************************/ const uint32_t crctable[256] = { ... }; uint32_t crc32z(uint32_t crc,const uint8_t *p,size_t size) { while(size--) crc = ((crc << 8) | *p++) ^ crctable[crc >> 24]; crc = (crc << 8) ^ crctable[crc >> 24]; crc = (crc << 8) ^ crctable[crc >> 24]; crc = (crc << 8) ^ crctable[crc >> 24]; crc = (crc << 8) ^ crctable[crc >> 24]; return crc; } /***************************************************************** * Table implementation part deux---improved * Section 10 of the Guide * Does not need trailing zero bits. ******************************************************************/ uint32_t crc32c(uint32_t crc,const uint8_t *p,size_t size) { while(size--) crc = (crc << 8) ^ crctable[ (crc >> 24) ^ *p++]; return crc; } /***************************************************************** * Parameterized Model, from code given in the Guide * Section 15 of the Guide * * This is a reference implementation provided by the Guide to be * used for testing various CRC implementations. ******************************************************************/ uint32_t crcmod(uint32_t crc,const uint8_t *p,size_t size) { cm_t ctx; ctx.cm_width = 32; ctx.cm_poly = 0x04C11DB7uL; ctx.cm_init = crc; ctx.cm_refin = false; ctx.cm_refot = false; ctx.cm_xorot = 0; cm_ini(&ctx); cm_blk(&ctx,(p_ubyte_)p,(ulong)size); return (uint32_t)cm_crc(&ctx); }

Implementation | CRC result |
---|---|

`crcsim()` | `89A1897F` |

`crc32z()` | `89A1897F` |

`crc32c()` | `89A1897F` |

`crcmod()` | `89A1897F` |

So far so good. But that isn't the result from the standard CRC-32 implementation, which is used by Ethernet, ZIP, gzip, PNG and a few other standards. No, CRC-32 uses what the Guide calls a “reflected” table mode, which came about because of hardware CRC-implementations start with the least significant bit of the byte; these algorithms start with the most significant bit of the byte.

Okay, so the bits are fed in backwards. That can be compensated for. Also, the standard CRC-32 algorithm mandates that the initial value of the remainder is all one bits, not zero bits. Easy to fix. And that the final remainder is to be exclusived-or'ed with all ones. Again, easy to do.

It all seems pretty straightforward. And while the Guide only goes over a table inplementation of the “reflected” mode, it seems like it would be straightforward (excuse the pun) to do reflected versions of all the implementations done so far.

And since the `zlib`

library uses the CRC-32, we can
link that in as a baseline to compare results.

So, with that out of the way, the code:

/********************************************************** * Straightforward CRC implementation, using reflected bytes * based on Section 9 of the Guide ***********************************************************/ uint32_t crcsimr(uint32_t crc,const uint8_t *p,size_t size) { size_t i; int c; int xor; crc = ~crc; while(size--) { c = *p++; for (i = 0 ; i < 8 ; i++) { xor = crc & 0x80000000uL; crc = crc << 1; crc |= (c & 0x01) ? 1 : 0; if (xor) crc ^= 0x04C11DB7uL; c = c >> 1; } } return crc; } /******************************************************************* * Table implementation, using a reflected table * based on Section 9 of the Guide ********************************************************************/ const uint32_t crctabler[256] = { ... }; uint32_t crc32rz(uint32_t crc,const uint8_t *p,size_t size) { crc = ~crc; while(size--) crc = ((crc >> 8) | *p++) ^ crctabler[ crc & 0xFF ]; crc = (crc >> 8) ^ crctabler[ crc & 0xFF ]; crc = (crc >> 8) ^ crctabler[ crc & 0xFF ]; crc = (crc >> 8) ^ crctabler[ crc & 0xFF ]; crc = (crc >> 8) ^ crctabler[ crc & 0xFF ]; return ~crc; } /***************************************************************** * Table implementation part deux---improved using reflected table * based on Section 10 of the Guide ******************************************************************/ uint32_t crc32r(uint32_t crc,const uint8_t *p,size_t size) { crc = ~crc; while(size--) crc = (crc >> 8) ^ crctabler[ (crc & 0xFF) ^ *p++]; return ~crc; } /***************************************************************** * Parameterized Model, from code given in the Guide * Section 15 of the Guide * * This is a reference implementation provided by the Guide to be * used for testing various CRC implementations. ******************************************************************/ uint32_t crcmodr(uint32_t crc,const uint8_t *p,size_t size) { cm_t ctx; ctx.cm_width = 32; ctx.cm_poly = 0x04C11DB7uL; ctx.cm_init = ~crc; ctx.cm_refin = true; ctx.cm_refot = true; ctx.cm_xorot = ~0; cm_ini(&ctx); cm_blk(&ctx,(p_ubyte_)p,(ulong)size); return (uint32_t)cm_crc(&ctx); }

And the results:

Implementation | CRC result |
---|---|

`crcsimr()` | `AF296EBB` |

`crc32rz()` | `717C74D2` |

`crc32r()` | `CBF43926` |

`crcmodr()` | `CBF43926` |

`zlib.crc32()` | `CBF43926` |

… um … that was rather unexpected.

I didn't think the code I wrote for reflected CRCs was that unreasonable based upon the information in the Guide, but I guess I was wrong for some of them.

Oh, and getting back to the non-reflected code—I didn't initialize the
results properly, nor did I exclusive-or the results. Hopefully, I'll get
`CBF43926`

when I do that.

Implementation | CRC result |
---|---|

`crcsim()` | `C8C3A78F` |

`crc32z()` | `C8C3A78F` |

`crc32c()` | `FC891918` |

`crcmod()` | `FC891918` |

Okay, now I'm horribly confused. There appears to be some missing
information in “A Painless Guide to CRC Error Detection Algorithms.” The
GNU
Radio implementation of CRC-32 uses the non-reflective table
implementation, and when I called that, I got back `FC891918`

, so
it's consistent with at least two of the CRC-32 non-reflected
implementations. But I'm concerned that the routines that require
additional zero bits aren't the same in this case. There has to be some
subtle difference between the two in this case that I don't see, and isn't
mentioned in the Guide at all.

I did find yet another comprehensive implemenation of CRCs—Danjel
McGougan's `universal_crc`

, and
every version of the non-reflected CRC-32 it generated (it generates either
bit-oriented code, or several table-driven implementations based on
tradeoffs betweeen speed and memory usage) returned `FC891918`

(even it's own bit-oriented version, which isn't the same as the one
described in the Guide).

Another thing I noticed by looking deeply into the abyss that is CRC, is
that my
first implementation of CRC-32 is flawed—I don't exclusive-or the
results with all ones at the end. I suspect that the code I based mine on
didn't bother with the exclusive-or when returning the CRC, but instead did
that elsewhere in the codebase. It's not a bug *per se*, but according to Numerical
Recipes in C:

Second, one can add (XOR) any

M-bit constantKto the CRC before it is transmitted … This has the advantage of detecting another kind of erorr that the CRC would otherwise not find: deletion of an initial 1 bit in the message with spurious insertion of a 1 bit at the end of the block.

The result is that there's a type of corruption that I won't catch. This code was the basis for the CRC implementation in a few programs at work (oops) but again, I don't think it's an outright show-stopping bug.

At some point, I may go through some of this on paper, one bit at a time, to see what's going on math-wise with the reflected and non-reflected table implementations with non-0 initial values.