The Painless Guide to CRC isn't quite painless

Sunday, October 26, 2014

So there's this “A Painless Guide to CRC Error Detection Algorithms,” which apparently tells you everything you wanted to know about CRCs, but were afraid to ask (or didn't really want to ask). The concept itself seems to be easy—a CRC is just the remainder of a particular type of division. The numerator is the data (treated as one long number) and the demoninator the “polynomial” of the CRC (even though it's a value, the specification for a given CRC is a polynomial equation—go figure). The steps of the algorithm are very simple:

Load the remainder with zero bits.
Augment the message by appending zero bits (equal to the size of the remainder) to the end of it.
While (more message bits)
1. Shift the remainder left by one bit, reading the next bit of the augmented message into bit position 0 of the remainder.
2. If a 1 bit popped out of the remainder during the previous step, XOR the result with the polynomial
We now have the remainder.

The size of the remainder is the size of our CRC—16 for a 16-bit CRC, 32 for a 32-bit CRC, etc. And from that description, the code pretty much follows:

/**********************************************************
* Straightforward CRC implementation 
* Section 8 of the Guide
***********************************************************/

uint32_t crcsim(uint32_t crc,const uint8_t *p,size_t size)
{
  size_t   i;  
  int      c;  
  int      xor;

  while(size--)
  {
    c = *p++;
    for (i = 0 ; i < 8 ; i++)
    {
      xor = crc & 0x80000000uL;  /* flag if we need to xor the polynomial */
      crc = crc << 1;            /* shift the crc register */   
      crc |= (c & 0x80) ? 1 : 0; /* and shift in the data bit */
 
      if (xor)
        crc ^= 0x04C11DB7uL;
 
      c = c << 1; /* shift data bit */
    }
  }
  
  return crc;
}

Although, this is the rare case where it's easier to write it in assembly that it is in C, since we have access to the carry bit when shifting, which makes it easier to check:

crc32a          push    ebp		; boiler plate code for any
                mov     ebp,esp		; C callable code
                push    ebx
                push    esi
                push    edi

                mov     edx,[ebp + 8]   ; load passed in CRC
                mov     esi,[ebp + 12]  ; ptr to data
                mov     ecx,[ebp + 16]  ; count of data
                mov     edi,0x04C11DB7  ; CRC polynomial

.main           lodsb                   ; read next data byte
                mov     bl,8            ; # of bits to go through

.10             shl     al,1            ; shift data bit into poly register
                rcl     edx,1           ; shift high bit out of poly register
                jnc     .15             ; if it wasn't set, skip
                xor     edx,edi         ; xoring the CRC polynomial

.15             dec     bl              ; more bits?
                jnz     .10             ; if so, keep going
                loop    .main           ; do next byte

                mov     eax,edx         ; return CRC
                pop     edi             ; save pushed registers
                pop     esi
                pop     ebx
                pop     ebp
                ret

(Note: the core of the algorithm in assembly is four instructions---the C compiler didn't do quite as good a job from the six lines of C comprising the core of the algorithm---it's about six times the object code.)

What is not shown in the code above (either version) is the agumentation step of adding additional 0-bits to the message—that's left up to the caller of these routines.

Both of these routines give the same result. Other implementations I did based upon the Guide also give the same results. And they're consistent with the results of the reference code given in the Guide.

/*******************************************************************
* Table implementation
* Section 9 of the Guide
* Requires trailing zero bits.
********************************************************************/

const uint32_t crctable[256] = { ... };

uint32_t crc32z(uint32_t crc,const uint8_t *p,size_t size)
{
  while(size--)
    crc = ((crc << 8) | *p++) ^ crctable[crc >> 24];

  crc = (crc << 8) ^ crctable[crc >> 24];
  crc = (crc << 8) ^ crctable[crc >> 24];
  crc = (crc << 8) ^ crctable[crc >> 24];
  crc = (crc << 8) ^ crctable[crc >> 24];
  return crc;  
} 

/*****************************************************************
* Table implementation part deux---improved
* Section 10 of the Guide
* Does not need trailing zero bits.
******************************************************************/

uint32_t crc32c(uint32_t crc,const uint8_t *p,size_t size)
{
  while(size--)
    crc = (crc << 8) ^ crctable[ (crc >> 24) ^ *p++]; 
      
  return crc; 
}

/*****************************************************************
* Parameterized Model, from code given in the Guide
* Section 15 of the Guide
*
* This is a reference implementation provided by the Guide to be
* used for testing various CRC implementations.
******************************************************************/

uint32_t crcmod(uint32_t crc,const uint8_t *p,size_t size)
{ 
  cm_t ctx;  
 
  ctx.cm_width = 32;
  ctx.cm_poly  = 0x04C11DB7uL;
  ctx.cm_init  = crc;
  ctx.cm_refin = false;
  ctx.cm_refot = false;
  ctx.cm_xorot = 0;
  
  cm_ini(&ctx);
  cm_blk(&ctx,(p_ubyte_)p,(ulong)size);
  return (uint32_t)cm_crc(&ctx);
}

CRC of “123456789” using different implementations
Implementation	CRC result
`crcsim()`	`89A1897F`
`crc32z()`	`89A1897F`
`crc32c()`	`89A1897F`
`crcmod()`	`89A1897F`

So far so good. But that isn't the result from the standard CRC-32 implementation, which is used by Ethernet, ZIP, gzip, PNG and a few other standards. No, CRC-32 uses what the Guide calls a “reflected” table mode, which came about because of hardware CRC-implementations start with the least significant bit of the byte; these algorithms start with the most significant bit of the byte.

Okay, so the bits are fed in backwards. That can be compensated for. Also, the standard CRC-32 algorithm mandates that the initial value of the remainder is all one bits, not zero bits. Easy to fix. And that the final remainder is to be exclusived-or'ed with all ones. Again, easy to do.

It all seems pretty straightforward. And while the Guide only goes over a table inplementation of the “reflected” mode, it seems like it would be straightforward (excuse the pun) to do reflected versions of all the implementations done so far.

And since the zlib library uses the CRC-32, we can link that in as a baseline to compare results.

So, with that out of the way, the code:

/**********************************************************
* Straightforward CRC implementation, using reflected bytes
* based on Section 9 of the Guide
***********************************************************/

uint32_t crcsimr(uint32_t crc,const uint8_t *p,size_t size)
{
  size_t   i;
  int      c;
  int      xor;
  
  crc = ~crc;
  while(size--)
  {
    c = *p++;
    for (i = 0 ; i < 8 ; i++)
    {
      xor = crc & 0x80000000uL;
      crc = crc << 1;
      crc |= (c & 0x01) ? 1 : 0;
      
      if (xor)
        crc ^= 0x04C11DB7uL;
 
      c = c >> 1;
    }
  }
  
  return crc;
}

/*******************************************************************
* Table implementation, using a reflected table
* based on Section 9 of the Guide
********************************************************************/

const uint32_t crctabler[256] = { ... };

uint32_t crc32rz(uint32_t crc,const uint8_t *p,size_t size)
{
  crc = ~crc;
  
  while(size--)
    crc = ((crc >> 8) | *p++) ^ crctabler[ crc & 0xFF ];

  crc = (crc >> 8) ^ crctabler[ crc & 0xFF ];
  crc = (crc >> 8) ^ crctabler[ crc & 0xFF ];
  crc = (crc >> 8) ^ crctabler[ crc & 0xFF ];
  crc = (crc >> 8) ^ crctabler[ crc & 0xFF ];

  return ~crc;
}

/*****************************************************************
* Table implementation part deux---improved using reflected table
* based on Section 10 of the Guide
******************************************************************/

uint32_t crc32r(uint32_t crc,const uint8_t *p,size_t size)
{
  crc = ~crc;
  
  while(size--)
    crc = (crc >> 8) ^ crctabler[ (crc & 0xFF) ^ *p++];
 
  return ~crc;
}

/*****************************************************************
* Parameterized Model, from code given in the Guide
* Section 15 of the Guide
*
* This is a reference implementation provided by the Guide to be
* used for testing various CRC implementations.
******************************************************************/

uint32_t crcmodr(uint32_t crc,const uint8_t *p,size_t size)
{
  cm_t ctx;

  ctx.cm_width = 32;
  ctx.cm_poly  = 0x04C11DB7uL;
  ctx.cm_init  = ~crc;
  ctx.cm_refin = true;
  ctx.cm_refot = true;
  ctx.cm_xorot = ~0;
  
  cm_ini(&ctx);
  cm_blk(&ctx,(p_ubyte_)p,(ulong)size);
  return (uint32_t)cm_crc(&ctx);
}

And the results:

CRC of “123456789” using different implemenations, reflected
Implementation	CRC result
`crcsimr()`	`AF296EBB`
`crc32rz()`	`717C74D2`
`crc32r()`	`CBF43926`
`crcmodr()`	`CBF43926`
`zlib.crc32()`	`CBF43926`

… um … that was rather unexpected.

I didn't think the code I wrote for reflected CRCs was that unreasonable based upon the information in the Guide, but I guess I was wrong for some of them.

Oh, and getting back to the non-reflected code—I didn't initialize the results properly, nor did I exclusive-or the results. Hopefully, I'll get CBF43926 when I do that.

CRC of “123456789” using different implementations, non-reflected with proper initialization
Implementation	CRC result
`crcsim()`	`C8C3A78F`
`crc32z()`	`C8C3A78F`
`crc32c()`	`FC891918`
`crcmod()`	`FC891918`

Okay, now I'm horribly confused. There appears to be some missing information in “A Painless Guide to CRC Error Detection Algorithms.” The GNU Radio implementation of CRC-32 uses the non-reflective table implementation, and when I called that, I got back FC891918, so it's consistent with at least two of the CRC-32 non-reflected implementations. But I'm concerned that the routines that require additional zero bits aren't the same in this case. There has to be some subtle difference between the two in this case that I don't see, and isn't mentioned in the Guide at all.

I did find yet another comprehensive implemenation of CRCs—Danjel McGougan's universal_crc, and every version of the non-reflected CRC-32 it generated (it generates either bit-oriented code, or several table-driven implementations based on tradeoffs betweeen speed and memory usage) returned FC891918 (even it's own bit-oriented version, which isn't the same as the one described in the Guide).

Another thing I noticed by looking deeply into the abyss that is CRC, is that my first implementation of CRC-32 is flawed—I don't exclusive-or the results with all ones at the end. I suspect that the code I based mine on didn't bother with the exclusive-or when returning the CRC, but instead did that elsewhere in the codebase. It's not a bug per se, but according to Numerical Recipes in C:

Second, one can add (XOR) any M-bit constant K to the CRC before it is transmitted … This has the advantage of detecting another kind of erorr that the CRC would otherwise not find: deletion of an initial 1 bit in the message with spurious insertion of a 1 bit at the end of the block.

The result is that there's a type of corruption that I won't catch. This code was the basis for the CRC implementation in a few programs at work (oops) but again, I don't think it's an outright show-stopping bug.

At some point, I may go through some of this on paper, one bit at a time, to see what's going on math-wise with the reflected and non-reflected table implementations with non-0 initial values.

The Boston Diaries

Sunday, October 26, 2014

The Painless Guide to CRC isn't quite painless

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous