Thursday, Debtember 15, 2022
Re: Conformance Should Mean Something - fputc, and Freestanding
Well, that’s okay, because I’m not one to just sit on my hands no matter how much silence I’m met with or how much crippling depression is running through my system: I reached out to a few folks who I knew worked on MISRA, met with them, and thankfully they brought it up in their group meeting on my behalf. Even if the Committee doesn’t want to / feel like commenting (and to be perfectly clear, they do not have to comment; it’s not like I wrote a paper and nobody owes me nothin’, Jack, including a response to my e-mail anyhow), at least MISRA could bring some clarity, right? They work with a ton of implementations, especially embedded/freestanding implementations, and so they should be able to give me good feedback. I contacted an implementer I have the utmost of faith in who attends MISRA functions, so they could bring the issue up at a meeting. They sort of hashed it out. People for/against the code snippet above, whether 2 could be returned validly, and whether what TI’s Run-Time Support Library was doing was standards-blessed behavior (ignoring any “Freestanding” weasel- ing)…
there was divergence on whether or not the snippet was illegal.
It is a little concerning that the body responsible for figuring out the dusty corners of the C standard and guaranteeing portable behavior are not sure if (a) they like what the code snippet implies or (b) which direction of implication they’d like it to go in. But, on the other hand, they are at least united in that some clarity around the subject would be helpful and that we should make it clear what we mean in these functions and in the specification. They’re sort of on top of moving the needle to make sure we are writing high-quality code that can stand the test of time, and “fwrite may not portably do what you want and you need to write a wrapper function before using it every time” needs to be something they should be keen on agreeing on before we can move forward with using basic file abstractions for C. Of course, this is the human-based, common, and shared understanding I was being told about before that would lead us to Nirvana, and what I’m unfortunately finding is that it’s not actually all that bound together in harmony.
Via Hacker News, Conformance Should Mean Something - fputc, and Freestanding | The PastureConformance Should Mean Something - fputc, and Freestanding | The Pasture
It is a mess. The code from the blog post works on most systems, but most systems these days use 8-bit characters; the article is about systems where a character is defined as 16-bits (allowed by the C Standard) and where an integer is also 16-bits (again, allowed by the C Standard and is the minimum size an integer can be per the C specification). It's rare to have non-8-bit characters on desktop computers these days (or even tablet and smart phones) but it seems it's not quite that rare in the embedded space, where you have DSPs that have weird architectures and a charater is most likely the same size as an integer. And that's where the trouble starts.
The main issue is with fputc(). The C Standard states:
The fputc function
Synopsis
#include <stdio.h>
int fputc(int c,FILE *stream);
Description
The fputc function writes the character specified by c (converted to an
unsigned char
) to the output stream pointed to by stream, at the position indicated by the associated file position indicator for the stream (if defined), and advances the indicator appropriately. If the file cannot support positioning requests, or if the stream was opened with append mode, the character is appended to the output stream.Returns
The fputc function returns the character written. If a write error occurs, the error indicator for the stream is set and fputc returns
EOF
.
If both char
and int
are the same size, then
this function can't work as is. The function assumes that the size
of int
is larger than the size of a char
, thus any
value of a signed or unsigned char
can be converted into an
int
or an EOF
, (a value unrepresentable as a
char
).
If char
and int
are the same size … yikes!
And from reading the blog post, it seems that most embedded systems will
clamp down on the values written by fputc() to be between 0 and
255, regardless of what you pass in, even when characters can be 16
bits in size. This is probably to remain interoperable with the rest of the
world where char
is 8-bits in size (Unicode
notwithstanding).
I'm also not sure about this bit from the blog post about fwrite(): “Okay, so it will loop and call through fputc. This is covered under the as-if wording, so it’s not like your standard library has to write exactly a loop of fputc.” I checked the standard, and it always mentions “as if” explicitly, like “this International Standard treats such an end-of-line indicator as if it were a single new-line character” (emphasis added) or “The implementation shall behave as if no library function calls the setlocale function.” (again, emphasis added). But no where is it mentioned in releation to fwrite().
Here's the C89 Standard on fwrite():
The fwrite function
Synopsis
#include <stdio.h>
int fwrite(const void * ptr,size_t size, size_t nmemb, FILE * stream);
Description
The fwrite function writes, from the array pointed to by ptr, up to nmemb elements whose size is specified by size, to the stream pointed to by stream. The file position indicator for the stream (if defined) is advanced by the number of characters successfully written. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.
Returns
The fwrite function returns the number of elements successfully written, which will be less than nmemb only if a write error is encountered.
It's the C99 standard that added the sentence about calling fputc() (which I highlighted below):
The fwrite function
Synopsis
#include <stdio.h>
int fwrite(const void * restrict ptr,size_t size, size_t nmemb, FILE * restrict stream);
Description
The fwrite function writes, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fputc function, taking the values (in order) from an array of
unsigned char
exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully written. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.Returns
The fwrite function returns the number of elements successfully written, which will be less than nmemb only if a write error is encountered. If size or nmemb is zero, fwrite returns zero and the state of the stream remains unchanged.
And nary an “as-if” in sight.
I have to wonder why that sentence was added to C99, if not to force calls to fputc(). I supposed the C Standards Comittee had a reason for it, and I don't think they would have omitted the “as if.” If they did, they failed to add it to the C11 and the proposed C2x standards. So I'm not sure if an implementation of fwrite() can avoid calling fgetc().
And unrelated to this post, I did come across this lovely footnote in the C99 standard:
Setting the file position indicator to end-of-file, as with
fseek(file, 0, SEEK_END)
, has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.
Seriously?
It's not even “implementation defined?” Because that sounds like an implementation detail (for example, on CP/M). But undefined? Come on!
Worse, it's not even listed in “Appendix J.2 Undefined behavior.”