The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, Debtember 15, 2022

Re: Conformance Should Mean Something - fputc, and Freestanding

Well, that’s okay, because I’m not one to just sit on my hands no matter how much silence I’m met with or how much crippling depression is running through my system: I reached out to a few folks who I knew worked on MISRA, met with them, and thankfully they brought it up in their group meeting on my behalf. Even if the Committee doesn’t want to / feel like commenting (and to be perfectly clear, they do not have to comment; it’s not like I wrote a paper and nobody owes me nothin’, Jack, including a response to my e-mail anyhow), at least MISRA could bring some clarity, right? They work with a ton of implementations, especially embedded/freestanding implementations, and so they should be able to give me good feedback. I contacted an implementer I have the utmost of faith in who attends MISRA functions, so they could bring the issue up at a meeting. They sort of hashed it out. People for/against the code snippet above, whether 2 could be returned validly, and whether what TI’s Run-Time Support Library was doing was standards-blessed behavior (ignoring any “Freestanding” weasel- ing)…

there was divergence on whether or not the snippet was illegal.

It is a little concerning that the body responsible for figuring out the dusty corners of the C standard and guaranteeing portable behavior are not sure if (a) they like what the code snippet implies or (b) which direction of implication they’d like it to go in. But, on the other hand, they are at least united in that some clarity around the subject would be helpful and that we should make it clear what we mean in these functions and in the specification. They’re sort of on top of moving the needle to make sure we are writing high-quality code that can stand the test of time, and “fwrite may not portably do what you want and you need to write a wrapper function before using it every time” needs to be something they should be keen on agreeing on before we can move forward with using basic file abstractions for C. Of course, this is the human-based, common, and shared understanding I was being told about before that would lead us to Nirvana, and what I’m unfortunately finding is that it’s not actually all that bound together in harmony.

Via Hacker News, Conformance Should Mean Something - fputc, and Freestanding | The PastureConformance Should Mean Something - fputc, and Freestanding | The Pasture

It is a mess. The code from the blog post works on most systems, but most systems these days use 8-bit characters; the article is about systems where a character is defined as 16-bits (allowed by the C Standard) and where an integer is also 16-bits (again, allowed by the C Standard and is the minimum size an integer can be per the C specification). It's rare to have non-8-bit characters on desktop computers these days (or even tablet and smart phones) but it seems it's not quite that rare in the embedded space, where you have DSPs that have weird architectures and a charater is most likely the same size as an integer. And that's where the trouble starts.

The main issue is with fputc(). The C Standard states:

The fputc function

Synopsis

#include <stdio.h>
int fputc(int c,FILE *stream);

Description

The fputc function writes the character specified by c (converted to an unsigned char) to the output stream pointed to by stream, at the position indicated by the associated file position indicator for the stream (if defined), and advances the indicator appropriately. If the file cannot support positioning requests, or if the stream was opened with append mode, the character is appended to the output stream.

Returns

The fputc function returns the character written. If a write error occurs, the error indicator for the stream is set and fputc returns EOF.

If both char and int are the same size, then this function can't work as is. The function assumes that the size of int is larger than the size of a char, thus any value of a signed or unsigned char can be converted into an int or an EOF, (a value unrepresentable as a char).

If char and int are the same size … yikes!

And from reading the blog post, it seems that most embedded systems will clamp down on the values written by fputc() to be between 0 and 255, regardless of what you pass in, even when characters can be 16 bits in size. This is probably to remain interoperable with the rest of the world where char is 8-bits in size (Unicode notwithstanding).

I'm also not sure about this bit from the blog post about fwrite(): “Okay, so it will loop and call through fputc. This is covered under the as-if wording, so it’s not like your standard library has to write exactly a loop of fputc.” I checked the standard, and it always mentions “as if” explicitly, like “this International Standard treats such an end-of-line indicator as if it were a single new-line character” (emphasis added) or “The implementation shall behave as if no library function calls the setlocale function.” (again, emphasis added). But no where is it mentioned in releation to fwrite().

Here's the C89 Standard on fwrite():

The fwrite function

Synopsis

#include <stdio.h>
int fwrite(const void * ptr,size_t size, size_t nmemb, FILE * stream);

Description

The fwrite function writes, from the array pointed to by ptr, up to nmemb elements whose size is specified by size, to the stream pointed to by stream. The file position indicator for the stream (if defined) is advanced by the number of characters successfully written. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.

Returns

The fwrite function returns the number of elements successfully written, which will be less than nmemb only if a write error is encountered.

It's the C99 standard that added the sentence about calling fputc() (which I highlighted below):

The fwrite function

Synopsis

#include <stdio.h>
int fwrite(const void * restrict ptr,size_t size, size_t nmemb, FILE * restrict stream);

Description

The fwrite function writes, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fputc function, taking the values (in order) from an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully written. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.

Returns

The fwrite function returns the number of elements successfully written, which will be less than nmemb only if a write error is encountered. If size or nmemb is zero, fwrite returns zero and the state of the stream remains unchanged.

And nary an “as-if” in sight.

I have to wonder why that sentence was added to C99, if not to force calls to fputc(). I supposed the C Standards Comittee had a reason for it, and I don't think they would have omitted the “as if.” If they did, they failed to add it to the C11 and the proposed C2x standards. So I'm not sure if an implementation of fwrite() can avoid calling fgetc().


And unrelated to this post, I did come across this lovely footnote in the C99 standard:

Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.

Seriously?

It's not even “implementation defined?” Because that sounds like an implementation detail (for example, on CP/M). But undefined? Come on!

Worse, it's not even listed in “Appendix J.2 Undefined behavior.”

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.