Short Read/Write and Buffered File

Recently, I was bitten by a bug in my code, where the buffer is too small and reads return prematurely without error. This has led me to a deep dive into the madness story of what I later know as “short read” on Linux.

Yes, this article is mostly me rambling. If you are only interested in short read/write, please read this seriously written article.

First, you probably already know that blocking syscalls like p?(read|write)v? already may short read or short write. This is nothing fancy.

Because of this correspondence, the same syscalls in io_uring may also short r/w. (As it turns out, in earlier kernel, it won’t!)

This does not make much sense to me, since io_uring allows linking requests together, and you can’t link read/write in any request chain whatsoever because the kernel might short r/w! Pain.

In the article about short r/w, I read that f(read|write) won’t do short IO! To find out how it works, we return once again to musl libc. Get a copy of musl libc and follow along.

We first examine fread in musl libc.

  • In fread, the FILE* is locked to a thread (FLOCK) using the Linux futex syscall. That’s why the functions are thread-safe!
  • _IO_FILE has a virtual table in it ! See file->(read|write|seek). Because of this, fopencookie is possible.
  • Every file has a fixed-size buffer. See #define BUFSIZ 1024. Because of this, fmemopen is possible.
  • You can provide your own buffer when opening a file using __fopen_rb_ca. You can set the buffer size of a FILE* with setvbuf.
  • The actual reading is done with readv, where two buffer is given, one from the user, and one is the internal fixed-size buffer. See __stdio_read.
  • speculative execution… SPECULATIVE EXECUTION?!

As it turns out, buffering and using the same file handle for read and write need special consideration. __toread handles clearing out the write buffer before trying to read in anything.

Next, we turn our eyes to fwrite. Rather than the oddly specific logic where '\n' determines when to actually send the syscall, we also find __towrite, within documented the author’s experience with summoning nasal demons.

Scary, the whole experience is.

Update at 2024-03-30:
Now that I know atomic write exists on Linux, where the kernel will ensure short write does not happen often. So I guess I can chain write requests together in io_uring and expect it to work? If it fails (writes less than expected), revert the whole operation with seek.