Short Read/Write and Buffered File
Recently, I was bitten by a bug in my code, where the buffer is too small and reads return prematurely without error. This has led me to a deep dive into the madness story of what I later know as “short read” on Linux.
Yes, this article is mostly me rambling. If you are only interested in short read/write, please read this seriously written article.
First, you probably already know that blocking syscalls like p?(read|write)v?
already may short read or short write. This is nothing fancy.
Because of this correspondence, the same syscalls in io_uring may also short r/w. (As it turns out, in earlier kernel, it won’t!)
This does not make much sense to me, since io_uring allows linking requests together, and you can’t link read/write in any request chain whatsoever because the kernel might short r/w! Pain.
In the article about short r/w, I read that f(read|write)
won’t do short IO! To find out how it works, we return once again to musl libc. Get a copy of musl libc and follow along.
We first examine fread
in musl libc.
- In
fread
, theFILE*
is locked to a thread (FLOCK
) using the Linuxfutex
syscall. That’s why the functions are thread-safe! _IO_FILE
has a virtual table in it ! Seefile->(read|write|seek)
. Because of this,fopencookie
is possible.- Every file has a fixed-size buffer. See
#define BUFSIZ 1024
. Because of this,fmemopen
is possible. You can provide your own buffer when opening a file usingYou can set the buffer size of a__fopen_rb_ca
.FILE*
withsetvbuf
.- The actual reading is done with
readv
, where two buffer is given, one from the user, and one is the internal fixed-size buffer. See__stdio_read
. - speculative execution… SPECULATIVE EXECUTION?!
As it turns out, buffering and using the same file handle for read and write need special consideration. __toread
handles clearing out the write buffer before trying to read in anything.
Next, we turn our eyes to fwrite
. Rather than the oddly specific logic where '\n'
determines when to actually send the syscall, we also find __towrite
, within documented the author’s experience with summoning nasal demons.
Scary, the whole experience is.
Update at 2024-03-30:
Now that I know atomic write exists on Linux, where the kernel will ensure short write does not happen often. So I guess I can chain write
requests together in io_uring and expect it to work? If it fails (writes less than expected), revert the whole operation with seek
.