XFS, stable kernels, and -rc releases

By Jonathan Corbet
December 3, 2020

Ever since the stable-update process was created, there have been questions about which patches are suitable for inclusion in those updates; usually, these discussions are driven by people who think that the criteria should be more restrictive. A regression in the XFS filesystem that found its way into the 5.9.9 stable update briefly rekindled this discussion. In one sense, there was little new ground covered in this iteration, but there was an interesting point raised about the relationship between stable updates and the mainline kernel -rc releases.

In the beginning, stable updates were restricted to critical fixes only, but the rules were relaxed over time. The patches merged for stable updates now are often automatically selected using a machine-learning system; others are picked because they look like they fix something somewhere. The result has been a massive increase in the number of patches going into the stable updates; the 5.9.x series has had over 1,900 patches applied through 5.9.11, while the delta between 4.9 and 4.9.246 is well over 18,000 patches.

Incorporating all those patches undoubtedly has the effect of increasing the number of useful fixes in the stable releases, which is a good thing. But it also increases the chances of merging bad patches that provide users with something other than the problem-free experience they were looking for.

For example, this XFS "fix" was posted to the linux-xfs list on November 9; it was reviewed, applied, and eventually pushed to the mainline four days later, where it appeared in the 5.10-rc4 release. On the 17th, Greg Kroah-Hartman included this patch in the 5.9.9 review cycle, along with 254 other fixes. No objections were raised, and the patch was part of the 5.9.9 release on the 19th, ten days after it was originally posted.

It took three days for Nick Alcock to report that he was getting XFS corruption errors when running a current stable kernel. Wong's patch was the source of this problem — indeed, the XFS developers had already reverted it in the mainline for 5.10-rc5 (with Eric Sandeen credited for the report). That revert found its way into the 5.9.11 update on November 24. So there was a period of about five days when the current stable 5.9 kernel had a bug that made XFS filesystems unusable in some configurations. It's worth noting that, even though XFS reported corruption, no actual filesystems were corrupted; they simply refused to mount.

That was the end of this incident — at least, until the stable-kernel machine-learning system picked another XFS patch for a future 5.9.x release. At that point, Dave Chinner objected:

We've already had one XFS upstream kernel regression in this -rc cycle propagated to the stable kernels in 5.9.9 because the stable process picked up a bunch of random XFS fixes within hours of them being merged by Linus. One of those commits was a result of a thinko, and despite the fact we found it and reverted it within a few days, users of stable kernels have been exposed to it for a couple of weeks. That *should never have happened*.

Chinner said that the stable process is cutting out some of the time that is needed to find problems in patches merged into the mainline, and requested that XFS patches not be selected for stable updates until after they have appeared in a final mainline release.

Sasha Levin, who maintains the automated patch-selection system, replied that the real failing was in the XFS quality-control process which, he said, should never have let the buggy patch through in the first place. By the time something is in the mainline, he said, it is assumed to be correct:

The stable process assumes that commits that ended up upstream were reviewed and tested; the stable process doesn't offer much in the way of in-depth review of specific patches but mostly focuses on testing the product of backporting hundreds of patches into each stable branch.

Chinner took strong exception to Levin's attempt to pin the blame on the XFS side. The bug, it seems, only manifests in settings where a newer user space is interacting with an older kernel, which is not a scenario that is tested prior to merging patches. Follow-on testing done by the XFS developers did eventually turn up the problem, at which point it was fixed. It is not reasonable, he said, to expect that every patch go through that degree of testing before being applied.

The important point, he said, was that not all problems can realistically be found before patches land in the mainline:

I'll repeat the lesson to be learned here: merging a commit into Linus's tree does not mean it is fully tested and production ready. It just means it's passed a wide range of unit tests without regressions and so is considered ready for wider integration testing by a larger population of developers and testers.

He said, again, that the stable process needs to slow down some to allow for that wider testing to take place, and later repeated this argument in a separate discussion, drawing an objection from Kroah-Hartman:

That means that the world would go for 3 months without some known fixes being applied to the tree? That's not acceptable to me, as I started doing this because it was needed to be done, not just because I wanted to do more work...

Chinner then clarified that most subsystems, as he saw it, did not need this sort of delay. He also suggested that perhaps waiting for two -rc cycles would be sufficient for subsystems where the risk of catastrophic regressions (filesystems, for example) is high.

The need for more time to test patches before applying them to the stable updates has been recognized in the past. Initially, any patch that had made it into the mainline was considered fair game for stable-update consideration. Later on, though, the rule was changed to require patches to appear in at least one -rc release first. But that still does not give a whole lot of time for problems in the mainline to come to light.

The fact that -rc releases may contain bugs is implicit in the fact that every development cycle involves seven or eight such releases. Getting into an -rc release is not deemed enough for a patch to appear in a final mainline release; patches are expected to survive testing for a few -rc releases first. But the stable updates can ship those patches immediately after they appear in an -rc release. Thus, in a sense, the mainline, which is not considered stable enough for most installations, has a more stringent requirement than the stable releases, which are widely used.

The conversations ended at this point, with no indications that any minds were changed. There can be no doubt that this topic will return, though. Bugs introduced into stable releases go against the reasons for using those releases in the first place, so they can be expected to draw attention. The number of such bugs will never be zero, though, regardless of how slow or careful the process is. A balance has to be found between the need to get as many fixes as possible to users as quickly as possible and the need to avoid regressions. If the "after one -rc release" rule allows too many regressions, then perhaps it is time for the community to more clearly articulate what it means when a patch lands in the mainline and adjust the stable-update selection process accordingly.

Index entries for this article

Kernel Development model/Stable tree

Kernel Filesystems/XFS

Index entries for this article
Kernel	Development model/Stable tree
Kernel	Filesystems/XFS

XFS, stable kernels, and -rc releases

Posted Dec 3, 2020 19:05 UTC (Thu) by roc (subscriber, #30627) [Link]

> merging a commit into Linus's tree does not mean it is fully tested and production ready. It just means it's passed a wide range of unit tests without regressions and so is considered ready for wider integration testing by a larger population of developers and testers.

Maybe some people don't want this to be true, but it clearly is.

For example see
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

> proc: try to remove use of FOLL_FORCE entirely
> We fixed the bugs in it, but it's still an ugly interface, so let's see if anybody actually depends on it.

XFS, stable kernels, and -rc releases

Posted Dec 3, 2020 20:43 UTC (Thu) by dxin (guest, #136611) [Link] (2 responses)

"Sasha Levin, who maintains the automated patch-selection system, replied that the real failing was in the XFS quality-control process which, he said, should never have let the buggy patch through in the first place. By the time something is in the mainline, he said, it is assumed to be correct"

Truthful to not, this assumption is a little far from reality.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 0:16 UTC (Fri) by sashal (✭ supporter ✭, #81842) [Link] (1 responses)

This is because at the point the patch lands upstream, the relevant maintainers signed off on it, so the additional review in the stable process is not about the correctness of the patch itself but rather the relevancy of the patch of the stable branches as well as the correctness of the backport.

My argument isn't that patches that land upstream are 100% correct; if that was the case we wouldn't need any of those pesky stable trees, right?

The assumption is stable's way of saying that we've subjected the patch to everything in our arsenal and we deem this patch to be at a good tradeoff point between assuring the stability of the tree vs shipping patches in a timely fashion.

If you rely on the stable tree, please participate in the process by testing the -rc releases and reporting back of issues - we'll never ship a release with known reported issues.

I'll maybe point to this article by LWN that did an analysis on regressions in the stable trees: https://lwn.net/Articles/812231/ , which seems to indicate that the current tradeoff seems to be a good one.

XFS, stable kernels, and -rc releases

Posted Dec 26, 2020 2:52 UTC (Sat) by ILMostro (guest, #105083) [Link]

Great points overall:

* Increase the testing before it gets to stable
* Testing increase can be accomplished through a combination of increased time and/or participation.

I assume that there is a process in place that takes into account how serious or urgent a "fix" is, which can be used to expedite the process at times--rather than assuming everything should be merged with the same level of urgency all the time...

XFS, stable kernels, and -rc releases

Posted Dec 3, 2020 23:05 UTC (Thu) by dcg (subscriber, #9198) [Link]

> The stable process assumes that commits that ended up upstream were reviewed and tested;

There goes your problem. The very reason why -stable releases exist, is because that does *not* happen. Buggy code was included in 5.9, and you expect that all commits that went into 5.10 are not?

I understand that the -stable guys want to automate things and bring fixes to users ASAP, but I don't think anyone is surprised that this automated patch picking thing does not work always well.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 5:02 UTC (Fri) by neilbrown (subscriber, #359) [Link] (2 responses)

> That means that the world would go for 3 months without some known fixes being applied to the tree?

So what? Every kernel has bugs. Delaying some bug-fixes isn't ipso-facto a bad thing.

Of course, fixes don't *have* to wait 3 months to land. Or even until the the next stable release.

If a known authority explicitly asks for a patch to go to stable, it should go immediately.

If a patch lands in mainline with a "Fixes" or "cc: stable" tag, then it should go to stable soonish. I'd leave it a week or two myself, but I don't think it needs to wait for a release (though I do remember once when this would have saved me some heartache).

If a patch is chosen by a non-authority (such as a machine-learning system), then delaying until the major release seems particularly sensible.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 8:25 UTC (Fri) by pbonzini (subscriber, #60935) [Link]

There was another buggy patch[1] in the same batch of autoselected patches that Dave Chinner replied to. The stable kernel process is basically working by chance and based on the assumption that maintainers need to be second guessed.

[1] https://lwn.net/ml/linux-kernel/9ec7dff6-d679-ce19-5e77-f...

XFS, stable kernels, and -rc releases

Posted Dec 17, 2020 8:21 UTC (Thu) by daenzer (subscriber, #7050) [Link]

I had very similar thoughts, thanks for articulating them so well.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 7:24 UTC (Fri) by bangert (subscriber, #28342) [Link] (8 responses)

Clearly what is missing here is a self-correcting heuristic:
We not only have data on what gets applied to mainline, but also what gets reverted. Based on the number of reverts in a given time period, determine the wait time for each subsystem and allow a configurable subsystem add-on, so XFS can wait for +2 rc releases extra.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 8:31 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (7 responses)

There's no need for heuristics at all. Maintainers mark patches as stable, just trust them.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 10:05 UTC (Fri) by Wol (subscriber, #4433) [Link]

And if the maintainer did NOT mark them as "for stable", then the stable kernel guys need to accept responsibility for picking those patches.

(Yep I know maintainers forget to mark patches, but what if they deliberately DIDN'T mark them ... ?)

Cheers,
Wol

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 17:00 UTC (Fri) by sashal (✭ supporter ✭, #81842) [Link] (5 responses)

So I'm curious, I got pinged last month asking me to add a patch that seemed to be able to cause a host hang (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...) which also had a CVE id attached to it.

Why didn't it have a stable tag? Or not sent to us from the KVM maintainers?

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 17:17 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (4 responses)

- Because it cannot happen with SR-IOV devices; passthrough of non-SR-IOV devices is simply not used in circumstances where a host hang can have security consequences, while hobbyists that do use it (e.g. for gaming) tend to stay on bleeding edge kernels.

- It has an easy workaround in userspace.

So I considered it as a minor bug even though it has a CVE id attached.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 17:55 UTC (Fri) by sashal (✭ supporter ✭, #81842) [Link] (3 responses)

I think that your assumptions around who does GPU passthrough are a bit off :)

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 18:54 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (2 responses)

Mediated pass-through aka vGPU is very different from PCI pass-through that you do for gaming.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 20:39 UTC (Fri) by zdzichu (subscriber, #17118) [Link] (1 responses)

And which one do you do for cryptocurrencies mining and machine learning?

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 22:06 UTC (Fri) by pbonzini (subscriber, #60935) [Link]

For Tesla you do mediated pass-through. Consumer hardware only does PCI pass-through (as far as I know that's just driver-enforced market segmentation, but still).

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 17:41 UTC (Fri) by zblaxell (subscriber, #26385) [Link]

Maybe it's too obvious, but...isn't this what integration trees are for?

Filesystem code shouldn't normally be landing in stable kernels after just a few weeks. Patches should be soaking on a branch in a dev repo, periodically rebased on -rc kernels, until they've had a few _months_ of testing. Not just automated unit tests, but users' regression test workloads that will catch bugs unit tests miss (especially corporate users that are already running full-stack regression tests in their devops sandboxes).

I say "normally" because there should be exceptions for critical problems that are having verifiable user impact. By all means, bypass the month-long soak test for the really bad bugs--but those fixes can only be chosen by human maintainers, unless the robots start reading bug reports or running their own regression tests. Urgent patches can take an expedited path to the kernel, and it appears they already do when human maintainers push untagged fixes to -stable. They don't need to be mixed in with random untested changes.

All that should be finished by the time Linus pulls the code into a -rc, so the -rc is only testing interactions between changes in the current merge window (e.g. subsystem X and subsystem Y change at the same time and introduce a bug that doesn't exist with either change separately)--and ideally even then, subsystems with dependent changes would merge with each others' integration trees as part of their own testing _before_ either code lands in -rc.

But...I thought all this was already happening? What are all those *-next trees out there for, if not to enable and support exactly this process? Or is it just xfs not doing this?

A quick look at the git log timestamps indicates that xfs _does_ do this most of the time, so what happened with this one patch? Why didn't it wait for the next merge window? All the bugs were 2-4 years old, and found by a fuzzer, not a user report, so there was no urgency (or at least none declared in the commit message). There was urgency in the pull request though ("Fix a fairly serious problem..." in commit d9315f5634c94500b91039895f40051a7ac79e28) and so this patch landed in a -rc4 kernel, covered with labels to make it attractive to humans and robots, the robots picked it up (that's what the robots are _for_), and the humans didn't stop them because who would expect the xfs maintainer, of all people, to push code that wasn't properly tested?

Oh well. Even human maintainers make mistakes. This is an excellent example of _why_ it's such a good idea to run code changes through months of testing before sending a pull request to Linus...

To be fair, users by now should be expecting some bugs in stable kernels, because even the best maintainers still let some through. Every kernel in the last 5 years has shipped with literally thousands of bugs, and the count is trending upwards--and those are just the known bugs with fixes found in the first 2 months after kernel release, so the true counts are probably higher. We run our own 3-month soak after kernels are released from -stable before we start rolling them out, because it's the only way we can be sure the soak happens and that use cases we care about are tested. This works out to about 7 months on average from git author date to when we start running the code outside of a test environment. We don't always find bugs that impact us, but when we do, we're usually glad that code didn't get anywhere near our production fleet. We also run regression tests on the public -next trees of kernel subsystems that have given us trouble in the past. The early bug detection we get from all this is worth its weight in VM machine-days.

XFS, stable kernels, and -rc releases

Posted Dec 4, 2020 19:08 UTC (Fri) by nix (subscriber, #2304) [Link] (5 responses)

The bug, it seems, only manifests in settings where a newer user space is interacting with an older kernel

As I understand it, that's when it was spotted: when they were testing the same code in the libxfs component of userspace, and it fell over when exposed to a kernel containing XFS filesystem code that did not have the same change in it. As I understand it, the same fault is evident when using a newer kernel and (at least some) filesystems created by older kernels -- that is to say, some subset of "essentially all of them". But tests that do a mkfs.xfs and then mess about in it using only the new kernel (as test runs usually do), are not going to observe the bug.

As the guy hit by this, I'm damn glad the verifiers stopped me from being burned more badly than I was (though I do have up-to-date backups so the ultimate consequences would just have been very unpleasant and not disastrous even if my biggest server's rootfs had vanished into the void).

If you ask me, though autosel is magical and wonderful and has had a huge positive effect, things that can affect persistent state need a lot more verification than autosel can possibly provide: filesystems should just not be exposed to its ministrations. Production filesystem code is usually very heavily tested over time, works very well (so most fixes affect only edge cases), but even small erroneous changes can do massive persistent damage, often triggered by changes in the on-disk state of already-existing filesystems -- and autosel can't test for that sort of thing, and of necessity no human will have tested the combination of fs changes autosel has pulled into the stable tree very much either (though I appreciate that some testing does happen in the stable prerelease phase).

So... small upside of the vast majority of fs stable fixes (probably very small given that significant fixes should have a stable tag anyway), huge downside if things go wrong, and nigh-impossible to spot when things are wrongly picked: autosel should just not run over fs/*/, at least not for writable persistent filesystems. It can't do much damage to squashfs, iso9660, or procfs, so there's no harm running autosel over that.

XFS, stable kernels, and -rc releases

Posted Dec 7, 2020 15:52 UTC (Mon) by wazoox (subscriber, #69624) [Link] (4 responses)

> autosel should just not run over fs/*/

I get your point, but doesn't it apply equally to other sensitive things like drivers that may be lying in the path of your precious data, too? There were some painful, persistent bugs in some storage controller drivers a few years ago in stable kernels, under some configurations.

All in all, we can very much doubt the sanity of importing "fixes" from -rc kernels into stable releases in an automated way...

XFS, stable kernels, and -rc releases

Posted Dec 8, 2020 21:23 UTC (Tue) by dgc (subscriber, #6611) [Link] (3 responses)

The difference is in the complexity of the code. drivers tend to pretty straight forward translations of kernel to hardware operations through a fairly narrow API. Functionality is rarely dependent on a huge matrix of user variable configuration or access mechanisms - they get tightly bound to do just what the hardware has to perform. Filesystems, OTOH, are some of the most complex structures and code in the kernel and have a huge API of user facing functions and configuration options they need to support.

It's the complexity of subsystems that drives the risk profile for the subsystem. The test matrix for a filesystem is -huge- once you consider the number of combinations of mount options, mkfs options and per-file options that can cause different code paths to run inside the filesystem. Last time I counted, XFS had about 20 mount options and 40 mkfs options that could be combined in several hundred different ways. Then we have forwards and backwards kernel/userspace version testing. Then we have different IO paths to test (e.g. DAX). Then we have different kernel configs to test. Then we have different architectures (x86-64, arm, ppc64le, etc) to test. Then we have different machine configs to test (high/low RAM, cpu count, slow storage vs fast storage, etc).

While considering this huge test matrix, also keep in mind the amount of code needed to implement a complex filesystem. e.g. fs/btrfs has more code in it than the entire mm/ subsystem. The same is true for fs/xfs. There there is the userspace code (mkfs, fsck, etc) - xfsprogs is has twice the amount of code than is in fs/xfs in the kernel. xfsdump alone is the same code size as fs/xfs. IOWs, we're talking hundreds of thousands of lines of code *per filesystem* to make them do what they do and the kernel component is really only a small piece of the overall code base. There are very few individual kernel subsystems that are of this size and complexity or have a larger userspace code base than the kernel code just to support the kernel code...

For further consideration: a single fstests unit test run on a single configuration takes 3-4 hours on a fast machine, and that doesn't cover the entire syscall or ioctl APIs. Add in an ltp run per configuration, then some performance testing, and you're starting to talk about a single set of unit tests on a single configuration taking half a day to run. Now multiply that by *thousands* of unique configurations that _should_ be tested *per filesystem*.

The complexity of the filesystem validation problem should be obvious by now. Not all subsystems have the same level of complexity, nor do they have the same level of disaster rating if something goes wrong in the subsystem. The amount of validation a filesystem requires is not something we can expect a single developer or maintainer can do by themselves. We don't start to really exercise the wider test matrix until the code is merged into for-next and, for -rcX fixes, it gets into the mainline tree. We need that time in -rcX releases to actually get a solid test matrix coverage before the code is released to users.

Filesystems, score at right at the top of the complexity, disaster risk and test matrix size scales. It is the risk of catastrophic regression and the time it takes to adequately test the new code that means we need to be much more conservative in backporting filesystem changes than probably any other subsystem. You can't just reboot to fix corruption or data loss bugs...

IOWs, a process that assumes that all subsystems have the same risk profile is making a fundamental mistake. This is something that is fairly simple to correct, but the stable maintainers are refusing to even acknowledge that risk is something they need to consider when running automated backports. And that's a lose-lose-lose situation for users, developers and maintainers.

XFS, stable kernels, and -rc releases

Posted Dec 10, 2020 20:51 UTC (Thu) by BenHutchings (subscriber, #37955) [Link] (2 responses)

The difference is in the complexity of the code. drivers tend to pretty straight forward translations of kernel to hardware operations through a fairly narrow API.

I think this is the "everyone else's problems are pretty simple" fallacy.

XFS, stable kernels, and -rc releases

Posted Dec 17, 2020 8:19 UTC (Thu) by daenzer (subscriber, #7050) [Link] (1 responses)

Yeah, similar things keep happening with DRM driver changes.

XFS, stable kernels, and -rc releases

Posted Dec 17, 2020 9:37 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Hey, DRM drivers just need to get the stream of commands from the userspace and give it to the GPU.

What could POSSIBLY be complicated there?

/sarcasm

XFS, stable kernels, and -rc releases

Posted Dec 21, 2020 22:24 UTC (Mon) by Mook (subscriber, #71173) [Link]

Apologies for the ignorance, but is the purpose of the stable kernel documented anywhere? I see https://www.kernel.org/doc/html/latest/process/stable-ker... which documents what patches go in, but that's very different from its raison d'être. It sounds like different people want different things from it, hence the perpetual conflict.

Note that I'm not actually asking for the purpose of the stable kernel; I'm asking for canonical documentation for it. After all, a random comment on an article (even from somebody obviously in the know) would not be helpful for the next time the mismatch occurs.