(Translated by https://www.hiragana.jp/)
Case-insensitive ext4 [LWN.net]
|
|
Subscribe / Log in / New account

Case-insensitive ext4

Case-insensitive ext4

Posted Mar 27, 2019 18:12 UTC (Wed) by clugstj (subscriber, #4020)
Parent article: Case-insensitive ext4

I've yet to see a legitimate use case for putting this brain damage in the kernel. Does anyone actually have one?


to post comments

Case-insensitive ext4

Posted Mar 27, 2019 19:08 UTC (Wed) by marcH (subscriber, #57642) [Link] (1 responses)

> I've yet to see a legitimate use case for putting this brain damage in the kernel

Without overly complicated code security researchers wouldn't have any work to do!

> Case-insensitivity would be set on a per-directory basis

Insanity has no limit. I was using the (otherwise pretty cool) Windows Subsystem for Linux. This is what happened:
https://github.com/vector-of-bool/vscode-cmake-tools/issu...

Because I was using the same project sometimes from WSL and sometimes from Windows, some directories *in the same project* were created case-sensitive and others not. Hilarity ensued.

Case-insensitive ext4

Posted Mar 27, 2019 19:47 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Well, CMake has "is case sensitive" logic baked in at compile time. Apple and Windows are "always case insensitive" and everything else is always case sensitive. I don't know what kinds of changes would be required in build tools to do this case insensitive comparisons. For example, this just doesn't work with ninja on Windows (I assume make has similar issues with the analogous ruleset):

rule copy
command = cp $in $out
build foo: copy in
build bar: copy FOO

saying that no rule makes FOO even though technically it will exist if you build foo. Basically, build tools that exist today need cases to match everywhere. And yes, ninja could figure this out right now, but if `dir/foo` and `dir/FOO` is used and `dir` is made by some rule during the build, its case sensitive flag can't be known at the start.

Case insensitivity in filesystems is broken. Conditional case sensitivity at a per-filesystem level means even ninja needs to add ioctl queries to figure that out, but `--one-file-system` is something that is at least enforceable. Per-directory flags which require magical "what will the flag on this directory be in the future" is even more broken.

I'd be surprised if "doesn't work in case insensitive ext4 directories" (nevermind an environment with a mix of case sensitive and insensitive directories) issues don't get closed as WONTFIX in many tools.

Case-insensitive ext4

Posted Mar 27, 2019 19:10 UTC (Wed) by Karellen (subscriber, #67644) [Link] (20 responses)

Yeah - I wonder what is the specific use case that would not be solved better by having case-insensitive globbing and autocompletion in the shell? In what other situation does a program know that it needs to open a file, and knows the name of that file, but doesn't know the precise capitalisation/normalisation of the name?

Case-insensitive ext4

Posted Mar 27, 2019 19:25 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (19 responses)

If you don't add case-insensitive version of open() and friends then every open() call will have to scan the whole directory first. This adds up quickly for Samba and other file-server use-cases.

Case-insensitive ext4

Posted Mar 27, 2019 20:02 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (15 responses)

It could as well maintain a userspace dictionary mapping normalized/ lowercased names to their actual names (which could be maintained incrementally based on filesystem change notifications).

Case-insensitive ext4

Posted Mar 27, 2019 20:05 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

Linux has no filesystem notification mechanisms that have required consistency and performance for a fileserver use-case.

Case-insensitive ext4

Posted Mar 27, 2019 21:17 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (8 responses)

There is no such thing as "a fileserver use case". Samba exists and has existed (and been used) for a while, hence, there are obviously "file server use cases" where the existing mechanisms perform well enough.

Case-insensitive ext4

Posted Mar 27, 2019 21:26 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Samba exists, but it does case insensitivity in a very expensive way. There are other use-cases as well, like mounting FAT filesystems.

I've seen this firsthand - I'm using a Linux server for TimeMachine backups for Mac OS X. TimeMachine is braindead - it creates hundreds of thousands files in the same directory. With the default settings Samba slowed down to a crawl.

Fortunately, TimeMachine doesn't care about file name cases. So by following steps from here: https://wiki.samba.org/index.php/Performance_Tuning I was able to speed up backups by something like 10x. This is not insignificant and it would be nice for Linux to handle similar use-cases natively.

Case-insensitive ext4

Posted Mar 28, 2019 0:50 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (6 responses)

> there are obviously "file server use cases" where the existing mechanisms perform well enough.

Have you talked to Samba developers and asked them if they are happy with the current performance or would like to see better support from the kernel? If you haven't I would encourage you to do that or talk to enterprises supporting Samba or even large customers. I think you will find that perspectives useful to add to your opinions.

Case-insensitive ext4

Posted Mar 28, 2019 16:10 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link] (5 responses)

Someone claimed that Samba opening a file residing on a case-sensitive filesystem would require a pre-open, linear directory traversal. As I pointed out, this isn't true, at least not on Linux: It would be possible to use an incrementally maintained, userspace translation cache instead, however, unless I again have to use Samba for something in a resource-constrained environment, I'm not going to implement that and in the unlikely case that this would happen, I'd certainly not go through the rather pointless hassle of trying to contribute a non-trivial change to an open sausage project, as I have neither the time to do this nor the social skills and pedigree to do so successfully.

Case-insensitive ext4

Posted Mar 28, 2019 18:36 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> As I pointed out, this isn't true, at least not on Linux: It would be possible to use an incrementally maintained
Nope. There is no way to maintain this cache with any sort of consistency guarantees. Linux filesystem change notifications are not up to it.

Case-insensitive ext4

Posted Mar 29, 2019 3:03 UTC (Fri) by pabs (subscriber, #43278) [Link] (2 responses)

What are they missing now that recent Linux versions offer rename notifications and other directory change notifications?

Case-insensitive ext4

Posted Mar 29, 2019 4:21 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

inotify is not recursive. It's also best-effort and its notifications are asynchronous.

fanotify is better, but it also can drop events from time to time under high load.

Case-insensitive ext4

Posted Oct 4, 2023 18:51 UTC (Wed) by calumapplepie (guest, #143655) [Link]

You get an event when the queue overflows. If you clear the cache on receiving such an event, you can provide consistency guarantees, at the cost of bad performance while the cache is rebuilt. Since the queue is pretty big, overflows shouldn't happen too much.

Case-insensitive ext4

Posted Mar 29, 2019 22:55 UTC (Fri) by jra (subscriber, #55261) [Link]

We already have an incrementally maintained, userspace translation cache in Samba. It catches the simple cases where we've seen a filename before - we cache it.

Unfortunately it isn't enough. Cache misses are the problem. If the SMB client sends a filename "foo" and it isn't in the directory, we don't know if it doesn't exist, or exists under another case (e.g. as "Foo"). In that case we need to scan the directory. This gets really expensive, really quickly.

We don't negatively cache as we're often used to export filesystems that local processes are also modifying.

I've been wanting a case-insensitive filesystem lookup option in Linux for a long time (I think ZFS and XFS already have it, however flawed).

Case-insensitive ext4

Posted Mar 28, 2019 7:28 UTC (Thu) by patrakov (subscriber, #97174) [Link] (1 responses)

Wouldn't it simplify things if SAMBA stopped any attempts to export an existing directory tree? I.e. mandate that the only way to make a new file exported is to copy it in via the SMB protocol, quite possibly from localhost. Keep filenames opaque, keep files in a clearly-private area, teach users not to mess with them (like they don't directly mess with MySQL files). Keep whatever attributes Windows needs in some sort of a database.

Case-insensitive ext4

Posted Mar 28, 2019 7:35 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

This is doable and is fairly easy, given that Samba has a well-defined pluggable VFS layer.

But this will break a ton of other software that wants to directly modify the disk files. It will also mean that Linux's VFS is inadequate for a fairly common use-case.

Case-insensitive ext4

Posted Mar 27, 2019 20:17 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (2 responses)

Ideally this would be some kind of LRU cache which would have some flag to say "this is a one-off open, don't cache" to avoid the inotify (or whatever) mechanism. Plus, I'm sure folks would love having the C library run a thread in the background to listen for its notifications taking locks on this cache whenever something happens. Yeah, I don't see any race conditions, unpredicitable latency issues, or TOCTOU/cache coherency issues here at all.

Sorry for the snark, it's not in response to your comment in particular, but my mind coming up with all the Pandora's boxes this is threatening to open.

Case-insensitive ext4

Posted Mar 27, 2019 21:11 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (1 responses)

There's no point in special-casing "one-off opens" unless this demonstrably solves a problem. As then kernel open has to scan the directory, anyway, you'll end up with the exact same kind of TOCTOU races. This is a problem which can't really be solved. As to your other objections: These is a generic list of programming errors, some of them attributable to the idea with "a background thread".

It's possible to implement case-insensitive open in user space without doing a second linear search through a directory for every open.

Case-insensitive ext4

Posted Mar 27, 2019 21:19 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

As I understand, the kernel will keep track of canonicalized names in file cache, so it won't have to do a search.

There's also the problem of making sure that no duplicate files exist.

Case-insensitive ext4

Posted Mar 29, 2019 9:16 UTC (Fri) by Karellen (subscriber, #67644) [Link] (2 responses)

Why?

How is a call to open() getting the filename to open? Either it's going to from an existing directory scan, in which case the capitalisation/normal form should already be correct, or it's going to be because a user has selected a file - in which case the shell/picker/whatever should be able to do that work already?

Where would calls to open() be getting these correctly named but incorrectly capitalised/normalised filenames from?

Case-insensitive ext4

Posted Mar 29, 2019 9:20 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> How is a call to open() getting the filename to open? Either it's going to from an existing directory scan, in which case the capitalisation/normal form should already be correct, or it's going to be because a user has selected a file - in which case the shell/picker/whatever should be able to do that work already?
You have an SMB request to open a file, with a file name. There's nothing else.

You can try a happy case and just attempt an open() with the provided name. If it fails, you need to scan the directory to find a matching file with a different case.

And you can't really cache the negative result, patterns like "if !exists(fname) {creat(fname);}" are exceedingly common.

Case-insensitive ext4

Posted Apr 4, 2019 17:09 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Where would calls to open() be getting these correctly named but incorrectly capitalised/normalised filenames from?

The user, maybe?

What about the use case where I type in a name in a picker, and it displays a bunch of matches?

Or what about the case where I typed in the name on the command line? Some of us still use a command line, you know ...

Cheers,
Wol

Case-insensitive ext4

Posted Mar 28, 2019 2:05 UTC (Thu) by dw (subscriber, #12017) [Link] (1 responses)

as a recent MacOS refugee, frankly I find dicking around with case intensely annoying on a desktop after discovering things don't have to be that way. Sure it's supposed to be in userspace -- good luck with that. Darwin has an approach that works for users, and I'd be very happy to enable this flag the moment it becomes available.

Was disgusted just last night to discover a Gtk chooser dialog's autocomplete was case sensitive. In a GUI. Total disconnect between Linux and what the real world has been doing successfully for decades now..

Case-insensitive ext4

Posted Mar 28, 2019 10:44 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

Case-insensitivity on ext4 volumes has to be done in userspace, because doing it in the kernel breaks userspace.

I agree that the Gtk file chooser having case-sensitive autocomplete is daft, but... I don't actually care, because I hate the Gtk file chooser anyway for other, more fundamental design decisions.

Case-insensitive ext4

Posted Mar 28, 2019 2:10 UTC (Thu) by dw (subscriber, #12017) [Link] (5 responses)

As a recent MacOS refugee (after a long previous history on desktop Linux), frankly I find dicking around with case on a desktop extremely annoying, after discovering things don't have to be that way. Sure it's supposed to be in userspace -- good luck retrofitting all that, and really it's just passing the blame. Darwin has an in-kernel approach that works for users, and I'd be very happy to enable this flag the instant it becomes available.

I was disgusted just last night to discover a Gtk chooser dialog's autocomplete was case sensitive. In a GUI. In 2019. Total disconnect between Linux and what the real world has been doing successfully for decades, and what actual users expect. No doubt someone will pop up to say 'but I prefer it that way', well, you're free patch whatever brainwrong you like into your desktop, but most people cannot and do not want that -- it's why contemporary developers are walking around with MacBooks rather than Linux boxes

Case-insensitive ext4

Posted Mar 28, 2019 3:56 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (4 responses)

What do you expect of tools that have to deal with non existent files in a case insensitive world? Mainly, I'm thinking of build tools here, but there are other categories too where this crops up. Should make expect that there is a dependency here?

foo:
touch foo
bar: Foo
cp Foo bar

Because if so, this means that tools now need to make a syscall just to do path manipulation to be accurate (something like canonpath() that would give a path which is the same for all equivalent input paths maybe by doing tolower() and normalization). And it has to work for paths that don't exist yet. And I don't think that can even be correct because that path might end up having a bind mount in there at some point which changes behavior (yeah, low chance, but kernels don't always have that luxury).

Yeah, case insensitivity might be useful at the UI level, but even there you still have to deal with paths using binary data or invalid utf8 because a file that the GUI can't delete is a wonderful thing to diagnose and resolve. Personally, I don't find it that useful (but I encourage you to file an issue against GTK for the completion thing).

Case-insensitive ext4

Posted Mar 28, 2019 19:11 UTC (Thu) by jccleaver (subscriber, #127418) [Link] (3 responses)

Ultimately, this really brings to mind how important initial design is for things.

Classic Mac OS was designed with case-insensitivity in mind, had no manual tools that needed to be imported with minimal effort rather than a complete rewrite, and had no shell mechanics to emulate.

Case Insensitivity #JustWorks when people expect it and are going through translation layers (and aren't in the business of writing drivers), and doesn't when people assume low level access.

Case-insensitive ext4

Posted Mar 28, 2019 20:14 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (2 responses)

I still wonder how this would have worked even on Classic Mac OS. Do you just assume that *all* paths can be normalized regardless of location or host filesystem? If so, do you just not support filesystems with alternate paths? Though I suppose the Windows solution of mangling unsupported names on the render side works too[1]. However, this means that any path manipulation has to do a syscall to get some canonical representation of a path or each program has to have a "pathcmp" function to determine that "foo" and "FOO" are really the same thing.

Would you have expected the shown Makefile snippet to work on Classic Mac OS or would an error that "no rule to make FOO" be acceptable?

[1]Making a path appear in Explorer via a network share with the name "CON1" renders as some mangled name. Creating a file with that mangled name then shows two files with the same name appear. Deleting either one via the UI deletes the one with the real mangled name first (I assume given a HANDLE, they can be differentiated).

Case-insensitive ext4

Posted Mar 28, 2019 20:35 UTC (Thu) by k8to (guest, #15413) [Link]

Yeah, mounting case sensitive filesystem on classic MacOS would have been messy. I'm sure I did this at some point with e.g. Basilisk II mounting the Linux filesystem underneath it, but that had a hefty translation layer to support the other oddities of classic MacOS like forks etc.

I think that was the approach taken by other people too, probably one of Apple Single or Apple Double representations which probably had some solution for NFS which was still in vogue in the 90s.

It wasn't that nice an experience for the Mac users or the non-mac users. I never programmed against it to experience the extra sharp edges, though.

Case-insensitive ext4

Posted Mar 28, 2019 21:01 UTC (Thu) by jccleaver (subscriber, #127418) [Link]

> Do you just assume that *all* paths can be normalized regardless of location or host filesystem?

I think by System 7.5 (or 7.1 Pro) you did, because if I recall correctly that's how File Exchange/PC Exchange did its work.

Remember, in classic Mac OS the colon ':' was the directory separator in paths, and you could use '/'s to your heart's content. Actually, you could use pretty much anything to your heart's content, including spaces, punctuation (since no one in the Mac side cared about extensions) and even weird graphs like the f-hook or florin https://en.wikipedia.org/wiki/%C6%91#Appearance_in_comput... , which I still find myself occasionally doing on OS X 20 years later.

Anyway, with /. \. and : being used in different locations, there was definitely path-mangling going on below the interface. But general users didn't have to care, and most Mac programs didn't deal with constructed path names, and *never* had to worry about shell-quoting for spaces and whatnot.

Between this freeform text attitude, the resource and data fork dichotomy, and the use of Type and Creator codes, I definitely feel like we've lost some good capabilities on the Mac side in the quest for broader interoperability.

Case-insensitive ext4

Posted Mar 28, 2019 8:11 UTC (Thu) by daniels (subscriber, #16193) [Link]

> I've yet to see a legitimate use case for putting this brain damage in the kernel. Does anyone actually have one?

No, everyone involved is just doing this for absolutely no reason at all. Weird.

Case-insensitive ext4

Posted Mar 28, 2019 8:39 UTC (Thu) by nim-nim (subscriber, #34454) [Link] (2 responses)

Any shared filesystem (network filesystem, removable media) will soon become completely unusable if different systems write on it with different default encodings. Treating filenames as opaque bunches of bytes does not work because you need to convey filenames to humans at some point. Humans do not understand raw bytes they understand decoded bytes, and that requires knowledge of the encoding used in filenames.

So any shared filesystem will need to export to userspace the encoding used for each part of its tree (either a single encoding for everything, or separate encodings per subtree).

Casing is something else but once you get past the encoding point casing becomes a less harder to tackle.

Case-insensitive ext4

Posted Mar 28, 2019 15:58 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (1 responses)

> Casing is something else but once you get past the encoding point casing becomes a less harder to tackle.

Not much less. Casing rules depend not just on encoding but also locale, and while it may be practical to enforce a single universal encoding and normalization scheme you're definitely not going to get away with enforcing a single universal locale.

The logical way to handle normalization is to simply disallow non-normalized filenames. The kernel doesn't change the encoding or compare different normal forms, it just verifies that the names of new files are in a particular normal form and returns an error if they aren't. Since all names are already in the same normal form comparisons reduce to exact binary matches. The equivalent for case would be to disallow either lowercase or uppercase characters in filenames (assuming you could even clearly define what is "uppercase" or "lowercase"—it depends on the locale). People put up with that in the DOS era but I don't think it would be considered acceptable today.

The odds that encoding or normalization would be permitted to vary per-filesystem or per-subtree are negligible. Applications aren't prepared to deal with that, nor should they be expected to do so. Any conversions needed for shared filesystems should be handled at the lowest layers of the filesystem, between the storage or network and the kernel.

Case-insensitive ext4

Posted Mar 29, 2019 10:52 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

Assuming the lowest layers of the stack could handle conversions transparently (which I'm doubtful of, that would require low-level knowledge about every possible encoding variation on earth), you still need to know the encoding(s) you start with. Meaning, you have to put at least one pivot encoding definition inside your filesystem.

That's the part people object to, because they are used to the simplicity of pushing encoding problems somewhere else, with "filenames are streams of bytes". Which was not true even for original UNIX. Actual original Unix filename bytes were 7bit ASCII bytes and nothing else.

But 7bit ASCII is useless in a modern i18n world. So you need to record other pivot encoding(s) in filesystems¹.

¹ Record, not reproduce the mistake of original UNIX, that assumed there was a single encoding that would never evolve so there was no need to make it explicit; easy mistake to made in the simpler computer age they lived in; inexcusable mistake to make today.

Case-insensitive ext4

Posted Mar 28, 2019 14:28 UTC (Thu) by smurf (subscriber, #17840) [Link]

Yes. Besides case insensitivity there's also the issue of differently-normalized file names. I would like to have one well-defined on-disk normal form. Otherwise I save "hëllo.txt" (e, combining diaraesis) and then fail to open "hëllo.txt" (e-with-diaraesis). This problem affects my desktop interface as well as web servers with "interesting" URLs.


Copyright © 2024, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds