How to Get More Value From Your File System Directory Cache

Posted on March 4, 2016

For my CS854 class I have to read a trio of research papers each week, and post summaries, which are then updated after the class that a student gives a presentation on it. Don’t rely on my summaries for anything, but it might interest some of you, so I’m posting it here.

Just read the paper.

How to Get More Value From Your File System Directory Cache


This paper explores, well, how to get more value from your file system directory cache?

First, lets explore what needs caching, then what the cache is, then how to get more value from it.

What needs caching?

In a word: Paths. Lots of system calls care about file paths. In POSIX, in order to open a file, you need to have search (execute) permissions on all of the parent directories. This is seemingly unavoidably linear time in the number of parents, as you would need to check each one in turn, and do a lot of pointer chasing.

Lots of things, like Linux Security Modules rely deeply on this model, so it seems hard to just replace it. You need to make sure the new method preserves the effects we care about.

How is this cached?

The directory cache is a LRU cache of the most recently accessed directories, so we don’t need to hit the disk repeatedly if we access the same directory repeatedly.

Each directory entry (dentry from now on) maps a path to a inode (kept in ram) with all the metadata about a file.

In addition to the LRU, we also have a tree structure matching that of the filesystem, a hash table from the pair of parent dentry vaddr and file name to dentry, and an alias list, to track hard links to an inode.

But even a cache hit is slow, compare 1.1 us for stat compared to 0.04us for getppid, or 0.3us for a 4KB pread.

It can also cache negative dentries, to prove that a file does not exist. This can speed up checks that fail too, which is pretty important.

This structure still is linear time in the number of path components.

How to get more value?

Create a system wide hash table from full, canonicalized paths to dentries, called the direct lookup hash table (DLHT). This is a cache, so it is populated lazily, and entries can be invalidated by operations such as renaming a directory.

In addition, we cache the results of previous prefix checks, in the prefix check cache (PCC). This depends on the permissions of the process, so we only share this between processes with identical permissions. Use a version number with each entry to detect stale entries.

We now have a faster fastpath, where you directly lookup the dentry in the DLHT, then check it’s permissions in the PCC, and then win at speed. No more linear time costs.

And if it fails, then use the old path.

What if two processes change something at once?

Version number based locks. Everywhere.

Before a mutation, like renaming a directory, the operation walks all children (that were cached) and bumps their version counter. That’ll prevent old PCC entries from applying to them anymore.

Also, remove all those dentries from the DLHT, as they could be invalid now.

Then, we have to consider of there is a slow path request currently in flight while the mutation is happening. If we don’t do anything, it might add to the cache an entry for a directory that has been moved. We can keep a global invalidation counter, and only cache things if it hasn’t been bumped during the lookup.

If you get exactly 2^32 mutations happening during one very slow lookup, you have other problems.

Also, use existing locking structures to make sure that the slow path handles concurrent requests well.

This is fundamentally a change to make reading cheaper, but writing more expensive. It does seem like a nice trade off, in that respect, since there are a lot more reads of a file’s permissions than there are writes.

Directory Completeness

This paper also introduced a separate concept, of Directory Completeness. This is a single bit stored on a directory’s dentry to mark if all of the directory’s children are definitely in the cache. If it’s marked, calls to ls can skip checking on disk.

This is cool, but not really related to the rest of the paper.

Concerns

If there are any bugs, this could have rather large implications for security. If file permissions don’t work properly, you’re going to have a bad time.

I feel like the signature stuff to avoid string comparisons is good, but could be separated out. Nothing else really depends on it, and it could be implemented on it’s own.