453 lines
18 KiB
ReStructuredText
453 lines
18 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
==============================
|
|
Network Filesystem Caching API
|
|
==============================
|
|
|
|
Fscache provides an API by which a network filesystem can make use of local
|
|
caching facilities. The API is arranged around a number of principles:
|
|
|
|
(1) A cache is logically organised into volumes and data storage objects
|
|
within those volumes.
|
|
|
|
(2) Volumes and data storage objects are represented by various types of
|
|
cookie.
|
|
|
|
(3) Cookies have keys that distinguish them from their peers.
|
|
|
|
(4) Cookies have coherency data that allows a cache to determine if the
|
|
cached data is still valid.
|
|
|
|
(5) I/O is done asynchronously where possible.
|
|
|
|
This API is used by::
|
|
|
|
#include <linux/fscache.h>.
|
|
|
|
.. This document contains the following sections:
|
|
|
|
(1) Overview
|
|
(2) Volume registration
|
|
(3) Data file registration
|
|
(4) Declaring a cookie to be in use
|
|
(5) Resizing a data file (truncation)
|
|
(6) Data I/O API
|
|
(7) Data file coherency
|
|
(8) Data file invalidation
|
|
(9) Write back resource management
|
|
(10) Caching of local modifications
|
|
(11) Page release and invalidation
|
|
|
|
|
|
Overview
|
|
========
|
|
|
|
The fscache hierarchy is organised on two levels from a network filesystem's
|
|
point of view. The upper level represents "volumes" and the lower level
|
|
represents "data storage objects". These are represented by two types of
|
|
cookie, hereafter referred to as "volume cookies" and "cookies".
|
|
|
|
A network filesystem acquires a volume cookie for a volume using a volume key,
|
|
which represents all the information that defines that volume (e.g. cell name
|
|
or server address, volume ID or share name). This must be rendered as a
|
|
printable string that can be used as a directory name (ie. no '/' characters
|
|
and shouldn't begin with a '.'). The maximum name length is one less than the
|
|
maximum size of a filename component (allowing the cache backend one char for
|
|
its own purposes).
|
|
|
|
A filesystem would typically have a volume cookie for each superblock.
|
|
|
|
The filesystem then acquires a cookie for each file within that volume using an
|
|
object key. Object keys are binary blobs and only need to be unique within
|
|
their parent volume. The cache backend is reponsible for rendering the binary
|
|
blob into something it can use and may employ hash tables, trees or whatever to
|
|
improve its ability to find an object. This is transparent to the network
|
|
filesystem.
|
|
|
|
A filesystem would typically have a cookie for each inode, and would acquire it
|
|
in iget and relinquish it when evicting the cookie.
|
|
|
|
Once it has a cookie, the filesystem needs to mark the cookie as being in use.
|
|
This causes fscache to send the cache backend off to look up/create resources
|
|
for the cookie in the background, to check its coherency and, if necessary, to
|
|
mark the object as being under modification.
|
|
|
|
A filesystem would typically "use" the cookie in its file open routine and
|
|
unuse it in file release and it needs to use the cookie around calls to
|
|
truncate the cookie locally. It *also* needs to use the cookie when the
|
|
pagecache becomes dirty and unuse it when writeback is complete. This is
|
|
slightly tricky, and provision is made for it.
|
|
|
|
When performing a read, write or resize on a cookie, the filesystem must first
|
|
begin an operation. This copies the resources into a holding struct and puts
|
|
extra pins into the cache to stop cache withdrawal from tearing down the
|
|
structures being used. The actual operation can then be issued and conflicting
|
|
invalidations can be detected upon completion.
|
|
|
|
The filesystem is expected to use netfslib to access the cache, but that's not
|
|
actually required and it can use the fscache I/O API directly.
|
|
|
|
|
|
Volume Registration
|
|
===================
|
|
|
|
The first step for a network filsystem is to acquire a volume cookie for the
|
|
volume it wants to access::
|
|
|
|
struct fscache_volume *
|
|
fscache_acquire_volume(const char *volume_key,
|
|
const char *cache_name,
|
|
const void *coherency_data,
|
|
size_t coherency_len);
|
|
|
|
This function creates a volume cookie with the specified volume key as its name
|
|
and notes the coherency data.
|
|
|
|
The volume key must be a printable string with no '/' characters in it. It
|
|
should begin with the name of the filesystem and should be no longer than 254
|
|
characters. It should uniquely represent the volume and will be matched with
|
|
what's stored in the cache.
|
|
|
|
The caller may also specify the name of the cache to use. If specified,
|
|
fscache will look up or create a cache cookie of that name and will use a cache
|
|
of that name if it is online or comes online. If no cache name is specified,
|
|
it will use the first cache that comes to hand and set the name to that.
|
|
|
|
The specified coherency data is stored in the cookie and will be matched
|
|
against coherency data stored on disk. The data pointer may be NULL if no data
|
|
is provided. If the coherency data doesn't match, the entire cache volume will
|
|
be invalidated.
|
|
|
|
This function can return errors such as EBUSY if the volume key is already in
|
|
use by an acquired volume or ENOMEM if an allocation failure occured. It may
|
|
also return a NULL volume cookie if fscache is not enabled. It is safe to
|
|
pass a NULL cookie to any function that takes a volume cookie. This will
|
|
cause that function to do nothing.
|
|
|
|
|
|
When the network filesystem has finished with a volume, it should relinquish it
|
|
by calling::
|
|
|
|
void fscache_relinquish_volume(struct fscache_volume *volume,
|
|
const void *coherency_data,
|
|
bool invalidate);
|
|
|
|
This will cause the volume to be committed or removed, and if sealed the
|
|
coherency data will be set to the value supplied. The amount of coherency data
|
|
must match the length specified when the volume was acquired. Note that all
|
|
data cookies obtained in this volume must be relinquished before the volume is
|
|
relinquished.
|
|
|
|
|
|
Data File Registration
|
|
======================
|
|
|
|
Once it has a volume cookie, a network filesystem can use it to acquire a
|
|
cookie for data storage::
|
|
|
|
struct fscache_cookie *
|
|
fscache_acquire_cookie(struct fscache_volume *volume,
|
|
u8 advice,
|
|
const void *index_key,
|
|
size_t index_key_len,
|
|
const void *aux_data,
|
|
size_t aux_data_len,
|
|
loff_t object_size)
|
|
|
|
This creates the cookie in the volume using the specified index key. The index
|
|
key is a binary blob of the given length and must be unique for the volume.
|
|
This is saved into the cookie. There are no restrictions on the content, but
|
|
its length shouldn't exceed about three quarters of the maximum filename length
|
|
to allow for encoding.
|
|
|
|
The caller should also pass in a piece of coherency data in aux_data. A buffer
|
|
of size aux_data_len will be allocated and the coherency data copied in. It is
|
|
assumed that the size is invariant over time. The coherency data is used to
|
|
check the validity of data in the cache. Functions are provided by which the
|
|
coherency data can be updated.
|
|
|
|
The file size of the object being cached should also be provided. This may be
|
|
used to trim the data and will be stored with the coherency data.
|
|
|
|
This function never returns an error, though it may return a NULL cookie on
|
|
allocation failure or if fscache is not enabled. It is safe to pass in a NULL
|
|
volume cookie and pass the NULL cookie returned to any function that takes it.
|
|
This will cause that function to do nothing.
|
|
|
|
|
|
When the network filesystem has finished with a cookie, it should relinquish it
|
|
by calling::
|
|
|
|
void fscache_relinquish_cookie(struct fscache_cookie *cookie,
|
|
bool retire);
|
|
|
|
This will cause fscache to either commit the storage backing the cookie or
|
|
delete it.
|
|
|
|
|
|
Marking A Cookie In-Use
|
|
=======================
|
|
|
|
Once a cookie has been acquired by a network filesystem, the filesystem should
|
|
tell fscache when it intends to use the cookie (typically done on file open)
|
|
and should say when it has finished with it (typically on file close)::
|
|
|
|
void fscache_use_cookie(struct fscache_cookie *cookie,
|
|
bool will_modify);
|
|
void fscache_unuse_cookie(struct fscache_cookie *cookie,
|
|
const void *aux_data,
|
|
const loff_t *object_size);
|
|
|
|
The *use* function tells fscache that it will use the cookie and, additionally,
|
|
indicate if the user is intending to modify the contents locally. If not yet
|
|
done, this will trigger the cache backend to go and gather the resources it
|
|
needs to access/store data in the cache. This is done in the background, and
|
|
so may not be complete by the time the function returns.
|
|
|
|
The *unuse* function indicates that a filesystem has finished using a cookie.
|
|
It optionally updates the stored coherency data and object size and then
|
|
decreases the in-use counter. When the last user unuses the cookie, it is
|
|
scheduled for garbage collection. If not reused within a short time, the
|
|
resources will be released to reduce system resource consumption.
|
|
|
|
A cookie must be marked in-use before it can be accessed for read, write or
|
|
resize - and an in-use mark must be kept whilst there is dirty data in the
|
|
pagecache in order to avoid an oops due to trying to open a file during process
|
|
exit.
|
|
|
|
Note that in-use marks are cumulative. For each time a cookie is marked
|
|
in-use, it must be unused.
|
|
|
|
|
|
Resizing A Data File (Truncation)
|
|
=================================
|
|
|
|
If a network filesystem file is resized locally by truncation, the following
|
|
should be called to notify the cache::
|
|
|
|
void fscache_resize_cookie(struct fscache_cookie *cookie,
|
|
loff_t new_size);
|
|
|
|
The caller must have first marked the cookie in-use. The cookie and the new
|
|
size are passed in and the cache is synchronously resized. This is expected to
|
|
be called from ``->setattr()`` inode operation under the inode lock.
|
|
|
|
|
|
Data I/O API
|
|
============
|
|
|
|
To do data I/O operations directly through a cookie, the following functions
|
|
are available::
|
|
|
|
int fscache_begin_read_operation(struct netfs_cache_resources *cres,
|
|
struct fscache_cookie *cookie);
|
|
int fscache_read(struct netfs_cache_resources *cres,
|
|
loff_t start_pos,
|
|
struct iov_iter *iter,
|
|
enum netfs_read_from_hole read_hole,
|
|
netfs_io_terminated_t term_func,
|
|
void *term_func_priv);
|
|
int fscache_write(struct netfs_cache_resources *cres,
|
|
loff_t start_pos,
|
|
struct iov_iter *iter,
|
|
netfs_io_terminated_t term_func,
|
|
void *term_func_priv);
|
|
|
|
The *begin* function sets up an operation, attaching the resources required to
|
|
the cache resources block from the cookie. Assuming it doesn't return an error
|
|
(for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do
|
|
nothing), then one of the other two functions can be issued.
|
|
|
|
The *read* and *write* functions initiate a direct-IO operation. Both take the
|
|
previously set up cache resources block, an indication of the start file
|
|
position, and an I/O iterator that describes buffer and indicates the amount of
|
|
data.
|
|
|
|
The read function also takes a parameter to indicate how it should handle a
|
|
partially populated region (a hole) in the disk content. This may be to ignore
|
|
it, skip over an initial hole and place zeros in the buffer or give an error.
|
|
|
|
The read and write functions can be given an optional termination function that
|
|
will be run on completion::
|
|
|
|
typedef
|
|
void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error,
|
|
bool was_async);
|
|
|
|
If a termination function is given, the operation will be run asynchronously
|
|
and the termination function will be called upon completion. If not given, the
|
|
operation will be run synchronously. Note that in the asynchronous case, it is
|
|
possible for the operation to complete before the function returns.
|
|
|
|
Both the read and write functions end the operation when they complete,
|
|
detaching any pinned resources.
|
|
|
|
The read operation will fail with ESTALE if invalidation occurred whilst the
|
|
operation was ongoing.
|
|
|
|
|
|
Data File Coherency
|
|
===================
|
|
|
|
To request an update of the coherency data and file size on a cookie, the
|
|
following should be called::
|
|
|
|
void fscache_update_cookie(struct fscache_cookie *cookie,
|
|
const void *aux_data,
|
|
const loff_t *object_size);
|
|
|
|
This will update the cookie's coherency data and/or file size.
|
|
|
|
|
|
Data File Invalidation
|
|
======================
|
|
|
|
Sometimes it will be necessary to invalidate an object that contains data.
|
|
Typically this will be necessary when the server informs the network filesystem
|
|
of a remote third-party change - at which point the filesystem has to throw
|
|
away the state and cached data that it had for an file and reload from the
|
|
server.
|
|
|
|
To indicate that a cache object should be invalidated, the following should be
|
|
called::
|
|
|
|
void fscache_invalidate(struct fscache_cookie *cookie,
|
|
const void *aux_data,
|
|
loff_t size,
|
|
unsigned int flags);
|
|
|
|
This increases the invalidation counter in the cookie to cause outstanding
|
|
reads to fail with -ESTALE, sets the coherency data and file size from the
|
|
information supplied, blocks new I/O on the cookie and dispatches the cache to
|
|
go and get rid of the old data.
|
|
|
|
Invalidation runs asynchronously in a worker thread so that it doesn't block
|
|
too much.
|
|
|
|
|
|
Write-Back Resource Management
|
|
==============================
|
|
|
|
To write data to the cache from network filesystem writeback, the cache
|
|
resources required need to be pinned at the point the modification is made (for
|
|
instance when the page is marked dirty) as it's not possible to open a file in
|
|
a thread that's exiting.
|
|
|
|
The following facilities are provided to manage this:
|
|
|
|
* An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an
|
|
in-use is held on the cookie for this inode. It can only be changed if the
|
|
the inode lock is held.
|
|
|
|
* A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control``
|
|
struct that gets set if ``__writeback_single_inode()`` clears
|
|
``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared.
|
|
|
|
To support this, the following functions are provided::
|
|
|
|
bool fscache_dirty_folio(struct address_space *mapping,
|
|
struct folio *folio,
|
|
struct fscache_cookie *cookie);
|
|
void fscache_unpin_writeback(struct writeback_control *wbc,
|
|
struct fscache_cookie *cookie);
|
|
void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
|
|
struct inode *inode,
|
|
const void *aux);
|
|
|
|
The *set* function is intended to be called from the filesystem's
|
|
``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not
|
|
set, it sets that flag and increments the use count on the cookie (the caller
|
|
must already have called ``fscache_use_cookie()``).
|
|
|
|
The *unpin* function is intended to be called from the filesystem's
|
|
``write_inode`` superblock operation. It cleans up after writing by unusing
|
|
the cookie if unpinned_fscache_wb is set in the writeback_control struct.
|
|
|
|
The *clear* function is intended to be called from the netfs's ``evict_inode``
|
|
superblock operation. It must be called *after*
|
|
``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans
|
|
up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to
|
|
be updated.
|
|
|
|
|
|
Caching of Local Modifications
|
|
==============================
|
|
|
|
If a network filesystem has locally modified data that it wants to write to the
|
|
cache, it needs to mark the pages to indicate that a write is in progress, and
|
|
if the mark is already present, it needs to wait for it to be removed first
|
|
(presumably due to an already in-progress operation). This prevents multiple
|
|
competing DIO writes to the same storage in the cache.
|
|
|
|
Firstly, the netfs should determine if caching is available by doing something
|
|
like::
|
|
|
|
bool caching = fscache_cookie_enabled(cookie);
|
|
|
|
If caching is to be attempted, pages should be waited for and then marked using
|
|
the following functions provided by the netfs helper library::
|
|
|
|
void set_page_fscache(struct page *page);
|
|
void wait_on_page_fscache(struct page *page);
|
|
int wait_on_page_fscache_killable(struct page *page);
|
|
|
|
Once all the pages in the span are marked, the netfs can ask fscache to
|
|
schedule a write of that region::
|
|
|
|
void fscache_write_to_cache(struct fscache_cookie *cookie,
|
|
struct address_space *mapping,
|
|
loff_t start, size_t len, loff_t i_size,
|
|
netfs_io_terminated_t term_func,
|
|
void *term_func_priv,
|
|
bool caching)
|
|
|
|
And if an error occurs before that point is reached, the marks can be removed
|
|
by calling::
|
|
|
|
void fscache_clear_page_bits(struct address_space *mapping,
|
|
loff_t start, size_t len,
|
|
bool caching)
|
|
|
|
In these functions, a pointer to the mapping to which the source pages are
|
|
attached is passed in and start and len indicate the size of the region that's
|
|
going to be written (it doesn't have to align to page boundaries necessarily,
|
|
but it does have to align to DIO boundaries on the backing filesystem). The
|
|
caching parameter indicates if caching should be skipped, and if false, the
|
|
functions do nothing.
|
|
|
|
The write function takes some additional parameters: the cookie representing
|
|
the cache object to be written to, i_size indicates the size of the netfs file
|
|
and term_func indicates an optional completion function, to which
|
|
term_func_priv will be passed, along with the error or amount written.
|
|
|
|
Note that the write function will always run asynchronously and will unmark all
|
|
the pages upon completion before calling term_func.
|
|
|
|
|
|
Page Release and Invalidation
|
|
=============================
|
|
|
|
Fscache keeps track of whether we have any data in the cache yet for a cache
|
|
object we've just created. It knows it doesn't have to do any reading until it
|
|
has done a write and then the page it wrote from has been released by the VM,
|
|
after which it *has* to look in the cache.
|
|
|
|
To inform fscache that a page might now be in the cache, the following function
|
|
should be called from the ``release_folio`` address space op::
|
|
|
|
void fscache_note_page_release(struct fscache_cookie *cookie);
|
|
|
|
if the page has been released (ie. release_folio returned true).
|
|
|
|
Page release and page invalidation should also wait for any mark left on the
|
|
page to say that a DIO write is underway from that page::
|
|
|
|
void wait_on_page_fscache(struct page *page);
|
|
int wait_on_page_fscache_killable(struct page *page);
|
|
|
|
|
|
API Function Reference
|
|
======================
|
|
|
|
.. kernel-doc:: include/linux/fscache.h
|