
virtiofsd cannot run in a container because CAP_SYS_ADMIN is required to create namespaces. Introduce a weaker sandbox mode that is sufficient in container environments because the container runtime already sets up namespaces. Use chroot to restrict path traversal to the shared directory. virtiofsd loses the following: 1. Mount namespace. The process chroots to the shared directory but leaves the mounts in place. Seccomp rejects mount(2)/umount(2) syscalls. 2. Pid namespace. This should be fine because virtiofsd is the only process running in the container. 3. Network namespace. This should be fine because seccomp already rejects the connect(2) syscall, but an additional layer of security is lost. Container runtime-specific network security policies can be used drop network traffic (except for the vhost-user UNIX domain socket). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201008085534.16070-1-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
146 lines
4.4 KiB
ReStructuredText
146 lines
4.4 KiB
ReStructuredText
QEMU virtio-fs shared file system daemon
|
|
========================================
|
|
|
|
Synopsis
|
|
--------
|
|
|
|
**virtiofsd** [*OPTIONS*]
|
|
|
|
Description
|
|
-----------
|
|
|
|
Share a host directory tree with a guest through a virtio-fs device. This
|
|
program is a vhost-user backend that implements the virtio-fs device. Each
|
|
virtio-fs device instance requires its own virtiofsd process.
|
|
|
|
This program is designed to work with QEMU's ``--device vhost-user-fs-pci``
|
|
but should work with any virtual machine monitor (VMM) that supports
|
|
vhost-user. See the Examples section below.
|
|
|
|
This program must be run as the root user. The program drops privileges where
|
|
possible during startup although it must be able to create and access files
|
|
with any uid/gid:
|
|
|
|
* The ability to invoke syscalls is limited using seccomp(2).
|
|
* Linux capabilities(7) are dropped.
|
|
|
|
In "namespace" sandbox mode the program switches into a new file system
|
|
namespace and invokes pivot_root(2) to make the shared directory tree its root.
|
|
A new pid and net namespace is also created to isolate the process.
|
|
|
|
In "chroot" sandbox mode the program invokes chroot(2) to make the shared
|
|
directory tree its root. This mode is intended for container environments where
|
|
the container runtime has already set up the namespaces and the program does
|
|
not have permission to create namespaces itself.
|
|
|
|
Both sandbox modes prevent "file system escapes" due to symlinks and other file
|
|
system objects that might lead to files outside the shared directory.
|
|
|
|
Options
|
|
-------
|
|
|
|
.. program:: virtiofsd
|
|
|
|
.. option:: -h, --help
|
|
|
|
Print help.
|
|
|
|
.. option:: -V, --version
|
|
|
|
Print version.
|
|
|
|
.. option:: -d
|
|
|
|
Enable debug output.
|
|
|
|
.. option:: --syslog
|
|
|
|
Print log messages to syslog instead of stderr.
|
|
|
|
.. option:: -o OPTION
|
|
|
|
* debug -
|
|
Enable debug output.
|
|
|
|
* flock|no_flock -
|
|
Enable/disable flock. The default is ``no_flock``.
|
|
|
|
* modcaps=CAPLIST
|
|
Modify the list of capabilities allowed; CAPLIST is a colon separated
|
|
list of capabilities, each preceded by either + or -, e.g.
|
|
''+sys_admin:-chown''.
|
|
|
|
* log_level=LEVEL -
|
|
Print only log messages matching LEVEL or more severe. LEVEL is one of
|
|
``err``, ``warn``, ``info``, or ``debug``. The default is ``info``.
|
|
|
|
* posix_lock|no_posix_lock -
|
|
Enable/disable remote POSIX locks. The default is ``no_posix_lock``.
|
|
|
|
* readdirplus|no_readdirplus -
|
|
Enable/disable readdirplus. The default is ``readdirplus``.
|
|
|
|
* sandbox=namespace|chroot -
|
|
Sandbox mode:
|
|
- namespace: Create mount, pid, and net namespaces and pivot_root(2) into
|
|
the shared directory.
|
|
- chroot: chroot(2) into shared directory (use in containers).
|
|
The default is "namespace".
|
|
|
|
* source=PATH -
|
|
Share host directory tree located at PATH. This option is required.
|
|
|
|
* timeout=TIMEOUT -
|
|
I/O timeout in seconds. The default depends on cache= option.
|
|
|
|
* writeback|no_writeback -
|
|
Enable/disable writeback cache. The cache allows the FUSE client to buffer
|
|
and merge write requests. The default is ``no_writeback``.
|
|
|
|
* xattr|no_xattr -
|
|
Enable/disable extended attributes (xattr) on files and directories. The
|
|
default is ``no_xattr``.
|
|
|
|
.. option:: --socket-path=PATH
|
|
|
|
Listen on vhost-user UNIX domain socket at PATH.
|
|
|
|
.. option:: --socket-group=GROUP
|
|
|
|
Set the vhost-user UNIX domain socket gid to GROUP.
|
|
|
|
.. option:: --fd=FDNUM
|
|
|
|
Accept connections from vhost-user UNIX domain socket file descriptor FDNUM.
|
|
The file descriptor must already be listening for connections.
|
|
|
|
.. option:: --thread-pool-size=NUM
|
|
|
|
Restrict the number of worker threads per request queue to NUM. The default
|
|
is 64.
|
|
|
|
.. option:: --cache=none|auto|always
|
|
|
|
Select the desired trade-off between coherency and performance. ``none``
|
|
forbids the FUSE client from caching to achieve best coherency at the cost of
|
|
performance. ``auto`` acts similar to NFS with a 1 second metadata cache
|
|
timeout. ``always`` sets a long cache lifetime at the expense of coherency.
|
|
The default is ``auto``.
|
|
|
|
Examples
|
|
--------
|
|
|
|
Export ``/var/lib/fs/vm001/`` on vhost-user UNIX domain socket
|
|
``/var/run/vm001-vhost-fs.sock``:
|
|
|
|
::
|
|
|
|
host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
|
|
host# qemu-system-x86_64 \
|
|
-chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
|
|
-device vhost-user-fs-pci,chardev=char0,tag=myfs \
|
|
-object memory-backend-memfd,id=mem,size=4G,share=on \
|
|
-numa node,memdev=mem \
|
|
...
|
|
guest# mount -t virtiofs myfs /mnt
|