security.texi is included from qemu-doc.texi but is not used in the qemu.1 manpage. So we can do a straightforward conversion of the contents, which go into the system manual. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20200228153619.9906-17-peter.maydell@linaro.org Message-id: 20200226113034.6741-16-pbonzini@redhat.com
		
			
				
	
	
		
			174 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			174 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
Security
 | 
						|
========
 | 
						|
 | 
						|
Overview
 | 
						|
--------
 | 
						|
 | 
						|
This chapter explains the security requirements that QEMU is designed to meet
 | 
						|
and principles for securely deploying QEMU.
 | 
						|
 | 
						|
Security Requirements
 | 
						|
---------------------
 | 
						|
 | 
						|
QEMU supports many different use cases, some of which have stricter security
 | 
						|
requirements than others.  The community has agreed on the overall security
 | 
						|
requirements that users may depend on.  These requirements define what is
 | 
						|
considered supported from a security perspective.
 | 
						|
 | 
						|
Virtualization Use Case
 | 
						|
'''''''''''''''''''''''
 | 
						|
 | 
						|
The virtualization use case covers cloud and virtual private server (VPS)
 | 
						|
hosting, as well as traditional data center and desktop virtualization.  These
 | 
						|
use cases rely on hardware virtualization extensions to execute guest code
 | 
						|
safely on the physical CPU at close-to-native speed.
 | 
						|
 | 
						|
The following entities are untrusted, meaning that they may be buggy or
 | 
						|
malicious:
 | 
						|
 | 
						|
- Guest
 | 
						|
- User-facing interfaces (e.g. VNC, SPICE, WebSocket)
 | 
						|
- Network protocols (e.g. NBD, live migration)
 | 
						|
- User-supplied files (e.g. disk images, kernels, device trees)
 | 
						|
- Passthrough devices (e.g. PCI, USB)
 | 
						|
 | 
						|
Bugs affecting these entities are evaluated on whether they can cause damage in
 | 
						|
real-world use cases and treated as security bugs if this is the case.
 | 
						|
 | 
						|
Non-virtualization Use Case
 | 
						|
'''''''''''''''''''''''''''
 | 
						|
 | 
						|
The non-virtualization use case covers emulation using the Tiny Code Generator
 | 
						|
(TCG).  In principle the TCG and device emulation code used in conjunction with
 | 
						|
the non-virtualization use case should meet the same security requirements as
 | 
						|
the virtualization use case.  However, for historical reasons much of the
 | 
						|
non-virtualization use case code was not written with these security
 | 
						|
requirements in mind.
 | 
						|
 | 
						|
Bugs affecting the non-virtualization use case are not considered security
 | 
						|
bugs at this time.  Users with non-virtualization use cases must not rely on
 | 
						|
QEMU to provide guest isolation or any security guarantees.
 | 
						|
 | 
						|
Architecture
 | 
						|
------------
 | 
						|
 | 
						|
This section describes the design principles that ensure the security
 | 
						|
requirements are met.
 | 
						|
 | 
						|
Guest Isolation
 | 
						|
'''''''''''''''
 | 
						|
 | 
						|
Guest isolation is the confinement of guest code to the virtual machine.  When
 | 
						|
guest code gains control of execution on the host this is called escaping the
 | 
						|
virtual machine.  Isolation also includes resource limits such as throttling of
 | 
						|
CPU, memory, disk, or network.  Guests must be unable to exceed their resource
 | 
						|
limits.
 | 
						|
 | 
						|
QEMU presents an attack surface to the guest in the form of emulated devices.
 | 
						|
The guest must not be able to gain control of QEMU.  Bugs in emulated devices
 | 
						|
could allow malicious guests to gain code execution in QEMU.  At this point the
 | 
						|
guest has escaped the virtual machine and is able to act in the context of the
 | 
						|
QEMU process on the host.
 | 
						|
 | 
						|
Guests often interact with other guests and share resources with them.  A
 | 
						|
malicious guest must not gain control of other guests or access their data.
 | 
						|
Disk image files and network traffic must be protected from other guests unless
 | 
						|
explicitly shared between them by the user.
 | 
						|
 | 
						|
Principle of Least Privilege
 | 
						|
''''''''''''''''''''''''''''
 | 
						|
 | 
						|
The principle of least privilege states that each component only has access to
 | 
						|
the privileges necessary for its function.  In the case of QEMU this means that
 | 
						|
each process only has access to resources belonging to the guest.
 | 
						|
 | 
						|
The QEMU process should not have access to any resources that are inaccessible
 | 
						|
to the guest.  This way the guest does not gain anything by escaping into the
 | 
						|
QEMU process since it already has access to those same resources from within
 | 
						|
the guest.
 | 
						|
 | 
						|
Following the principle of least privilege immediately fulfills guest isolation
 | 
						|
requirements.  For example, guest A only has access to its own disk image file
 | 
						|
``a.img`` and not guest B's disk image file ``b.img``.
 | 
						|
 | 
						|
In reality certain resources are inaccessible to the guest but must be
 | 
						|
available to QEMU to perform its function.  For example, host system calls are
 | 
						|
necessary for QEMU but are not exposed to guests.  A guest that escapes into
 | 
						|
the QEMU process can then begin invoking host system calls.
 | 
						|
 | 
						|
New features must be designed to follow the principle of least privilege.
 | 
						|
Should this not be possible for technical reasons, the security risk must be
 | 
						|
clearly documented so users are aware of the trade-off of enabling the feature.
 | 
						|
 | 
						|
Isolation mechanisms
 | 
						|
''''''''''''''''''''
 | 
						|
 | 
						|
Several isolation mechanisms are available to realize this architecture of
 | 
						|
guest isolation and the principle of least privilege.  With the exception of
 | 
						|
Linux seccomp, these mechanisms are all deployed by management tools that
 | 
						|
launch QEMU, such as libvirt.  They are also platform-specific so they are only
 | 
						|
described briefly for Linux here.
 | 
						|
 | 
						|
The fundamental isolation mechanism is that QEMU processes must run as
 | 
						|
unprivileged users.  Sometimes it seems more convenient to launch QEMU as
 | 
						|
root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
 | 
						|
huge security risk.  File descriptor passing can be used to give an otherwise
 | 
						|
unprivileged QEMU process access to host devices without running QEMU as root.
 | 
						|
It is also possible to launch QEMU as a non-root user and configure UNIX groups
 | 
						|
for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes.
 | 
						|
Some Linux distros already ship with UNIX groups for these devices by default.
 | 
						|
 | 
						|
- SELinux and AppArmor make it possible to confine processes beyond the
 | 
						|
  traditional UNIX process and file permissions model.  They restrict the QEMU
 | 
						|
  process from accessing processes and files on the host system that are not
 | 
						|
  needed by QEMU.
 | 
						|
 | 
						|
- Resource limits and cgroup controllers provide throughput and utilization
 | 
						|
  limits on key resources such as CPU time, memory, and I/O bandwidth.
 | 
						|
 | 
						|
- Linux namespaces can be used to make process, file system, and other system
 | 
						|
  resources unavailable to QEMU.  A namespaced QEMU process is restricted to only
 | 
						|
  those resources that were granted to it.
 | 
						|
 | 
						|
- Linux seccomp is available via the QEMU ``--sandbox`` option.  It disables
 | 
						|
  system calls that are not needed by QEMU, thereby reducing the host kernel
 | 
						|
  attack surface.
 | 
						|
 | 
						|
Sensitive configurations
 | 
						|
------------------------
 | 
						|
 | 
						|
There are aspects of QEMU that can have security implications which users &
 | 
						|
management applications must be aware of.
 | 
						|
 | 
						|
Monitor console (QMP and HMP)
 | 
						|
'''''''''''''''''''''''''''''
 | 
						|
 | 
						|
The monitor console (whether used with QMP or HMP) provides an interface
 | 
						|
to dynamically control many aspects of QEMU's runtime operation. Many of the
 | 
						|
commands exposed will instruct QEMU to access content on the host file system
 | 
						|
and/or trigger spawning of external processes.
 | 
						|
 | 
						|
For example, the ``migrate`` command allows for the spawning of arbitrary
 | 
						|
processes for the purpose of tunnelling the migration data stream. The
 | 
						|
``blockdev-add`` command instructs QEMU to open arbitrary files, exposing
 | 
						|
their content to the guest as a virtual disk.
 | 
						|
 | 
						|
Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
 | 
						|
or Linux namespaces, the monitor console should be considered to have privileges
 | 
						|
equivalent to those of the user account QEMU is running under.
 | 
						|
 | 
						|
It is further important to consider the security of the character device backend
 | 
						|
over which the monitor console is exposed. It needs to have protection against
 | 
						|
malicious third parties which might try to make unauthorized connections, or
 | 
						|
perform man-in-the-middle attacks. Many of the character device backends do not
 | 
						|
satisfy this requirement and so must not be used for the monitor console.
 | 
						|
 | 
						|
The general recommendation is that the monitor console should be exposed over
 | 
						|
a UNIX domain socket backend to the local host only. Use of the TCP based
 | 
						|
character device backend is inappropriate unless configured to use both TLS
 | 
						|
encryption and authorization control policy on client connections.
 | 
						|
 | 
						|
In summary, the monitor console is considered a privileged control interface to
 | 
						|
QEMU and as such should only be made accessible to a trusted management
 | 
						|
application or user.
 |