 e84125761f
			
		
	
	
		e84125761f
		
	
	
	
	
		
			
			This new chapter in the QEMU documentation covers the security requirements that QEMU is designed to meet and principles for securely deploying QEMU. It is just a starting point that can be extended in the future with more information. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-id: 20190509121820.16294-3-stefanha@redhat.com Message-Id: <20190509121820.16294-3-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
		
			
				
	
	
		
			132 lines
		
	
	
		
			5.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			132 lines
		
	
	
		
			5.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| @node Security
 | |
| @chapter Security
 | |
| 
 | |
| @section Overview
 | |
| 
 | |
| This chapter explains the security requirements that QEMU is designed to meet
 | |
| and principles for securely deploying QEMU.
 | |
| 
 | |
| @section Security Requirements
 | |
| 
 | |
| QEMU supports many different use cases, some of which have stricter security
 | |
| requirements than others.  The community has agreed on the overall security
 | |
| requirements that users may depend on.  These requirements define what is
 | |
| considered supported from a security perspective.
 | |
| 
 | |
| @subsection Virtualization Use Case
 | |
| 
 | |
| The virtualization use case covers cloud and virtual private server (VPS)
 | |
| hosting, as well as traditional data center and desktop virtualization.  These
 | |
| use cases rely on hardware virtualization extensions to execute guest code
 | |
| safely on the physical CPU at close-to-native speed.
 | |
| 
 | |
| The following entities are untrusted, meaning that they may be buggy or
 | |
| malicious:
 | |
| 
 | |
| @itemize
 | |
| @item Guest
 | |
| @item User-facing interfaces (e.g. VNC, SPICE, WebSocket)
 | |
| @item Network protocols (e.g. NBD, live migration)
 | |
| @item User-supplied files (e.g. disk images, kernels, device trees)
 | |
| @item Passthrough devices (e.g. PCI, USB)
 | |
| @end itemize
 | |
| 
 | |
| Bugs affecting these entities are evaluated on whether they can cause damage in
 | |
| real-world use cases and treated as security bugs if this is the case.
 | |
| 
 | |
| @subsection Non-virtualization Use Case
 | |
| 
 | |
| The non-virtualization use case covers emulation using the Tiny Code Generator
 | |
| (TCG).  In principle the TCG and device emulation code used in conjunction with
 | |
| the non-virtualization use case should meet the same security requirements as
 | |
| the virtualization use case.  However, for historical reasons much of the
 | |
| non-virtualization use case code was not written with these security
 | |
| requirements in mind.
 | |
| 
 | |
| Bugs affecting the non-virtualization use case are not considered security
 | |
| bugs at this time.  Users with non-virtualization use cases must not rely on
 | |
| QEMU to provide guest isolation or any security guarantees.
 | |
| 
 | |
| @section Architecture
 | |
| 
 | |
| This section describes the design principles that ensure the security
 | |
| requirements are met.
 | |
| 
 | |
| @subsection Guest Isolation
 | |
| 
 | |
| Guest isolation is the confinement of guest code to the virtual machine.  When
 | |
| guest code gains control of execution on the host this is called escaping the
 | |
| virtual machine.  Isolation also includes resource limits such as throttling of
 | |
| CPU, memory, disk, or network.  Guests must be unable to exceed their resource
 | |
| limits.
 | |
| 
 | |
| QEMU presents an attack surface to the guest in the form of emulated devices.
 | |
| The guest must not be able to gain control of QEMU.  Bugs in emulated devices
 | |
| could allow malicious guests to gain code execution in QEMU.  At this point the
 | |
| guest has escaped the virtual machine and is able to act in the context of the
 | |
| QEMU process on the host.
 | |
| 
 | |
| Guests often interact with other guests and share resources with them.  A
 | |
| malicious guest must not gain control of other guests or access their data.
 | |
| Disk image files and network traffic must be protected from other guests unless
 | |
| explicitly shared between them by the user.
 | |
| 
 | |
| @subsection Principle of Least Privilege
 | |
| 
 | |
| The principle of least privilege states that each component only has access to
 | |
| the privileges necessary for its function.  In the case of QEMU this means that
 | |
| each process only has access to resources belonging to the guest.
 | |
| 
 | |
| The QEMU process should not have access to any resources that are inaccessible
 | |
| to the guest.  This way the guest does not gain anything by escaping into the
 | |
| QEMU process since it already has access to those same resources from within
 | |
| the guest.
 | |
| 
 | |
| Following the principle of least privilege immediately fulfills guest isolation
 | |
| requirements.  For example, guest A only has access to its own disk image file
 | |
| @code{a.img} and not guest B's disk image file @code{b.img}.
 | |
| 
 | |
| In reality certain resources are inaccessible to the guest but must be
 | |
| available to QEMU to perform its function.  For example, host system calls are
 | |
| necessary for QEMU but are not exposed to guests.  A guest that escapes into
 | |
| the QEMU process can then begin invoking host system calls.
 | |
| 
 | |
| New features must be designed to follow the principle of least privilege.
 | |
| Should this not be possible for technical reasons, the security risk must be
 | |
| clearly documented so users are aware of the trade-off of enabling the feature.
 | |
| 
 | |
| @subsection Isolation mechanisms
 | |
| 
 | |
| Several isolation mechanisms are available to realize this architecture of
 | |
| guest isolation and the principle of least privilege.  With the exception of
 | |
| Linux seccomp, these mechanisms are all deployed by management tools that
 | |
| launch QEMU, such as libvirt.  They are also platform-specific so they are only
 | |
| described briefly for Linux here.
 | |
| 
 | |
| The fundamental isolation mechanism is that QEMU processes must run as
 | |
| unprivileged users.  Sometimes it seems more convenient to launch QEMU as
 | |
| root to give it access to host devices (e.g. @code{/dev/net/tun}) but this poses a
 | |
| huge security risk.  File descriptor passing can be used to give an otherwise
 | |
| unprivileged QEMU process access to host devices without running QEMU as root.
 | |
| It is also possible to launch QEMU as a non-root user and configure UNIX groups
 | |
| for access to @code{/dev/kvm}, @code{/dev/net/tun}, and other device nodes.
 | |
| Some Linux distros already ship with UNIX groups for these devices by default.
 | |
| 
 | |
| @itemize
 | |
| @item SELinux and AppArmor make it possible to confine processes beyond the
 | |
| traditional UNIX process and file permissions model.  They restrict the QEMU
 | |
| process from accessing processes and files on the host system that are not
 | |
| needed by QEMU.
 | |
| 
 | |
| @item Resource limits and cgroup controllers provide throughput and utilization
 | |
| limits on key resources such as CPU time, memory, and I/O bandwidth.
 | |
| 
 | |
| @item Linux namespaces can be used to make process, file system, and other system
 | |
| resources unavailable to QEMU.  A namespaced QEMU process is restricted to only
 | |
| those resources that were granted to it.
 | |
| 
 | |
| @item Linux seccomp is available via the QEMU @option{--sandbox} option.  It disables
 | |
| system calls that are not needed by QEMU, thereby reducing the host kernel
 | |
| attack surface.
 | |
| @end itemize
 |