This document describes the current state of PersistentVolumes
in Kubernetes. Familiarity with volumes is suggested.
Managing storage is a distinct problem from managing compute. The PersistentVolume
subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this we introduce two new API resources: PersistentVolume
and PersistentVolumeClaim
.
A PersistentVolume
(PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim
(PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
While PersistentVolumeClaims
allow a user to consume abstract storage
resources, it is common that users need PersistentVolumes
with varying
properties, such as performance, for different problems. Cluster administrators
need to be able to offer a variety of PersistentVolumes
that differ in more
ways than just size and access modes, without exposing users to the details of
how those volumes are implemented. For these needs there is the StorageClass
resource.
Please see the detailed walkthrough with working examples.
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
There are two ways PVs may be provisioned: statically or dynamically.
A cluster administrator creates a number of PVs. They carry the details of the real storage which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.
When none of the static PVs the administrator created matches a user’s PersistentVolumeClaim
,
the cluster may try to dynamically provision a volume specially for the PVC.
This provisioning is based on StorageClasses
: the PVC must request a
storage class and
the administrator must have created and configured that class in order for dynamic
provisioning to occur. Claims that request the class ""
effectively disable
dynamic provisioning for themselves.
To enable dynamic storage provisioning based on storage class, the cluster administrator
needs to enable the DefaultStorageClass
admission controller
on the API server. This can be done, for example, by ensuring that DefaultStorageClass
is
among the comma-delimited, ordered list of values for the --admission-control
flag of
the API server component. For more information on API server command line flags,
please check kube-apiserver documentation.
A user creates, or has already created in the case of dynamic provisioning, a PersistentVolumeClaim
with a specific amount of storage requested and with certain access modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise, the user will always get at least what they asked for, but the volume may be in excess of what was requested. Once bound, PersistentVolumeClaim
binds are exclusive, regardless of how they were bound. A PVC to PV binding is a one-to-one mapping.
Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be bound as matching volumes become available. For example, a cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.
Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a pod. For volumes which support multiple access modes, the user specifies which mode desired when using their claim as a volume in a pod.
Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule Pods and access their claimed PVs by including a persistentVolumeClaim
in their Pod’s volumes block. See below for syntax details.
FEATURE STATE: Kubernetes v1.9
alpha
This feature is currently in a alpha state, meaning:
The purpose of the PVC protection is to ensure that PVCs in active use by a pod are not removed from the system as this may result in data loss.
Note: PVC is in active use by a pod when the the pod status is Pending
and the pod is assigned to a node or the pod status is Running
.
When the PVC protection alpha feature is enabled, if a user deletes a PVC in active use by a pod, the PVC is not removed immediately. PVC removal is postponed until the PVC is no longer actively used by any pods.
You can see that a PVC is protected when the PVC’s status is Terminating
and the Finalizers
list includes kubernetes.io/pvc-protection
:
kubectl describe pvc hostpath
Name: hostpath
Namespace: default
StorageClass: example-hostpath
Status: Terminating
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-class=example-hostpath
volume.beta.kubernetes.io/storage-provisioner=example.com/hostpath
Finalizers: [kubernetes.io/pvc-protection]
...
When a user is done with their volume, they can delete the PVC objects from the API which allows reclamation of the resource. The reclaim policy for a PersistentVolume
tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled or Deleted.
The Retain reclaim policy allows for manual reclamation of the resource. When the PersistentVolumeClaim
is deleted, the PersistentVolume
still exists and the volume is considered “released”. But it is not yet available for another claim because the previous claimant’s data remains on the volume. An administrator can manually reclaim the volume with the following steps.
PersistentVolume
. The associated storage asset in external infrastructure (such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.PersistentVolume
with the storage asset definition.If supported by appropriate volume plugin, recycling performs a basic scrub (rm -rf /thevolume/*
) on the volume and makes it available again for a new claim.
However, an administrator can configure a custom recycler pod template using the Kubernetes controller manager command line arguments as described here. The custom recycler pod template must contain a volumes
specification, as shown in the example below:
apiVersion: v1
kind: Pod
metadata:
name: pv-recycler
namespace: default
spec:
restartPolicy: Never
volumes:
- name: vol
hostPath:
path: /any/path/it/will/be/replaced
containers:
- name: pv-recycler
image: "k8s.gcr.io/busybox"
command: ["/bin/sh", "-c", "test -e /scrub && rm -rf /scrub/..?* /scrub/.[!.]* /scrub/* && test -z \"$(ls -A /scrub)\" || exit 1"]
volumeMounts:
- name: vol
mountPath: /scrub
However, the particular path specified in the custom recycler pod template in the volumes
part is replaced with the particular path of the volume that is being recycled.
For volume plugins that support the Delete reclaim policy, deletion removes both the PersistentVolume
object from Kubernetes, as well as deleting the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes that were dynamically provisioned inherit the reclaim policy of their StorageClass
, which defaults to Delete. The administrator should configure the StorageClass
according to users’ expectations, otherwise the PV must be edited or patched after it is created. See Change the Reclaim Policy of a PersistentVolume.
Kubernetes 1.8 added Alpha support for expanding persistent volumes. In v1.9, the following volume types support expanding Persistent volume claims:
Administrator can allow expanding persistent volume claims by setting ExpandPersistentVolumes
feature gate to true. Administrator
should also enable PersistentVolumeClaimResize
admission plugin
to perform additional validations of volumes that can be resized.
Once PersistentVolumeClaimResize
admission plug-in has been turned on, resizing will only be allowed for storage classes
whose allowVolumeExpansion
field is set to true.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gluster-vol-default
provisioner: kubernetes.io/glusterfs
parameters:
resturl: "http://192.168.10.100:8080"
restuser: ""
secretNamespace: ""
secretName: ""
allowVolumeExpansion: true
Once both feature gate and aforementioned admission plug-in are turned on, an user can request larger volume for their PersistentVolumeClaim
by simply editing the claim and requesting bigger size. This in turn will trigger expansion of volume that is backing underlying PersistentVolume
.
Under no circumstances a new PersistentVolume
gets created to satisfy the claim. Kubernetes will attempt to resize existing volume to satisfy the claim.
For expanding volumes containing a file system, file system resizing is only performed when a new Pod is started using the PersistentVolumeClaim
in
ReadWrite mode. In other words, if a volume being expanded is used in a pod or deployment, you will need to delete and recreate the pod for file system
resizing to take place. Also, file system resizing is only supported for following file system types:
Note: Expanding EBS volumes is a time consuming operation. Also, there is a per-volume quota of one modification every 6 hours.
PersistentVolume
types are implemented as plugins. Kubernetes currently supports the following plugins:
Raw Block Support exists for these plugins only.
Each PV contains a spec and status, which is the specification and status of the volume.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0003
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: slow
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /tmp
server: 172.17.0.2
Generally, a PV will have a specific storage capacity. This is set using the PV’s capacity
attribute. See the Kubernetes Resource Model to understand the units expected by capacity
.
Currently, storage size is the only resource that can be set or requested. Future attributes may include IOPS, throughput, etc.
Prior to v1.9, the default behavior for all volume plugins was to create a filesystem on the persistent volume. With v1.9, the user can specify a volumeMode
which will now support raw block devices in addition to file systems. Valid values for volumeMode
are “Filesystem” or “Block”. If left unspecified, volumeMode
defaults to “Filesystem” internally. This is an optional API parameter.
Note: This feature is alpha in v1.9 and may change in the future.
A PersistentVolume
can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV’s access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV’s capabilities.
The access modes are:
In the CLI, the access modes are abbreviated to:
Important! A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.
Volume Plugin | ReadWriteOnce | ReadOnlyMany | ReadWriteMany |
---|---|---|---|
AWSElasticBlockStore | ✓ | - | - |
AzureFile | ✓ | ✓ | ✓ |
AzureDisk | ✓ | - | - |
CephFS | ✓ | ✓ | ✓ |
Cinder | ✓ | - | - |
FC | ✓ | ✓ | - |
FlexVolume | ✓ | ✓ | - |
Flocker | ✓ | - | - |
GCEPersistentDisk | ✓ | ✓ | - |
Glusterfs | ✓ | ✓ | ✓ |
HostPath | ✓ | - | - |
iSCSI | ✓ | ✓ | - |
PhotonPersistentDisk | ✓ | - | - |
Quobyte | ✓ | ✓ | ✓ |
NFS | ✓ | ✓ | ✓ |
RBD | ✓ | ✓ | - |
VsphereVolume | ✓ | - | - (works when pods are collocated) |
PortworxVolume | ✓ | - | ✓ |
ScaleIO | ✓ | ✓ | - |
StorageOS | ✓ | - | - |
A PV can have a class, which is specified by setting the
storageClassName
attribute to the name of a
StorageClass.
A PV of a particular class can only be bound to PVCs requesting
that class. A PV with no storageClassName
has no class and can only be bound
to PVCs that request no particular class.
In the past, the annotation volume.beta.kubernetes.io/storage-class
was used instead
of the storageClassName
attribute. This annotation is still working, however
it will become fully deprecated in a future Kubernetes release.
Current reclaim policies are:
rm -rf /thevolume/*
)Currently, only NFS and HostPath support recycling. AWS EBS, GCE PD, Azure Disk, and Cinder volumes support deletion.
A Kubernetes administrator can specify additional mount options for when a Persistent Volume is mounted on a node.
Note: Not all Persistent volume types support mount options.
The following volume types support mount options:
Mount options are not validated, so mount will simply fail if one is invalid.
In the past, the annotation volume.beta.kubernetes.io/mount-options
was used instead
of the mountOptions
attribute. This annotation is still working, however
it will become fully deprecated in a future Kubernetes release.
A volume will be in one of the following phases:
The CLI will show the name of the PVC bound to the PV.
Each PVC contains a spec and status, which is the specification and status of the claim.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 8Gi
storageClassName: slow
selector:
matchLabels:
release: "stable"
matchExpressions:
- {key: environment, operator: In, values: [dev]}
Claims use the same conventions as volumes when requesting storage with specific access modes.
Claims use the same convention as volumes to indicates the consumption of the volume as either a filesystem or block device.
Claims, like pods, can request specific quantities of a resource. In this case, the request is for storage. The same resource model applies to both volumes and claims.
Claims can specify a label selector to further filter the set of volumes. Only the volumes whose labels match the selector can be bound to the claim. The selector can consist of two fields:
matchLabels
- the volume must have a label with this valuematchExpressions
- a list of requirements made by specifying key, list of values, and operator that relates the key and values. Valid operators include In, NotIn, Exists, and DoesNotExist.All of the requirements, from both matchLabels
and matchExpressions
are ANDed together – they must all be satisfied in order to match.
A claim can request a particular class by specifying the name of a
StorageClass
using the attribute storageClassName
.
Only PVs of the requested class, ones with the same storageClassName
as the PVC, can
be bound to the PVC.
PVCs don’t necessarily have to request a class. A PVC with its storageClassName
set
equal to ""
is always interpreted to be requesting a PV with no class, so it
can only be bound to PVs with no class (no annotation or one set equal to
""
). A PVC with no storageClassName
is not quite the same and is treated differently
by the cluster depending on whether the
DefaultStorageClass
admission plugin
is turned on.
StorageClass
. All PVCs that have no storageClassName
can be bound only to
PVs of that default. Specifying a default StorageClass
is done by setting the
annotation storageclass.kubernetes.io/is-default-class
equal to “true” in
a StorageClass
object. If the administrator does not specify a default, the
cluster responds to PVC creation as if the admission plugin were turned off. If
more than one default is specified, the admission plugin forbids the creation of
all PVCs.StorageClass
. All PVCs that have no storageClassName
can be bound only to PVs that
have no class. In this case, the PVCs that have no storageClassName
are treated the
same way as PVCs that have their storageClassName
set to ""
.Depending on installation method, a default StorageClass may be deployed to Kubernetes cluster by addon manager during installation.
When a PVC specifies a selector
in addition to requesting a StorageClass
,
the requirements are ANDed together: only a PV of the requested class and with
the requested labels may be bound to the PVC.
Note: Currently, a PVC with a non-empty selector
can’t have a PV dynamically provisioned for it.
In the past, the annotation volume.beta.kubernetes.io/storage-class
was used instead
of storageClassName
attribute. This annotation is still working, however
it won’t be supported in a future Kubernetes release.
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the pod using the claim. The cluster finds the claim in the pod’s namespace and uses it to get the PersistentVolume
backing the claim. The volume is then mounted to the host and into the pod.
kind: Pod
apiVersion: v1
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: dockerfile/nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: myclaim
PersistentVolumes
binds are exclusive, and since PersistentVolumeClaims
are namespaced objects, mounting claims with “Many” modes (ROX
, RWX
) is only possible within one namespace.
Static provisioning support for Raw Block Volumes is included as an alpha feature for v1.9. With this change are some new API fields that need to be used to facilitate this functionality. Currently, Fibre Channel is the only supported plugin for this feature.
apiVersion: v1
kind: PersistentVolume
metadata:
name: block-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
volumeMode: Block
persistentVolumeReclaimPolicy: Retain
fc:
targetWWNs: ["50060e801049cfd1"]
lun: 0
readOnly: false
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: block-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Block
resources:
requests:
storage: 10Gi
apiVersion: v1
kind: Pod
metadata:
name: pod-with-block-volume
spec:
containers:
- name: fc-container
image: fedora:26
command: ["/bin/sh", "-c"]
args: [ "tail -f /dev/null" ]
volumeDevices:
- name: data
devicePath: /dev/xvda
volumes:
- name: data
persistentVolumeClaim:
claimName: block-pvc
Note: When adding a raw block device for a Pod, we specify the device path in the container instead of a mount path.
If a user requests a raw block volume by indicating this using the volumeMode
field in the PersistentVolumeClaim
spec, the binding rules differ slighty from previous releases that didn’t consider this mode as part of the spec.
Listed is a table of possible combinations the user and admin might specify for requesting a raw block device. The table indicates if the volume will be bound or not given the combinations:
Volume binding matrix for statically provisioned volumes:
PV volumeMode | PVC volumeMode | Result |
---|---|---|
unspecified | unspecified | BIND |
unspecified | Block | NO BIND |
unspecified | Filesystem | BIND |
Block | unspecified | NO BIND |
Block | Block | BIND |
Block | Filesystem | NO BIND |
Filesystem | Filesystem | BIND |
Filesystem | Block | NO BIND |
Filesystem | unspecified | BIND |
Note: Only statically provisioned volumes are supported for alpha release. Administrators should take care to consider these values when working with raw block devices.
If you’re writing configuration templates or examples that run on a wide range of clusters and need persistent storage, we recommend that you use the following pattern:
persistentVolumeClaim.storageClassName
field.
This will cause the PVC to match the right storage
class if the cluster has StorageClasses enabled by the admin.persistentVolumeClaim.storageClassName
field as nil.