As currently defined, initial_bytes is monotonically decreasing and precedes dirty_bytes when reading from the saving file descriptor. The transition from initial_bytes to dirty_bytes is unidirectional and irreversible. The initial_bytes are considered as critical data that is highly recommended to be transferred to the target as part of PRE_COPY, without this data, the PRE_COPY phase would be ineffective. We come to solve the case when a new chunk of critical data is introduced during the PRE_COPY phase and the driver would like to report an entirely new value for the initial_bytes. For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new initial_bytes value during the PRE_COPY phase. Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't assign info.flags before copy_to_user(), this effectively echoes userspace-provided flags back as output, preventing the field from being used to report new reliable data from the drivers. Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace to explicitly opt in by enabling the VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 device feature. When the caller opts in, the driver may report an entirely new value for initial_bytes. It may be larger, it may be smaller, it may include the previous unread initial_bytes, it may discard the previous unread initial_bytes, up to the driver logic and state. The presence of the VFIO_PRECOPY_INFO_REINIT output flag set by the driver indicates that new initial data is present on the stream. Once the caller sees this flag, the initial_bytes value should be re-evaluated relative to the readiness state for transition to STOP_COPY. Signed-off-by: Yishai Hadas --- include/uapi/linux/vfio.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index bb7b89330d35..b6efda07000f 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1266,6 +1266,17 @@ enum vfio_device_mig_state { * The initial_bytes field indicates the amount of initial precopy * data available from the device. This field should have a non-zero initial * value and decrease as migration data is read from the device. + * The presence of the VFIO_PRECOPY_INFO_REINIT output flag indicates + * that new initial data is present on the stream. + * In that case initial_bytes may report a non-zero value irrespective of + * any previously reported values, which progresses towards zero as precopy + * data is read from the data stream. dirty_bytes is also reset + * to zero and represents the state change of the device relative to the new + * initial_bytes. + * VFIO_PRECOPY_INFO_REINIT can be reported only after userspace opts in to + * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. Without this opt-in, the flags field + * of struct vfio_precopy_info is reserved for bug-compatibility reasons. + * * It is recommended to leave PRE_COPY for STOP_COPY only after this field * reaches zero. Leaving PRE_COPY earlier might make things slower. * @@ -1301,6 +1312,7 @@ enum vfio_device_mig_state { struct vfio_precopy_info { __u32 argsz; __u32 flags; +#define VFIO_PRECOPY_INFO_REINIT (1 << 0) /* output - new initial data is present */ __aligned_u64 initial_bytes; __aligned_u64 dirty_bytes; }; @@ -1510,6 +1522,16 @@ struct vfio_device_feature_dma_buf { struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges); }; +/* + * Enables the migration prepcopy_info_v2 behaviour. + * + * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. + * + * On SET, enables the v2 pre_copy_info behaviour, where the + * vfio_precopy_info.flags is a valid output field. + */ +#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12 + /* -------- API for Type1 VFIO IOMMU -------- */ /** -- 2.18.1