Add KAPI-annotated kerneldoc for the sys_open system call in fs/open.c. The specification documents parameter constraints (pathname, flags bitmask, permission mode), 22 error conditions, locking requirements, side effects, required capabilities, and usage examples. Signed-off-by: Sasha Levin --- fs/open.c | 327 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 327 insertions(+) diff --git a/fs/open.c b/fs/open.c index 91f1139591abe..8e805233a277b 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1373,6 +1373,328 @@ int do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) } +/** + * sys_open - Open or create a file + * @filename: Pathname of the file to open or create + * @flags: File access mode and behavior flags (O_RDONLY, O_WRONLY, O_RDWR, etc.) + * @mode: File permission bits for newly created files (only with O_CREAT/O_TMPFILE) + * + * long-desc: Opens the file specified by pathname. If O_CREAT or O_TMPFILE is + * specified in flags, the file is created if it does not exist; its mode is + * set according to the mode parameter modified by the process's umask. + * + * The flags argument must include one of the following access modes: O_RDONLY + * (read-only), O_WRONLY (write-only), or O_RDWR (read/write). These are the + * low-order two bits of flags. In addition, zero or more file creation and + * file status flags can be bitwise-ORed in flags. + * + * File creation flags: O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC, O_DIRECTORY, + * O_NOFOLLOW, O_CLOEXEC, O_TMPFILE. These flags affect open behavior. + * + * File status flags: O_APPEND, FASYNC, O_DIRECT, O_DSYNC, O_LARGEFILE, + * O_NOATIME, O_NONBLOCK (O_NDELAY), O_PATH, O_SYNC. These become part of the + * file's open file description and can be retrieved/modified with fcntl(). + * + * The return value is a file descriptor, a small nonnegative integer used in + * subsequent system calls (read, write, lseek, fcntl, etc.) to refer to the + * open file. The file descriptor returned by a successful open is the lowest- + * numbered file descriptor not currently open for the process. + * + * On 64-bit systems, O_LARGEFILE is automatically added to the flags. On 32-bit + * systems, files larger than 2GB require O_LARGEFILE to be explicitly set. + * + * This syscall is a legacy interface. Modern code should prefer openat() for + * relative path operations and openat2() for additional control via resolve + * flags. The open() call is equivalent to openat(AT_FDCWD, pathname, flags). + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: filename + * type: KAPI_TYPE_PATH + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_USER_PATH + * cdesc: Must be a valid null-terminated path string in user memory. + * Maximum path length is PATH_MAX (4096 bytes) including null terminator. + * For relative paths, resolution starts from current working directory. + * The path is followed (symlinks resolved) unless O_NOFOLLOW is specified. + * + * param: flags + * type: KAPI_TYPE_INT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_MASK + * valid-mask: O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | + * O_TRUNC | O_APPEND | O_NONBLOCK | O_DSYNC | O_SYNC | FASYNC | + * O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | O_NOATIME | + * O_CLOEXEC | O_PATH | O_TMPFILE + * cdesc: Must include exactly one of O_RDONLY (0), O_WRONLY (1), or + * O_RDWR (2) as the access mode. Additional flags may be ORed. Invalid flag + * combinations (e.g., O_PATH with incompatible flags, O_TMPFILE without + * O_DIRECTORY, O_TMPFILE with read-only mode) return EINVAL. Since Linux + * 6.7, O_CREAT is silently ignored when combined with O_DIRECTORY. Unknown + * flags are silently ignored for backward compatibility (unlike openat2 + * which rejects them). + * + * param: mode + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_MASK + * valid-mask: S_ISUID | S_ISGID | S_ISVTX | S_IRWXU | S_IRWXG | S_IRWXO + * cdesc: Only meaningful when O_CREAT or O_TMPFILE is specified in + * flags. Specifies the file mode bits (permissions and setuid/setgid/sticky + * bits) for a newly created file. The effective mode is (mode & ~umask). + * When O_CREAT/O_TMPFILE is not set, mode is ignored. Mode values exceeding + * S_IALLUGO (07777) are masked off. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_FD + * success: >= 0 + * desc: On success, returns a new file descriptor (non-negative integer). + * The returned file descriptor is the lowest-numbered descriptor not + * currently open for the process. On error, returns a negative error code. + * + * error: EACCES, Permission denied + * desc: The requested access to the file is not allowed, or search permission + * is denied for one of the directories in the path prefix of pathname, or + * the file did not exist yet and write access to the parent directory is + * not allowed, or O_TRUNC is specified but write permission is denied, or + * the file is on a filesystem mounted with noexec and MAY_EXEC was implied. + * + * error: EAGAIN, Resource temporarily unavailable + * desc: The file is a FIFO or regular file, O_NONBLOCK is specified, and the + * operation would block. Also returned when RESOLVE_CACHED is used with + * openat2() and the lookup cannot be satisfied from the dentry cache. + * + * error: EBUSY, Device or resource busy + * desc: O_EXCL was specified in flags and pathname refers to a block device + * that is in use by the system (e.g., it is mounted). + * + * error: EDQUOT, Disk quota exceeded + * desc: O_CREAT is specified and the file does not exist, and the user's quota + * of disk blocks or inodes on the filesystem has been exhausted. + * + * error: EEXIST, File exists + * desc: O_CREAT and O_EXCL were specified in flags, but pathname already exists. + * This error is atomic with respect to file creation - it prevents race + * conditions (TOCTOU) when creating files. + * + * error: EFAULT, Bad address + * desc: pathname points outside the process's accessible address space. + * + * error: EINTR, Interrupted system call + * desc: The call was interrupted by a signal handler before completing file + * open. This can occur during lock acquisition or when breaking leases. + * + * error: EINVAL, Invalid argument + * desc: Returned for several conditions: (1) Invalid O_* flag combinations + * (O_TMPFILE without O_DIRECTORY, O_TMPFILE with read-only access, O_PATH + * with flags other than O_DIRECTORY|O_NOFOLLOW|O_CLOEXEC). + * (2) mode contains bits outside S_IALLUGO when O_CREAT/O_TMPFILE + * is set (openat2 only). (3) O_DIRECT requested but filesystem doesn't + * support it. (4) The filesystem does not support O_SYNC or O_DSYNC. + * + * error: EISDIR, Is a directory + * desc: pathname refers to a directory and the access requested involved + * writing (O_WRONLY, O_RDWR, or O_TRUNC). Also returned when O_TMPFILE is + * used on a directory that doesn't support tmpfile operations. + * + * error: ELOOP, Too many symbolic links + * desc: Too many symbolic links were encountered in resolving pathname, or + * O_NOFOLLOW was specified but pathname refers to a symbolic link. + * + * error: EMFILE, Too many open files + * desc: The per-process limit on the number of open file descriptors has been + * reached. This limit is RLIMIT_NOFILE (default typically 1024, max set by + * /proc/sys/fs/nr_open). + * + * error: ENAMETOOLONG, File name too long + * desc: pathname was too long, exceeding PATH_MAX (4096) bytes, or a single + * path component exceeded NAME_MAX (usually 255) bytes. + * + * error: ENFILE, Too many open files in system + * desc: The system-wide limit on the total number of open files has been + * reached (/proc/sys/fs/file-max). Processes with CAP_SYS_ADMIN can exceed + * this limit. + * + * error: ENODEV, No such device + * desc: pathname refers to a special file that has no corresponding device, or + * the file's inode has no file operations assigned. + * + * error: ENOENT, No such file or directory + * desc: A directory component in pathname does not exist or is a dangling + * symbolic link, or O_CREAT is not set and the named file does not exist, + * or pathname is an empty string (unless AT_EMPTY_PATH is used with openat2). + * + * error: ENOMEM, Out of memory + * desc: The kernel could not allocate sufficient memory for the file structure, + * path lookup structures, or the filename buffer. + * + * error: ENOSPC, No space left on device + * desc: O_CREAT was specified and the file does not exist, and the directory + * or filesystem containing the file has no room for a new file entry. + * + * error: ENOTDIR, Not a directory + * desc: A component used as a directory in pathname is not actually a directory, + * or O_DIRECTORY was specified and pathname was not a directory. + * + * error: ENXIO, No such device or address + * desc: O_NONBLOCK | O_WRONLY is set and the named file is a FIFO and no + * process has the FIFO open for reading. Also returned when opening a device + * special file that does not exist. + * + * error: EOPNOTSUPP, Operation not supported + * desc: The filesystem containing pathname does not support O_TMPFILE. + * + * error: EOVERFLOW, Value too large for defined data type + * desc: pathname refers to a regular file that is too large to be opened. + * This occurs on 32-bit systems without O_LARGEFILE when the file size + * exceeds 2GB (2^31 - 1 bytes). + * + * error: EPERM, Operation not permitted + * desc: O_NOATIME flag was specified but the effective UID of the caller did + * not match the owner of the file and the caller is not privileged, or the + * file is append-only and O_TRUNC was specified or write mode without + * O_APPEND, or the file is immutable, or a seal prevents the operation. + * + * error: EROFS, Read-only file system + * desc: pathname refers to a file on a read-only filesystem and write access + * was requested. + * + * error: ETXTBSY, Text file busy + * desc: pathname refers to an executable image which is currently being + * executed, or to a swap file, and write access or truncation was requested. + * + * error: EWOULDBLOCK, Resource temporarily unavailable + * desc: O_NONBLOCK was specified and an incompatible lease is held on the file. + * + * lock: files->file_lock + * type: KAPI_LOCK_SPINLOCK + * acquired: true + * released: true + * desc: Acquired when allocating a file descriptor slot. Held briefly during + * fd allocation via alloc_fd() and released before the syscall returns. + * + * lock: inode->i_rwsem (parent directory) + * type: KAPI_LOCK_RWLOCK + * acquired: conditional + * released: true + * desc: Write lock acquired on parent directory inode when creating a new file + * (O_CREAT). Acquired via inode_lock_nested() in lookup path. May use + * killable variant which can return EINTR on fatal signal. + * + * lock: RCU read-side + * type: KAPI_LOCK_RCU + * acquired: true + * released: true + * desc: Path lookup uses RCU mode initially for performance. If RCU lookup + * fails (returns -ECHILD), falls back to reference-based lookup. + * + * signal: Any signal + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RETURN + * condition: When blocked on interruptible or killable operations + * desc: The syscall may be interrupted during path lookup, lock acquisition, + * or lease breaking. Fatal signals (SIGKILL, etc.) will interrupt killable + * operations. Non-fatal signals may interrupt interruptible operations. + * error: -EINTR + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: yes + * + * side-effect: KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_ALLOC_MEMORY + * target: file descriptor, file structure, dentry cache + * desc: Allocates a new file descriptor in the process's fd table. Allocates + * a struct file from the filp slab cache. May allocate dentries and inodes + * during path lookup. System-wide file count (nr_files) is incremented. + * reversible: yes + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: filesystem, inode + * condition: When O_CREAT is specified and file doesn't exist + * desc: Creates a new file on the filesystem. Creates new inode, allocates + * data blocks as needed, and creates directory entry. Updates parent + * directory mtime and ctime. + * reversible: no + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: file content + * condition: When O_TRUNC is specified for existing file + * desc: Truncates the file to zero length, releasing data blocks. Updates + * file mtime and ctime. May trigger notifications to lease holders. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: inode timestamps + * condition: Unless O_NOATIME is specified + * desc: Opens for reading may update inode access time (atime) unless mounted + * with noatime/relatime or O_NOATIME is specified. Opens for writing that + * truncate or create update mtime and ctime. + * + * capability: CAP_DAC_OVERRIDE + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypass file read, write, and execute permission checks + * without: Standard DAC (discretionary access control) checks are applied + * condition: Checked when file permission would otherwise deny access + * + * capability: CAP_DAC_READ_SEARCH + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypass read permission on files and search permission on directories + * without: Must have read permission on file or search permission on directory + * condition: Checked during path traversal and file open + * + * capability: CAP_FOWNER + * type: KAPI_CAP_BYPASS_CHECK + * allows: Use O_NOATIME on files not owned by caller + * without: O_NOATIME returns EPERM if caller is not file owner + * condition: Checked when O_NOATIME is specified and caller is not owner + * + * capability: CAP_SYS_ADMIN + * type: KAPI_CAP_INCREASE_LIMIT + * allows: Exceed the system-wide file limit (file-max) + * without: Returns ENFILE when system limit is reached + * condition: Checked in alloc_empty_file() when nr_files >= max_files + * + * constraint: RLIMIT_NOFILE (per-process fd limit) + * desc: The returned file descriptor must be less than the process's + * RLIMIT_NOFILE limit. Default is typically 1024, maximum is controlled + * by /proc/sys/fs/nr_open (default 1048576). Exceeding returns EMFILE. + * expr: fd < rlimit(RLIMIT_NOFILE) + * + * constraint: file-max (system-wide limit) + * desc: System-wide limit on open files in /proc/sys/fs/file-max. Processes + * without CAP_SYS_ADMIN receive ENFILE when this limit is reached. The + * limit is computed based on system memory at boot time. + * expr: nr_files < files_stat.max_files || capable(CAP_SYS_ADMIN) + * + * constraint: PATH_MAX + * desc: Maximum length of pathname including null terminator is PATH_MAX + * (4096 bytes). Individual path components must not exceed NAME_MAX (255). + * + * examples: fd = open("/etc/passwd", O_RDONLY); // Read existing file + * fd = open("/tmp/newfile", O_WRONLY | O_CREAT | O_TRUNC, 0644); // Create/truncate + * fd = open("/tmp/lockfile", O_WRONLY | O_CREAT | O_EXCL, 0600); // Exclusive create + * fd = open("/dev/null", O_RDWR); // Open device + * fd = open("/tmp", O_RDONLY | O_DIRECTORY); // Open directory + * fd = open("/tmp", O_TMPFILE | O_RDWR, 0600); // Anonymous temp file + * + * notes: The distinction between O_RDONLY, O_WRONLY, and O_RDWR is critical. + * O_RDONLY is defined as 0, so (flags & O_RDONLY) will be true for all flags. + * Test access mode using (flags & O_ACCMODE) == O_RDONLY. + * + * When O_CREAT is specified without O_EXCL, there is a race condition between + * testing for file existence and creating it. Use O_CREAT | O_EXCL for atomic + * exclusive file creation. + * + * O_CLOEXEC should be used in multithreaded programs to prevent file descriptor + * leaks to child processes between fork() and execve(). + * + * O_DIRECT has alignment requirements that vary by filesystem. Use statx() + * with STATX_DIOALIGN (Linux 6.1+) to query requirements. Unaligned I/O may + * fail with EINVAL or fall back to buffered I/O. + * + * O_PATH opens a file descriptor that can be used only for certain operations + * (fstat, dup, fcntl, close, fchdir on directories, as dirfd for *at() calls). + * I/O operations will fail with EBADF. + */ SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode) { if (force_o_largefile()) @@ -1581,3 +1903,8 @@ int stream_open(struct inode *inode, struct file *filp) } EXPORT_SYMBOL(stream_open); + +/* Include auto-generated API specifications from kerneldoc annotations */ +#if IS_ENABLED(CONFIG_KAPI_SPEC) +#include "open.apispec.h" +#endif -- 2.51.0