Add doxygen comment blocks for remaining helpers (btf/iter etc.) in tools/lib/bpf/bpf.h. These doc comments are for: -libbpf_set_memlock_rlim() -bpf_btf_load() -bpf_iter_create() -bpf_btf_get_next_id() -bpf_btf_get_fd_by_id() -bpf_btf_get_fd_by_id_opts() -bpf_raw_tracepoint_open_opts() -bpf_raw_tracepoint_open() -bpf_task_fd_query() Signed-off-by: Jianyun Gao --- v1->v2: - Fixed compilation error caused by embedded literal "/*" inside a comment (rephrased/escaped). - Fixed the non-ASCII characters in this patch. The v1 is here: https://lore.kernel.org/lkml/20251031032627.1414462-6-jianyungao89@gmail.com/ tools/lib/bpf/bpf.h | 745 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 740 insertions(+), 5 deletions(-) diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index a0cebda09e16..6ef1ea7921c4 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -34,7 +34,61 @@ #ifdef __cplusplus extern "C" { #endif - +/** + * @brief Adjust process RLIMIT_MEMLOCK to facilitate loading BPF objects. + * + * libbpf_set_memlock_rlim() raises (or lowers) the calling process's + * RLIMIT_MEMLOCK soft and hard limits to at least the number of bytes + * specified by memlock_bytes. BPF map and program creation can require + * locking kernel/user pages; if RLIMIT_MEMLOCK is too low the kernel + * will fail operations with EPERM/ENOMEM. This helper provides a + * convenient way to pre-allocate sufficient memlock quota. + * + * Semantics: + * - If current (soft or hard) RLIMIT_MEMLOCK is already >= memlock_bytes, + * the limit is left unchanged and the function succeeds. + * - Otherwise, the function attempts to set both soft and hard limits + * to memlock_bytes using setrlimit(RLIMIT_MEMLOCK, ...). + * - On systems enforcing privilege constraints, increasing the hard + * limit may require CAP_SYS_RESOURCE; lack of privilege yields failure. + * + * Typical usage (before loading large maps/programs): + * size_t needed = 128ul * 1024 * 1024; // 128 MB + * if (libbpf_set_memlock_rlim(needed) < 0) { + * // handle error (e.g., fall back to smaller maps or abort) + * } + * + * Choosing a value: + * - Sum anticipated sizes of maps (key_size + value_size) * max_entries + * plus overhead. Add headroom for verifier, BTF, and future growth. + * - Large per-CPU maps multiply value storage by number of CPUs. + * - Overestimating is usually harmless (within administrative policy). + * + * Concurrency & scope: + * - Affects only the calling process's RLIMIT_MEMLOCK. + * - Child processes inherit the adjusted limits after fork/exec. + * + * Security / privileges: + * - Increasing the hard limit above the current maximum may require + * CAP_SYS_RESOURCE or appropriate PAM/ulimit configuration. + * - Without sufficient privilege, the call fails with -errno (often -EPERM). + * + * @param memlock_bytes Desired minimum RLIMIT_MEMLOCK (in bytes). If zero, + * the function is a no-op (always succeeds). + * + * @return 0 on success; + * < 0 negative error code (libbpf style == -errno) on failure: + * - -EINVAL: Invalid argument (e.g., internal conversion issues). + * - -EPERM / -EACCES: Insufficient privilege to raise hard limit. + * - -ENOMEM: Rare failure allocating internal structures. + * - Other -errno codes propagated from setrlimit(). + * + * Failure handling: + * - A failure means RLIMIT_MEMLOCK is unchanged; subsequent BPF map/program + * loads may still succeed if existing limit is adequate. + * - Check current limits manually (getrlimit) if precise sizing is critical. + * + */ LIBBPF_API int libbpf_set_memlock_rlim(size_t memlock_bytes); struct bpf_map_create_opts { @@ -295,7 +349,104 @@ struct bpf_btf_load_opts { size_t :0; }; #define bpf_btf_load_opts__last_field token_fd - +/** + * @brief Load a BTF (BPF Type Format) blob into the kernel and obtain a BTF object FD. + * + * bpf_btf_load() wraps the BPF_BTF_LOAD command of the bpf(2) syscall. It validates + * and registers the BTF metadata described by @p btf_data so that subsequently loaded + * BPF programs and maps can reference rich type information (for CO-RE relocations, + * pretty printing, introspection, etc.). + * + * Typical usage: + * // Prepare optional verifier/logging buffer (only if you want kernel diagnostics) + * char log_buf[1 << 20] = {}; + * struct bpf_btf_load_opts opts = { + * .sz = sizeof(opts), + * .log_buf = log_buf, + * .log_size = sizeof(log_buf), + * .log_level = 1, // >0 to request kernel parsing/validation log + * }; + * int btf_fd = bpf_btf_load(btf_blob_ptr, btf_blob_size, &opts); + * if (btf_fd < 0) { + * // Inspect errno; if opts.log_buf was provided, it may contain details. + * } else { + * // Use btf_fd (e.g. pass to bpf_prog_load() via prog_btf_fd, or query info). + * } + * + * Input expectations: + * - @p btf_data must point to a complete, well-formed BTF buffer starting with + * struct btf_header followed by the type section and string section. + * - @p btf_size is the total size in bytes of that buffer. + * - Endianness must match the running kernel; cross-endian BTF is rejected. + * - Types must obey kernel constraints (e.g., no unsupported kinds, valid string + * offsets, canonical integer encodings, no dangling references). + * + * Logging (opts->log_*): + * - If @p opts is non-NULL and opts->log_level > 0, the kernel may emit a textual + * parse/validation log into opts->log_buf (up to opts->log_size - 1 bytes, with + * trailing '\0'). + * - On supported kernels, opts->log_true_size is updated to reflect the full (untruncated) + * length of the internal log; if larger than log_size, the log was truncated. + * - If the kernel does not support returning true size, log_true_size remains equal + * to the original log_size value or zero. + * + * Privileges & security: + * - CAP_BPF and/or CAP_SYS_ADMIN may be required depending on kernel configuration, + * LSM policy, and lockdown mode. Lack of privilege yields -EPERM / -EACCES. + * - In delegated environments, opts->token_fd (if available and supported) can grant + * scoped permission to load BTF without full global capabilities. + * + * Memory and lifetime: + * - On success a file descriptor (>= 0) referencing the in-kernel BTF object is returned. + * Close it with close() when no longer needed. + * - The kernel makes its own copy of the supplied BTF blob; the caller can free or reuse + * @p btf_data immediately after the call returns. + * - BTF objects can be queried via bpf_btf_get_info_by_fd() and referenced by programs + * (prog_btf_fd) or maps for type information. + * + * Concurrency & races: + * - Loading is independent; multiple BTF objects may coexist. + * - There is no automatic deduplication across separate loads (except any internal + * kernel optimizations); user space manages uniqueness/pinning if desired. + * + * Validation tips: + * - Use bpftool btf dump to sanity-check a blob before loading. + * - Keep string table minimal; excessive strings inflate memory and may hit limits. + * - Ensure all referenced type IDs exist and form a closed, acyclic graph (except + * for permitted self-references in struct/union definitions). + * + * After loading: + * - Pass the returned FD as prog_btf_fd when loading programs that rely on CO-RE + * relocations or need BTF type validation. + * - Optionally pin the BTF object with bpf_obj_pin() for persistence across process + * lifetimes. + * - Query metadata (e.g., number of types, string section size) with bpf_btf_get_info_by_fd(). + * + * @param btf_data Pointer to the raw in-memory BTF blob. + * @param btf_size Size (in bytes) of the BTF blob pointed to by @p btf_data. + * @param opts Optional pointer to a bpf_btf_load_opts struct. May be NULL. + * Must set opts->sz = sizeof(*opts) when non-NULL. Fields: + * - log_buf / log_size / log_level: Request and store kernel + * validation log (see Logging). + * - log_true_size: Updated by kernel on success (if supported). + * - btf_flags: Reserved for future extensions (must be 0 unless documented). + * - token_fd: Delegated permission token (0 or -1 if unused). + * + * @return + * >= 0 : File descriptor referencing the loaded BTF object. + * < 0 : Negative error code (see Error handling). + * + * Error handling (negative return codes == -errno style): + * - -EINVAL: Malformed BTF (bad header, section sizes, invalid type graph, bad string + * offsets, unsupported features), opts->sz mismatch, bad flags. + * - -EFAULT: @p btf_data or opts->log_buf points to unreadable/writable memory. + * - -ENOMEM: Kernel failed to allocate memory for internal BTF representation. + * - -EPERM / -EACCES: Insufficient privileges or blocked by security policy. + * - -E2BIG: Exceeds kernel size/complexity limits (e.g., too many types or strings). + * - -ENOTSUP / -EOPNOTSUPP: Kernel lacks support for a feature used in the blob (rare). + * - Other negative codes may be propagated from the underlying syscall. + * + */ LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts); @@ -1840,7 +1991,84 @@ struct bpf_link_update_opts { */ LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd, const struct bpf_link_update_opts *opts); - +/** + * @brief Create a user space iterator stream FD from an existing BPF iterator link. + * + * bpf_iter_create() wraps the kernel's BPF_ITER_CREATE command. Given a BPF + * link FD (@p link_fd) that represents an attached BPF iterator program + * (i.e., a program of type BPF_PROG_TYPE_TRACING with an iterator attach + * type such as BPF_TRACE_ITER), this function returns a new file descriptor + * from which user space can sequentially read the iterator's textual or + * binary output. + * + * Reading the returned FD: + * - Use read(), pread(), or a buffered I/O layer to consume iterator data. + * - Each read() returns zero (EOF) when the iterator has completed producing + * all elements; close the FD afterward. + * - Short reads are normal; loop until EOF or error. + * + * Lifetime & ownership: + * - Success returns a new FD; caller owns it and must close() when finished. + * - Closing the iterator FD does NOT destroy the underlying link or program. + * - You can create multiple iterator FDs from the same link concurrently; + * each is an independent traversal. + * + * Typical usage: + * int link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_ITER, &opts); + * if (link_fd < 0) { // handle error } + * int iter_fd = bpf_iter_create(link_fd); + * if (iter_fd < 0) { // handle error } + * char buf[4096]; + * for (;;) { + * ssize_t n = read(iter_fd, buf, sizeof(buf)); + * if (n < 0) { + * if (errno == EINTR) continue; + * perror("read iter"); + * break; + * } + * if (n == 0) // end of iteration + * break; + * fwrite(buf, 1, n, stdout); + * } + * close(iter_fd); + * + * Concurrency & races: + * - Safe to call concurrently from multiple threads; each iterator FD + * represents its own walk. + * - Underlying kernel objects (maps, tasks, etc.) may change while iterating; + * output is a best-effort snapshot, not a stable, atomic view. + * + * Performance considerations: + * - Large buffers (e.g., 16-64 KiB) reduce syscall overhead for high-volume + * iterators. + * - For blocking behavior, select()/poll()/epoll() can be used; EOF is + * indicated by read() returning 0. + * + * Security & privileges: + * - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel configuration, + * lockdown mode, and LSM policy governing the iterator target. + * + * @param link_fd File descriptor of a BPF link representing an attached iterator program. + * + * @return >= 0: Iterator stream file descriptor to read from. + * < 0 : Negative error code (libbpf style, == -errno) on failure. + * + * + * Error handling (negative libbpf-style return value == -errno): + * - -EBADF: @p link_fd is not a valid open FD. + * - -EINVAL: @p link_fd does not refer to an iterator-capable BPF link, or + * unsupported combination for the running kernel. + * - -EPERM / -EACCES: Insufficient privileges / blocked by security policy. + * - -EOPNOTSUPP / -ENOTSUP: Kernel lacks iterator creation support for this link. + * - -ENOMEM: Kernel could not allocate internal data structures. + * - Other -errno codes may be propagated from the underlying bpf() syscall. + * + * Robustness tips: + * - Verify the program was attached with the correct iterator attach type. + * - Treat a 0-length read as normal completion, not an error. + * - Always handle transient read() failures (EINTR, EAGAIN if non-blocking). + * + */ LIBBPF_API int bpf_iter_create(int link_fd); struct bpf_prog_test_run_attr { @@ -1953,6 +2181,68 @@ LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id); */ LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id); +/** + * @brief Retrieve the next existing BTF object ID after a given starting ID. + * + * This helper wraps the kernel's BPF_BTF_GET_NEXT_ID command and enumerates + * in-kernel BTF (BPF Type Format) objects in strictly ascending order of + * their kernel-assigned IDs. It is typically used to iterate all currently + * loaded BTF objects (e.g., vmlinux BTF, module BTFs, user-loaded BTF blobs). + * + * Enumeration pattern: + * 1. Initialize start_id to 0 to obtain the first (lowest) existing BTF ID. + * 2. On success, *next_id is set to the first BTF ID strictly greater than start_id. + * 3. Use the returned *next_id as the new start_id in a subsequent call. + * 4. Repeat until the function returns -ENOENT, which signals there is no + * BTF object with ID greater than start_id (end of iteration). + * + * Concurrency & races: + * - BTF objects can be loaded or unloaded concurrently with enumeration. + * An ID retrieved in one call may become invalid (object unloaded) before + * you convert it to a file descriptor with bpf_btf_get_fd_by_id(). + * - Enumeration does not provide a stable snapshot. Newly loaded BTFs may + * appear after you've passed their predecessor ID. + * + * Lifetime & validity: + * - IDs are monotonically increasing and effectively never wrap in normal + * operation. + * - Successfully retrieving an ID does NOT pin the corresponding BTF object. + * Obtain a file descriptor immediately if you need to interact with it. + * + * Typical usage: + * __u32 id = 0, next; + * while (bpf_btf_get_next_id(id, &next) == 0) { + * int btf_fd = bpf_btf_get_fd_by_id(next); + * if (btf_fd >= 0) { + * // Inspect/query BTF (e.g. bpf_btf_get_info_by_fd()). + * close(btf_fd); + * } + * id = next; + * } + * // Loop ends when bpf_btf_get_next_id() returns -ENOENT. + * + * @param start_id + * Starting point for the search. The helper finds the first BTF ID + * strictly greater than start_id. Use 0 to begin enumeration. + * @param next_id + * Pointer to a __u32 that receives the next BTF ID on success. + * Must not be NULL. + * + * @return + * 0 on success (next_id populated); + * -ENOENT if there is no BTF ID greater than start_id (normal end of iteration); + * -EINVAL if next_id is NULL or arguments are otherwise invalid; + * -EPERM / -EACCES if denied by security policy or lacking required privileges; + * Other negative libbpf-style codes (-errno) on transient or system failures. + * + * Error handling notes: + * - Treat -ENOENT as normal termination, not an exceptional error. + * - For other failures, errno is set to the underlying cause. + * + * Follow-up: + * - Convert retrieved IDs to FDs with bpf_btf_get_fd_by_id() to inspect + * metadata or pin the BTF object. + */ LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id); /** * @brief Retrieve the next existing BPF link ID after a given starting ID. @@ -2227,9 +2517,171 @@ LIBBPF_API int bpf_map_get_fd_by_id(__u32 id); */ LIBBPF_API int bpf_map_get_fd_by_id_opts(__u32 id, const struct bpf_get_fd_by_id_opts *opts); - +/** + * @brief Obtain a file descriptor for an existing in-kernel BTF (BPF Type Format) + * object given its kernel-assigned ID. + * + * bpf_btf_get_fd_by_id() wraps the BPF_BTF_GET_FD_BY_ID command of the bpf(2) + * syscall. Each loaded BTF object (vmlinux BTF, kernel module BTF, or user-supplied + * BTF blob loaded via BPF_BTF_LOAD) has a monotonically increasing, unique ID. + * This helper converts that stable ID into a process-local file descriptor + * suitable for introspection (e.g., via bpf_btf_get_info_by_fd()), pinning + * (bpf_obj_pin()), or reuse when loading BPF programs/maps that reference types + * from this BTF. + * + * Typical enumeration + open pattern: + * __u32 id = 0, next; + * while (bpf_btf_get_next_id(id, &next) == 0) { + * int btf_fd = bpf_btf_get_fd_by_id(next); + * if (btf_fd >= 0) { + * // inspect with bpf_btf_get_info_by_fd(btf_fd, ...) + * close(btf_fd); + * } + * id = next; + * } + * // Loop ends when bpf_btf_get_next_id() returns -ENOENT. + * + * Concurrency & races: + * - A BTF object may be unloaded (e.g., module removal) between discovering + * its ID and calling this function; in that case the call fails with -ENOENT. + * - Successfully obtaining a file descriptor does not prevent later unloading + * by other processes; subsequent operations on the FD can still fail. + * + * Lifetime & ownership: + * - On success the caller owns the returned descriptor and must close() it + * when no longer needed. + * - Closing the FD does not destroy the underlying BTF object if other + * references (FDs or pinned bpffs paths) remain. + * + * Privileges / security: + * - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel configuration, + * LSM policies, or lockdown mode. Lack of privilege yields -EPERM / -EACCES. + * - Access can also be restricted by namespace or cgroup-based security policies. + * + * Use cases: + * - Retrieve BTF metadata (type counts, string section size, specific type + * definitions) via bpf_btf_get_info_by_fd(). + * - Pass the FD as prog_btf_fd when loading eBPF programs needing CO-RE or + * type validation. + * - Pin the BTF object for persistence across process lifetimes. + * + * @param id + * Kernel-assigned unique (non-zero) BTF object ID. Typically obtained via + * bpf_btf_get_next_id() or from a prior info query. Must be > 0. + * + * @return + * >= 0 : File descriptor referencing the BTF object (caller must close()). + * < 0 : Negative libbpf-style error code (== -errno): + * - -ENOENT : No BTF object with this ID (unloaded or never existed). + * - -EPERM / -EACCES : Insufficient privileges / blocked by policy. + * - -EINVAL : Invalid ID (e.g., 0) or kernel rejected the request. + * - -ENOMEM : Kernel memory/resource exhaustion. + * - Other negative values: Propagated syscall failures. + * + * Error handling notes: + * - Treat -ENOENT as a normal race outcome if objects can disappear. + * - Always close the returned FD to avoid resource leaks. + * + * Thread safety: + * - Safe to call concurrently; each successful invocation yields an independent FD. + * + * Forward compatibility: + * - ID space is monotonic; practical wraparound is not expected. + * - Future kernels may add additional validation or permission gating; handle + * new -errno codes conservatively. + */ LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id); +/** + * @brief Obtain a file descriptor for an existing in-kernel BTF (BPF Type Format) + * object by its kernel-assigned ID, with extended open options. + * + * bpf_btf_get_fd_by_id_opts() is an extended variant of bpf_btf_get_fd_by_id(). + * It wraps the BPF_BTF_GET_FD_BY_ID command of the bpf(2) syscall and converts + * a stable, monotonically increasing BTF object ID (@p id) into a process-local + * file descriptor, honoring optional attributes supplied via @p opts. + * + * A BTF object represents a loaded collection of type metadata (vmlinux BTF, + * kernel module BTF, or user-supplied BTF blob). Programs and maps can refer + * to these types for CO-RE relocations, verification, and introspection. + * + * Typical enumeration + open pattern: + * __u32 cur = 0, next; + * while (bpf_btf_get_next_id(cur, &next) == 0) { + * struct bpf_get_fd_by_id_opts o = { + * .sz = sizeof(o), + * .open_flags = 0, + * .token_fd = -1, + * }; + * int btf_fd = bpf_btf_get_fd_by_id_opts(next, &o); + * if (btf_fd >= 0) { + * // use btf_fd (e.g. bpf_btf_get_info_by_fd()) + * close(btf_fd); + * } + * cur = next; + * } + * // Loop ends when bpf_btf_get_next_id() returns -ENOENT. + * + * Initialization & @p opts usage: + * - @p opts may be NULL for default behavior (equivalent to zeroed fields). + * - If @p opts is non-NULL, opts->sz MUST be set to sizeof(*opts); mismatch + * yields -EINVAL. + * - opts->open_flags: + * Reserved for future kernel extensions; pass 0 unless a documented flag + * is supported. Unsupported bits => -EINVAL. + * - opts->token_fd: + * Optional BPF token FD enabling delegated (restricted) permissions. Set + * to -1 or 0 if unused. Provides a way to open BTF objects without full + * CAP_BPF/CAP_SYS_ADMIN in controlled environments. + * + * Concurrency & races: + * - A BTF object can be unloaded (e.g., module removal) after ID discovery + * but before this call; expect -ENOENT in such races. + * - Successfully obtaining a file descriptor does not guarantee the object + * will remain available for its entire lifetime (it could still be removed + * depending on kernel policies), so subsequent operations may fail. + * + * Lifetime & ownership: + * - On success you own the returned FD and must close() it when done. + * - Closing the FD does not destroy the BTF object if other references (FDs, + * pinned bpffs entries) remain. + * - You may pin the BTF object via bpf_obj_pin() for persistence. + * + * Security / privileges: + * - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel configuration, + * LSM policy, and lockdown mode. + * - Access via a token_fd is subject to token scope; insufficient rights yield + * -EPERM / -EACCES. + * + * Use cases: + * - Retrieve type information with bpf_btf_get_info_by_fd(). + * - Supply prog_btf_fd when loading eBPF programs needing CO-RE relocations. + * - Enumerate and manage user-loaded or kernel-provided BTF datasets. + * + * Robustness tips: + * - Treat -ENOENT as a normal race when enumerating dynamic BTF objects. + * - Always zero-initialize opts before setting recognized fields: + * struct bpf_get_fd_by_id_opts o = {}; + * o.sz = sizeof(o); + * - Avoid non-zero open_flags until documented; future kernels may add semantic + * modifiers (e.g., restricted viewing modes). + * + * @param id Kernel-assigned unique BTF object ID (> 0). + * @param opts Optional pointer to struct bpf_get_fd_by_id_opts controlling open + * behavior; may be NULL for defaults. + * + * @return >= 0: File descriptor referencing the BTF object (caller must close()). + * < 0 : Negative error code (libbpf style == -errno) on failure. + * + * Error handling (negative return values are libbpf-style == -errno): + * - -ENOENT: No BTF object with @p id (unloaded or never existed). + * - -EINVAL: Invalid @p id (e.g., 0), malformed @p opts (bad sz), or unsupported + * open_flags bits. + * - -EPERM / -EACCES: Insufficient privileges or blocked by security policy. + * - -ENOMEM: Kernel resource allocation failure. + * - Other -errno codes may be propagated from underlying syscall failures. + * + */ LIBBPF_API int bpf_btf_get_fd_by_id_opts(__u32 id, const struct bpf_get_fd_by_id_opts *opts); /** @@ -2650,11 +3102,294 @@ struct bpf_raw_tp_opts { size_t :0; }; #define bpf_raw_tp_opts__last_field cookie - +/** + * @brief Attach a loaded BPF program to a raw tracepoint using extended options. + * + * bpf_raw_tracepoint_open_opts() wraps the BPF_RAW_TRACEPOINT_OPEN command and + * creates a persistent attachment of @p prog_fd to the raw tracepoint named in + * @p opts->tp_name. On success it returns a file descriptor representing the + * attachment. Closing that FD detaches the program from the tracepoint. + * + * Compared to bpf_raw_tracepoint_open(), this variant allows passing a user + * cookie (opts->cookie) and provides forward/backward compatibility via the + * @p opts->sz field. + * + * Typical usage: + * struct bpf_raw_tp_opts ropts = { + * .sz = sizeof(ropts), + * .tp_name = "sched_switch", // raw tracepoint name (no "tracepoint/" prefix) + * .cookie = 0xdeadbeef, // optional user cookie (visible to program) + * }; + * int tp_fd = bpf_raw_tracepoint_open_opts(prog_fd, &ropts); + * if (tp_fd < 0) { + * // handle error (inspect errno or negative return value) + * } + * // ... use attachment; close(tp_fd) to detach when done. + * + * Tracepoint name: + * - Use the raw tracepoint identifier as exposed under + * /sys/kernel/debug/tracing/events/ without category prefixes. For raw + * tracepoints this is typically the internal kernel name (e.g., "sched_switch"). + * - Passing NULL or an empty string fails with -EINVAL. + * + * Cookie: + * - opts->cookie (if non-zero) becomes available to the attached program via + * bpf_get_attach_cookie() helper (where supported). + * - Set to 0 if you don't need a cookie; kernel treats it as absent. + * + * Structure initialization: + * - opts MUST NOT be NULL. + * - Zero-initialize the struct, then set: + * opts->sz = sizeof(struct bpf_raw_tp_opts); + * opts->tp_name = ""; + * opts->cookie = ; + * - Unrecognized future fields must remain zero for compatibility. + * + * Lifetime & detachment: + * - The returned FD solely controls the attachment lifetime. Closing it + * detaches the program. + * - The program FD @p prog_fd may be closed independently after successful + * attachment; the link remains active until the tracepoint FD is closed. + * + * Concurrency: + * - Multiple programs can attach to the same raw tracepoint (each gets its + * own FD). + * - Attaching/detaching is atomic from the program's perspective; events + * arriving after success will invoke the program. + * + * Privileges: + * - Typically requires CAP_BPF and/or CAP_SYS_ADMIN depending on kernel + * configuration, LSM policy, and lockdown mode. + * + * Performance considerations: + * - Raw tracepoints invoke programs on every event occurrence; ensure program + * logic is efficient to avoid noticeable system overhead. + * + * @param prog_fd + * File descriptor of a previously loaded BPF program (bpf_prog_load()) that + * is compatible with raw tracepoint attachment (e.g., program type + * BPF_PROG_TYPE_RAW_TRACEPOINT or suitable tracing type). + * + * @param opts + * Pointer to an initialized bpf_raw_tp_opts structure describing the target + * tracepoint and optional cookie. Must not be NULL. opts->sz must equal + * sizeof(struct bpf_raw_tp_opts). + * + * @return + * >= 0 : File descriptor representing the attachment (close to detach). + * < 0 : Negative libbpf-style error code (== -errno) on failure: + * - -EINVAL : Bad prog_fd, malformed opts (sz mismatch, NULL tp_name), + * unsupported program type, or kernel lacks raw TP support. + * - -EPERM/-EACCES : Insufficient privileges or blocked by security policy. + * - -ENOENT : Tracepoint name not found / not supported by current kernel. + * - -EBADF : Invalid prog_fd. + * - -ENOMEM : Kernel memory/resource exhaustion. + * - -EOPNOTSUPP/-ENOTSUP : Raw tracepoint attachment not supported. + * - Other -errno codes may be propagated from the underlying syscall. + * + * Error handling: + * - Inspect the negative return value or errno for diagnostics. + * - Treat -ENOENT as "tracepoint unavailable" (kernel config or version gap). + * + * After attachment: + * - Optionally pin the FD (bpf_obj_pin()) if you need persistence. + * - Use bpf_obj_get_info_by_fd() to query attachment metadata if supported. + */ LIBBPF_API int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts); +/** + * @brief Attach a loaded BPF program to a raw tracepoint (legacy/simple API). + * + * bpf_raw_tracepoint_open() is a convenience wrapper that issues the + * BPF_RAW_TRACEPOINT_OPEN command to attach the BPF program referenced + * by @p prog_fd to the raw tracepoint named @p name. On success it returns + * a file descriptor representing the attachment; closing that FD detaches + * the program from the tracepoint. + * + * Compared to bpf_raw_tracepoint_open_opts(), this legacy interface + * provides no ability to specify an attach cookie or future extension + * fields. For new code prefer bpf_raw_tracepoint_open_opts() to enable + * forward/backward compatible option passing. + * + * Tracepoint name: + * - @p name must be a non-NULL, null-terminated string identifying a + * raw tracepoint (e.g. "sched_switch"). + * - Pass the raw kernel tracepoint identifier without any category + * prefix (do not include "tracepoint/" or directory components). + * - If the tracepoint is not available (kernel config/version) the + * call fails with -ENOENT. + * + * Program requirements: + * - @p prog_fd must refer to a loaded BPF program of a type compatible + * with raw tracepoint attachment (e.g., BPF_PROG_TYPE_RAW_TRACEPOINT + * or an allowed tracing program type accepted by the kernel). + * - The program may be safely closed after a successful attachment; + * the returned FD controls the lifetime of the link. + * + * Lifetime & detachment: + * - Each successful call creates a distinct attachment with its own FD. + * - Closing the returned FD immediately detaches the program from the + * tracepoint. + * - The returned FD can be pinned (bpf_obj_pin()) for persistence. + * + * Concurrency: + * - Multiple programs can be attached to the same raw tracepoint. + * - Attach/detach operations are atomic; events after success invoke + * the program until its FD is closed. + * + * Privileges & security: + * - Typically requires CAP_BPF and/or CAP_SYS_ADMIN depending on + * kernel configuration, LSM, and lockdown mode. + * - Insufficient privilege yields -EPERM / -EACCES. + * + * Performance considerations: + * - Raw tracepoints can be very frequent; ensure attached program + * logic is efficient to avoid noticeable overhead. + * + * @param name Null-terminated raw tracepoint name (e.g. "sched_switch"). + * @param prog_fd File descriptor of a loaded, compatible BPF program. + * + * @return >= 0 : Attachment file descriptor (close to detach). + * < 0 : Negative error code (libbpf style == -errno) on failure. + * + * Error handling (negative libbpf-style return value == -errno): + * - -EINVAL : Invalid @p prog_fd, NULL/empty @p name, incompatible program type. + * - -ENOENT : Tracepoint not found / unsupported by current kernel. + * - -EPERM/-EACCES : Insufficient privileges or blocked by security policy. + * - -EBADF : @p prog_fd is not a valid file descriptor. + * - -ENOMEM : Kernel memory/resource exhaustion. + * - -EOPNOTSUPP/-ENOTSUP : Raw tracepoints unsupported by the kernel. + * - Other negative codes may be propagated from the underlying syscall. + * + * Best practices: + * - Prefer bpf_raw_tracepoint_open_opts() for new development to + * gain cookie support and extensibility. + * - Immediately check the return value; do not rely solely on errno. + * - Pin the attachment if you need persistence across process lifetimes. + * + */ LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd); +/** + * @brief Query metadata about a file descriptor in another task (process) that + * is associated with a BPF tracing/perf event and (optionally) an + * attached BPF program. + * + * This helper wraps the kernel's BPF_TASK_FD_QUERY command. It inspects the + * file descriptor number @p fd that belongs to the task identified by @p pid + * and, if that FD represents a perf event or similar tracing attachment, it + * returns descriptive information about: + * - The attached BPF program (its kernel program ID). + * - The nature/type of the FD (tracepoint, raw_tracepoint, kprobe, uprobe, etc.). + * - Target symbol/address/offset data for kprobe/uprobes. + * - A human-readable identifier (tracepoint name, kprobe function name, + * uprobe file path), copied into @p buf when provided. + * + * Typical use cases: + * - Introspecting perf event FDs opened by another process to discover + * which BPF program is attached. + * - Enumerating and characterizing dynamically created kprobes or uprobes + * (e.g., by observability agents). + * - Building higher-level tooling that correlates program IDs with their + * originating probe specifications. + * + * Usage pattern: + * char info[256]; + * __u32 info_len = sizeof(info); + * __u32 prog_id = 0, fd_type = 0; + * __u64 probe_off = 0, probe_addr = 0; + * int err = bpf_task_fd_query(target_pid, target_fd, 0, + * info, &info_len, + * &prog_id, &fd_type, + * &probe_off, &probe_addr); + * if (err == 0) { + * // info[] now holds a NUL-terminated identifier (if available) + * // info_len == actual length (including terminating '\0') + * // fd_type enumerates one of BPF_FD_TYPE_* values + * // prog_id is the kernel-assigned BPF program ID (0 if none) + * // probe_off / probe_addr describe offsets/addresses for kprobe/uprobe + * } else if (err == -ENOSPC) { + * // info_len contains required size; allocate larger buffer and retry + * } + * + * Buffer semantics (@p buf / @p buf_len): + * - On input @p *buf_len must hold the capacity (in bytes) of @p buf. + * - If @p buf is large enough, the kernel copies a NUL-terminated string + * (tracepoint name, kprobe symbol, uprobe path, etc.) and updates + * @p *buf_len with the actual string length (including the NUL). + * - If @p buf is too small, the call fails with -ENOSPC and sets + * @p *buf_len to the required length; reallocate and retry. + * - If a textual identifier is not applicable (or unavailable), the kernel + * may set @p *buf_len to 0 (and leave @p buf untouched). + * - Passing @p buf == NULL is allowed only if @p buf_len is non-NULL and + * points to 0; otherwise -EINVAL is returned. + * + * Output parameters: + * - @p prog_id: Set to the kernel BPF program ID attached to the perf event + * FD (0 if no BPF program is attached). + * - @p fd_type: Set to one of the BPF_FD_TYPE_* enum values describing the + * FD (e.g., BPF_FD_TYPE_TRACEPOINT, BPF_FD_TYPE_KPROBE, BPF_FD_TYPE_UPROBE, + * BPF_FD_TYPE_RAW_TRACEPOINT). Use this to disambiguate interpretation of + * other outputs. + * - @p probe_offset: For kprobe/uprobes, the offset within the symbol or + * mapped file that was requested when the probe was created. + * - @p probe_addr: For kprobes, the resolved kernel address of the probed + * symbol/instruction; for uprobes may be 0 or implementation-dependent. + * - Any output pointer may be NULL if the caller is not interested in that + * field (it will simply be skipped). + * + * Privileges & access control: + * - Querying another task's file descriptor typically requires sufficient + * permissions (ptrace-like restrictions, CAP_BPF / CAP_SYS_ADMIN, and/or + * LSM allowances). Lack of privilege yields -EPERM / -EACCES. + * - The target task must exist and the FD must be valid at query time. + * + * Concurrency / races: + * - The target process may close or replace its FD concurrently; the query + * can fail with -EBADF or -ENOENT in such races. + * - Retrieved metadata is a point-in-time snapshot and can become stale + * immediately after return. + * + * @param pid PID of the target task whose file descriptor table should be queried. + * Use the numeric PID (thread group leader or specific thread PID); + * passing 0 is typically invalid (returns -EINVAL). + * @param fd File descriptor number as seen from inside the task identified by @p pid. + * @param flags Query modifier flags. Must be 0 on current kernels; non-zero + * (unsupported) bits return -EINVAL. + * @param buf Optional user buffer to receive a NUL-terminated identifier string + * (tracepoint name, kprobe symbol, uprobe path). Can be NULL if + * @p buf_len points to 0. + * @param buf_len In/out pointer to buffer length. On input: capacity of @p buf. + * On success: actual length copied (including terminating NUL). + * On -ENOSPC: required length (caller should reallocate and retry). + * @param prog_id Optional output pointer receiving the attached BPF program ID (0 if none). + * @param fd_type Optional output pointer receiving one of BPF_FD_TYPE_* constants identifying FD type. + * @param probe_offset Optional output pointer receiving the probe offset (for kprobe/uprobe types). + * @param probe_addr Optional output pointer receiving resolved kernel address (kprobe) or relevant mapping address. + * + * @return 0 on success; + * Negative libbpf-style error code (< 0) on failure: + * - -EINVAL : Invalid arguments (bad pid/fd, unsupported flags, inconsistent buf/buf_len). + * - -ENOENT : Task, file descriptor, or associated probe/program not found. + * - -EBADF : Bad file descriptor in target task at time of query. + * - -ENOSPC : @p buf too small; @p *buf_len updated with required size. + * - -EPERM / -EACCES : Insufficient privileges or access denied by security policy. + * - -EFAULT : User memory (buf or buf_len or an output pointer) not accessible. + * - -ENOMEM : Temporary kernel memory/resource exhaustion. + * - Other -errno codes may be propagated from the underlying syscall. + * + * Best practices: + * - Initialize *buf_len with the size of your buffer; handle -ENOSPC by allocating + * a larger buffer using the returned required length. + * - Check @p fd_type first to interpret @p probe_offset / @p probe_addr meaningfully. + * - Treat -ENOENT and -EBADF as normal race outcomes in dynamic environments. + * - Avoid querying extremely frequently in production paths; this is introspective + * debug/management tooling, not a fast data path primitive. + * + * Thread safety: + * - This helper is thread-safe; multiple threads can query different (or the same) + * tasks concurrently. Returned data structures are per-call (no shared state). + */ LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len, __u32 *prog_id, __u32 *fd_type, __u64 *probe_offset, __u64 *probe_addr); -- 2.34.1