Over the years there's been a number of issues with the eBPF verifier/jit/codegen (incl. both code bugs & spectre related stuff). It's an amazing but very complex piece of logic, and I don't think it's realistic to expect it to ever be (or become) 100% secure. For example we currently have KASAN reporting buffer length violation issues on 6.18 (which may or may not be due to eBPF subsystem, but are worrying none-the-less) Blocking bpf(BPF_PROG_LOAD, ...) is the only sure fire way to guarantee the inability to exploit the eBPF subsystem. In comparison other eBPF operations are pretty benign. Even map creation is usually at most a memory DoS, furthermore it remains useful (even with prog load disabled) due to inner maps. This new sysctl is designed primarily for verified boot systems, where (while the system is booting from trusted/signed media) BPF_PROG_LOAD can be enabled, but before untrusted user media is mounted or networking is enabled, BPF_PROG_LOAD can be outright disabled. This provides for a very simple way to limit eBPF programs to only those signed programs that are part of the verified boot chain, which has always been a requirement of eBPF use in Android. I can think of two other ways to accomplish this: (a) via sepolicy with booleans, but it ends up being pretty complex (especially wrt verifying the correctness of the resulting policies) (b) via BPF_LSM bpf_prog_load hook, which requires enabling additional kernel options which aren't necessarily worth the bother, and requires dynamically patching the kernel (frowned upon by security folks). This approach appears to simply be the most trivial. I've chosed to return EUNATCH 'Protocol driver not attached.' to separate it from EPERM and make it clear the eBPF program loading subsystem has been outright disabled (detached). There aren't any permissions you could gain to make things work again (short of a reboot/kexec). It is intentionally kernel global and doesn't affect cBPF, which has various runtime use cases (incl. tcpdump style dynamic socket filters and seccomp sandboxing) and thus cannot be disabled, but (as experience shows) is also much less dangerous (mainly due to being much simpler). Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: John Fastabend Signed-off-by: Maciej Żenczykowski --- Documentation/admin-guide/sysctl/kernel.rst | 9 +++++++++ kernel/bpf/syscall.c | 14 ++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index f3ee807b5d8b..4906ef08c741 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1655,6 +1655,15 @@ entry will default to 2 instead of 0. = ============================================================= +disable_bpf_prog_load +===================== + +Writing 1 to this entry will cause all future invocations of +``bpf(BPF_PROG_LOAD, ...)`` to fail with -EUNATCH, thus effectively +permanently disabling the instantiation of new eBPF programs. +Once set to 1, this cannot be reset back to 0. + + warn_limit ========== diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 6589acc89ef8..ef655ff501e7 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -67,6 +67,8 @@ static DEFINE_SPINLOCK(link_idr_lock); int sysctl_unprivileged_bpf_disabled __read_mostly = IS_BUILTIN(CONFIG_BPF_UNPRIV_DEFAULT_OFF) ? 2 : 0; +int sysctl_disable_bpf_prog_load = 0; + static const struct bpf_map_ops * const bpf_map_types[] = { #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) #define BPF_MAP_TYPE(_id, _ops) \ @@ -2891,6 +2893,9 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size) BPF_F_TOKEN_FD)) return -EINVAL; + if (sysctl_disable_bpf_prog_load) + return -EUNATCH; + bpf_prog_load_fixup_attach_type(attr); if (attr->prog_flags & BPF_F_TOKEN_FD) { @@ -6511,6 +6516,15 @@ static const struct ctl_table bpf_syscall_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, + { + .procname = "disable_bpf_prog_load", + .data = &sysctl_disable_bpf_prog_load, + .maxlen = sizeof(sysctl_disable_bpf_prog_load), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_ONE, + }, { .procname = "bpf_stats_enabled", .data = &bpf_stats_enabled_key.key, -- 2.52.0.394.g0814c687bb-goog