Over the years there's been a number of issues with the eBPF
verifier/jit/codegen (incl. both code bugs & spectre related stuff).

It's an amazing but very complex piece of logic, and I don't think
it's realistic to expect it to ever be (or become) 100% secure.

For example we currently have KASAN reporting buffer length violation
issues on 6.18 (which may or may not be due to eBPF subsystem, but are
worrying none-the-less)

Blocking bpf(BPF_PROG_LOAD, ...) is the only sure fire way to guarantee
the inability to exploit the eBPF subsystem.
In comparison other eBPF operations are pretty benign.
Even map creation is usually at most a memory DoS, furthermore it
remains useful (even with prog load disabled) due to inner maps.

This new sysctl is designed primarily for verified boot systems,
where (while the system is booting from trusted/signed media)
BPF_PROG_LOAD can be enabled, but before untrusted user
media is mounted or networking is enabled, BPF_PROG_LOAD
can be outright disabled.

This provides for a very simple way to limit eBPF programs to only
those signed programs that are part of the verified boot chain,
which has always been a requirement of eBPF use in Android.

I can think of two other ways to accomplish this:
(a) via sepolicy with booleans, but it ends up being pretty complex
    (especially wrt verifying the correctness of the resulting policies)
(b) via BPF_LSM bpf_prog_load hook, which requires enabling additional
    kernel options which aren't necessarily worth the bother,
    and requires dynamically patching the kernel (frowned upon by
    security folks).

This approach appears to simply be the most trivial.

I've chosed to return EUNATCH 'Protocol driver not attached.'
to separate it from EPERM and make it clear the eBPF program loading
subsystem has been outright disabled (detached).  There aren't
any permissions you could gain to make things work again (short
of a reboot/kexec).

It is intentionally kernel global and doesn't affect cBPF,
which has various runtime use cases (incl. tcpdump style dynamic
socket filters and seccomp sandboxing) and thus cannot be disabled,
but (as experience shows) is also much less dangerous (mainly due
to being much simpler).

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |  9 +++++++++
 kernel/bpf/syscall.c                        | 14 ++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index f3ee807b5d8b..4906ef08c741 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1655,6 +1655,15 @@ entry will default to 2 instead of 0.
 = =============================================================
 
 
+disable_bpf_prog_load
+=====================
+
+Writing 1 to this entry will cause all future invocations of
+``bpf(BPF_PROG_LOAD, ...)`` to fail with -EUNATCH, thus effectively
+permanently disabling the instantiation of new eBPF programs.
+Once set to 1, this cannot be reset back to 0.
+
+
 warn_limit
 ==========
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6589acc89ef8..ef655ff501e7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -67,6 +67,8 @@ static DEFINE_SPINLOCK(link_idr_lock);
 int sysctl_unprivileged_bpf_disabled __read_mostly =
 	IS_BUILTIN(CONFIG_BPF_UNPRIV_DEFAULT_OFF) ? 2 : 0;
 
+int sysctl_disable_bpf_prog_load = 0;
+
 static const struct bpf_map_ops * const bpf_map_types[] = {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
 #define BPF_MAP_TYPE(_id, _ops) \
@@ -2891,6 +2893,9 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 				 BPF_F_TOKEN_FD))
 		return -EINVAL;
 
+	if (sysctl_disable_bpf_prog_load)
+		return -EUNATCH;
+
 	bpf_prog_load_fixup_attach_type(attr);
 
 	if (attr->prog_flags & BPF_F_TOKEN_FD) {
@@ -6511,6 +6516,15 @@ static const struct ctl_table bpf_syscall_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_TWO,
 	},
+	{
+		.procname	= "disable_bpf_prog_load",
+		.data		= &sysctl_disable_bpf_prog_load,
+		.maxlen		= sizeof(sysctl_disable_bpf_prog_load),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ONE,
+		.extra2		= SYSCTL_ONE,
+	},
 	{
 		.procname	= "bpf_stats_enabled",
 		.data		= &bpf_stats_enabled_key.key,
-- 
2.52.0.394.g0814c687bb-goog