Inspection mechanism allows registration of a specific memory area(or object) for later inspection purpose. Ranges are being added into an inspection table, which can be requested and analyzed by specific drivers. Drivers would interface any hardware mechanism that will allow inspection of the data, including but not limited to: dumping for debugging, creating a coredump, analysis, or statistical information. Drivers can register a notifier to know when new objects are registered, or to traverse existing inspection table. Inspection table is created ahead of time such that it can be later used regardless of the state of the kernel (running, frozen, crashed, or any particular state). Signed-off-by: Eugen Hristev --- Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/meminspect.rst | 139 ++++++++ MAINTAINERS | 7 + include/asm-generic/vmlinux.lds.h | 13 + include/linux/meminspect.h | 261 ++++++++++++++ init/Kconfig | 2 + kernel/Makefile | 1 + kernel/meminspect/Kconfig | 20 ++ kernel/meminspect/Makefile | 3 + kernel/meminspect/meminspect.c | 470 +++++++++++++++++++++++++ 10 files changed, 917 insertions(+) create mode 100644 Documentation/dev-tools/meminspect.rst create mode 100644 include/linux/meminspect.h create mode 100644 kernel/meminspect/Kconfig create mode 100644 kernel/meminspect/Makefile create mode 100644 kernel/meminspect/meminspect.c diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index 4b8425e348ab..8ce605de8ee6 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -38,6 +38,7 @@ Documentation/process/debugging/index.rst gpio-sloppy-logic-analyzer autofdo propeller + meminspect .. only:: subproject and html diff --git a/Documentation/dev-tools/meminspect.rst b/Documentation/dev-tools/meminspect.rst new file mode 100644 index 000000000000..2a0bd4d6e448 --- /dev/null +++ b/Documentation/dev-tools/meminspect.rst @@ -0,0 +1,139 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========== +meminspect +========== + +This document provides information about the meminspect feature. + +Overview +======== + +meminspect is a mechanism that allows the kernel to register a chunk of +memory into a table, to be used at a later time for a specific +inspection purpose like debugging, memory dumping or statistics. + +meminspect allows drivers to traverse the inspection table on demand, +or to register a notifier to be called whenever a new entry is being added +or removed. + +The reasoning for meminspect is also to minimize the required information +in case of a kernel problem. For example a traditional debug method involves +dumping the whole kernel memory and then inspecting it. Meminspect allows the +users to select which memory is of interest, in order to help this specific +use case in production, where memory and connectivity are limited. + +Although the kernel has multiple internal mechanisms, meminspect fits +a particular model which is not covered by the others. + +meminspect Internals +==================== + +API +--- + +Static memory can be registered at compile time, by instructing the compiler +to create a separate section with annotation info. +For each such annotated memory (variables usually), a dedicated struct +is being created with the required information. +To achieve this goal, some basic APIs are available: + + MEMINSPECT_ENTRY(idx, sym, sz) +is the basic macro that takes an ID, the symbol, and a size. + +To make it easier, some wrappers are also defined: + MEMINSPECT_SIMPLE_ENTRY(sym) +will use the dedicated MEMINSPECT_ID_##sym with a size equal to sizeof(sym) + + MEMINSPECT_NAMED_ENTRY(name, sym) +will be a simple entry that has an id that cannot be derived from the sym, +so a name has to be provided + + MEMINSPECT_AREA_ENTRY(sym, sz) +this will register sym, but with the size given as sz, useful for e.g. +arrays which do not have a fixed size at compile time. + +For dynamically allocated memory, or for other cases, the following APIs +are being defined: + meminspect_register_id_pa(enum meminspect_uid id, phys_addr_t zone, + size_t size, unsigned int type); +which takes the ID and the physical address. +Similarly there are variations: + meminspect_register_pa() omits the ID + meminspect_register_id_va() requires the ID but takes a virtual address + meminspect_register_va() omits the ID and requires a virtual address + +If the ID is not given, the next avialable dynamic ID is allocated. + +To unregister a dynamic entry, some APIs are being defined: + meminspect_unregister_pa(phys_addr_t zone, size_t size); + meminspect_unregister_id(enum meminspect_uid id); + meminspect_unregister_va(va, size); + +All of the above have a lock variant that ensures the lock on the table +is taken. + + +meminspect drivers +------------------ + +Drivers are free to traverse the table by using a dedicated function +meminspect_traverse(void *priv, MEMINSPECT_ITERATOR_CB cb) +The callback will be called for each entry in the table. + +Drivers can also register a notifier with + meminspect_notifier_register() +and unregister with + meminspect_notifier_unregister() +to be called when a new entry is being added or removed. + +Data structures +--------------- + +The regions are being stored in a simple fixed size array. It avoids +memory allocation overhead. This is not performance critical nor does +allocating a few hundred entries create a memory consumption problem. + +The static variables registered into meminspect are being annotated into +a dedicated .inspect_table memory section. This is then walked by meminspect +at a later time and each variable is then copied to the whole inspect table. + +meminspect Initialization +------------------------- + +At any time, meminspect will be ready to accept region registration +from any part of the kernel. The table does not require any initialization. +In case CONFIG_CRASH_DUMP is enabled, meminspect will create an ELF header +corresponding to a core dump image, in which each region is added as a +program header. In this scenario, the first region is this ELF header, and +the second region is the vmcoreinfo ELF note. +By using this mechanism, all the meminspect table, if dumped, can be +concatenated to obtain a core image that is loadable with the `crash` tool. + +meminspect example +================== + +A simple scenario for meminspect is the following: +The kernel registers the linux_banner variable into meminspect with +a simple annotation like: + + MEMINSPECT_SIMPLE_ENTRY(linux_banner); + +The meminspect late initcall will parse the compilation time created table +and copy the entry information into the inspection table. +At a later point, any interested driver can call the traverse function to +find out all entries in the table. +A specific driver will then note into a specific table the address of the +banner and the size of it. +The specific table is then written to a shared memory area that can be +read by upper level firmware. +When the kernel freezes (hypothetically), the kernel will no longer feed +the watchdog. The watchdog will trigger a higher exception level interrupt +which will be handled by the upper level firmware. This firmware will then +read the shared memory table and find an entry with the start and size of +the banner. It will then copy it for debugging purpose. The upper level +firmware will then be able to provide useful debugging information, +like in this example, the banner. + +As seen here, meminspect facilitates the interaction between the kernel +and a specific firmware. diff --git a/MAINTAINERS b/MAINTAINERS index 545a4776795e..2cb2cc427c90 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16157,6 +16157,13 @@ F: arch/*/include/asm/sync_core.h F: include/uapi/linux/membarrier.h F: kernel/sched/membarrier.c +MEMINSPECT +M: Eugen Hristev +S: Maintained +F: Documentation/dev-tools/meminspect.rst +F: include/linux/meminspect.h +F: kernel/meminspect/* + MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION M: Mike Rapoport L: linux-mm@kvack.org diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 8a9a2e732a65..713135d72c34 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -489,6 +489,8 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG) FW_LOADER_BUILT_IN_DATA \ TRACEDATA \ \ + MEMINSPECT_TABLE \ + \ PRINTK_INDEX \ \ /* Kernel symbol table: Normal symbols */ \ @@ -893,6 +895,17 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG) #define TRACEDATA #endif +#ifdef CONFIG_MEMINSPECT +#define MEMINSPECT_TABLE \ + . = ALIGN(8); \ + .inspect_table : AT(ADDR(.inspect_table) - LOAD_OFFSET) { \ + BOUNDED_SECTION_POST_LABEL(.inspect_table, \ + __inspect_table,, _end) \ + } +#else +#define MEMINSPECT_TABLE +#endif + #ifdef CONFIG_PRINTK_INDEX #define PRINTK_INDEX \ .printk_index : AT(ADDR(.printk_index) - LOAD_OFFSET) { \ diff --git a/include/linux/meminspect.h b/include/linux/meminspect.h new file mode 100644 index 000000000000..e58b00079156 --- /dev/null +++ b/include/linux/meminspect.h @@ -0,0 +1,261 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _MEMINSPECT_H +#define _MEMINSPECT_H + +#include + +enum meminspect_uid { + MEMINSPECT_ID_STATIC = 0, + MEMINSPECT_ID_ELF, + MEMINSPECT_ID_VMCOREINFO, + MEMINSPECT_ID_CONFIG, + MEMINSPECT_ID__totalram_pages, + MEMINSPECT_ID___cpu_possible_mask, + MEMINSPECT_ID___cpu_present_mask, + MEMINSPECT_ID___cpu_online_mask, + MEMINSPECT_ID___cpu_active_mask, + MEMINSPECT_ID_mem_section, + MEMINSPECT_ID_jiffies_64, + MEMINSPECT_ID_linux_banner, + MEMINSPECT_ID_nr_threads, + MEMINSPECT_ID_nr_irqs, + MEMINSPECT_ID_tainted_mask, + MEMINSPECT_ID_taint_flags, + MEMINSPECT_ID_node_states, + MEMINSPECT_ID___per_cpu_offset, + MEMINSPECT_ID_nr_swapfiles, + MEMINSPECT_ID_init_uts_ns, + MEMINSPECT_ID_printk_rb_static, + MEMINSPECT_ID_printk_rb_dynamic, + MEMINSPECT_ID_prb, + MEMINSPECT_ID_prb_descs, + MEMINSPECT_ID_prb_infos, + MEMINSPECT_ID_prb_data, + MEMINSPECT_ID_high_memory, + MEMINSPECT_ID_init_mm, + MEMINSPECT_ID_init_mm_pgd, + MEMINSPECT_ID__sinittext, + MEMINSPECT_ID__einittext, + MEMINSPECT_ID__end, + MEMINSPECT_ID__text, + MEMINSPECT_ID__stext, + MEMINSPECT_ID__etext, + MEMINSPECT_ID_kallsyms_num_syms, + MEMINSPECT_ID_kallsyms_relative_base, + MEMINSPECT_ID_kallsyms_offsets, + MEMINSPECT_ID_kallsyms_names, + MEMINSPECT_ID_kallsyms_token_table, + MEMINSPECT_ID_kallsyms_token_index, + MEMINSPECT_ID_kallsyms_markers, + MEMINSPECT_ID_kallsyms_seqs_of_names, + MEMINSPECT_ID_swapper_pg_dir, + MEMINSPECT_ID_DYNAMIC, + MEMINSPECT_ID_MAX = 201, +}; + +#define MEMINSPECT_TYPE_REGULAR 0 + +#define MEMINSPECT_NOTIFIER_ADD 0 +#define MEMINSPECT_NOTIFIER_REMOVE 1 + +/** + * struct inspect_entry - memory inspect entry information + * @id: unique id for this entry + * @va: virtual address for the memory (pointer) + * @pa: physical address for the memory + * @size: size of the memory area of this entry + * @type: type of the entry (class) + */ +struct inspect_entry { + enum meminspect_uid id; + void *va; + phys_addr_t pa; + size_t size; + unsigned int type; +}; + +typedef void (*MEMINSPECT_ITERATOR_CB)(void *priv, const struct inspect_entry *ie); + +#ifdef CONFIG_MEMINSPECT +/* .inspect_table section table markers*/ +extern const struct inspect_entry __inspect_table[]; +extern const struct inspect_entry __inspect_table_end[]; + +/* + * Annotate a static variable into inspection table. + * Can be called multiple times for the same ID, in which case + * multiple table entries will be created + */ +#define MEMINSPECT_ENTRY(idx, sym, sz) \ + static const struct inspect_entry __UNIQUE_ID(__inspect_entry_##idx) \ + __used __section(".inspect_table") = { .id = idx, \ + .va = (void *)&(sym), \ + .size = (sz), \ + } +/* + * A simple entry is just a variable, the size of the entry is the variable size + * The variable can also be a pointer, the pointer itself is being added in this + * case. + */ +#define MEMINSPECT_SIMPLE_ENTRY(sym) \ + MEMINSPECT_ENTRY(MEMINSPECT_ID_##sym, sym, sizeof(sym)) +/* + * In the case when `sym` is not a variable, but a member of a struct e.g., + * and we cannot derive a name from it, a name must be provided. + */ +#define MEMINSPECT_NAMED_ENTRY(name, sym) \ + MEMINSPECT_ENTRY(MEMINSPECT_ID_##name, sym, sizeof(sym)) +/* + * Create a more complex entry, by registering an arbitrary memory starting + * at sym. The size is provided as a parameter. + * This is used e.g. when the symbol is a start of an unknown sized array. + */ +#define MEMINSPECT_AREA_ENTRY(sym, sz) \ + MEMINSPECT_ENTRY(MEMINSPECT_ID_##sym, sym, sz) + +/* Iterate through .inspect_table section entries */ +#define for_each_meminspect_entry(__entry) \ + for (__entry = __inspect_table; \ + __entry < __inspect_table_end; \ + __entry++) + +#else +#define MEMINSPECT_ENTRY(...) +#define MEMINSPECT_SIMPLE_ENTRY(...) +#define MEMINSPECT_NAMED_ENTRY(...) +#define MEMINSPECT_AREA_ENTRY(...) +#endif + +#ifdef CONFIG_MEMINSPECT + +/* + * Dynamic helpers to register entries. + * These do not lock the table, so use with caution. + */ +void meminspect_register_id_pa(enum meminspect_uid id, phys_addr_t zone, + size_t size, unsigned int type); +void meminspect_table_lock(void); +void meminspect_table_unlock(void); + +#define meminspect_register_pa(...) \ + meminspect_register_id_pa(MEMINSPECT_ID_DYNAMIC, __VA_ARGS__, MEMINSPECT_TYPE_REGULAR) + +#define meminspect_register_id_va(id, va, size) \ + meminspect_register_id_pa(id, virt_to_phys(va), size, MEMINSPECT_TYPE_REGULAR) + +#define meminspect_register_va(...) \ + meminspect_register_id_va(MEMINSPECT_ID_DYNAMIC, __VA_ARGS__) + +void meminspect_unregister_pa(phys_addr_t zone, size_t size); +void meminspect_unregister_id(enum meminspect_uid id); + +#define meminspect_unregister_va(va, size) \ + meminspect_unregister_pa(virt_to_phys(va), size) + +void meminspect_traverse(void *priv, MEMINSPECT_ITERATOR_CB cb); + +/* + * Producers, or registrators, are advised to use the locked API below + */ +#define meminspect_lock_register_pa(...) \ + { \ + meminspect_table_lock(); \ + meminspect_register_pa(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_register_id_va(...) \ + { \ + meminspect_table_lock(); \ + meminspect_register_id_va(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_register_va(...) \ + { \ + meminspect_table_lock(); \ + meminspect_register_va(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_unregister_pa(...) \ + { \ + meminspect_table_lock(); \ + meminspect_unregister_pa(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_unregister_va(...) \ + { \ + meminspect_table_lock(); \ + meminspect_unregister_va(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_unregister_id(...) \ + { \ + meminspect_table_lock(); \ + meminspect_unregister_id(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_traverse(...) \ + { \ + meminspect_table_lock(); \ + meminspect_traverse(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +int meminspect_notifier_register(struct notifier_block *n); +int meminspect_notifier_unregister(struct notifier_block *n); + +#else +static inline void meminspect_register_id_pa(enum meminspect_uid id, + phys_addr_t zone, + size_t size, unsigned int type) +{ +} + +static inline void meminspect_table_lock(void) +{ +} + +static inline void meminspect_table_unlock(void) +{ +} + +static inline void meminspect_unregister(phys_addr_t zone, size_t size) +{ +} + +static inline void meminspect_unregister_id(enum meminspect_uid id) +{ +} + +static inline void meminspect_traverse(MEMINSPECT_ITERATOR_CB cb) +{ +} + +static inline int meminspect_notifier_register(struct notifier_block *n) +{ + return 0; +} + +static inline int meminspect_notifier_unregister(struct notifier_block *n) +{ + return 0; +} + +#define meminspect_register_pa(...) +#define meminspect_register_id_va(...) +#define meminspect_register_va(...) +#define meminspect_lock_register_pa(...) +#define meminspect_lock_register_va(...) +#define meminspect_lock_register_id_va(...) +#define meminspect_lock_traverse(...) +#define meminspect_lock_unregister_va(...) +#define meminspect_lock_unregister_pa(...) +#define meminspect_lock_unregister_id(...) +#endif + +#endif diff --git a/init/Kconfig b/init/Kconfig index cab3ad28ca49..d48647419944 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2138,6 +2138,8 @@ config TRACEPOINTS source "kernel/Kconfig.kexec" +source "kernel/meminspect/Kconfig" + endmenu # General setup source "arch/Kconfig" diff --git a/kernel/Makefile b/kernel/Makefile index df3dd8291bb6..83ec5310dfd1 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -50,6 +50,7 @@ obj-y += locking/ obj-y += power/ obj-y += printk/ obj-y += irq/ +obj-y += meminspect/ obj-y += rcu/ obj-y += livepatch/ obj-y += dma/ diff --git a/kernel/meminspect/Kconfig b/kernel/meminspect/Kconfig new file mode 100644 index 000000000000..8680fbf0e285 --- /dev/null +++ b/kernel/meminspect/Kconfig @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 + +config MEMINSPECT + bool "Allow the kernel to register memory regions for inspection purpose" + help + Inspection mechanism allows registration of a specific memory + area(or object) for later inspection purpose. + Ranges are being added into an inspection table, which can be + requested and analyzed by specific drivers. + Drivers would interface any hardware mechanism that will allow + inspection of the data, including but not limited to: dumping + for debugging, creating a coredump, analysis, or statistical + information. + Inspection table is created ahead of time such that it can be later + used regardless of the state of the kernel (running, frozen, crashed, + or any particular state). + + Note that modules using this feature must be rebuilt if option + changes. + diff --git a/kernel/meminspect/Makefile b/kernel/meminspect/Makefile new file mode 100644 index 000000000000..09fd55e6d9cf --- /dev/null +++ b/kernel/meminspect/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_MEMINSPECT) += meminspect.o diff --git a/kernel/meminspect/meminspect.c b/kernel/meminspect/meminspect.c new file mode 100644 index 000000000000..0d9ad65ba92e --- /dev/null +++ b/kernel/meminspect/meminspect.c @@ -0,0 +1,470 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include + +static DEFINE_MUTEX(meminspect_lock); +static struct inspect_entry inspect_entries[MEMINSPECT_ID_MAX]; + +ATOMIC_NOTIFIER_HEAD(meminspect_notifier_list); + +#ifdef CONFIG_CRASH_DUMP + +#define CORE_STR "CORE" + +static struct elfhdr *ehdr; +static size_t elf_offset; +static bool elf_hdr_ready; + +static void append_kcore_note(char *notes, size_t *i, const char *name, + unsigned int type, const void *desc, + size_t descsz) +{ + struct elf_note *note = (struct elf_note *)¬es[*i]; + + note->n_namesz = strlen(name) + 1; + note->n_descsz = descsz; + note->n_type = type; + *i += sizeof(*note); + memcpy(¬es[*i], name, note->n_namesz); + *i = ALIGN(*i + note->n_namesz, 4); + memcpy(¬es[*i], desc, descsz); + *i = ALIGN(*i + descsz, 4); +} + +static void append_kcore_note_nodesc(char *notes, size_t *i, const char *name, + unsigned int type, size_t descsz) +{ + struct elf_note *note = (struct elf_note *)¬es[*i]; + + note->n_namesz = strlen(name) + 1; + note->n_descsz = descsz; + note->n_type = type; + *i += sizeof(*note); + memcpy(¬es[*i], name, note->n_namesz); + *i = ALIGN(*i + note->n_namesz, 4); +} + +static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx) +{ + struct elf_phdr *ephdr = (struct elf_phdr *)((size_t)ehdr + ehdr->e_phoff); + + return &ephdr[idx]; +} + +static int clear_elfheader(const struct inspect_entry *e) +{ + struct elf_phdr *phdr; + struct elf_phdr *tmp_phdr; + unsigned int phidx; + unsigned int i; + + for (i = 0; i < ehdr->e_phnum; i++) { + phdr = elf_phdr_entry_addr(ehdr, i); + if (phdr->p_paddr == e->pa && + phdr->p_memsz == ALIGN(e->size, 4)) + break; + } + + if (i == ehdr->e_phnum) { + pr_debug("Cannot find program header entry in elf\n"); + return -EINVAL; + } + + phidx = i; + + /* Clear program header */ + tmp_phdr = elf_phdr_entry_addr(ehdr, phidx); + for (i = phidx; i < ehdr->e_phnum - 1; i++) { + tmp_phdr = elf_phdr_entry_addr(ehdr, i + 1); + phdr = elf_phdr_entry_addr(ehdr, i); + memcpy(phdr, tmp_phdr, sizeof(*phdr)); + phdr->p_offset = phdr->p_offset - ALIGN(e->size, 4); + } + memset(tmp_phdr, 0, sizeof(*tmp_phdr)); + ehdr->e_phnum--; + + elf_offset -= ALIGN(e->size, 4); + + return 0; +} + +static void update_elfheader(const struct inspect_entry *e) +{ + struct elf_phdr *phdr; + + phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++); + + phdr->p_type = PT_LOAD; + phdr->p_offset = elf_offset; + phdr->p_vaddr = (elf_addr_t)e->va; + if (e->pa) + phdr->p_paddr = (elf_addr_t)e->pa; + else + phdr->p_paddr = (elf_addr_t)virt_to_phys(e->va); + phdr->p_filesz = phdr->p_memsz = ALIGN(e->size, 4); + phdr->p_flags = PF_R | PF_W; + + elf_offset += ALIGN(e->size, 4); +} + +/* + * This function prepares the elf header for the coredump image. + * Initially there is a single program header for the elf NOTE. + * The note contains the usual core dump information, and the vmcoreinfo. + */ +static int init_elfheader(void) +{ + struct elf_phdr *phdr; + void *notes; + unsigned int elfh_size, buf_sz; + unsigned int phdr_off; + size_t note_len, i = 0; + struct page *p; + + struct elf_prstatus prstatus = {}; + struct elf_prpsinfo prpsinfo = { + .pr_sname = 'R', + .pr_fname = "vmlinux", + }; + + /* + * Header buffer contains: + * ELF header, Note entry with PR status, PR ps info, and vmcoreinfo. + * Also, MEMINSPECT_ID_MAX program headers. + */ + elfh_size = sizeof(*ehdr); + elfh_size += sizeof(struct elf_prstatus); + elfh_size += sizeof(struct elf_prpsinfo); + elfh_size += sizeof(VMCOREINFO_NOTE_NAME); + elfh_size += ALIGN(vmcoreinfo_size, 4); + elfh_size += (sizeof(*phdr)) * (MEMINSPECT_ID_MAX); + + elfh_size = ALIGN(elfh_size, 4); + + /* Length of the note is made of : + * 3 elf notes structs (prstatus, prpsinfo, vmcoreinfo) + * 3 notes names (2 core strings, 1 vmcoreinfo name) + * sizeof each note + */ + note_len = (3 * sizeof(struct elf_note) + + 2 * ALIGN(sizeof(CORE_STR), 4) + + VMCOREINFO_NOTE_NAME_BYTES + + ALIGN(sizeof(struct elf_prstatus), 4) + + ALIGN(sizeof(struct elf_prpsinfo), 4) + + ALIGN(vmcoreinfo_size, 4)); + + buf_sz = elfh_size + note_len - ALIGN(vmcoreinfo_size, 4); + + /* Never freed */ + p = dma_alloc_from_contiguous(NULL, buf_sz >> PAGE_SHIFT, + get_order(buf_sz), true); + if (!p) + return -ENOMEM; + + ehdr = dma_common_contiguous_remap(p, buf_sz, + pgprot_decrypted(pgprot_dmacoherent(PAGE_KERNEL)), + __builtin_return_address(0)); + if (!ehdr) { + dma_release_from_contiguous(NULL, p, buf_sz >> PAGE_SHIFT); + return -ENOMEM; + } + + memset(ehdr, 0, elfh_size); + + /* Assign Program headers offset, it's right after the elf header. */ + phdr = (struct elf_phdr *)(ehdr + 1); + phdr_off = sizeof(*ehdr); + + memcpy(ehdr->e_ident, ELFMAG, SELFMAG); + ehdr->e_ident[EI_CLASS] = ELF_CLASS; + ehdr->e_ident[EI_DATA] = ELF_DATA; + ehdr->e_ident[EI_VERSION] = EV_CURRENT; + ehdr->e_ident[EI_OSABI] = ELF_OSABI; + ehdr->e_type = ET_CORE; + ehdr->e_machine = ELF_ARCH; + ehdr->e_version = EV_CURRENT; + ehdr->e_ehsize = sizeof(*ehdr); + ehdr->e_phentsize = sizeof(*phdr); + + elf_offset = elfh_size; + + notes = (void *)(((char *)ehdr) + elf_offset); + + /* we have a single program header now */ + ehdr->e_phnum = 1; + + phdr->p_type = PT_NOTE; + phdr->p_offset = elf_offset; + phdr->p_filesz = note_len; + + /* advance elf offset */ + elf_offset += note_len; + + strscpy(prpsinfo.pr_psargs, saved_command_line, + sizeof(prpsinfo.pr_psargs)); + + append_kcore_note(notes, &i, CORE_STR, NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + append_kcore_note(notes, &i, CORE_STR, NT_PRPSINFO, &prpsinfo, + sizeof(prpsinfo)); + append_kcore_note_nodesc(notes, &i, VMCOREINFO_NOTE_NAME, 0, + ALIGN(vmcoreinfo_size, 4)); + + ehdr->e_phoff = phdr_off; + + /* This is the first coredump region, the ELF header */ + meminspect_register_id_pa(MEMINSPECT_ID_ELF, page_to_phys(p), + buf_sz, MEMINSPECT_TYPE_REGULAR); + + /* + * The second region is the vmcoreinfo, which goes right after. + * It's being registered through vmcoreinfo. + */ + + return 0; +} +#endif + +/** + * meminspect_unregister_id() - Unregister region from inspection table. + * @id: region's id in the table + * + * Return: None + */ +void meminspect_unregister_id(enum meminspect_uid id) +{ + struct inspect_entry *e; + + WARN_ON(!mutex_is_locked(&meminspect_lock)); + + e = &inspect_entries[id]; + if (!e->id) + return; + + atomic_notifier_call_chain(&meminspect_notifier_list, + MEMINSPECT_NOTIFIER_REMOVE, e); +#ifdef CONFIG_CRASH_DUMP + if (elf_hdr_ready) + clear_elfheader(e); +#endif + memset(e, 0, sizeof(*e)); +} +EXPORT_SYMBOL_GPL(meminspect_unregister_id); + +/** + * meminspect_unregister_pa() - Unregister region from inspection table. + * @pa: Physical address of the memory region to remove + * @size: Size of the memory region to remove + * + * Return: None + */ +void meminspect_unregister_pa(phys_addr_t pa, size_t size) +{ + struct inspect_entry *e; + enum meminspect_uid i; + + WARN_ON(!mutex_is_locked(&meminspect_lock)); + + for (i = MEMINSPECT_ID_STATIC; i < MEMINSPECT_ID_MAX; i++) { + e = &inspect_entries[i]; + if (e->pa != pa) + continue; + if (e->size != size) + continue; + meminspect_unregister_id(e->id); + return; + } +} +EXPORT_SYMBOL_GPL(meminspect_unregister_pa); + +/** + * meminspect_register_id_pa() - Register region into inspection table + * with given ID and physical address. + * @req_id: Requested unique meminspect_uid that identifies the region + * This can be MEMINSPECT_ID_DYNAMIC, in which case the function will + * find an unused ID and register with it. + * @pa: physical address of the memory region + * @size: region size + * @type: region type + * + * Return: None + */ +void meminspect_register_id_pa(enum meminspect_uid req_id, phys_addr_t pa, + size_t size, unsigned int type) +{ + struct inspect_entry *e; + enum meminspect_uid uid = req_id; + + WARN_ON(!mutex_is_locked(&meminspect_lock)); + + if (uid < MEMINSPECT_ID_STATIC || uid >= MEMINSPECT_ID_MAX) + return; + + if (uid == MEMINSPECT_ID_DYNAMIC) + while (uid < MEMINSPECT_ID_MAX) { + if (!inspect_entries[uid].id) + break; + uid++; + } + + if (uid == MEMINSPECT_ID_MAX) + return; + + e = &inspect_entries[uid]; + + if (e->id) + meminspect_unregister_id(e->id); + + e->pa = pa; + e->va = phys_to_virt(pa); + e->size = size; + e->id = uid; + e->type = type; +#ifdef CONFIG_CRASH_DUMP + if (elf_hdr_ready) + update_elfheader(e); +#endif + atomic_notifier_call_chain(&meminspect_notifier_list, + MEMINSPECT_NOTIFIER_ADD, e); +} +EXPORT_SYMBOL_GPL(meminspect_register_id_pa); + +/** + * meminspect_table_lock() - Lock the mutex on the inspection table + * + * Return: None + */ +void meminspect_table_lock(void) +{ + mutex_lock(&meminspect_lock); +} +EXPORT_SYMBOL_GPL(meminspect_table_lock); + +/** + * meminspect_table_unlock() - Unlock the mutex on the inspection table + * + * Return: None + */ +void meminspect_table_unlock(void) +{ + mutex_unlock(&meminspect_lock); +} +EXPORT_SYMBOL_GPL(meminspect_table_unlock); + +/** + * meminspect_traverse() - Traverse the meminspect table and call the + * callback function for each valid entry. + * @priv: private data to be called to the callback + * @cb: meminspect iterator callback that should be called for each entry + * + * Return: None + */ +void meminspect_traverse(void *priv, MEMINSPECT_ITERATOR_CB cb) +{ + const struct inspect_entry *e; + int i; + + WARN_ON(!mutex_is_locked(&meminspect_lock)); + + for (i = MEMINSPECT_ID_STATIC; i < MEMINSPECT_ID_MAX; i++) { + e = &inspect_entries[i]; + if (e->id) + cb(priv, e); + } +} +EXPORT_SYMBOL_GPL(meminspect_traverse); + +/** + * meminspect_notifier_register() - Register a notifier to meminspect table + * @n: notifier block to register. This will be called whenever an entry + * is being added or removed. + * + * Return: errno + */ +int meminspect_notifier_register(struct notifier_block *n) +{ + return atomic_notifier_chain_register(&meminspect_notifier_list, n); +} +EXPORT_SYMBOL_GPL(meminspect_notifier_register); + +/** + * meminspect_notifier_unregister() - Unregister a previously registered + * notifier from meminspect table. + * @n: notifier block to unregister. + * + * Return: errno + */ +int meminspect_notifier_unregister(struct notifier_block *n) +{ + return atomic_notifier_chain_unregister(&meminspect_notifier_list, n); +} +EXPORT_SYMBOL_GPL(meminspect_notifier_unregister); + +#ifdef CONFIG_CRASH_DUMP +static int __init meminspect_prepare_crashdump(void) +{ + const struct inspect_entry *e; + int ret; + enum meminspect_uid i; + + ret = init_elfheader(); + + if (ret < 0) + return ret; + + /* + * Some regions may have been registered very early. + * Update the elf header for all existing regions, + * except for MEMINSPECT_ID_ELF and MEMINSPECT_ID_VMCOREINFO, + * those are included in the ELF header upon its creation. + */ + for (i = MEMINSPECT_ID_VMCOREINFO + 1; i < MEMINSPECT_ID_MAX; i++) { + e = &inspect_entries[i]; + if (e->id) + update_elfheader(e); + } + + elf_hdr_ready = true; + + return 0; +} +#endif + +static int __init meminspect_prepare_table(void) +{ + const struct inspect_entry *e; + enum meminspect_uid i; + + meminspect_table_lock(); + /* + * First, copy all entries from the compiler built table + * In case some entries are registered multiple times, + * the last chronological entry will be stored. + * Previusly registered entries will be dropped. + */ + for_each_meminspect_entry(e) { + inspect_entries[e->id] = *e; + } +#ifdef CONFIG_CRASH_DUMP + meminspect_prepare_crashdump(); +#endif + /* if we have early notifiers registered, call them now */ + for (i = MEMINSPECT_ID_STATIC; i < MEMINSPECT_ID_MAX; i++) + if (inspect_entries[i].id) + atomic_notifier_call_chain(&meminspect_notifier_list, + MEMINSPECT_NOTIFIER_ADD, + &inspect_entries[i]); + meminspect_table_unlock(); + + pr_debug("Memory inspection table initialized"); + + return 0; +} +late_initcall(meminspect_prepare_table); -- 2.43.0 Annotate vital static information into inspection table: - init_uts_ns - linux_banner Information on these variables is stored into dedicated meminspect section. Signed-off-by: Eugen Hristev --- init/version-timestamp.c | 3 +++ init/version.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/init/version-timestamp.c b/init/version-timestamp.c index d071835121c2..6f920d0e1169 100644 --- a/init/version-timestamp.c +++ b/init/version-timestamp.c @@ -6,6 +6,7 @@ #include #include #include +#include struct uts_namespace init_uts_ns = { .ns.ns_type = ns_common_type(&init_uts_ns), @@ -29,3 +30,5 @@ struct uts_namespace init_uts_ns = { const char linux_banner[] = "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n"; + +MEMINSPECT_SIMPLE_ENTRY(linux_banner); diff --git a/init/version.c b/init/version.c index 94c96f6fbfe6..eeb139236562 100644 --- a/init/version.c +++ b/init/version.c @@ -16,6 +16,7 @@ #include #include #include +#include static int __init early_hostname(char *arg) { @@ -51,4 +52,6 @@ const char linux_banner[] __weak; #include "version-timestamp.c" +MEMINSPECT_SIMPLE_ENTRY(init_uts_ns); + EXPORT_SYMBOL_GPL(init_uts_ns); -- 2.43.0 Annotate vital static information into meminspect: - __per_cpu_offset Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- mm/percpu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/percpu.c b/mm/percpu.c index 81462ce5866e..cdc5b30f6a99 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -87,6 +87,7 @@ #include #include #include +#include #include #include @@ -3346,6 +3347,7 @@ void __init setup_per_cpu_areas(void) #endif /* CONFIG_SMP */ +MEMINSPECT_SIMPLE_ENTRY(__per_cpu_offset); /* * pcpu_nr_pages - calculate total number of populated backing pages * -- 2.43.0 Annotate vital static information into inspection table: - __cpu_present_mask - __cpu_online_mask - __cpu_possible_mask - __cpu_active_mask Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- kernel/cpu.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index db9f6c539b28..1f2df5a5b9ab 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -3089,18 +3090,22 @@ struct cpumask __cpu_possible_mask __ro_after_init struct cpumask __cpu_possible_mask __ro_after_init; #endif EXPORT_SYMBOL(__cpu_possible_mask); +MEMINSPECT_SIMPLE_ENTRY(__cpu_possible_mask); struct cpumask __cpu_online_mask __read_mostly; EXPORT_SYMBOL(__cpu_online_mask); +MEMINSPECT_SIMPLE_ENTRY(__cpu_online_mask); struct cpumask __cpu_enabled_mask __read_mostly; EXPORT_SYMBOL(__cpu_enabled_mask); struct cpumask __cpu_present_mask __read_mostly; EXPORT_SYMBOL(__cpu_present_mask); +MEMINSPECT_SIMPLE_ENTRY(__cpu_present_mask); struct cpumask __cpu_active_mask __read_mostly; EXPORT_SYMBOL(__cpu_active_mask); +MEMINSPECT_SIMPLE_ENTRY(__cpu_active_mask); struct cpumask __cpu_dying_mask __read_mostly; EXPORT_SYMBOL(__cpu_dying_mask); -- 2.43.0 Annotate vital static information into inspection table: - nr_irqs Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- kernel/irq/irqdesc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index db714d3014b5..89538324a95a 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "internals.h" @@ -140,6 +141,7 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node, } static unsigned int nr_irqs = NR_IRQS; +MEMINSPECT_SIMPLE_ENTRY(nr_irqs); /** * irq_get_nr_irqs() - Number of interrupts supported by the system. -- 2.43.0 Annotate vital static information into inspection table: - jiffies_64 Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- kernel/time/timer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 553fa469d7cc..c6adea734b93 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -60,6 +61,7 @@ __visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; EXPORT_SYMBOL(jiffies_64); +MEMINSPECT_SIMPLE_ENTRY(jiffies_64); /* * The timer wheel has LVL_DEPTH array levels. Each level provides an array of -- 2.43.0 Annotate vital static information into inspection table: - nr_threads Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- kernel/fork.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/fork.c b/kernel/fork.c index 3da0f08615a9..c85948804aa7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -106,6 +106,7 @@ #include #include #include +#include #include #include @@ -138,6 +139,7 @@ */ unsigned long total_forks; /* Handle normal Linux uptimes. */ int nr_threads; /* The idle threads do not count.. */ +MEMINSPECT_SIMPLE_ENTRY(nr_threads); static int max_threads; /* tunable limit on nr_threads */ -- 2.43.0 Annotate vital static information into inspection table: - node_states Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- mm/page_alloc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 600d9e981c23..323521489907 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -55,6 +55,7 @@ #include #include #include +#include #include #include "internal.h" #include "shuffle.h" @@ -207,6 +208,7 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = { #endif /* NUMA */ }; EXPORT_SYMBOL(node_states); +MEMINSPECT_SIMPLE_ENTRY(node_states); gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK; -- 2.43.0 Annotate vital static information into inspection table: - _totalram_pages Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- mm/show_mem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/show_mem.c b/mm/show_mem.c index 3a4b5207635d..be9cded4acdb 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -14,12 +14,14 @@ #include #include #include +#include #include "internal.h" #include "swap.h" atomic_long_t _totalram_pages __read_mostly; EXPORT_SYMBOL(_totalram_pages); +MEMINSPECT_SIMPLE_ENTRY(_totalram_pages); unsigned long totalreserve_pages __read_mostly; unsigned long totalcma_pages __read_mostly; -- 2.43.0 Annotate vital static information into inspection table: - nr_swapfiles Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- mm/swapfile.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index 10760240a3a2..ee677be19041 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -65,6 +66,7 @@ static void move_cluster(struct swap_info_struct *si, static DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; +MEMINSPECT_SIMPLE_ENTRY(nr_swapfiles); atomic_long_t nr_swap_pages; /* * Some modules use swappable objects and may try to swap them out under -- 2.43.0 Register vmcoreinfo information into inspection table. Because the size of the info is computed after all entries are being added, there is no point in registering the whole page, rather, call the inspection registration once everything is in place with the right size. A second reason is that the vmcoreinfo is added as a region inside the ELF coreimage note, there is no point in having blank space at the end. Signed-off-by: Eugen Hristev --- kernel/vmcore_info.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c index e066d31d08f8..6a9658d6ec9a 100644 --- a/kernel/vmcore_info.c +++ b/kernel/vmcore_info.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -227,6 +228,9 @@ static int __init crash_save_vmcoreinfo_init(void) arch_crash_save_vmcoreinfo(); update_vmcoreinfo_note(); + meminspect_register_id_va(MEMINSPECT_ID_VMCOREINFO, + (void *)vmcoreinfo_data, vmcoreinfo_size); + return 0; } -- 2.43.0 Register kernel_config_data information into inspection table. Debugging tools look for the start and end markers, so we need to capture those as well into the region. Signed-off-by: Eugen Hristev --- kernel/configs.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/configs.c b/kernel/configs.c index a28c79c5f713..139ecc74bcee 100644 --- a/kernel/configs.c +++ b/kernel/configs.c @@ -15,6 +15,7 @@ #include #include #include +#include /* * "IKCFG_ST" and "IKCFG_ED" are used to extract the config data from @@ -64,6 +65,11 @@ static int __init ikconfig_init(void) proc_set_size(entry, &kernel_config_data_end - &kernel_config_data); + /* Register 8 bytes before and after, to catch the marker too */ + meminspect_lock_register_id_va(MEMINSPECT_ID_CONFIG, + (void *)&kernel_config_data - 8, + &kernel_config_data_end - &kernel_config_data + 16); + return 0; } -- 2.43.0 Annotate vital static information into inspection table: - init_mm - swapper_pg_dir - _sinittext - _einittext - _end - _text - _stext - _etext Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- mm/init-mm.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/init-mm.c b/mm/init-mm.c index 4600e7605cab..6931549bb7a2 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -19,6 +20,13 @@ const struct vm_operations_struct vma_dummy_vm_ops; +MEMINSPECT_AREA_ENTRY(_sinittext, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(_einittext, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(_end, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(_text, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(_stext, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(_etext, sizeof(void *)); + /* * For dynamically allocated mm_structs, there is a dynamically sized cpumask * at the end of the structure, the size of which depends on the maximum CPU @@ -48,6 +56,9 @@ struct mm_struct init_mm = { INIT_MM_CONTEXT(init_mm) }; +MEMINSPECT_SIMPLE_ENTRY(init_mm); +MEMINSPECT_AREA_ENTRY(swapper_pg_dir, sizeof(void *)); + void setup_initial_init_mm(void *start_code, void *end_code, void *end_data, void *brk) { -- 2.43.0 Annotate vital static information into inspection table: - tainted_mask - taint_flags Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- kernel/panic.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/panic.c b/kernel/panic.c index 24cc3eec1805..e99539e18054 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -56,6 +57,7 @@ static unsigned int __read_mostly sysctl_oops_all_cpu_backtrace; int panic_on_oops = IS_ENABLED(CONFIG_PANIC_ON_OOPS); static unsigned long tainted_mask = IS_ENABLED(CONFIG_RANDSTRUCT) ? (1 << TAINT_RANDSTRUCT) : 0; +MEMINSPECT_SIMPLE_ENTRY(tainted_mask); static int pause_on_oops; static int pause_on_oops_flag; static DEFINE_SPINLOCK(pause_on_oops_lock); @@ -662,6 +664,8 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = { TAINT_FLAG(FWCTL, 'J', ' ', true), }; +MEMINSPECT_SIMPLE_ENTRY(taint_flags); + #undef TAINT_FLAG static void print_tainted_seq(struct seq_buf *s, bool verbose) -- 2.43.0 Annotate vital static information into meminspect: - kallsysms_num_syms - kallsyms_relative_base - kallsysms_offsets - kallsysms_names - kallsyms_token_table - kallsyms_token_index - kallsyms_markers - kallsyms_seqs_of_names Information on these variables is stored into inspection table. Signed-off-by: Eugen Hristev --- kernel/kallsyms.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 1e7635864124..06a77a09088a 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -31,9 +31,19 @@ #include #include #include +#include #include "kallsyms_internal.h" +MEMINSPECT_SIMPLE_ENTRY(kallsyms_num_syms); +MEMINSPECT_SIMPLE_ENTRY(kallsyms_relative_base); +MEMINSPECT_AREA_ENTRY(kallsyms_offsets, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(kallsyms_names, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(kallsyms_token_table, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(kallsyms_token_index, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(kallsyms_markers, sizeof(void *)); +MEMINSPECT_AREA_ENTRY(kallsyms_seqs_of_names, sizeof(void *)); + /* * Expand a compressed symbol data into the resulting uncompressed string, * if uncompressed string is too long (>= maxlen), it will be truncated, -- 2.43.0 Annotate vital static information into inspection table: - high_memory Information on these variables is stored into dedicated inspection section. Signed-off-by: Eugen Hristev --- mm/mm_init.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/mm_init.c b/mm/mm_init.c index 3db2dea7db4c..c31062a3ff47 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -32,6 +32,7 @@ #include #include #include +#include #include "internal.h" #include "slab.h" #include "shuffle.h" @@ -52,6 +53,7 @@ EXPORT_SYMBOL(mem_map); */ void *high_memory; EXPORT_SYMBOL(high_memory); +MEMINSPECT_SIMPLE_ENTRY(high_memory); #ifdef CONFIG_DEBUG_MEMORY_INIT int __meminitdata mminit_loglevel; -- 2.43.0 Annotate runqueues into meminspect. Even if these are static, they are defined percpu, and a later init call will instantiate them for each cpu. Hence, we cannot annotate them in the usual way, but rather have to call meminspect API at init time. Signed-off-by: Eugen Hristev --- kernel/sched/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f1ebf67b48e2..a68367daddb4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -69,6 +69,7 @@ #include #include #include +#include #ifdef CONFIG_PREEMPT_DYNAMIC # ifdef CONFIG_GENERIC_IRQ_ENTRY @@ -8792,6 +8793,7 @@ void __init sched_init(void) rq->core_cookie = 0UL; #endif zalloc_cpumask_var_node(&rq->scratch_mask, GFP_KERNEL, cpu_to_node(i)); + meminspect_lock_register_va(rq, sizeof(*rq)); } set_load_weight(&init_task, false); -- 2.43.0 This memblock flag indicates that a specific block is registered into an inspection table. The block can be marked for inspection using memblock_mark_inspect() and cleared with memblock_clear_inspect() Signed-off-by: Eugen Hristev --- include/linux/memblock.h | 7 +++++++ mm/memblock.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 221118b5a16e..c3e55a4475cf 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -51,6 +51,10 @@ extern unsigned long long max_possible_pfn; * memory reservations yet, so we get scratch memory from the previous * kernel that we know is good to use. It is the only memory that * allocations may happen from in this phase. + * @MEMBLOCK_INSPECT: memory region is annotated in kernel memory inspection + * table. This means a dedicated entry will be created for this region which + * will contain the memory's address and size. This allows kernel inspectors + * to retrieve the memory. */ enum memblock_flags { MEMBLOCK_NONE = 0x0, /* No special request */ @@ -61,6 +65,7 @@ enum memblock_flags { MEMBLOCK_RSRV_NOINIT = 0x10, /* don't initialize struct pages */ MEMBLOCK_RSRV_KERN = 0x20, /* memory reserved for kernel use */ MEMBLOCK_KHO_SCRATCH = 0x40, /* scratch memory for kexec handover */ + MEMBLOCK_INSPECT = 0x80, /* memory selected for kernel inspection */ }; /** @@ -149,6 +154,8 @@ unsigned long memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1, bool memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size); bool memblock_validate_numa_coverage(unsigned long threshold_bytes); +int memblock_mark_inspect(phys_addr_t base, phys_addr_t size); +int memblock_clear_inspect(phys_addr_t base, phys_addr_t size); int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index e23e16618e9b..a5df5ab286e5 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -17,6 +17,7 @@ #include #include #include +#include #ifdef CONFIG_KEXEC_HANDOVER #include @@ -1016,6 +1017,40 @@ static int __init_memblock memblock_setclr_flag(struct memblock_type *type, return 0; } +/** + * memblock_mark_inspect - Mark inspectable memory with flag MEMBLOCK_INSPECT. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_mark_inspect(phys_addr_t base, phys_addr_t size) +{ + int ret; + + ret = memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_INSPECT); + if (ret) + return ret; + + meminspect_lock_register_pa(base, size); + + return 0; +} + +/** + * memblock_clear_inspect - Clear flag MEMBLOCK_INSPECT for a specified region. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_clear_inspect(phys_addr_t base, phys_addr_t size) +{ + meminspect_lock_unregister_pa(base, size); + + return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_INSPECT); +} + /** * memblock_mark_hotplug - Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG. * @base: the base phys addr of the region @@ -2704,6 +2739,7 @@ static const char * const flagname[] = { [ilog2(MEMBLOCK_RSRV_NOINIT)] = "RSV_NIT", [ilog2(MEMBLOCK_RSRV_KERN)] = "RSV_KERN", [ilog2(MEMBLOCK_KHO_SCRATCH)] = "KHO_SCRATCH", + [ilog2(MEMBLOCK_INSPECT)] = "INSPECT", }; static int memblock_debug_show(struct seq_file *m, void *private) -- 2.43.0 Register dynamic information into meminspect: - dynamic node data for each node This information is being allocated for each node, as physical address, so call memblock_mark_inspect that will mark the block accordingly. Signed-off-by: Eugen Hristev --- mm/numa.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/numa.c b/mm/numa.c index 7d5e06fe5bd4..379065dd633e 100644 --- a/mm/numa.c +++ b/mm/numa.c @@ -4,6 +4,7 @@ #include #include #include +#include struct pglist_data *node_data[MAX_NUMNODES]; EXPORT_SYMBOL(node_data); @@ -20,6 +21,7 @@ void __init alloc_node_data(int nid) if (!nd_pa) panic("Cannot allocate %zu bytes for node %d data\n", nd_size, nid); + memblock_mark_inspect(nd_pa, nd_size); /* report and initialize */ pr_info("NODE_DATA(%d) allocated [mem %#010Lx-%#010Lx]\n", nid, -- 2.43.0 Annotate vital static information into meminspect: - mem_section Information on these variables is stored into inspection table. Register dynamic information into meminspect: - section - mem_section_usage This information is being allocated for each node, so call memblock_mark_inspect to mark the block accordingly. Signed-off-by: Eugen Hristev --- mm/sparse.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/sparse.c b/mm/sparse.c index 17c50a6415c2..80530e39c8b2 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "internal.h" #include @@ -30,6 +31,7 @@ struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] ____cacheline_internodealigned_in_smp; #endif EXPORT_SYMBOL(mem_section); +MEMINSPECT_SIMPLE_ENTRY(mem_section); #ifdef NODE_NOT_IN_PAGE_FLAGS /* @@ -253,6 +255,7 @@ static void __init memblocks_present(void) size = sizeof(struct mem_section *) * NR_SECTION_ROOTS; align = 1 << (INTERNODE_CACHE_SHIFT); mem_section = memblock_alloc_or_panic(size, align); + memblock_mark_inspect(virt_to_phys(mem_section), size); } #endif @@ -343,6 +346,7 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, limit = MEMBLOCK_ALLOC_ACCESSIBLE; goto again; } + memblock_mark_inspect(virt_to_phys(usage), size); return usage; } -- 2.43.0 Annotate vital static information into meminspect: - prb_descs - prb_infos - prb - prb_data - printk_rb_static - printk_rb_dynamic Information on these variables is stored into inspection table. Register dynamic information into meminspect: - new_descs - new_infos - new_log_buf This information is being allocated as a memblock, so call memblock_mark_inspect to mark the block accordingly. Signed-off-by: Eugen Hristev --- kernel/printk/printk.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 5aee9ffb16b9..8b5aba2527ac 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -49,6 +49,7 @@ #include #include #include +#include #include #include @@ -513,10 +514,16 @@ static u32 log_buf_len = __LOG_BUF_LEN; #endif _DEFINE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS, PRB_AVGBITS, &__log_buf[0]); +MEMINSPECT_NAMED_ENTRY(prb_descs, _printk_rb_static_descs); +MEMINSPECT_NAMED_ENTRY(prb_infos, _printk_rb_static_infos); +MEMINSPECT_NAMED_ENTRY(prb_data, __log_buf); +MEMINSPECT_SIMPLE_ENTRY(printk_rb_static); static struct printk_ringbuffer printk_rb_dynamic; +MEMINSPECT_SIMPLE_ENTRY(printk_rb_dynamic); struct printk_ringbuffer *prb = &printk_rb_static; +MEMINSPECT_SIMPLE_ENTRY(prb); /* * We cannot access per-CPU data (e.g. per-CPU flush irq_work) before @@ -1190,6 +1197,7 @@ void __init setup_log_buf(int early) new_log_buf_len); goto out; } + memblock_mark_inspect(virt_to_phys(new_log_buf), new_log_buf_len); new_descs_size = new_descs_count * sizeof(struct prb_desc); new_descs = memblock_alloc(new_descs_size, LOG_ALIGN); @@ -1198,6 +1206,7 @@ void __init setup_log_buf(int early) new_descs_size); goto err_free_log_buf; } + memblock_mark_inspect(virt_to_phys(new_descs), new_descs_size); new_infos_size = new_descs_count * sizeof(struct printk_info); new_infos = memblock_alloc(new_infos_size, LOG_ALIGN); @@ -1206,6 +1215,7 @@ void __init setup_log_buf(int early) new_infos_size); goto err_free_descs; } + memblock_mark_inspect(virt_to_phys(new_infos), new_infos_size); prb_rec_init_rd(&r, &info, &setup_text_buf[0], sizeof(setup_text_buf)); @@ -1258,8 +1268,10 @@ void __init setup_log_buf(int early) err_free_descs: memblock_free(new_descs, new_descs_size); + memblock_clear_inspect(virt_to_phys(new_descs), new_descs_size); err_free_log_buf: memblock_free(new_log_buf, new_log_buf_len); + memblock_clear_inspect(virt_to_phys(new_log_buf), new_log_buf_len); out: print_log_buf_usage_stats(); } -- 2.43.0 Extract the minidump definitions into a header such that the definitions can be reused by other drivers. No other change, purely moving the definitions. Signed-off-by: Eugen Hristev --- drivers/remoteproc/qcom_common.c | 56 +------------------------ include/linux/soc/qcom/minidump.h | 68 +++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 55 deletions(-) create mode 100644 include/linux/soc/qcom/minidump.h diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c index 8c8688f99f0a..4f1c8d005c97 100644 --- a/drivers/remoteproc/qcom_common.c +++ b/drivers/remoteproc/qcom_common.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include "remoteproc_internal.h" @@ -28,61 +29,6 @@ #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev) #define to_pdm_subdev(d) container_of(d, struct qcom_rproc_pdm, subdev) -#define MAX_NUM_OF_SS 10 -#define MAX_REGION_NAME_LENGTH 16 -#define SBL_MINIDUMP_SMEM_ID 602 -#define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0) -#define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) -#define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0) - -/** - * struct minidump_region - Minidump region - * @name : Name of the region to be dumped - * @seq_num: : Use to differentiate regions with same name. - * @valid : This entry to be dumped (if set to 1) - * @address : Physical address of region to be dumped - * @size : Size of the region - */ -struct minidump_region { - char name[MAX_REGION_NAME_LENGTH]; - __le32 seq_num; - __le32 valid; - __le64 address; - __le64 size; -}; - -/** - * struct minidump_subsystem - Subsystem's SMEM Table of content - * @status : Subsystem toc init status - * @enabled : if set to 1, this region would be copied during coredump - * @encryption_status: Encryption status for this subsystem - * @encryption_required : Decides to encrypt the subsystem regions or not - * @region_count : Number of regions added in this subsystem toc - * @regions_baseptr : regions base pointer of the subsystem - */ -struct minidump_subsystem { - __le32 status; - __le32 enabled; - __le32 encryption_status; - __le32 encryption_required; - __le32 region_count; - __le64 regions_baseptr; -}; - -/** - * struct minidump_global_toc - Global Table of Content - * @status : Global Minidump init status - * @md_revision : Minidump revision - * @enabled : Minidump enable status - * @subsystems : Array of subsystems toc - */ -struct minidump_global_toc { - __le32 status; - __le32 md_revision; - __le32 enabled; - struct minidump_subsystem subsystems[MAX_NUM_OF_SS]; -}; - struct qcom_ssr_subsystem { const char *name; struct srcu_notifier_head notifier_list; diff --git a/include/linux/soc/qcom/minidump.h b/include/linux/soc/qcom/minidump.h new file mode 100644 index 000000000000..25247a6216e2 --- /dev/null +++ b/include/linux/soc/qcom/minidump.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Qualcomm Minidump definitions + * + * Copyright (C) 2016 Linaro Ltd + * Copyright (C) 2015 Sony Mobile Communications Inc + * Copyright (c) 2012-2013, The Linux Foundation. All rights reserved. + */ + +#ifndef __QCOM_MINIDUMP_H__ +#define __QCOM_MINIDUMP_H__ + +#define MAX_NUM_OF_SS 10 +#define MAX_REGION_NAME_LENGTH 16 +#define SBL_MINIDUMP_SMEM_ID 602 +#define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0) +#define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) +#define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0) + +/** + * struct minidump_region - Minidump region + * @name : Name of the region to be dumped + * @seq_num: : Use to differentiate regions with same name. + * @valid : This entry to be dumped (if set to 1) + * @address : Physical address of region to be dumped + * @size : Size of the region + */ +struct minidump_region { + char name[MAX_REGION_NAME_LENGTH]; + __le32 seq_num; + __le32 valid; + __le64 address; + __le64 size; +}; + +/** + * struct minidump_subsystem - Subsystem's SMEM Table of content + * @status : Subsystem toc init status + * @enabled : if set to 1, this region would be copied during coredump + * @encryption_status: Encryption status for this subsystem + * @encryption_required : Decides to encrypt the subsystem regions or not + * @region_count : Number of regions added in this subsystem toc + * @regions_baseptr : regions base pointer of the subsystem + */ +struct minidump_subsystem { + __le32 status; + __le32 enabled; + __le32 encryption_status; + __le32 encryption_required; + __le32 region_count; + __le64 regions_baseptr; +}; + +/** + * struct minidump_global_toc - Global Table of Content + * @status : Global Minidump init status + * @md_revision : Minidump revision + * @enabled : Minidump enable status + * @subsystems : Array of subsystems toc + */ +struct minidump_global_toc { + __le32 status; + __le32 md_revision; + __le32 enabled; + struct minidump_subsystem subsystems[MAX_NUM_OF_SS]; +}; + +#endif -- 2.43.0 Qualcomm Minidump is a driver that manages the minidump shared memory table on Qualcomm platforms. It uses the meminspect table that it parses , in order to obtain inspection entries from the kernel, and convert them into regions. Regions are afterwards being registered into the shared memory and into the table of contents. Further, the firmware can read the table of contents and dump the memory accordingly, as per the firmware requirements. Signed-off-by: Eugen Hristev --- drivers/soc/qcom/Kconfig | 13 ++ drivers/soc/qcom/Makefile | 1 + drivers/soc/qcom/minidump.c | 272 ++++++++++++++++++++++++++++++ include/linux/soc/qcom/minidump.h | 4 + 4 files changed, 290 insertions(+) create mode 100644 drivers/soc/qcom/minidump.c diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig index 2caadbbcf830..be768537528e 100644 --- a/drivers/soc/qcom/Kconfig +++ b/drivers/soc/qcom/Kconfig @@ -180,6 +180,19 @@ config QCOM_SMEM The driver provides an interface to items in a heap shared among all processors in a Qualcomm platform. +config QCOM_MINIDUMP + tristate "Qualcomm Minidump memory inspection driver" + depends on ARCH_QCOM || COMPILE_TEST + depends on QCOM_SMEM + help + Say y here to enable the Qualcomm Minidump memory inspection driver. + This driver uses memory inspection mechanism to register minidump + regions with the Qualcomm firmware, into the shared memory. + The registered regions are being linked into the minidump table + of contents. + Further on, the firmware will be able to read the table of contents + and extract the memory regions on case-by-case basis. + config QCOM_SMD_RPM tristate "Qualcomm Resource Power Manager (RPM) over SMD" depends on ARCH_QCOM || COMPILE_TEST diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile index b7f1d2a57367..3e5a2cacccd4 100644 --- a/drivers/soc/qcom/Makefile +++ b/drivers/soc/qcom/Makefile @@ -25,6 +25,7 @@ qcom_rpmh-y += rpmh.o obj-$(CONFIG_QCOM_SMD_RPM) += rpm-proc.o smd-rpm.o obj-$(CONFIG_QCOM_SMEM) += smem.o obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o +obj-$(CONFIG_QCOM_MINIDUMP) += minidump.o CFLAGS_smp2p.o := -I$(src) obj-$(CONFIG_QCOM_SMP2P) += smp2p.o obj-$(CONFIG_QCOM_SMSM) += smsm.o diff --git a/drivers/soc/qcom/minidump.c b/drivers/soc/qcom/minidump.c new file mode 100644 index 000000000000..67ebbf09c171 --- /dev/null +++ b/drivers/soc/qcom/minidump.c @@ -0,0 +1,272 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Qualcomm Minidump kernel inspect driver + * Copyright (C) 2016,2024-2025 Linaro Ltd + * Copyright (C) 2015 Sony Mobile Communications Inc + * Copyright (c) 2012-2013, The Linux Foundation. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/** + * struct minidump - Minidump driver data information + * + * @dev: Minidump device struct. + * @toc: Minidump table of contents subsystem. + * @regions: Minidump regions array. + * @nb: Notifier block to register to meminspect. + */ +struct minidump { + struct device *dev; + struct minidump_subsystem *toc; + struct minidump_region *regions; + struct notifier_block nb; +}; + +static const char * const meminspect_id_to_md_string[] = { + "", + "ELF", + "vmcoreinfo", + "config", + "totalram", + "cpu_possible", + "cpu_present", + "cpu_online", + "cpu_active", + "mem_section", + "jiffies", + "linux_banner", + "nr_threads", + "nr_irqs", + "tainted_mask", + "taint_flags", + "node_states", + "__per_cpu_offset", + "nr_swapfiles", + "init_uts_ns", + "printk_rb_static", + "printk_rb_dynamic", + "prb", + "prb_descs", + "prb_infos", + "prb_data", + "high_memory", + "init_mm", + "init_mm_pgd", +}; + +/** + * qcom_md_table_init() - Initialize the minidump table + * @md: minidump data + * @mdss_toc: minidump subsystem table of contents + * + * Return: On success, it returns 0 and negative error value on failure. + */ +static int qcom_md_table_init(struct minidump *md, + struct minidump_subsystem *mdss_toc) +{ + md->toc = mdss_toc; + md->regions = devm_kcalloc(md->dev, MAX_NUM_REGIONS, + sizeof(*md->regions), GFP_KERNEL); + if (!md->regions) + return -ENOMEM; + + md->toc->regions_baseptr = cpu_to_le64(virt_to_phys(md->regions)); + md->toc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED); + md->toc->status = cpu_to_le32(1); + md->toc->region_count = cpu_to_le32(0); + + /* Tell bootloader not to encrypt the regions of this subsystem */ + md->toc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE); + md->toc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ); + + return 0; +} + +/** + * qcom_md_get_region_index() - Lookup minidump region by id + * @md: minidump data + * @id: minidump region id + * + * Return: On success, it returns the internal region index, on failure, + * returns negative error value + */ +static int qcom_md_get_region_index(struct minidump *md, int id) +{ + unsigned int count = le32_to_cpu(md->toc->region_count); + unsigned int i; + + for (i = 0; i < count; i++) + if (md->regions[i].seq_num == id) + return i; + + return -ENOENT; +} + +/** + * register_md_region() - Register a new minidump region + * @priv: private data + * @e: pointer to inspect entry + * + * Return: None + */ +static void __maybe_unused register_md_region(void *priv, + const struct inspect_entry *e) +{ + unsigned int num_region, region_cnt; + const char *name = "unknown"; + struct minidump_region *mdr; + struct minidump *md = priv; + + if (!(e->va || e->pa) || !e->size) { + dev_dbg(md->dev, "invalid region requested\n"); + return; + } + + if (e->id < ARRAY_SIZE(meminspect_id_to_md_string)) + name = meminspect_id_to_md_string[e->id]; + + if (qcom_md_get_region_index(md, e->id) >= 0) { + dev_dbg(md->dev, "%s:%d region is already registered\n", + name, e->id); + return; + } + + /* Check if there is a room for a new entry */ + num_region = le32_to_cpu(md->toc->region_count); + if (num_region >= MAX_NUM_REGIONS) { + dev_dbg(md->dev, "maximum region limit %u reached\n", + num_region); + return; + } + + region_cnt = le32_to_cpu(md->toc->region_count); + mdr = &md->regions[region_cnt]; + scnprintf(mdr->name, MAX_REGION_NAME_LENGTH, "K%.8s", name); + mdr->seq_num = e->id; + if (e->pa) + mdr->address = cpu_to_le64(e->pa); + else if (e->va) + mdr->address = cpu_to_le64(__pa(e->va)); + mdr->size = cpu_to_le64(ALIGN(e->size, 4)); + mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID); + region_cnt++; + md->toc->region_count = cpu_to_le32(region_cnt); + + dev_dbg(md->dev, "%s:%d region registered %llx:%llx\n", + mdr->name, mdr->seq_num, mdr->address, mdr->size); +} + +/** + * unregister_md_region() - Unregister a previously registered minidump region + * @priv: private data + * @e: pointer to inspect entry + * + * Return: None + */ +static void __maybe_unused unregister_md_region(void *priv, + const struct inspect_entry *e) +{ + struct minidump_region *mdr; + struct minidump *md = priv; + unsigned int region_cnt; + unsigned int idx; + + idx = qcom_md_get_region_index(md, e->id); + if (idx < 0) { + dev_dbg(md->dev, "%d region is not present\n", e->id); + return; + } + + mdr = &md->regions[0]; + region_cnt = le32_to_cpu(md->toc->region_count); + + /* + * Left shift one position all the regions located after the + * region being removed, in order to fill the gap. + * Then, zero out the last region at the end. + */ + memmove(&mdr[idx], &mdr[idx + 1], (region_cnt - idx - 1) * sizeof(*mdr)); + memset(&mdr[region_cnt - 1], 0, sizeof(*mdr)); + region_cnt--; + md->toc->region_count = cpu_to_le32(region_cnt); +} + +static int qcom_md_notifier_cb(struct notifier_block *nb, + unsigned long code, void *entry) +{ + struct minidump *md = container_of(nb, struct minidump, nb); + + if (code == MEMINSPECT_NOTIFIER_ADD) + register_md_region(md, entry); + else if (code == MEMINSPECT_NOTIFIER_REMOVE) + unregister_md_region(md, entry); + + return 0; +} + +static int qcom_md_probe(struct platform_device *pdev) +{ + struct minidump_global_toc *mdgtoc; + struct device *dev = &pdev->dev; + struct minidump *md; + size_t size; + int ret; + + md = devm_kzalloc(dev, sizeof(*md), GFP_KERNEL); + if (!md) + return -ENOMEM; + platform_set_drvdata(pdev, md); + + md->dev = dev; + md->nb.notifier_call = qcom_md_notifier_cb; + + mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size); + if (IS_ERR(mdgtoc)) { + ret = PTR_ERR(mdgtoc); + dev_err_probe(dev, ret, "Couldn't find minidump smem item\n"); + } + + if (size < sizeof(*mdgtoc) || !mdgtoc->status) + dev_err_probe(dev, -EINVAL, "minidump table not ready\n"); + + ret = qcom_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_SUBSYSTEM_APSS]); + if (ret) + dev_err_probe(dev, ret, "Could not initialize table\n"); + + meminspect_notifier_register(&md->nb); + + meminspect_lock_traverse(md, register_md_region); + return 0; +} + +static void qcom_md_remove(struct platform_device *pdev) +{ + struct minidump *md = platform_get_drvdata(pdev); + + meminspect_notifier_unregister(&md->nb); + meminspect_lock_traverse(md, unregister_md_region); +} + +static struct platform_driver qcom_md_driver = { + .probe = qcom_md_probe, + .remove = qcom_md_remove, + .driver = { + .name = "qcom-minidump", + }, +}; + +module_platform_driver(qcom_md_driver); + +MODULE_AUTHOR("Eugen Hristev "); +MODULE_AUTHOR("Mukesh Ojha "); +MODULE_DESCRIPTION("Qualcomm minidump inspect driver"); +MODULE_LICENSE("GPL"); diff --git a/include/linux/soc/qcom/minidump.h b/include/linux/soc/qcom/minidump.h index 25247a6216e2..f90b61feb550 100644 --- a/include/linux/soc/qcom/minidump.h +++ b/include/linux/soc/qcom/minidump.h @@ -10,12 +10,16 @@ #ifndef __QCOM_MINIDUMP_H__ #define __QCOM_MINIDUMP_H__ +#define MINIDUMP_SUBSYSTEM_APSS 0 #define MAX_NUM_OF_SS 10 #define MAX_REGION_NAME_LENGTH 16 #define SBL_MINIDUMP_SMEM_ID 602 #define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0) #define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) +#define MINIDUMP_SS_ENCR_NOTREQ (0 << 24 | 0 << 16 | 'N' << 8 | 'R' << 0) #define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0) +#define MAX_NUM_REGIONS 201 + /** * struct minidump_region - Minidump region -- 2.43.0 Add a minidump platform device. Minidump can collect various memory snippets using dedicated firmware. To know which snippets to collect, each snippet must be registered by the kernel into a specific shared memory table which is controlled by the qcom smem driver. To instantiate the minidump platform driver, register its data using platform_device_register_data. Later on, the minidump driver will probe and obtain the required memory snippets from the memory inspection table (meminspect) Signed-off-by: Eugen Hristev --- drivers/soc/qcom/smem.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c index c4c45f15dca4..03315722d71a 100644 --- a/drivers/soc/qcom/smem.c +++ b/drivers/soc/qcom/smem.c @@ -270,6 +270,7 @@ struct smem_region { * @partitions: list of partitions of current processor/host * @item_count: max accepted item number * @socinfo: platform device pointer + * @mdinfo: minidump device pointer * @num_regions: number of @regions * @regions: list of the memory regions defining the shared memory */ @@ -280,6 +281,7 @@ struct qcom_smem { u32 item_count; struct platform_device *socinfo; + struct platform_device *mdinfo; struct smem_ptable *ptable; struct smem_partition global_partition; struct smem_partition partitions[SMEM_HOST_COUNT]; @@ -1236,12 +1238,20 @@ static int qcom_smem_probe(struct platform_device *pdev) if (IS_ERR(smem->socinfo)) dev_dbg(&pdev->dev, "failed to register socinfo device\n"); + smem->mdinfo = platform_device_register_data(&pdev->dev, "qcom-minidump", + PLATFORM_DEVID_AUTO, NULL, + 0); + if (IS_ERR(smem->mdinfo)) + dev_err(&pdev->dev, "failed to register platform md device\n"); + return 0; } static void qcom_smem_remove(struct platform_device *pdev) { platform_device_unregister(__smem->socinfo); + if (!IS_ERR(__smem->mdinfo)) + platform_device_unregister(__smem->mdinfo); hwspin_lock_free(__smem->hwlock); __smem = NULL; -- 2.43.0 Add documentation for Google Kinfo Pixel reserved memory area. Signed-off-by: Eugen Hristev --- .../reserved-memory/google,kinfo.yaml | 49 +++++++++++++++++++ MAINTAINERS | 5 ++ 2 files changed, 54 insertions(+) create mode 100644 Documentation/devicetree/bindings/reserved-memory/google,kinfo.yaml diff --git a/Documentation/devicetree/bindings/reserved-memory/google,kinfo.yaml b/Documentation/devicetree/bindings/reserved-memory/google,kinfo.yaml new file mode 100644 index 000000000000..12d0b2815c02 --- /dev/null +++ b/Documentation/devicetree/bindings/reserved-memory/google,kinfo.yaml @@ -0,0 +1,49 @@ +# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/reserved-memory/google,kinfo.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Google Pixel Kinfo reserved memory + +maintainers: + - Eugen Hristev + +description: + This binding describes the Google Pixel Kinfo reserved memory, a region + of reserved-memory used to store data for firmware/bootloader on the Pixel + platform. The data stored is debugging information on the running kernel. + +properties: + compatible: + items: + - const: google,kinfo + + memory-region: + maxItems: 1 + description: Reference to the reserved-memory for the data + +required: + - compatible + - memory-region + +additionalProperties: true + +examples: + - | + reserved-memory { + #address-cells = <1>; + #size-cells = <1>; + ranges; + + kinfo_region: smem@fa00000 { + reg = <0xfa00000 0x1000>; + no-map; + }; + }; + + debug-kinfo { + compatible = "google,debug-kinfo"; + + memory-region = <&kinfo_region>; + }; diff --git a/MAINTAINERS b/MAINTAINERS index 2cb2cc427c90..8034940d0b1e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16164,6 +16164,11 @@ F: Documentation/dev-tools/meminspect.rst F: include/linux/meminspect.h F: kernel/meminspect/* +MEMINSPECT KINFO DRIVER +M: Eugen Hristev +S: Maintained +F: Documentation/devicetree/bindings/misc/google,kinfo.yaml + MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION M: Mike Rapoport L: linux-mm@kvack.org -- 2.43.0 With this driver, the registered regions are copied to a shared memory zone at register time. The shared memory zone is supplied via OF. This driver will select only regions that are of interest, and keep only addresses. The format of the list is Kinfo compatible, with devices like Google Pixel phone. The firmware is only interested in some symbols' addresses. Signed-off-by: Eugen Hristev --- MAINTAINERS | 1 + kernel/meminspect/Kconfig | 10 ++ kernel/meminspect/Makefile | 1 + kernel/meminspect/kinfo.c | 289 +++++++++++++++++++++++++++++++++++++ 4 files changed, 301 insertions(+) create mode 100644 kernel/meminspect/kinfo.c diff --git a/MAINTAINERS b/MAINTAINERS index 8034940d0b1e..9cba0e472e01 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16168,6 +16168,7 @@ MEMINSPECT KINFO DRIVER M: Eugen Hristev S: Maintained F: Documentation/devicetree/bindings/misc/google,kinfo.yaml +F: kernel/meminspect/kinfo.c MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION M: Mike Rapoport diff --git a/kernel/meminspect/Kconfig b/kernel/meminspect/Kconfig index 8680fbf0e285..396510908e47 100644 --- a/kernel/meminspect/Kconfig +++ b/kernel/meminspect/Kconfig @@ -18,3 +18,13 @@ config MEMINSPECT Note that modules using this feature must be rebuilt if option changes. +config MEMINSPECT_KINFO + tristate "Shared memory KInfo compatible driver" + depends on MEMINSPECT + help + Say y here to enable the Shared memory KInfo compatible driver + With this driver, the registered regions are copied to a shared + memory zone at register time. + The shared memory zone is supplied via OF. + This driver will select only regions that are of interest, + and keep only addresses. The format of the list is Kinfo compatible. diff --git a/kernel/meminspect/Makefile b/kernel/meminspect/Makefile index 09fd55e6d9cf..283604d892e5 100644 --- a/kernel/meminspect/Makefile +++ b/kernel/meminspect/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_MEMINSPECT) += meminspect.o +obj-$(CONFIG_MEMINSPECT_KINFO) += kinfo.o diff --git a/kernel/meminspect/kinfo.c b/kernel/meminspect/kinfo.c new file mode 100644 index 000000000000..62f8ee7a66a9 --- /dev/null +++ b/kernel/meminspect/kinfo.c @@ -0,0 +1,289 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * + * Copyright 2002 Rusty Russell IBM Corporation + * Copyright 2021 Google LLC + * Copyright 2025 Linaro Ltd. Eugen Hristev + */ +#include +#include +#include +#include +#include +#include +#include +#include + +#define BUILD_INFO_LEN 256 +#define DEBUG_KINFO_MAGIC 0xcceeddff + +/* + * Header structure must be byte-packed, since the table is provided to + * bootloader. + */ +struct kernel_info { + /* For kallsyms */ + u8 enabled_all; + u8 enabled_base_relative; + u8 enabled_absolute_percpu; + u8 enabled_cfi_clang; + u32 num_syms; + u16 name_len; + u16 bit_per_long; + u16 module_name_len; + u16 symbol_len; + u64 _relative_pa; + u64 _text_pa; + u64 _stext_pa; + u64 _etext_pa; + u64 _sinittext_pa; + u64 _einittext_pa; + u64 _end_pa; + u64 _offsets_pa; + u64 _names_pa; + u64 _token_table_pa; + u64 _token_index_pa; + u64 _markers_pa; + u64 _seqs_of_names_pa; + + /* For frame pointer */ + u32 thread_size; + + /* For virt_to_phys */ + u64 swapper_pg_dir_pa; + + /* For linux banner */ + u8 last_uts_release[__NEW_UTS_LEN]; + + /* Info of running build */ + u8 build_info[BUILD_INFO_LEN]; + + /* For module kallsyms */ + u32 enabled_modules_tree_lookup; + u32 mod_mem_offset; + u32 mod_kallsyms_offset; +} __packed; + +struct kernel_all_info { + u32 magic_number; + u32 combined_checksum; + struct kernel_info info; +} __packed; + +struct debug_kinfo { + struct device *dev; + void *all_info_addr; + size_t all_info_size; + struct notifier_block nb; +}; + +static void update_kernel_all_info(struct kernel_all_info *all_info) +{ + struct kernel_info *info; + u32 *checksum_info; + int index; + + all_info->magic_number = DEBUG_KINFO_MAGIC; + all_info->combined_checksum = 0; + + info = &all_info->info; + checksum_info = (u32 *)info; + for (index = 0; index < sizeof(*info) / sizeof(u32); index++) + all_info->combined_checksum ^= checksum_info[index]; +} + +static u8 global_build_info[BUILD_INFO_LEN]; + +static int build_info_set(const char *str, const struct kernel_param *kp) +{ + size_t build_info_size = sizeof(global_build_info); + + if (strlen(str) > build_info_size) + return -ENOMEM; + memcpy(global_build_info, str, min(build_info_size - 1, strlen(str))); + return 0; +} + +static const struct kernel_param_ops build_info_op = { + .set = build_info_set, +}; + +module_param_cb(build_info, &build_info_op, NULL, 0200); +MODULE_PARM_DESC(build_info, "Write build info to field 'build_info' of debug kinfo."); + +static void __maybe_unused register_kinfo_region(void *priv, + const struct inspect_entry *e) +{ + struct debug_kinfo *kinfo = priv; + struct kernel_all_info *all_info = kinfo->all_info_addr; + struct kernel_info *info = &all_info->info; + struct uts_namespace *uts; + u64 paddr; + + if (e->pa) + paddr = e->pa; + else + paddr = __pa(e->va); + + switch (e->id) { + case MEMINSPECT_ID__sinittext: + info->_sinittext_pa = paddr; + break; + case MEMINSPECT_ID__einittext: + info->_einittext_pa = paddr; + break; + case MEMINSPECT_ID__end: + info->_end_pa = paddr; + break; + case MEMINSPECT_ID__text: + info->_text_pa = paddr; + break; + case MEMINSPECT_ID__stext: + info->_stext_pa = paddr; + break; + case MEMINSPECT_ID__etext: + info->_etext_pa = paddr; + break; + case MEMINSPECT_ID_kallsyms_num_syms: + info->num_syms = *(__u32 *)e->va; + break; + case MEMINSPECT_ID_kallsyms_relative_base: + info->_relative_pa = (u64)__pa(*(u64 *)e->va); + break; + case MEMINSPECT_ID_kallsyms_offsets: + info->_offsets_pa = paddr; + break; + case MEMINSPECT_ID_kallsyms_names: + info->_names_pa = paddr; + break; + case MEMINSPECT_ID_kallsyms_token_table: + info->_token_table_pa = paddr; + break; + case MEMINSPECT_ID_kallsyms_token_index: + info->_token_index_pa = paddr; + break; + case MEMINSPECT_ID_kallsyms_markers: + info->_markers_pa = paddr; + break; + case MEMINSPECT_ID_kallsyms_seqs_of_names: + info->_seqs_of_names_pa = paddr; + break; + case MEMINSPECT_ID_swapper_pg_dir: + info->swapper_pg_dir_pa = paddr; + break; + case MEMINSPECT_ID_init_uts_ns: + if (!e->va) + return; + uts = e->va; + strscpy(info->last_uts_release, uts->name.release, __NEW_UTS_LEN); + break; + default: + break; + }; + + update_kernel_all_info(all_info); +} + +static int kinfo_notifier_cb(struct notifier_block *nb, + unsigned long code, void *entry) +{ + struct debug_kinfo *kinfo = container_of(nb, struct debug_kinfo, nb); + + if (code == MEMINSPECT_NOTIFIER_ADD) + register_kinfo_region(kinfo, entry); + + return NOTIFY_DONE; +} + +static int debug_kinfo_probe(struct platform_device *pdev) +{ + struct kernel_all_info *all_info; + struct device *dev = &pdev->dev; + struct device_node *mem_region; + struct reserved_mem *rmem; + struct debug_kinfo *kinfo; + struct kernel_info *info; + + mem_region = of_parse_phandle(dev->of_node, "memory-region", 0); + if (!mem_region) + return dev_err_probe(dev, -ENODEV, "no such memory-region\n"); + + rmem = of_reserved_mem_lookup(mem_region); + if (!rmem) + return dev_err_probe(dev, -ENODEV, "no such reserved mem of node name %s\n", + dev->of_node->name); + + /* Need to wait for reserved memory to be mapped */ + if (!rmem->priv) + return -EPROBE_DEFER; + + if (!rmem->base || !rmem->size) + dev_err_probe(dev, -EINVAL, "unexpected reserved memory\n"); + + if (rmem->size < sizeof(struct kernel_all_info)) + dev_err_probe(dev, -EINVAL, "reserved memory size too small\n"); + + kinfo = devm_kzalloc(dev, sizeof(*kinfo), GFP_KERNEL); + if (!kinfo) + return -ENOMEM; + platform_set_drvdata(pdev, kinfo); + + kinfo->dev = dev; + + kinfo->all_info_addr = rmem->priv; + kinfo->all_info_size = rmem->size; + + all_info = kinfo->all_info_addr; + + memset(all_info, 0, sizeof(struct kernel_all_info)); + info = &all_info->info; + info->enabled_all = IS_ENABLED(CONFIG_KALLSYMS_ALL); + info->enabled_absolute_percpu = IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU); + info->enabled_base_relative = IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE); + info->enabled_cfi_clang = IS_ENABLED(CONFIG_CFI_CLANG); + info->name_len = KSYM_NAME_LEN; + info->bit_per_long = BITS_PER_LONG; + info->module_name_len = MODULE_NAME_LEN; + info->symbol_len = KSYM_SYMBOL_LEN; + info->thread_size = THREAD_SIZE; + info->enabled_modules_tree_lookup = IS_ENABLED(CONFIG_MODULES_TREE_LOOKUP); + info->mod_mem_offset = offsetof(struct module, mem); + info->mod_kallsyms_offset = offsetof(struct module, kallsyms); + + memcpy(info->build_info, global_build_info, strlen(global_build_info)); + + kinfo->nb.notifier_call = kinfo_notifier_cb; + + meminspect_notifier_register(&kinfo->nb); + meminspect_lock_traverse(kinfo, register_kinfo_region); + + return 0; +} + +static void debug_kinfo_remove(struct platform_device *pdev) +{ + struct debug_kinfo *kinfo = platform_get_drvdata(pdev); + + meminspect_notifier_unregister(&kinfo->nb); +} + +static const struct of_device_id debug_kinfo_of_match[] = { + { .compatible = "google,debug-kinfo" }, + {}, +}; +MODULE_DEVICE_TABLE(of, debug_kinfo_of_match); + +static struct platform_driver debug_kinfo_driver = { + .probe = debug_kinfo_probe, + .remove = debug_kinfo_remove, + .driver = { + .name = "debug-kinfo", + .of_match_table = of_match_ptr(debug_kinfo_of_match), + }, +}; +module_platform_driver(debug_kinfo_driver); + +MODULE_AUTHOR("Eugen Hristev "); +MODULE_AUTHOR("Jone Chou "); +MODULE_DESCRIPTION("meminspect Kinfo Driver"); +MODULE_LICENSE("GPL"); -- 2.43.0