Add fw_smepmp_ids bitmap to track PMP entries that protect firmware
regions. Allow us to preserve these critical entries across domain
transitions and check inconsistent firmware entry allocation.
Also add sbi_hart_smepmp_is_fw_region() helper function to query
whether a given SmePMP entry protects firmware regions.
Signed-off-by: Yu-Chien Peter Lin <peter.lin@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251008084444.3525615-8-peter.lin@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
During domain context switches, all PMP entries are reconfigured
which can clear firmware access permissions, causing M-mode access
faults under SmePMP.
Sort domain regions to place firmware regions first, ensuring
consistent firmware PMP entries so they won't be revoked during
domain context switches.
Signed-off-by: Yu-Chien Peter Lin <peter.lin@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251008084444.3525615-7-peter.lin@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Previously, when memory regions exceed available PMP entries,
some regions were silently ignored. If the last entry that covers
the full 64-bit address space is not added to a domain, the next
stage S-mode software won't have permission to access and fetch
instructions from its memory. So return early with error message
to catch such situation.
Signed-off-by: Yu-Chien Peter Lin <peter.lin@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251008084444.3525615-5-peter.lin@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Previously, in sbi_pmu_ctr_cfg_match() function, ctr_idx was used immediately
after pmu_ctr_find_fw() or pmu_ctr_find_hw() calls. In first case, array index
was (ctr_idx - num_hw_ctrs), in second - ctr_idx. But pmu_ctr_find_fw() and
pmu_ctr_find_hw() functions can return negative value, in which case writing
in arrays with such indexes would corrupt sbi_pmu_hart_state structure.
To avoid this situation, direct ctr_idx value check added.
Signed-off-by: Alexander Chuprunov <alexander.chuprunov@syntacore.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250918090706.2217603-4-alexander.chuprunov@syntacore.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Generally, hardware performance counters can only be started, stopped,
or configured from machine-mode using mcountinhibit and mhpmeventX CSRs.
Also, in opensbi only sbi_pmu_ctr_cfg_match() managed mhpmeventX. But
in generic Linux driver, when perf starts, Linux calls both
sbi_pmu_ctr_cfg_match() and sbi_pmu_ctr_start(), while after hart suspend
only sbi_pmu_ctr_start() command called through SBI interface. This doesn't
work properly in case when suspend state resets HPM registers. In order
to keep counter integrity, sbi_pmu_ctr_start() modified. First, we're saving
hw_counters_data, and after hart suspend this value is restored if
event is currently active.
Signed-off-by: Alexander Chuprunov <alexander.chuprunov@syntacore.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250918090706.2217603-2-alexander.chuprunov@syntacore.com
Signed-off-by: Anup Patel <anup@brainfault.org>
A platform can have multiple IPI devices (such as ACLINT MSWI,
AIA IMSIC, etc). Currently, OpenSBI rely on platform calling
the sbi_ipi_set_device() function in correct order and prefer
the first avaiable IPI device which is fragile.
Instead of the above, introduce IPI device rating and prefer
the highest rated IPI device. This further allows extending
the sbi_ipi_raw_clear() to clear all available IPI devices.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Tested-by: Nick Hu <nick.hu@sifive.com>
Link: https://lore.kernel.org/r/20250904052410.546818-2-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Currently the heap has a fixed housekeeping factor of 16, which means
1/16 of the heap is reserved for list nodes. But this is not enough when
there are many small allocations; in the worst case, 1/3 of the heap is
needed for list nodes (32 byte heap_node for each 64 byte allocation).
This has caused allocation failures on some platforms.
Let's avoid trying to guess the best ratio. Instead, allocate more nodes
as needed. To avoid recursion, the nodes are permanent allocations. So
to minimize fragmentation, allocate them in small batches from the end
of the last free space node. Bootstrap the free space list by embedding
one node in the heap control struct.
Some error paths are avoided because the nodes are allocated up front.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250617032306.1494528-3-samuel.holland@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Using hsm stop in hsm wait loop causes secondary harts to be stuck
forever in OpenSBI on RISC-V platforms where HSM hart hotplug is
available and all harts come-up at the same time during system
power-on.
For example, lets say we have two harts A and B on a RISC-V platform
with HSM hart hotplug which come-up at the same time during system
power-on. The hart A enters OpenSBI before hart B hence it becomes
the primary (or cold-boot) hart whereas hart B becomes the secondary
(or warm-boot) hart. The hart A follows the OpenSBI cold-boot path
and registers hsm device before hart B enters OpenSBI. The hart B
eventually enters OpenSBI and follows the OpenSBI warm-boot path
so it will increment it's own entry_count before entering hsm wait
loop where it sees hsm device and stops itself. Later as part of
the Linux boot-up sequence, hart A issues SBI HSM start call to
bring-up hart B but OpenSBI sees entry_count != init_count for
hart B in sbi_hsm_hart_start() hence hsm_device_hart_start() is
not called for hart B resulting in hart B stuck forever in OpenSBI.
To fix the above issue, revert entry_count before doing hsm stop
in hsm wait loop.
Fixes: d844deadec ("lib: sbi: Use hsm stop for hsm wait")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Nick Hu <nick.hu@sifive.com>
Link: https://lore.kernel.org/r/20250527124821.2113467-1-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Currently, when configuring a matching programmable HPM counter with
Sscofpmf being present, cidx_base > 2, and cidx_mask == 0 to monitor
either the CPU_CYCLES or INSTRUCTIONS hardware event,
sbi_pmu_ctr_cfg_match will succeed but it will configure the
corresponding fixed counter instead of the counter specified by the
cidx_base parameter.
During counter configuration, the following issues may arise:
- If the SKIP_MATCH flag is set, an out-of-bounds memory read of the
phs->active_events array would occur, which could lead to undefined
behavior.
- If the CLEAR_VALUE flag is set, the corresponding fixed counter will
be reset, which could be considered unexpected behavior.
- If the AUTO_START flag is set, pmu_ctr_start_hw will silently start
the fixed counter, even though it has already started. From the
supervisor's perspective, nothing has changed, which could be confusing.
The supervisor will not see the SBI_ERR_ALREADY_STARTED error code since
sbi_pmu_ctr_cfg_match does not return the error code of
pmu_ctr_start_hw.
The only way to detect these issues is to check the ctr_idx return value
of sbi_pmu_ctr_cfg_match and compare it with cidx_base.
Fix these issues by returning the SBI_ERR_INVALID_PARAM error code if
the cidx_mask parameter value being passed in is 0 since an invalid
parameter should not lead to a successful sbi_pmu_ctr_cfg_match but with
unexpected side effects.
Following a similar rationale, add the validation check to
sbi_pmu_ctr_start and sbi_pmu_ctr_stop as well since sbi_fls is
undefined when the mask is 0.
This also aligns OpenSBI's behavior with KVM's.
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250520132533.30974-1-jamestiotio@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Zicntr extension specifies three read-only CSRs, time, cycle and
instret. It isn't sufficient to report Zicntr is fully supported with
only time CSR detected.
This patch introduces a bitmap to sbi_hart_features to record
availability of these CSRs, which are detected using traps. Zicntr is
reported as present if and only if three CSRs are all available on the
HARTs.
Sites originally depending on SBI_HART_EXT_ZICNTR for detecting
existence of time CSR are switched to detect SBI_HART_CSR_TIME instead.
Suggested-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250516133352.36617-3-ziyao@disroot.org
Signed-off-by: Anup Patel <anup@brainfault.org>
If we hotplug a core and then perform a suspend-to-RAM operation on a
multi-core platform, the hotplugged CPU may be woken up along with the rest
of the system, particularly on platforms that wake all cores from the
deepest sleep state. When this happens, the hotplugged CPU enters the
sbi_hsm_wait WFI wait loop instead of transitioning into a
platform-specific low-power state. To address this, we add a HSM stop call
within the wait loop. This allows platforms that support HSM stop to enter
a low-power state when the CPU is unexpectedly woken up.
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250418064506.15771-1-nick.hu@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>