Using hsm stop in hsm wait loop causes secondary harts to be stuck
forever in OpenSBI on RISC-V platforms where HSM hart hotplug is
available and all harts come-up at the same time during system
power-on.
For example, lets say we have two harts A and B on a RISC-V platform
with HSM hart hotplug which come-up at the same time during system
power-on. The hart A enters OpenSBI before hart B hence it becomes
the primary (or cold-boot) hart whereas hart B becomes the secondary
(or warm-boot) hart. The hart A follows the OpenSBI cold-boot path
and registers hsm device before hart B enters OpenSBI. The hart B
eventually enters OpenSBI and follows the OpenSBI warm-boot path
so it will increment it's own entry_count before entering hsm wait
loop where it sees hsm device and stops itself. Later as part of
the Linux boot-up sequence, hart A issues SBI HSM start call to
bring-up hart B but OpenSBI sees entry_count != init_count for
hart B in sbi_hsm_hart_start() hence hsm_device_hart_start() is
not called for hart B resulting in hart B stuck forever in OpenSBI.
To fix the above issue, revert entry_count before doing hsm stop
in hsm wait loop.
Fixes: d844deadec ("lib: sbi: Use hsm stop for hsm wait")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Nick Hu <nick.hu@sifive.com>
Link: https://lore.kernel.org/r/20250527124821.2113467-1-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
If we hotplug a core and then perform a suspend-to-RAM operation on a
multi-core platform, the hotplugged CPU may be woken up along with the rest
of the system, particularly on platforms that wake all cores from the
deepest sleep state. When this happens, the hotplugged CPU enters the
sbi_hsm_wait WFI wait loop instead of transitioning into a
platform-specific low-power state. To address this, we add a HSM stop call
within the wait loop. This allows platforms that support HSM stop to enter
a low-power state when the CPU is unexpectedly woken up.
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250418064506.15771-1-nick.hu@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Simplify the code and improve consistency by using the new macros where
possible. sbi_hart_count() obsoletes sbi_scratch_last_hartindex().
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
sbi_ipi_init() expects the platform warm init function to clear IPIs
on the local hart, but there is already a generic function to do this.
After this change, none of the existing drivers need a warm init
callback.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
All callers already have the hartindex available, so this removes a
hartid to hartindex conversion.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
This removes redundant hartid to hartindex conversions from four call
sites and provides a net reduction in code size.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
This removes several hartid/hartindex conversions, as well as two loops
through the mask for broadcast IPIs.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
This removes some hartindex conversions in sbi_system_suspend(), but is
mostly intended to support refactoring sbi_hsm_hart_interruptible_mask()
to work exclusively with struct sbi_hartmask.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
This avoids calls to the expensive sbi_hartid_to_hartindex() function
and also makes the firmware smaller.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Attempting to access the menvcfg CSR raises an illegal instruction
exception on hardware which implements Sm1p11 or older.
Fixes: e9ee9678ba ("lib: sbi: fwft: add support for SBI_FWFT_PTE_AD_HW_UPDATING")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Add support for SBI_FWFT_PTE_AD_HW_UPDATING based on SVADU presence.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Add support for SBI_FWFT_MISALIGNED_EXC_DELEG withing FWFT support. This
support allows to delegate misaligned accesses traps.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Hart state should change back to hart stop when hsm_device_hart_start()
or sbi_ipi_raw_send() fails to perform hart start.
Signed-off-by: Joshua Yeong <joshua.yeong@starfivetech.com>
Reviewed-by: Xiang W <wxjstz@126.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Current interruptible hart mask doesn't include the hart which HSM state
is SBI_HSM_STATE_RESUME_PENDING. So when there is a request to send an
IPI to the hart which is in the resume process, this hart would miss the
IPI forever. Put the SBI_HSM_STATE_RESUME_PENDING hart in the
interruptible hart mask to fix the issue.
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Reviewed-by: Xiang W <wxjstz@126.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
The sbi_scratch_last_hartid() macro is not of much use on platforms
with really sparse hartids so let us replace use of this macro with
other approaches.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Let us prefer hartindex over hartid in IPI framework which in-turn
forces IPI users to also prefer hartindex.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
We have redundant semicolon at quite a few places so let's remove it.
Signed-off-by: Xiang W <wxjstz@126.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
It possible that a platform supports hart hotplug (i.e. both hart_start
and hart_stop callbacks available) and all harts are start simultaneously
at platform boot-time. In this situation, the sbi_hsm_hart_start() will
call hsm_device_hart_start() for secondary harts at platform boot-time
which will fail because secondary harts were already started.
To fix above, we call hsm_device_hart_start() from sbi_hsm_hart_start()
only when entry_count is same as init_count for the secondary hart.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
When a boot hart executes sbi_hsm_hart_start() to start a secondary hart,
next_arg1, next_addr and next_mode for the latter are stored in the scratch
area after the state has been set to SBI_HSM_STATE_START_PENDING.
The secondary hart waits in the loop with wfi() in sbi_hsm_hart_wait() at
that time. However, "wfi" instruction is not guaranteed to wait for an
interrupt to be received by the hart, it is just a hint for the CPU.
According to RISC-V Privileged Architectures spec. v20211203, even an
implementation of "wfi" as "nop" is legal.
So, the secondary might leave the loop in sbi_hsm_hart_wait() as soon as
its state has been set to SBI_HSM_STATE_START_PENDING, even if it got no
IPI or it got an IPI unrelated to sbi_hsm_hart_start(). This could lead to
the following race condition when booting Linux, for example:
Boot hart (#0) Secondary hart (#1)
runs Linux startup code waits in sbi_hsm_hart_wait()
sbi_ecall(SBI_EXT_HSM,
SBI_EXT_HSM_HART_START,
...)
enters sbi_hsm_hart_start()
sets state of hart #1 to START_PENDING
leaves sbi_hsm_hart_wait()
runs to the end of init_warmboot()
returns to scratch->next_addr
(next_addr can be garbage here)
sets next_addr, etc. for hart #1
(no good: hart #1 has already left)
sends IPI to hart #1
(no good either)
If this happens, the secondary hart jumps to a wrong next_addr at the end
of init_warmboot(), which leads to a system hang or crash.
To reproduce the issue more reliably, one could add a delay in
sbi_hsm_hart_start() after setting the hart's state but before sending
IPI to that hart:
hstate = atomic_cmpxchg(&hdata->state, SBI_HSM_STATE_STOPPED,
SBI_HSM_STATE_START_PENDING);
...
+ sbi_timer_mdelay(10);
init_count = sbi_init_count(hartid);
rscratch->next_arg1 = arg1;
rscratch->next_addr = saddr;
The issue can be reproduced, for example, in a QEMU VM with '-machine virt'
and 2 or more CPUs, with Linux as the guest OS.
This patch moves writing of next_arg1, next_addr and next_mode for the
secondary hart before setting its state to SBI_HSM_STATE_START_PENDING.
In theory, it is possible that two or more harts enter sbi_hsm_hart_start()
for the same target hart simultaneously. To make sure the current hart has
exclusive access to the scratch area of the target hart at that point, a
per-hart 'start_ticket' is used. It is initially 0. The current hart tries
to acquire the ticket first (set it to 1) at the beginning of
sbi_hsm_hart_start() and only proceeds if it has successfully acquired it.
The target hart reads next_addr, etc., and then the releases the ticket
(sets it to 0) before calling sbi_hart_switch_mode(). This way, even if
some other hart manages to enter sbi_hsm_hart_start() after the ticket has
been released but before the target hart jumps to next_addr, it will not
cause problems.
atomic_cmpxchg() already has "acquire" semantics, among other things, so
no additional barriers are needed in hsm_start_ticket_acquire(). No hart
can perform or observe the update of *rscratch before setting of
'start_ticket' to 1.
atomic_write() only imposes ordering of writes, so an explicit barrier is
needed in hsm_start_ticket_release() to ensure its "release" semantics.
This guarantees that reads of scratch->next_addr, etc., in
sbi_hsm_hart_start_finish() cannot happen after 'start_ticket' has been
released.
Signed-off-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Move them into sbi_hsm_hart_start_finish() and sbi_hsm_hart_resume_finish()
to make them easier to manage.
This will be used by subsequent patches.
Suggested-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
A coming patch can make use of a few internal hsm functions if
we export them.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
While non-retentive suspend is not allowed for M-mode, the comment
at the top of sbi_hsm_hart_suspend() implied suspend wasn't allowed
for M-mode at all. Move the comment above the mode check which is
inside a suspend type is non-retentive check.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
HSM functions define when SBI_ERR_INVALID_PARAM should be returned.
Ensure it's not used for reasons that don't meet the definitions by
using the catch-all code, SBI_ERR_FAILED, for those reasons instead.
Also, in one case sbi_hart_suspend() may have returned SBI_ERR_DENIED,
which isn't defined for that function at all. Use SBI_ERR_FAILED for
that case too.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
When a state change fails there's no need to restore the original
state as it remains the same.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Remove some redundant code by creating an invalid state detection
macro.
No functional change intended.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
If we use the csr_write to restore the MIP, we may clear the SEIP.
In generic behavior of QEMU, if the pending bits of PLIC are set and we
clear the SEIP, the QEMU may not set it back immediately. It may cause
the interrupts won't be handled anymore until the new interrupts arrived
and QEMU set the bits back.
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Signed-off-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Make use of generic warm-boot path when platform hart_stop callback
returns SBI_ENOTSUPP, in case certain hart can not turn off its
power domain, or it detects some error occured in power management
unit, it can fall through warm-boot flow and wait for interrupt in
sbi_hsm_hart_wait().
Also improves comment in sbi_hsm_hart_wait().
Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
'priv' argument of sbi_hsm_hart_start() and sbi_hsm_hart_suspend()
may mislead people to think it stands for 'privilege mode', but it
is not. Change it to 'arg1' to clearly indicate the a1 register.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Tested-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
If the ecall SBI_EXT_HSM_HART_START is called it might try to wake the
secondary hart using sbi_ipi_raw_send() to send an IPI to the hart.
This can fail if there is no IPI device but no error is returned from
sbi_ipi_raw_send() so the ecall returns as if the action completed and
the caller continues without noticing (in the case of Linux it just hangs
waiting for the secondary hart to become active)
Fix this by changing sbi_ipi_raw_send() to return and error, and if an
error is returned, then return it via SBI_EXT_HSM_HART_START call.
Signed-off-by: Ben Dooks <ben.dooks@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
The suspend code needs to know the resume address for two reasons:
1) Programming some hardware register or management firmware. Here we
assume the hardware/firmware maintains its state between suspends,
so it only needs to be programmed once at startup.
2) When a non-retentive suspend request ends up being retentive, due
to lack of hardware support, pending interrupt, or for some other
reason. However, the behavior here is not platform-dependent, and
this can be handled in the generic hart suspend function.
Since neither situation requires the platform-level suspend function to
know the resume address, stop passing it to that function. Instead,
handle the non-retentive to retentive situation generically.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Non-retentive suspend states may require platform-specific actions
during resume. For example, firmware may need to save and restore the
values of custom CSRs. Add a hook to support this.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Samuel Holland <samuel@sholland.org>
We can have IPIs based on external interrupts provided by devices
such as AIA IMSIC so we should enable mie.MEIE bit at appropriate
places in generic library.
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Instead of saving context only for default non-retentive suspend,
we should save context for all non-retentive suspend types.
Fixes: 74756891cc ("lib: sbi: Implement SBI HSM suspend function")
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Dong Du <Dd_nirvana@sjtu.edu.cn>
Reviewed-by: Xiang W <wxjstz@126.com>
The parameter owner of function sbi_scratch_alloc_offset() is never used.
The scratch memory is small. We should not use it for debug information in
future. Hence eliminate the parameter.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Xiang W <wxjstz@126.com>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Instead of having hsm_start(), hsm_stop() and hsm_suspend()
callbacks in platform operations, it will be much simpler for
HSM driver to directly register these operations as a device
to the sbi_hsm implementation.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Instead of having ipi_send() and ipi_clear() callbacks in
platform operations, it will be much simpler for ipi driver
to directly register these operations as a device to sbi_ipi
implementation.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
The sbi_platform_ipi_clear() called from wait_for_coldboot() and
sbi_hsm_hart_wait() is redundant because IPI will be automatically
cleared by sbi_platform_ipi_init() called from sbi_ipi_init().
Further, wait_for_coldboot() is common for warm startup and warm
resume path so the sbi_platform_ipi_clear() called in warm resume
path cause resuming HART to miss an IPI injected other HART to
wakeup the HART.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
This patch implements the SBI HSM suspend function. Using this
new SBI call, the S-mode software can put calling HART in platform
specific suspend (i.e. low-power) state. For a successful retentive
suspend, the SBI call will return without errors upon resuming
whereas for a successful non-retentive suspend, the SBI call will
resume from a user provided resume address.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
The sbi_hsm_hart_start() and sbi_hsm_hart_stop() functions should
only return error codes as defined by the SBI specification.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
The sbi_hsm_hart_started() function is only used by sbi_hsm_hart_stop()
for checking state of calling HART and current domain assignment.
The atomic_cmpxchg() called by sbi_hsm_hart_stop() will check state of
calling hart anyway and domain assignment can be checked by other domain
function such as sbi_domain_is_assigned_hart().
This means sbi_hsm_hart_started() is redundant and can be removed.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
A hart can take interrupt in the new HSM states introduced by the
SBI HSM suspend function (such as SUSPENDED state) so we rename
sbi_hsm_hart_started_mask() to something more generic such as
sbi_hsm_hart_interruptible_mask().
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
We replace the use of SBI_STATE_xyz defines with SBI_HSM_STATE_xyz
defines because the HSM state defines are complete enough to implement
HSM state machine in OpenSBI. As a result of this, we can now remove
sbi_hsm_hart_state_to_status() function because it is now redundant
and sbi_hsm_hart_get_state() can directly return HSM state or error.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
We simplify HSM state define names so that these defines can directly
replace SBI_HART_xyz defines used by SBI HSM implementation.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Instead of calling sbi_hsm_hart_get_state() in a loop, we can simply
call a new inline __sbi_hsm_hart_get_state() which only takes "hartid"
and enforce domain checks using sbi_domain_assigned_hartmask().
This patch optimizes sbi_hsm_hart_started_mask() as-per above.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
The sbi_hsm_hart_start() should consider the domain under which we
are trying to start the HART. This will help ensure that HART A can
start HART B only if both HARTs A and B belong to the same domain.
We also have a special case when we bring-up boot HART of non-root
domains in sbi_domain_finalize() where we should skip domain checks
in sbi_hsm_hart_start(). To achieve this, sbi_hsm_hart_start() should
do domain checks only when domain parameter is non-NULL.
This patch extends sbi_hsm_hart_start() as-per above.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
The sbi_hsm_hart_started_mask() API should take one more parameter
to allow caller specify domain under which started_mask is being
generated. Further, the sbi_hsm_hart_started_mask() depends on
sbi_hsm_hart_get_state() which also should return HART state under
specified domain.
This patch updates both sbi_hsm_hart_started_mask() and
sbi_hsm_hart_get_state() as-per above.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
We extend sbi_hart_pmp_check_addr() API so that users can specify
privilege mode of the address for checking PMP access permissions.
To achieve this, we end-up converting "unsigned long *size" parameter
to "unsigned long *log2len" for pmp_get() implementation so that we
can deal with regions of "1UL << __riscv_xlen" size in a special case
in sbi_hart_pmp_check_addr() implementation.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
The sbi_scratch already has provision to specify the next stage mode
so we can leverage this to specify start mode to sbi_hsm_hart_start().
In future, this will be useful in providing SBI calls to U-mode on
embedded cores where we M-mode and U-mode but no S-mode.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>