During warm reset, my EIC770X/Hifive Premier P550 can sometimes
encounter memory corruption issue crashing Linux boot. Currently the
issue is mitigated by having a sbi_printf before writing to the reset
register. I analyzed the issue further since then. From the SoC
datasheet[1], it's recommended to implement power-down flow as:
a. Designate a primary core, and let it broadcast requests to other
cores to execute a CEASE insn. Primary core also notifies an
"Externel Agent" to start monitoring.
b. Primary core waits for other cores to CEASE before it CEASEs.
c. "External Agent" waits for primary core to CEASE before resets
the Core Complex.
It's possible that EIC770X can trigger undefined behavior if the core
complex is reset while the harts are actively running. The sbi_printf
in the reset handler effectively hides the problem by delaying the
reset -- by the time sbi_printf finishes, all other harts will have
already landed in the loop in sbi_hsm_hart_wait(), which parks the hart.
Without the sbi_printf, I confirmed that other harts haven't reached
sbi_hsm_hart_wait yet before current hart resets the SoC. (by debugging)
To safely reset, and inspired by the datasheet, the warm reset logic
in eic770x.c now use the current hart as both primary core and the
"External Agent", and other harts as secondary cores. It leverages
the HSM framework and a new eic770x_hsm device to CEASE other harts,
and wait for them to CEASE before resets the SoC. with the sbi_printf
before reset removed, and this logic in place, stress test shows that
the memory corruption issue no longer occurs.
The new eic770x_hsm device is only used for the reset-CEASE logic at
the moment, and may be extended to a fully functional HSM device in
the future.
[1] https://github.com/eswincomputing/EIC7700X-SoC-Technical-Reference-Manual
Fixes: e5797e0688 ("platform: generic: eswin: add EIC7700")
Signed-off-by: Bo Gan <ganboing@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260605075708.96-3-ganboing@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Hifive Premier P550[1] is a Mini-DTX form factor board with EIC7700X.
It has a STM32F407VET6 onboard MCU acting as the BMC, controlling
ATX power on/off while providing remote management features. The
EIC7700X SoC/SoM communicates with the BMC via UART2, using ESWIN's
protocol. The messages transmitted are fixed sizes (267 bytes), and
depending on the type, can be directional or bi-directional. The
shutdown and cold reboot requests are directional messages from SoC
to BMC (NOTIFY type) with CMD_POWER_OFF or CMD_RESTART. The payload
of shutdown/cold reboot requests should be empty and are ignored by
the BMC at the moment. A HFP (Hifive Premier) specific reset device
is registered in addition to the SoC reset device. For shutdown and
cold reboot, the board-level reset takes precedence.
The definitions of the SoC <-> BMC message protocol is taken from
ESWIN's repo [2]. The only file used from that repo is `hf_common.h`
It's disjunctively dual licensed as (GPL-2.0-only OR BSD-2-Clause),
hence, compatible with the license of OpenSBI. It's heavily modified
and renamed as platform/generic/include/eswin/hfp.h. The author and
copyright in the original file are retained.
Validated shutdown/cold reboot working on Hifive Premier P550.
[1] https://www.sifive.com/boards/hifive-premier-p550#documentation
[2] https://github.com/eswincomputing/hifive-premier-p550-mcu-patches.git
Signed-off-by: Bo Gan <ganboing@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251218104243.562667-8-ganboing@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>