NVMe random power failures on CWWK x86-P6 – D3cold and controller reset issues
Hi,
Summary:
I'm having a critical issue with NVMe stability on my CWWK x86-P6 (Intel N150, 2 USBs in the back, 4x NVME expansion card inside, running Proxmox 8.4).
I have 4x WD Red SN700 2TB NVMe brand new drives in a RAID 5 array (mdadm), and they randomly drop from the system, even when idle or under light use.
Questions:
- Is this a known hardware limitation? The internal power circuitry seems unable to handle 4 NVMe drives under normal load or idle.
- Is there an updated BIOS, with better NVMe power/ASPM/D3cold handling or bifurcation, disable power saving, etc, options?
- Would using a power source with extra power helps? I currently use the default 12V 5A DC adapter provided.
- Is anyone else running 4x NVMe drives in RAID on this board stably?
Surprisingly, I ran a full fio write test (20GB, 2 threads, 1MiB blocks) and all disks worked fine during that period. However, after some time (even at idle), the system logs show:
nvme nvme3: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
nvme nvme3: Does your device have a faulty power saving mode enabled?
nvme nvme3: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
nvme 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible
nvme nvme3: Disabling device after reset failure: -19
This affects random drives at random times (nvme0n1, nvme1n1, etc.). mdadm disables them and marks the RAID as degraded.
I've added kernel parameters with no luck:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off
In lspci -vv, all failing NVMe drives show:
!!! Unknown header type 7f
This is a promising board, but stability is critical for RAID setups.
Thanks in advance!
Hello,
Thank you for providing the detailed logs and configuration information. Based on your description and system behavior, it appears that the device may have certain limitations in terms of power delivery and hardware compatibility. This unit is primarily designed for use with low-power, energy-efficient storage devices.
Considering the current situation, we recommend the following testing and optimization suggestions to help identify and potentially resolve the issue:
1. Disable ACPI and Power-Saving FeaturesPlease try disabling ACPI (Advanced Configuration and Power Interface) and all power-saving related settings in the BIOS (such as PCIe ASPM, D3cold, auto-sleep, etc.), then retest the NVMe stability. It's possible that under near-maximum power conditions, aggressive power management is causing instability in the drives, leading to disconnections.
2. Check Memory StabilityWe suggest either replacing the RAM modules or running the memtest86+ tool to test the stability of your current memory. Unstable or faulty memory can sometimes lead to unpredictable hardware errors or driver crashes.
3. Avoid Using RAID for TestingTo rule out potential compatibility issues caused by the RAID configuration, please temporarily disable the RAID array and test each NVMe drive individually. The system should remain stable when the drives are not performing simultaneous read/write operations.
4. Consider Upgrading the Power AdapterSwitching to a higher-capacity power adapter (e.g., 12V 6A or above) may help alleviate power delivery issues to some extent. However, due to the limitations of the motherboard’s internal power design, the improvement may be limited — although it's still worth trying.
Key Message:This device can support 4 NVMe drives simultaneously, but only when using low-power consumption models. It is more suitable for light-duty applications, such as a lightweight database server. You can use this machine at home or in the office, and perform backups to an HDD-based NAS when needed — RAID is not necessary.
Additionally, reducing the number of installed NVMe drives (e.g., using three instead of four) can also significantly improve system stability.
Ultimately, how you choose to configure the system depends on your specific needs and priorities.
If you continue to experience issues after applying these steps, please let us know the results so we can further assist you.
Best regards,
//cwwk
The good news is it seems to be stable (so far) after disabling PCI Express clock gating and PCI Express power gating. I'll be monitoring, but if I don´t comeback with news, then it means it worked.
- 0Answer
- 0Follower
- 0Follow
- SSD_Config_on_CWWK X86-P6 N355
- CWWK X86-P6 Pocket NAS
- Bios/Driver for CWWK X86-P6 Board PC Intel i3-N355
- X86-P6 N355 doesn't boost above 3Ghz
- CWWK x86Pi-P5/P6 MAX (upgraded) single NVME bandwidth
- Running CWWK Q670 LGA 1700 NAS Mainboard with 6x nvme and unRAID OS
- CWWK x86 P5 NVMe NAS Review
- CWWK x86 P6 SATA disks
- x86-P6_Multiple_Questions_about_development_board_and_pocket_nas
- CWWK X86-P5-V3 No POST
- CWWK X86 P5 Super Mini Router V3 N305 and TPU Coral Edge M.2 suppport
- run the CW-X86-P6-V1-N150 without NVMe expansion
- CWWK X86-P6-V1-N355 m.2 E-Key support
- NEED.x86.p6.driverForwin11
- x86Pi-P5/P6 MAX BIOS