Debug Suspend Issues in Certain Macbooks
Since updating to kernel version 5.15 on Slackware, my Macbook Pro (model A1398) takes a long time, or simply wouldn’t wake up from suspend. To try to figure out where the problem lies, I ran Intel’s pm-graph, but the logs didn’t yield anything interesting. At the time I switched the lid close action to hibernate and ignored the problem.
Update 2024-04-21: Currently kernel 6.6.28 with elogind 252.23-x86_64-3 works, but I’m conservative about how long it would work before an update breaks it again.
See also the relevant Slackware forum post posted by me on January 2023.
My first shot at the problem (around June 2022)
Wondering if other distros would have the same problem, I tried booting Linux Mint. To my surprise, suspend and resume worked normally, while it doesn’t with the kernel Slackware provided even with the same kernel version (5.15.0). So I tried booting Slackware with the Linux Mint kernel, and found out that booting with this kernel makes suspend/resume work correctly even on Slackware.
However solving the problem by booting a kernel from another system feels… dirty, and doesn’t fit in well with other parts of the system (as seen with some driver issues). I have to find a better solution.
Second shot: Adding elogind hooks (failed)
In December 2022 I picked up the problem again and looked at various logs. dmesg didn’t contain anything other than a jump in time between log entries, but the syslog had something like this:
Logs
``` Jan 4 05:04:51 dankstar kernel: thunderbolt 0000:07:00.0: device link creation from 0000:06:00.0 failed Jan 4 05:04:52 dankstar kernel: ACPI Warning: SystemIO range 0x0000000000000840-0x000000000000084F conflicts with OpRegion 0x0000000000000800-0x0000000000000863 (\x5cGPIO) (20220331/utaddress-204) Jan 4 05:04:52 dankstar kernel: ACPI Warning: SystemIO range 0x0000000000000830-0x000000000000083F conflicts with OpRegion 0x0000000000000800-0x0000000000000863 (\x5cGPIO) (20220331/utaddress-204) Jan 4 05:04:52 dankstar kernel: ACPI Warning: SystemIO range 0x0000000000000800-0x000000000000082F conflicts with OpRegion 0x0000000000000800-0x0000000000000863 (\x5cGPIO) (20220331/utaddress-204) Jan 4 05:04:52 dankstar kernel: ACPI Warning: SystemIO range 0x0000000000000800-0x000000000000082F conflicts with OpRegion 0x0000000000000810-0x0000000000000813 (\x5cIO_D) (20220331/utaddress-204) Jan 4 05:04:52 dankstar kernel: ACPI Warning: SystemIO range 0x0000000000000800-0x000000000000082F conflicts with OpRegion 0x0000000000000800-0x000000000000080F (\x5cIO_T) (20220331/utaddress-204) Jan 4 05:04:52 dankstar kernel: lpc_ich: Resource conflict(s) found affecting gpio_ich Jan 4 05:04:52 dankstar kernel: wl: loading out-of-tree module taints kernel. Jan 4 05:04:52 dankstar kernel: wl: module license 'MIXED/Proprietary' taints kernel. Jan 4 05:04:52 dankstar kernel: Disabling lock debugging due to kernel taint Jan 4 05:04:52 dankstar kernel: eth0: Broadcom BCM43a0 802.11 Hybrid Wireless Controller 6.30.223.271 (r587334) Jan 4 05:04:52 dankstar kernel: Jan 4 05:04:52 dankstar kernel: applesmc applesmc.768: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). Jan 4 05:04:52 dankstar udevd[765]: Unable to EVIOCGABS device "/dev/input/event10" Jan 4 05:04:52 dankstar last message buffered 3 times Jan 4 05:04:55 dankstar kernel: FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. Jan 4 05:04:58 dankstar udevd[760]: Unable to EVIOCGABS device "/dev/input/event10" Jan 4 05:04:58 dankstar last message buffered 3 times Jan 4 05:05:04 dankstar udevd[751]: specified group 'adbusers' unknown Jan 4 05:05:05 dankstar bluetoothd[1685]: profiles/audio/vcp.c:vcp_init() D-Bus experimental not enabled Jan 4 05:05:05 dankstar bluetoothd[1685]: src/plugin.c:plugin_init() Failed to init vcp plugin Jan 4 05:05:05 dankstar bluetoothd[1685]: profiles/audio/mcp.c:mcp_init() D-Bus experimental not enabled Jan 4 05:05:05 dankstar bluetoothd[1685]: src/plugin.c:plugin_init() Failed to init mcp plugin Jan 4 05:05:05 dankstar bluetoothd[1685]: profiles/audio/bap.c:bap_init() D-Bus experimental not enabled Jan 4 05:05:05 dankstar bluetoothd[1685]: src/plugin.c:plugin_init() Failed to init bap plugin Jan 4 05:05:05 dankstar bluetoothd[1685]: Failed to set privacy: Rejected (0x0b) Jan 4 05:05:05 dankstar dnsmasq[1975]: no servers found in /etc/resolv.conf, will retry Jan 4 05:05:06 dankstar kernel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details. Jan 4 05:05:06 dankstar kernel: ahci 0000:04:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update ```There were no further messages until I force restarted the system, so I reckoned that it must be a thunderbolt problem. While searching for more info I found a blog entry, where someone in the comments say that the thunderbolt module needs to be disabled before suspend, and reactivated after resume. (Bluetooth seemed to cause issues too, so I added it to the script)
Therefore I tried to:
Add to
/lib/elogind/system-sleep
(or/lib64/elogind/system-sleep
depending on your architecture) a file named50-suspend-fix
(could be any name):#!/bin/sh case "${1-}" in 'pre') /etc/rc.d/rc.bluetooth stop modprobe -r thunderbolt ;; 'post') /etc/rc.d/rc.bluetooth start modprobe thunderbolt ;; *) exit 64 ;; esac
Make sure the file is executable (
chmod +x 50-suspend-fix
).
It seemed to work for some time but failed again when I updated the kernel from 5.19.17 to 6.1.4. Nevertheless suspend was working with the Linux kernel, so it must be something in the kernel.
3rd shot: compiling a custom kernel (2023-01-23)
The next thing I tried is to compare the configs of the Mint kernel and the Slackware kernel, make some changes and try to compile.
The changes, diffed by `diffconfig`
BLK_DEV_IO_TRACE n -\> y FTRACE_SYSCALLS n -\> y
FUNCTION_PROFILER n -\> y HIST_TRIGGERS n -\> y HWLAT_TRACER n -\> y
MMIOTRACE n -\> y SAMPLES n -\> y SCHED_TRACER n -\> y STACK_TRACER n
-\> y TRACER_SNAPSHOT n -\> y TRACE_EVENT_INJECT n -\> y
+HIST_TRIGGERS_DEBUG n +MMIOTRACE_TEST n +RING_BUFFER_ALLOW_SWAP y
+SAMPLE_AUXDISPLAY n +SAMPLE_CONFIGFS n +SAMPLE_FPROBE n
+SAMPLE_FTRACE_DIRECT n +SAMPLE_FTRACE_DIRECT_MULTI n
+SAMPLE_HW_BREAKPOINT n +SAMPLE_KFIFO n +SAMPLE_KOBJECT n
+SAMPLE_LIVEPATCH n +SAMPLE_RPMSG_CLIENT n +SAMPLE_TRACE_ARRAY n
+SAMPLE_TRACE_CUSTOM_EVENTS n +SAMPLE_TRACE_EVENTS n
+SAMPLE_TRACE_PRINTK n +SAMPLE_VFIO_MDEV_MBOCHS n +SAMPLE_VFIO_MDEV_MDPY
n +SAMPLE_VFIO_MDEV_MDPY_FB n +SAMPLE_VFIO_MDEV_MTTY n +SAMPLE_WATCHDOG
n +TRACER_MAX_TRACE y +TRACER_SNAPSHOT_PER_CPU_SWAP y +TRACING_MAP y
I’m not going into details of compiling and swapping the kernel, as it’s not that important (and also a huge pain).
This time I had to switch to “hybrid suspend” mode for the lid close action in KDE settings. Aside from times where it would go into suspend once more after waking up, sleep worked OK.
However this posed some other problems, since I changed more that one variable in the process:
- Would hybrid suspend work with the normal kernel?
- Which kernel config (or combination of kernel configs) is fixing the issue?
Clearly the problem’s asking for another debug. Until next time.
4th shot: time passed (2024-01-05)
Time passed, and I’ve left my lid close action on “hibernate” the whole time. The update this week, however, made resuming from hibernate slower, so I looked into the issue again.
Switching the lid close action back to “Suspend” worked, but the
resume time is still slow, around 20 seconds. I checked the elogind
hooks left over from the second attempt, and added wl (the Broadcom
Wi-Fi driver) to the modprobe -r
/ modprobe
list, and guess what, after a reboot, the computer now resumes
instantly! I’m happy that this happened, though I’m not sure what is the
right solution. I’ll leave it this way though, as this is my actual work
machine.
5th shot: elogind woes (2024-04-21)
A short while after I posted the “4th shot” update, suspend stopped working again. I switched the lid close action back to Hibernate in response.
Around 2024-04-15 Pat upgraded elogind to version 255, which made Wi-Fi upon resume stop working. I opened a thread on LinuxQuestions and it seems like other people are also experiencing the problem.
After Pat reverted elogind to 252, everything went back to normal.
But this left me thinking, was the suspend issues related to elogind?
Today (2024-04-21) I tested suspend with pm-graph
again,
and to my surprise, it worked, though far from instantaneous (around 15
seconds). I’m not counting on it to keep working though, guess we’ll
see? See you in a few months.
– ltlnx 2024-04-21