Suppose you have just resumed your trusty Thinkpad T14 laptop from a suspended state and suddenly one of your displays connected to the Thunderbolt port is not working. Instead of simply rebooting and resolving the issue the easy way, we can take the opportunity to gain a bit of knowledge.
First stop would be to check dmesg(1)
output which says:
Output
(...)
[1914071.755327] ACPI: PM: Low-level resume complete
[1914071.755374] ACPI: EC: EC started
[1914071.755375] ACPI: PM: Restoring platform NVS memory
[1914071.756339] Enabling non-boot CPUs ...
[1914071.756489] smpboot: Booting Node 0 Processor 1 APIC 0x2
[1914071.759986] CPU1 is up
[1914071.760128] smpboot: Booting Node 0 Processor 2 APIC 0x4
[1914071.763640] CPU2 is up
[1914071.763756] smpboot: Booting Node 0 Processor 3 APIC 0x6
[1914071.767324] CPU3 is up
[1914071.767470] smpboot: Booting Node 0 Processor 4 APIC 0x1
[1914071.768321] CPU4 is up
[1914071.768508] smpboot: Booting Node 0 Processor 5 APIC 0x3
[1914071.769301] CPU5 is up
[1914071.769406] smpboot: Booting Node 0 Processor 6 APIC 0x5
[1914071.770235] CPU6 is up
[1914071.770339] smpboot: Booting Node 0 Processor 7 APIC 0x7
[1914071.771202] CPU7 is up
[1914071.774051] ACPI: PM: Waking up from system sleep state S3
[1914071.800138] ACPI: EC: interrupt unblocked
[1914072.854482] pcieport 0000:00:1c.4: Data Link Layer Link Active not set in 1000 msec
[1914072.854525] pcieport 0000:00:1c.4: pciehp: Slot(4): Card not present
[1914072.854547] xhci_hcd 0000:2b:00.0: remove, state 4
[1914072.854554] usb usb4: USB disconnect, device number 1
[1914072.854571] pcieport 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[1914072.854612] pcieport 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
[1914072.854617] pcieport 0000:04:01.0: Unable to change power state from D3cold to D0, device inaccessible
[1914072.854663] pcieport 0000:04:02.0: Unable to change power state from D3cold to D0, device inaccessible
[1914072.854690] thunderbolt 0000:05:00.0: Unable to change power state from D3cold to D0, device inaccessible
[1914072.854743] xhci_hcd 0000:2b:00.0: Unable to change power state from D3cold to D0, device inaccessible
[1914072.854909] xhci_hcd 0000:2b:00.0: USB bus 4 deregistered
[1914072.854917] xhci_hcd 0000:2b:00.0: remove, state 4
[1914072.854921] usb usb3: USB disconnect, device number 1
[1914072.855164] xhci_hcd 0000:2b:00.0: Host halt failed, -19
[1914072.855169] xhci_hcd 0000:2b:00.0: Host not accessible, reset failed.
[1914072.855368] xhci_hcd 0000:2b:00.0: USB bus 3 deregistered
[1914072.855374] ------------[ cut here ]------------
[1914072.855375] xhci_hcd 0000:2b:00.0: disabling already-disabled device
[1914072.855390] WARNING: CPU: 0 PID: 93 at drivers/pci/pci.c:2250 pci_disable_device+0x88/0x90
[1914072.855402] Modules linked in: veth xt_conntrack bridge stp llc nf_conntrack_netlink xt_addrtype overlay nfs lockd grace sunrpc netfs cdc_ether usbnet mii ccm uinput uhid rfcomm cmac algif_hash algif_skcipher af_alg xt_MASQUERADE xt_tcpudp xt_mark snd_seq_dummy snd_hrtimer snd_seq tun nf_tables ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_soc_intel_hda_dsp_common snd_sof_probes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic bnep snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_generic_allocation soundwire_bus snd_soc_avs snd_soc_hda_codec intel_uncore_frequency snd_soc_skl intel_uncore_frequency_common intel_pmc_core_pltdrv snd_soc_hdac_hda intel_pmc_core snd_hda_ext_core intel_vsec
[1914072.855485] snd_soc_sst_ipc pmt_telemetry pmt_class snd_soc_sst_dsp intel_tcc_cooling uvcvideo snd_usb_audio snd_soc_acpi_intel_match x86_pkg_temp_thermal videobuf2_vmalloc snd_soc_acpi uvc snd_usbmidi_lib intel_powerclamp snd_soc_core snd_ump btusb rmi_smbus snd_rawmidi rmi_core videobuf2_memops btrtl videobuf2_v4l2 snd_seq_device snd_compress btintel nls_iso8859_1 coretemp videodev ac97_bus mousedev btbcm iTCO_wdt vfat snd_pcm_dmaengine intel_pmc_bxt btmtk videobuf2_common joydev kvm_intel snd_hda_intel fat bluetooth mc mei_hdcp mei_pxp intel_rapl_msr iTCO_vendor_support ee1004 iwlmvm think_lmi snd_intel_dspcfg i915 snd_ctl_led firmware_attributes_class wmi_bmof intel_wmi_thunderbolt snd_intel_sdw_acpi processor_thermal_device_pci_legacy drm_buddy snd_hda_codec processor_thermal_device kvm mac80211 ttm processor_thermal_wt_hint snd_hda_core rapl i2c_algo_bit processor_thermal_rfim snd_hwdep processor_thermal_rapl drm_display_helper thinkpad_acpi snd_pcm intel_rapl_common intel_cstate libarc4 snd_timer cec
[1914072.855568] platform_profile i2c_i801 processor_thermal_wt_req ucsi_acpi spi_nor intel_gtt mei_me processor_thermal_power_floor snd intel_uncore i2c_smbus e1000e mtd iwlwifi psmouse pcspkr soundcore i2c_mux typec_ucsi processor_thermal_mbox video thunderbolt mei typec intel_soc_dts_iosf intel_pch_thermal int3403_thermal roles int340x_thermal_zone cfg80211 int3400_thermal intel_hid acpi_thermal_rel sparse_keymap wmi acpi_pad rfkill mac_hid sg crypto_user fuse loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt usbhid cbc encrypted_keys trusted asn1_encoder tee crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel dm_mod sha512_ssse3 serio_raw sha256_ssse3 atkbd sha1_ssse3 rtsx_pci_sdmmc libps2 aesni_intel mmc_core vivaldi_fmap gf128mul nvme crypto_simd cryptd nvme_core spi_intel_pci xhci_pci spi_intel rtsx_pci nvme_auth i8042 xhci_pci_renesas serio
[1914072.855658] CPU: 0 UID: 0 PID: 93 Comm: irq/121-pciehp Not tainted 6.11.0-rc7-1-drm-tip-git-g5908c6f634a1 #1 22f752fa1c57f5cbe7d750770be78af10fa2ca7a
[1914072.855665] Hardware name: LENOVO 20S1SCKC00/20S1SCKC00, BIOS N2XET23W (1.13 ) 08/26/2020
[1914072.855668] RIP: 0010:pci_disable_device+0x88/0x90
[1914072.855675] Code: 48 85 ed 75 07 48 8b af c8 00 00 00 48 8d bb c8 00 00 00 e8 fa 44 20 00 48 89 ea 48 c7 c7 f8 5a ef a8 48 89 c6 e8 e8 d4 95 ff <0f> 0b eb 8c 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[1914072.855679] RSP: 0018:ffffa91000797c88 EFLAGS: 00010282
[1914072.855683] RAX: 0000000000000000 RBX: ffff942042151000 RCX: 0000000000000027
[1914072.855686] RDX: ffff942576221a48 RSI: 0000000000000001 RDI: ffff942576221a40
[1914072.855689] RBP: ffff94204216ff20 R08: 0000000000000000 R09: 0000000000000000
[1914072.855692] R10: 617369642d796461 R11: 0000000000000000 R12: ffffffffc067b080
[1914072.855694] R13: ffffffffc067b0e8 R14: ffffffffc067b0e8 R15: ffffa91000797d66
[1914072.855696] FS: 0000000000000000(0000) GS:ffff942576200000(0000) knlGS:0000000000000000
[1914072.855700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1914072.855703] CR2: 000055c5d53bf0c6 CR3: 0000000505822003 CR4: 00000000003706f0
[1914072.855706] Call Trace:
[1914072.855709] <TASK>
[1914072.855711] ? pci_disable_device+0x88/0x90
[1914072.855717] ? __warn.cold+0x8e/0xe8
[1914072.855724] ? pci_disable_device+0x88/0x90
[1914072.855734] ? report_bug+0xff/0x140
[1914072.855739] ? console_unlock+0xd7/0x130
[1914072.855745] ? handle_bug+0x3c/0x80
[1914072.855752] ? exc_invalid_op+0x17/0x70
[1914072.855756] ? asm_exc_invalid_op+0x1a/0x20
[1914072.855765] ? pci_disable_device+0x88/0x90
[1914072.855771] ? pci_disable_device+0x88/0x90
[1914072.855776] pci_device_remove+0x3f/0xb0
[1914072.855782] device_release_driver_internal+0x19c/0x200
[1914072.855790] pci_stop_bus_device+0x81/0xb0
[1914072.855796] pci_stop_bus_device+0x2c/0xb0
[1914072.855802] pci_stop_bus_device+0x2c/0xb0
[1914072.855807] pci_stop_and_remove_bus_device+0x12/0x20
[1914072.855812] pciehp_unconfigure_device+0x9f/0x170
[1914072.855818] pciehp_disable_slot+0x67/0x100
[1914072.855822] pciehp_handle_presence_or_link_change+0x77/0x350
[1914072.855827] pciehp_ist+0x140/0x180
[1914072.855832] irq_thread_fn+0x20/0x60
[1914072.855838] irq_thread+0x18a/0x270
[1914072.855843] ? __pfx_irq_thread_fn+0x10/0x10
[1914072.855848] ? __pfx_irq_thread_dtor+0x10/0x10
[1914072.855854] ? __pfx_irq_thread+0x10/0x10
[1914072.855858] kthread+0xcf/0x100
[1914072.855864] ? __pfx_kthread+0x10/0x10
[1914072.855869] ret_from_fork+0x31/0x50
[1914072.855872] ? __pfx_kthread+0x10/0x10
[1914072.855877] ret_from_fork_asm+0x1a/0x30
[1914072.855885] </TASK>
[1914072.855887] ---[ end trace 0000000000000000 ]---
[1914073.356390] ------------[ cut here ]------------
[1914073.356391] thunderbolt 0000:05:00.0: TX ring 0 is still active
[1914073.356413] WARNING: CPU: 0 PID: 93 at drivers/thunderbolt/nhi.c:1138 nhi_shutdown+0x65/0x150 [thunderbolt]
[1914073.356431] Modules linked in: veth xt_conntrack bridge stp llc nf_conntrack_netlink xt_addrtype overlay nfs lockd grace sunrpc netfs cdc_ether usbnet mii ccm uinput uhid rfcomm cmac algif_hash algif_skcipher af_alg xt_MASQUERADE xt_tcpudp xt_mark snd_seq_dummy snd_hrtimer snd_seq tun nf_tables ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_soc_intel_hda_dsp_common snd_sof_probes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic bnep snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_generic_allocation soundwire_bus snd_soc_avs snd_soc_hda_codec intel_uncore_frequency snd_soc_skl intel_uncore_frequency_common intel_pmc_core_pltdrv snd_soc_hdac_hda intel_pmc_core snd_hda_ext_core intel_vsec
[1914073.356452] snd_soc_sst_ipc pmt_telemetry pmt_class snd_soc_sst_dsp intel_tcc_cooling uvcvideo snd_usb_audio snd_soc_acpi_intel_match x86_pkg_temp_thermal videobuf2_vmalloc snd_soc_acpi uvc snd_usbmidi_lib intel_powerclamp snd_soc_core snd_ump btusb rmi_smbus snd_rawmidi rmi_core videobuf2_memops btrtl videobuf2_v4l2 snd_seq_device snd_compress btintel nls_iso8859_1 coretemp videodev ac97_bus mousedev btbcm iTCO_wdt vfat snd_pcm_dmaengine intel_pmc_bxt btmtk videobuf2_common joydev kvm_intel snd_hda_intel fat bluetooth mc mei_hdcp mei_pxp intel_rapl_msr iTCO_vendor_support ee1004 iwlmvm think_lmi snd_intel_dspcfg i915 snd_ctl_led firmware_attributes_class wmi_bmof intel_wmi_thunderbolt snd_intel_sdw_acpi processor_thermal_device_pci_legacy drm_buddy snd_hda_codec processor_thermal_device kvm mac80211 ttm processor_thermal_wt_hint snd_hda_core rapl i2c_algo_bit processor_thermal_rfim snd_hwdep processor_thermal_rapl drm_display_helper thinkpad_acpi snd_pcm intel_rapl_common intel_cstate libarc4 snd_timer cec
[1914073.356473] platform_profile i2c_i801 processor_thermal_wt_req ucsi_acpi spi_nor intel_gtt mei_me processor_thermal_power_floor snd intel_uncore i2c_smbus e1000e mtd iwlwifi psmouse pcspkr soundcore i2c_mux typec_ucsi processor_thermal_mbox video thunderbolt mei typec intel_soc_dts_iosf intel_pch_thermal int3403_thermal roles int340x_thermal_zone cfg80211 int3400_thermal intel_hid acpi_thermal_rel sparse_keymap wmi acpi_pad rfkill mac_hid sg crypto_user fuse loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt usbhid cbc encrypted_keys trusted asn1_encoder tee crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel dm_mod sha512_ssse3 serio_raw sha256_ssse3 atkbd sha1_ssse3 rtsx_pci_sdmmc libps2 aesni_intel mmc_core vivaldi_fmap gf128mul nvme crypto_simd cryptd nvme_core spi_intel_pci xhci_pci spi_intel rtsx_pci nvme_auth i8042 xhci_pci_renesas serio
[1914073.356497] CPU: 0 UID: 0 PID: 93 Comm: irq/121-pciehp Tainted: G W 6.11.0-rc7-1-drm-tip-git-g5908c6f634a1 #1 22f752fa1c57f5cbe7d750770be78af10fa2ca7a
[1914073.356499] Tainted: [W]=WARN
[1914073.356499] Hardware name: LENOVO 20S1SCKC00/20S1SCKC00, BIOS N2XET23W (1.13 ) 08/26/2020
[1914073.356500] RIP: 0010:nhi_shutdown+0x65/0x150 [thunderbolt]
[1914073.356514] Code: ed 75 07 4c 8b af c8 00 00 00 48 81 c7 c8 00 00 00 e8 2f ef 25 e7 89 e9 4c 89 ea 48 c7 c7 c0 23 d7 c0 48 89 c6 e8 1b 7f 9b e6 <0f> 0b 48 8b 43 28 4a 83 3c e0 00 74 39 48 8b 7b 08 4c 8b a7 18 01
[1914073.356515] RSP: 0018:ffffa91000797c78 EFLAGS: 00010282
[1914073.356516] RAX: 0000000000000000 RBX: ffff942081c78028 RCX: 0000000000000027
[1914073.356517] RDX: ffff942576221a48 RSI: 0000000000000001 RDI: ffff942576221a40
[1914073.356518] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffa91000797a40
[1914073.356519] R10: ffffffffa9673658 R11: 0000000000000003 R12: 0000000000000000
[1914073.356519] R13: ffff94204216fd60 R14: ffffffffc0d41da8 R15: ffffa91000797d66
[1914073.356576] FS: 0000000000000000(0000) GS:ffff942576200000(0000) knlGS:0000000000000000
[1914073.356577] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1914073.356578] CR2: 000055c5d53bf0c6 CR3: 0000000505822003 CR4: 00000000003706f0
[1914073.356579] Call Trace:
[1914073.356579] <TASK>
[1914073.356580] ? nhi_shutdown+0x65/0x150 [thunderbolt 58f0a0e9b43e695aa4dbbdfc37d6c930981cb43f]
[1914073.356593] ? __warn.cold+0x8e/0xe8
[1914073.356595] ? nhi_shutdown+0x65/0x150 [thunderbolt 58f0a0e9b43e695aa4dbbdfc37d6c930981cb43f]
[1914073.356609] ? report_bug+0xff/0x140
[1914073.356610] ? console_unlock+0xd7/0x130
[1914073.356612] ? handle_bug+0x3c/0x80
[1914073.356614] ? exc_invalid_op+0x17/0x70
[1914073.356615] ? asm_exc_invalid_op+0x1a/0x20
[1914073.356617] ? nhi_shutdown+0x65/0x150 [thunderbolt 58f0a0e9b43e695aa4dbbdfc37d6c930981cb43f]
[1914073.356631] ? nhi_shutdown+0x65/0x150 [thunderbolt 58f0a0e9b43e695aa4dbbdfc37d6c930981cb43f]
[1914073.356644] pci_device_remove+0x3f/0xb0
[1914073.356646] device_release_driver_internal+0x19c/0x200
[1914073.356648] pci_stop_bus_device+0x81/0xb0
[1914073.356650] pci_stop_bus_device+0x2c/0xb0
[1914073.356651] pci_stop_bus_device+0x3d/0xb0
[1914073.356653] pci_stop_and_remove_bus_device+0x12/0x20
[1914073.356654] pciehp_unconfigure_device+0x9f/0x170
[1914073.356656] pciehp_disable_slot+0x67/0x100
[1914073.356657] pciehp_handle_presence_or_link_change+0x77/0x350
[1914073.356659] pciehp_ist+0x140/0x180
[1914073.356660] irq_thread_fn+0x20/0x60
[1914073.356662] irq_thread+0x18a/0x270
[1914073.356663] ? __pfx_irq_thread_fn+0x10/0x10
[1914073.356664] ? __pfx_irq_thread_dtor+0x10/0x10
[1914073.356666] ? __pfx_irq_thread+0x10/0x10
[1914073.356668] kthread+0xcf/0x100
[1914073.356669] ? __pfx_kthread+0x10/0x10
[1914073.356671] ret_from_fork+0x31/0x50
[1914073.356672] ? __pfx_kthread+0x10/0x10
[1914073.356673] ret_from_fork_asm+0x1a/0x30
[1914073.356675] </TASK>
[1914073.356676] ---[ end trace 0000000000000000 ]---
[1914073.357022] pci_bus 0000:05: busn_res: [bus 05] is released
[1914073.357052] pci_bus 0000:06: busn_res: [bus 06-2a] is released
[1914073.357355] pci_bus 0000:2b: busn_res: [bus 2b] is released
[1914073.357467] pci_bus 0000:04: busn_res: [bus 04-2b] is released
[1914073.357982] ACPI: EC: event unblocked
Now that is indeed a lot of information, of high density, but it does suggest an issue with the configuration of PCI devices during resume. There are multiple and interconnected domains of knowledge that we would need to understand, including but not limited to: ACPI, PCI, USB, Thunderbolt, etc.
Thunderbolt is related in some way to PCI, and we can use lspci(1)
to get a list of all detected PCI devices:
00:00.0 Host bridge: Intel Corporation Comet Lake-U v1 4c Host Bridge/DRAM Controller (rev 0c)
00:02.0 VGA compatible controller: Intel Corporation CometLake-U GT2 [UHD Graphics] (rev 02)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0c)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Comet Lake Thermal Subsytem
00:14.0 USB controller: Intel Corporation Comet Lake PCH-LP USB 3.1 xHCI Host Controller
00:14.2 RAM memory: Intel Corporation Comet Lake PCH-LP Shared SRAM
00:14.3 Network controller: Intel Corporation Comet Lake PCH-LP CNVi WiFi
00:16.0 Communication controller: Intel Corporation Comet Lake Management Engine Interface
00:1c.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #1 (rev f0)
00:1c.4 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #5 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #9 (rev f0)
00:1d.4 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #13 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Comet Lake PCH-LP LPC Premium Controller/eSPI Controller
00:1f.3 Audio device: Intel Corporation Comet Lake PCH-LP cAVS
00:1f.4 SMBus: Intel Corporation Comet Lake PCH-LP SMBus Host Controller
00:1f.5 Serial bus controller: Intel Corporation Comet Lake SPI (flash) Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (10) I219-LM
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
00:1f.5 Serial bus controller: Intel Corporation Comet Lake SPI (flash) Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (10) I219-LM
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
-03:00.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)
-04:00.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)
-04:01.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)
-04:02.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)
-05:00.0 System peripheral: Intel Corporation JHL6240 Thunderbolt 3 NHI (Low Power) [Alpine Ridge LP 2016] (rev 01)
-2b:00.0 USB controller: Intel Corporation JHL6240 Thunderbolt 3 USB 3.1 Controller (Low Power) [Alpine Ridge LP 2016] (rev 01)
2e:00.0 Non-Volatile memory controller: Sandisk Corp SanDisk Extreme Pro / WD Black SN750 / PC SN730 / Red SN700 NVMe SSD
It seems that
PCI describes a network of devices with its own topology. The above output only displays the leaves and not how they are connected; we need to take a look at the underlying topology e.g. with lspci -t
. The difference is shown here:
+-14.3
+-16.0
+-1c.0-[02]----00.0
- +-1c.4-[03-2b]--
+ +-1c.4-[03-2b]----00.0-[04-2b]--+-00.0-[05]----00.0
+ | +-01.0-[06-2a]--
+ | \-02.0-[2b]----00.0
+-1d.0-[2d]--
+-1d.4-[2e]----00.0
+-1f.0
From the above output we see that the 1c.4
device is a PCI bridge which exposes another (Thunderbolt) PCI bridge with address 03:00.0
and an (Thunderbolt) USB controller on address 2b:00.0
. We can confirm this by increasing the verbosity of lspci
on the main bridge:
~ $ sudo lspci -s 00:1c.4 -v
00:1c.4 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #5 (rev f0) (prog-if 00 [Normal decode])
Subsystem: Lenovo Device 22b1
Flags: bus master, fast devsel, latency 0, IRQ 121
Bus: primary=00, secondary=03, subordinate=2b, sec-latency=0
^^^^^^^^^^^^ ^^^^^^^^^^^^^^ here!
I/O behind bridge: 5000-6fff [size=8K] [16-bit]
Memory behind bridge: c0000000-cc1fffff [size=194M] [32-bit]
Prefetchable memory behind bridge: cc200000-e81fffff [size=448M] [32-bit]
Capabilities: [40] Express Root Port (Slot+), IntMsgNum 0
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: Lenovo Device 22b1
Capabilities: [a0] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Access Control Services
Capabilities: [150] Precision Time Measurement
Capabilities: [220] Secondary PCI Express
Capabilities: [250] Downstream Port Containment
Kernel driver in use: pcieport
There seems to be an issue with the secondary bus which has gone into some kind of bad state. We could try to remove the parent device 00:1c.4
and rescan the main bus:
~ # echo 1 | tee /sys/class/pci_bus/0000:03/device/remove
1
[2000933.962438] pci_bus 0000:03: busn_res: [bus 03-2b] is released
~ # echo 1 | tee /sys/bus/pci/rescan
1
[2000997.112484] pci_bus 0000:03: busn_res: [bus 03-2b] is released
[2000999.696873] pci 0000:00:1c.4: [8086:02bc] type 01 class 0x060400 PCIe Root Port
[2000999.696941] pci 0000:00:1c.4: PCI bridge to [bus 03-2b]
[2000999.696949] pci 0000:00:1c.4: bridge window [io 0x5000-0x6fff]
[2000999.696954] pci 0000:00:1c.4: bridge window [mem 0xc0000000-0xcc1fffff]
[2000999.696965] pci 0000:00:1c.4: bridge window [mem 0xcc200000-0xe81fffff 64bit pref]
[2000999.697075] pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold
[2000999.697158] pci 0000:00:1c.4: PTM enabled (root), 4ns granularity
[2000999.710763] pci 0000:00:1c.4: PCI bridge to [bus 03-2b]
[2000999.710835] pci 0000:00:1c.4: bridge window [mem 0xc0000000-0xcc1fffff]: assigned
[2000999.710839] pci 0000:00:1c.4: bridge window [mem 0xcc200000-0xe81fffff 64bit pref]: assigned
[2000999.710843] pci 0000:00:1c.4: bridge window [io 0x5000-0x6fff]: assigned
[2000999.710848] pci 0000:00:1c.4: PCI bridge to [bus 03-2b]
[2000999.710864] pci 0000:00:1c.4: bridge window [io 0x5000-0x6fff]
[2000999.710871] pci 0000:00:1c.4: bridge window [mem 0xc0000000-0xcc1fffff]
[2000999.710877] pci 0000:00:1c.4: bridge window [mem 0xcc200000-0xe81fffff 64bit pref]
[2000999.711248] pcieport 0000:00:1c.4: PME: Signaling with IRQ 121
[2000999.711319] pcieport 0000:00:1c.4: pciehp: Slot #4 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
Unfortunately, that did not work as no further devices are detected on the bus:
-[0000:00]-+-00.0
...
+-1c.4-[03-2b]--
...
Attempting to reset the main device also does not bear fruit:
~ # echo 1 | tee /sys/class/pci_bus/0000:03/device/reset
1
However, we can confirm that the main device is in a low power state:
~ # realpath /sys/class/pci_bus/0000:03/device/power_state
/sys/devices/pci0000:00/0000:00:1c.4/power_state
~ # cat /sys/class/pci_bus/0000:03/device/power_state
D3cold
Let’s try to put it into D3hot
power state and then D0
using the setpci(1)
command:
~ # setpci -s 0000:00:1c.4 CAP_PM+4.b=03
~ # cat /sys/class/pci_bus/0000:03/device/power_state
D3hot
~ # setpci -s 0000:00:1c.4 CAP_PM+4.b=00
~ # /sys/class/pci_bus/0000:03/device/power_state
D3hot
As we can see, we could only change it to D3hot
but attempting to change it to D0
did not work. There seems to be
a patch for an unrelated device which describes how in some cases a “PCI bridge cannot be recovered from D3cold mode”. The patch also shows how to properly do these changes by not only unbinding the pcieport
driver, but also executing acpidbg
commands as well.
~ # echo "0000:00:1c.4" > /sys/bus/pci/drivers/pcieport/unbind
~ # setpci -s 0000:00:1c.4 CAP_PM+4.b=00
~ # cat /sys/class/pci_bus/0000:03/device/power_state
unknown
~ # echo "0000:00:1c.4" > /sys/bus/pci/drivers/pcieport/bind
~ # cat /sys/class/pci_bus/0000:03/device/power_state
D3cold
Additional information was to be found in the bug report: https://bugzilla.kernel.org/show_bug.cgi?id=215742
Unfortunately, none of these resolved the issue. At this point I simply gave up and rebooted the system. The port started working normally.
Update (2024-10-20):
It would appear that the D3cold
state on the 0000:03:00.0
device is actually the default on my running system with working Thunderbolt:
~ $ cat /sys/class/pci_bus/0000:03/device/power_state
D3cold
-[0000:00]-+-00.0
...
+-1c.4-[03-2b]----00.0-[04-2b]--+-00.0-[05]----00.0
| +-01.0-[06-2a]--
| \-02.0-[2b]----00.0
...
~ $ realpath /sys/class/pci_bus/0000:2b/device/power_state
/sys/devices/pci0000:00/0000:00:1c.4/0000:03:00.0/0000:04:02.0/power_state
Perhaps next time I could try to bind
the 0000:04:02.0
device.
Other resources: