Project

General

Profile

Actions

Bug #42707

closed

Kernel 5.0 CephFS client hang

Added by Марк Коренберг over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
fs, kcephfs
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

$ uname -a
Linux Dell-Latitude-ideco 5.0.0-32-generic #34~18.04.2-Ubuntu SMP Thu Oct 10 10:36:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Sometimes this happens during simple FS operations:

[ 1129.553410] cache_from_obj: Wrong slab cache. inode_cache but object is from ceph_inode_info
[ 1129.553428] WARNING: CPU: 3 PID: 0 at /build/linux-hwe-iAAoxd/linux-hwe-5.0.0/mm/slab.h:380 kmem_cache_free+0x189/0x1d0
[ 1129.553429] Modules linked in: ceph libceph fscache rfcomm devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 l2tp_ppp af_key l2tp_netlink xfrm_algo l2tp_core pppox ccm vxlan ip6_udp_tunnel udp_tunnel aufs overlay cmac bnep dm_crypt binfmt_misc arc4 nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel dell_laptop snd_hda_codec aesni_intel ledtrig_audio snd_hda_core snd_hwdep dell_smm_hwmon snd_pcm aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf ath10k_pci ath10k_core snd_seq_midi ath snd_seq_midi_event dell_wmi snd_rawmidi uvcvideo mac80211 dell_smbios btusb input_leds btrtl btbcm dcdbas btintel videobuf2_vmalloc videobuf2_memops dell_wmi_descriptor joydev sparse_keymap wmi_bmof bluetooth serio_raw videobuf2_v4l2 snd_seq
[ 1129.553482]  videobuf2_common videodev cfg80211 ecdh_generic media snd_seq_device snd_timer lpc_ich snd mei_me processor_thermal_device mei soundcore intel_soc_dts_iosf int3403_thermal mac_hid int3400_thermal int3402_thermal acpi_thermal_rel int340x_thermal_zone dell_rbtn acpi_pad sch_fq_codel parport_pc ppdev nf_tables nfnetlink lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c hid_generic usbhid hid i915 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass i2c_algo_bit drm_kms_helper syscopyarea sysfillrect ahci psmouse sysimgblt fb_sys_fops sdhci_pci drm libahci cqhci e1000e sdhci wmi video
[ 1129.553524] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-32-generic #34~18.04.2-Ubuntu
[ 1129.553525] Hardware name: Dell Inc. Latitude E7250/0V8RX3, BIOS A19 01/23/2018
[ 1129.553529] RIP: 0010:kmem_cache_free+0x189/0x1d0
[ 1129.553532] Code: 84 bf fe ff ff 4c 3b a0 d0 00 00 00 74 41 48 8b 48 58 49 8b 54 24 58 48 c7 c6 c0 bd a4 8d 48 c7 c7 78 bb d0 8d e8 5a 15 e8 ff <0f> 0b e9 93 fe ff ff 48 89 fe 41 b8 01 00 00 00 48 89 d9 48 89 da
[ 1129.553534] RSP: 0018:ffff985d56383ec0 EFLAGS: 00010282
[ 1129.553536] RAX: 0000000000000050 RBX: ffff985c7b864600 RCX: 0000000000000000
[ 1129.553537] RDX: 0000000000000000 RSI: ffff985d56396448 RDI: ffff985d56396448
[ 1129.553539] RBP: ffff985d56383ed8 R08: 000000000000043d R09: ffffffff8e579960
[ 1129.553540] R10: ffff985d56383ec0 R11: ffff985d56383d30 R12: ffff985d55d98180
[ 1129.553542] R13: ffff985d56383f00 R14: 000000000000000a R15: 0000000000000202
[ 1129.553544] FS:  0000000000000000(0000) GS:ffff985d56380000(0000) knlGS:0000000000000000
[ 1129.553546] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1129.553547] CR2: 000020401a711888 CR3: 00000001ba60e002 CR4: 00000000003606e0
[ 1129.553549] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1129.553551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1129.553551] Call Trace:
[ 1129.553554]  <IRQ>
[ 1129.553559]  i_callback+0x1c/0x20
[ 1129.553562]  rcu_process_callbacks+0x252/0x440
[ 1129.553567]  __do_softirq+0xe4/0x2f3
[ 1129.553571]  irq_exit+0xc5/0xd0
[ 1129.553574]  smp_apic_timer_interrupt+0x79/0x140
[ 1129.553577]  apic_timer_interrupt+0xf/0x20
[ 1129.553578]  </IRQ>
[ 1129.553583] RIP: 0010:cpuidle_enter_state+0xa9/0x440
[ 1129.553585] Code: 3d 9c 41 da 72 e8 c7 39 86 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 f8 44 86 ff 80 7d d3 00 0f 85 e6 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ed 0f 89 ff 01 00 00 41 c7 44 24 08 00 00 00 00 48 83 c4 18
[ 1129.553586] RSP: 0018:ffffbd7840d07e60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 1129.553589] RAX: ffff985d563a3200 RBX: ffffffff8e153e20 RCX: 000000000000001f
[ 1129.553590] RDX: 00000106fe9638d4 RSI: 0000000037c7f175 RDI: 0000000000000000
[ 1129.553591] RBP: ffffbd7840d07ea0 R08: 0000000000000004 R09: 0000000000022ac0
[ 1129.553593] R10: ffffbd7840d07e30 R11: 00000000000000ab R12: ffff985d563add00
[ 1129.553594] R13: 0000000000000003 R14: ffffffff8e153f58 R15: 00000106fe9638d4
[ 1129.553600]  cpuidle_enter+0x17/0x20
[ 1129.553603]  call_cpuidle+0x23/0x40
[ 1129.553606]  do_idle+0x204/0x280
[ 1129.553610]  cpu_startup_entry+0x1d/0x20
[ 1129.553613]  start_secondary+0x1ab/0x200
[ 1129.553617]  secondary_startup_64+0xa4/0xb0
[ 1129.553619] ---[ end trace 33e501f46ae14015 ]---
[ 1129.553622] cache_from_obj: Wrong slab cache. inode_cache but object is from ceph_inode_info
[ 1129.616811] WARNING: CPU: 3 PID: 7630 at /build/linux-hwe-iAAoxd/linux-hwe-5.0.0/kernel/rcu/tree.c:2499 rcu_process_callbacks+0x421/0x440
[ 1129.616815] Modules linked in: ceph libceph fscache rfcomm devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 l2tp_ppp af_key l2tp_netlink xfrm_algo l2tp_core pppox ccm vxlan ip6_udp_tunnel udp_tunnel aufs overlay cmac bnep dm_crypt binfmt_misc arc4 nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel dell_laptop snd_hda_codec aesni_intel ledtrig_audio snd_hda_core snd_hwdep dell_smm_hwmon snd_pcm aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf ath10k_pci ath10k_core snd_seq_midi ath snd_seq_midi_event dell_wmi snd_rawmidi uvcvideo mac80211 dell_smbios btusb input_leds btrtl btbcm dcdbas btintel videobuf2_vmalloc videobuf2_memops dell_wmi_descriptor joydev sparse_keymap wmi_bmof bluetooth serio_raw videobuf2_v4l2 snd_seq
[ 1129.616883]  videobuf2_common videodev cfg80211 ecdh_generic media snd_seq_device snd_timer lpc_ich snd mei_me processor_thermal_device mei soundcore intel_soc_dts_iosf int3403_thermal mac_hid int3400_thermal int3402_thermal acpi_thermal_rel int340x_thermal_zone dell_rbtn acpi_pad sch_fq_codel parport_pc ppdev nf_tables nfnetlink lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c hid_generic usbhid hid i915 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass i2c_algo_bit drm_kms_helper syscopyarea sysfillrect ahci psmouse sysimgblt fb_sys_fops sdhci_pci drm libahci cqhci e1000e sdhci wmi video
[ 1129.616933] CPU: 3 PID: 7630 Comm: SCTP timer Tainted: G        W         5.0.0-32-generic #34~18.04.2-Ubuntu
[ 1129.616935] Hardware name: Dell Inc. Latitude E7250/0V8RX3, BIOS A19 01/23/2018
[ 1129.616942] RIP: 0010:rcu_process_callbacks+0x421/0x440
[ 1129.616946] Code: 48 8b 05 ca f7 55 01 48 89 83 98 00 00 00 e9 b3 fe ff ff 0f 0b e9 c7 fc ff ff 4c 89 f6 4c 89 e7 e8 a4 c8 92 00 e9 5c fc ff ff <0f> 0b e9 d5 fe ff ff 0f 0b e9 da fd ff ff e8 2c 1b f8 ff 66 90 66
[ 1129.616948] RSP: 0000:ffff985d56383ef8 EFLAGS: 00010002
[ 1129.616952] RAX: ffffffffffffd800 RBX: ffff985d563a3ec0 RCX: 0000000000005f01
[ 1129.616954] RDX: 0000000000000002 RSI: ffff985d56383f00 RDI: ffff985d563a3ef0
[ 1129.616956] RBP: ffff985d56383f50 R08: 000000000002a550 R09: ffffffff8d2d4bee
[ 1129.616958] R10: 0000000000000001 R11: 0000000000000000 R12: ffff985d563a3ef0
[ 1129.616960] R13: ffff985d56383f00 R14: 0000000000000246 R15: 0000000000000202
[ 1129.616963] FS:  00007f69e9c1d700(0000) GS:ffff985d56380000(0000) knlGS:0000000000000000
[ 1129.616966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1129.616968] CR2: 00007f9c6424f000 CR3: 000000014e53c006 CR4: 00000000003606e0
[ 1129.616970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1129.616972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1129.616974] Call Trace:
[ 1129.616977]  <IRQ>
[ 1129.616986]  __do_softirq+0xe4/0x2f3
[ 1129.616993]  irq_exit+0xc5/0xd0
[ 1129.616997]  smp_apic_timer_interrupt+0x79/0x140
[ 1129.617001]  apic_timer_interrupt+0xf/0x20
[ 1129.617003]  </IRQ>
[ 1129.617007] RIP: 0033:0x7f6a106fd047
[ 1129.617010] Code: fb 48 83 ec 10 e8 b9 96 01 00 4d 89 f0 41 89 c1 4d 89 ea 4c 89 e2 48 89 ee 89 df b8 17 00 00 00 0f 05 48 3d 00 f0 ff ff 77 33 <44> 89 cf 89 44 24 0c e8 ed 96 01 00 8b 44 24 0c 48 83 c4 10 5b 5d
[ 1129.617012] RSP: 002b:00007f69e9c1ce20 EFLAGS: 00000207 ORIG_RAX: ffffffffffffff13
[ 1129.617016] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f6a106fd03f
[ 1129.617017] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1129.617019] RBP: 0000000000000000 R08: 00007f69e9c1ce70 R09: 0000000000000000
[ 1129.617021] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[ 1129.617023] R13: 0000000000000000 R14: 00007f69e9c1ce70 R15: 00007f6a052668d0
[ 1129.617027] ---[ end trace 33e501f46ae14016 ]---
[ 1136.130317] general protection fault: 0000 [#1] SMP PTI
[ 1136.130328] CPU: 3 PID: 7107 Comm: kworker/3:0 Tainted: G        W         5.0.0-32-generic #34~18.04.2-Ubuntu
[ 1136.130331] Hardware name: Dell Inc. Latitude E7250/0V8RX3, BIOS A19 01/23/2018
[ 1136.130366] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[ 1136.130377] RIP: 0010:kmem_cache_alloc+0x88/0x1c0
[ 1136.130382] Code: 65 49 8b 50 08 65 4c 03 05 6d c3 38 73 4d 8b 30 4d 85 f6 0f 84 f7 00 00 00 41 8b 5f 20 49 8b 3f 48 8d 4a 01 4c 89 f0 4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd
[ 1136.130386] RSP: 0018:ffffbd78481cfaf8 EFLAGS: 00010296
[ 1136.130391] RAX: c0064d11e8bd815c RBX: c0064d11e8bd86e4 RCX: 0000000000000119
[ 1136.130395] RDX: 0000000000000118 RSI: 0000000000600040 RDI: 0000451ae9a63d40
[ 1136.130398] RBP: ffffbd78481cfb28 R08: ffffdd783fde3d40 R09: ffff985c42ab2e60
[ 1136.130401] R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000600040
[ 1136.130404] R13: ffff985d4dd24000 R14: c0064d11e8bd815c R15: ffff985d4dd24000
[ 1136.130408] FS:  0000000000000000(0000) GS:ffff985d56380000(0000) knlGS:0000000000000000
[ 1136.130412] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1136.130415] CR2: 00007efffd53b180 CR3: 00000001ba60e004 CR4: 00000000003606e0
[ 1136.130419] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1136.130422] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1136.130424] Call Trace:
[ 1136.130450]  ? ceph_alloc_inode+0x1d/0x3f0 [ceph]
[ 1136.130469]  ? ceph_show_options+0x410/0x410 [ceph]
[ 1136.130486]  ceph_alloc_inode+0x1d/0x3f0 [ceph]
[ 1136.130492]  alloc_inode+0x20/0x90
[ 1136.130498]  iget5_locked+0x50/0x90
[ 1136.130514]  ? ceph_ino_compare+0x30/0x30 [ceph]
[ 1136.130531]  ceph_get_inode+0x36/0xd0 [ceph]
[ 1136.130549]  ceph_readdir_prepopulate+0x4b9/0xc60 [ceph]
[ 1136.130578]  handle_reply+0x989/0xcf0 [ceph]
[ 1136.130604]  dispatch+0xcf/0xaf0 [ceph]
[ 1136.130612]  ? __switch_to_asm+0x41/0x70
[ 1136.130618]  ? __switch_to_asm+0x35/0x70
[ 1136.130624]  ? __switch_to_asm+0x41/0x70
[ 1136.130629]  ? __switch_to_asm+0x35/0x70
[ 1136.130635]  ? __switch_to_asm+0x41/0x70
[ 1136.130657]  try_read+0x604/0x1240 [libceph]
[ 1136.130664]  ? __switch_to_asm+0x35/0x70
[ 1136.130669]  ? __switch_to_asm+0x41/0x70
[ 1136.130674]  ? __switch_to_asm+0x35/0x70
[ 1136.130679]  ? __switch_to_asm+0x41/0x70
[ 1136.130685]  ? __switch_to_asm+0x41/0x70
[ 1136.130707]  ceph_con_workfn+0xdc/0x610 [libceph]
[ 1136.130712]  ? __schedule+0x2c8/0x870
[ 1136.130719]  process_one_work+0x1fd/0x400
[ 1136.130724]  worker_thread+0x34/0x410
[ 1136.130732]  kthread+0x121/0x140
[ 1136.130737]  ? process_one_work+0x400/0x400
[ 1136.130743]  ? kthread_park+0xb0/0xb0
[ 1136.130749]  ret_from_fork+0x35/0x40
[ 1136.130754] Modules linked in: ceph libceph fscache rfcomm devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 l2tp_ppp af_key l2tp_netlink xfrm_algo l2tp_core pppox ccm vxlan ip6_udp_tunnel udp_tunnel aufs overlay cmac bnep dm_crypt binfmt_misc arc4 nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel dell_laptop snd_hda_codec aesni_intel ledtrig_audio snd_hda_core snd_hwdep dell_smm_hwmon snd_pcm aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf ath10k_pci ath10k_core snd_seq_midi ath snd_seq_midi_event dell_wmi snd_rawmidi uvcvideo mac80211 dell_smbios btusb input_leds btrtl btbcm dcdbas btintel videobuf2_vmalloc videobuf2_memops dell_wmi_descriptor joydev sparse_keymap wmi_bmof bluetooth serio_raw videobuf2_v4l2 snd_seq
[ 1136.130816]  videobuf2_common videodev cfg80211 ecdh_generic media snd_seq_device snd_timer lpc_ich snd mei_me processor_thermal_device mei soundcore intel_soc_dts_iosf int3403_thermal mac_hid int3400_thermal int3402_thermal acpi_thermal_rel int340x_thermal_zone dell_rbtn acpi_pad sch_fq_codel parport_pc ppdev nf_tables nfnetlink lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c hid_generic usbhid hid i915 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass i2c_algo_bit drm_kms_helper syscopyarea sysfillrect ahci psmouse sysimgblt fb_sys_fops sdhci_pci drm libahci cqhci e1000e sdhci wmi video
[ 1136.130870] ---[ end trace 33e501f46ae14017 ]---
[ 1136.130878] RIP: 0010:kmem_cache_alloc+0x88/0x1c0
[ 1136.130882] Code: 65 49 8b 50 08 65 4c 03 05 6d c3 38 73 4d 8b 30 4d 85 f6 0f 84 f7 00 00 00 41 8b 5f 20 49 8b 3f 48 8d 4a 01 4c 89 f0 4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd
[ 1136.130885] RSP: 0018:ffffbd78481cfaf8 EFLAGS: 00010296
[ 1136.130889] RAX: c0064d11e8bd815c RBX: c0064d11e8bd86e4 RCX: 0000000000000119
[ 1136.130892] RDX: 0000000000000118 RSI: 0000000000600040 RDI: 0000451ae9a63d40
[ 1136.130895] RBP: ffffbd78481cfb28 R08: ffffdd783fde3d40 R09: ffff985c42ab2e60
[ 1136.130898] R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000600040
[ 1136.130901] R13: ffff985d4dd24000 R14: c0064d11e8bd815c R15: ffff985d4dd24000
[ 1136.130904] FS:  0000000000000000(0000) GS:ffff985d56380000(0000) knlGS:0000000000000000
[ 1136.130908] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1136.130910] CR2: 00007efffd53b180 CR3: 00000001ba60e004 CR4: 00000000003606e0
[ 1136.130913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1136.130916] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Actions #1

Updated by Greg Farnum over 4 years ago

  • Project changed from Ceph to CephFS
Actions #2

Updated by Patrick Donnelly over 4 years ago

  • Assignee set to Jeff Layton
  • Target version deleted (v14.2.5)
  • Start date deleted (11/08/2019)
  • Component(FS) kceph added
Actions #3

Updated by Jeff Layton over 4 years ago

There was a bad backport that crept into a stable release and it looks like this ubuntu kernel pulled it in:

https://kernel.ubuntu.com/git/ubuntu/ubuntu-disco.git/commit/?h=master-next&id=100a8eb40c492f2525cdae434c50d53ec7f5cc23

...it looks like their next release has a revert and then a corrected backport of that patch applied:

https://kernel.ubuntu.com/git/ubuntu/ubuntu-disco.git/commit/?h=master-next&id=2a43c12dcccbd4a94ec0163a83a0775579a732d0

...but I'm not clear on when that will be officially released.

Actions #4

Updated by Jeff Layton over 4 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Марк Коренберг over 4 years ago

5.0.0-33.35~18.04.1 seems fix this issue. I'm installing and testing now.

Actions #6

Updated by Jeff Layton over 4 years ago

  • Status changed from In Progress to Resolved

Looks like the updates have trickled out to ubuntu repos. Let's call this resolved. Please reopen if you see it again on more recent kernels.

Actions

Also available in: Atom PDF