Project

General

Profile

Actions

Bug #16013

closed

Failing file operations on kernel based cephfs mount point leaves unaccessible file behind on hammer 0.94.7

Added by Burkhard Linke almost 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel,hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After some number of operations on files (which could not be traced for reproduction), we end up with a broken directory:

  1. ls -al
    ls: cannot access DensityMap: Invalid argument
    total 0
    drwxr-sr-x 1 XXXX XXXX 0 May 24 17:50 .
    drwxr-sr-x 1 XXXX XXXX 260874151 May 24 17:55 ..
    l????????? ? ? ? ? ? DensityMap

Accessing the directory results in a kernel error message:

  1. uname -a
    Linux waas 4.4.0-22-generic #39~14.04.1-Ubuntu SMP Thu May 5 19:19:06 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[Wed May 25 09:08:23 2016] ------------[ cut here ]------------
[Wed May 25 09:08:23 2016] WARNING: CPU: 8 PID: 9700 at /build/linux-lts-xenial-7RlTta/linux-lts-xenial-4.4.0/fs/ceph/inode.c:811 fill_inode.isra.16+0xb2a/0xc00 [ceph]()
[Wed May 25 09:08:23 2016] Modules linked in: tcp_diag(E) inet_diag(E) ceph(E) libceph(E) usblp(E) parport_pc(E) ppdev(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) iptable_filter ip_tables x_tables cts 8021q garp mrp stp llc openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl x86_pkg_temp_thermal ipmi_ssif intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw input_leds hpilo gf128mul glue_helper joydev ablk_helper shpchp ioatdma cryptd lpc_ich sb_edac serio_raw edac_core acpi_power_meter 8250_fintek wmi mac_hid ipmi_si ipmi_msghandler lp parport btrfs xor raid6_pq hid_generic usbhid hid ixgbe dca psmouse vxlan tg3 ip6_udp_tunnel hpsa udp_tunnel ptp pps_core scsi_transport_sas mdio fjes [last unloaded: libceph]
[Wed May 25 09:08:23 2016] CPU: 8 PID: 9700 Comm: kworker/8:0 Tainted: G W E 4.4.0-22-generic #39~14.04.1-Ubuntu
[Wed May 25 09:08:23 2016] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 07/20/2015
[Wed May 25 09:08:23 2016] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[Wed May 25 09:08:23 2016] 0000000000000000 ffff881fde657a28 ffffffff813cde6c 0000000000000000
[Wed May 25 09:08:23 2016] ffffffffc08003b0 ffff881fde657a60 ffffffff8107d856 00000000ffffffea
[Wed May 25 09:08:23 2016] 0000000000000000 0000000000000000 ffff880422556308 ffff881f1e5e053a
[Wed May 25 09:08:23 2016] Call Trace:
[Wed May 25 09:08:23 2016] [<ffffffff813cde6c>] dump_stack+0x63/0x87
[Wed May 25 09:08:23 2016] [<ffffffff8107d856>] warn_slowpath_common+0x86/0xc0
[Wed May 25 09:08:23 2016] [<ffffffff8107d94a>] warn_slowpath_null+0x1a/0x20
[Wed May 25 09:08:23 2016] [<ffffffffc07d899a>] fill_inode.isra.16+0xb2a/0xc00 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffffc07d65e0>] ? ceph_mount+0x810/0x810 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffffc07d9604>] ceph_readdir_prepopulate+0x224/0x8c0 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffffc07f7342>] handle_reply+0xa32/0xca0 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffffc07f918e>] dispatch+0xae/0xaf0 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffffc0618ac3>] try_read+0x443/0x1120 [libceph]
[Wed May 25 09:08:23 2016] [<ffffffff81036a09>] ? sched_clock+0x9/0x10
[Wed May 25 09:08:23 2016] [<ffffffff810b31c5>] ? put_prev_entity+0x35/0x670
[Wed May 25 09:08:23 2016] [<ffffffff8102c696>] ? __switch_to+0x1d6/0x570
[Wed May 25 09:08:23 2016] [<ffffffffc0619852>] ceph_con_workfn+0xb2/0x5d0 [libceph]
[Wed May 25 09:08:23 2016] [<ffffffff810959cd>] process_one_work+0x14d/0x3f0
[Wed May 25 09:08:23 2016] [<ffffffff8109614a>] worker_thread+0x11a/0x470
[Wed May 25 09:08:23 2016] [<ffffffff817ebe19>] ? __schedule+0x359/0x970
[Wed May 25 09:08:23 2016] [<ffffffff81096030>] ? rescuer_thread+0x310/0x310
[Wed May 25 09:08:23 2016] [<ffffffff8109b882>] kthread+0xd2/0xf0
[Wed May 25 09:08:23 2016] [<ffffffff8109b7b0>] ? kthread_park+0x50/0x50
[Wed May 25 09:08:23 2016] [<ffffffff817f004f>] ret_from_fork+0x3f/0x70
[Wed May 25 09:08:23 2016] [<ffffffff8109b7b0>] ? kthread_park+0x50/0x50
[Wed May 25 09:08:23 2016] ---[ end trace 99ae552d517bb8d0 ]---
[Wed May 25 09:08:23 2016] ceph: fill_inode badness on ffff880422556308
[Wed May 25 09:08:23 2016] ------------[ cut here ]------------
[Wed May 25 09:08:23 2016] WARNING: CPU: 8 PID: 9700 at /build/linux-lts-xenial-7RlTta/linux-lts-xenial-4.4.0/fs/ceph/inode.c:811 fill_inode.isra.16+0xb2a/0xc00 [ceph]()
[Wed May 25 09:08:23 2016] Modules linked in: tcp_diag(E) inet_diag(E) ceph(E) libceph(E) usblp(E) parport_pc(E) ppdev(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) iptable_filter ip_tables x_tables cts 8021q garp mrp stp llc openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl x86_pkg_temp_thermal ipmi_ssif intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw input_leds hpilo gf128mul glue_helper joydev ablk_helper shpchp ioatdma cryptd lpc_ich sb_edac serio_raw edac_core acpi_power_meter 8250_fintek wmi mac_hid ipmi_si ipmi_msghandler lp parport btrfs xor raid6_pq hid_generic usbhid hid ixgbe dca psmouse vxlan tg3 ip6_udp_tunnel hpsa udp_tunnel ptp pps_core scsi_transport_sas mdio fjes [last unloaded: libceph]
[Wed May 25 09:08:23 2016] CPU: 8 PID: 9700 Comm: kworker/8:0 Tainted: G W E 4.4.0-22-generic #39~14.04.1-Ubuntu
[Wed May 25 09:08:23 2016] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 07/20/2015
[Wed May 25 09:08:23 2016] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[Wed May 25 09:08:23 2016] 0000000000000000 ffff881fde657a50 ffffffff813cde6c 0000000000000000
[Wed May 25 09:08:23 2016] ffffffffc08003b0 ffff881fde657a88 ffffffff8107d856 00000000ffffffea
[Wed May 25 09:08:23 2016] 0000000000000000 0000000000000000 ffff880422556e48 ffff881f1e5e0130
[Wed May 25 09:08:23 2016] Call Trace:
[Wed May 25 09:08:23 2016] [<ffffffff813cde6c>] dump_stack+0x63/0x87
[Wed May 25 09:08:23 2016] [<ffffffff8107d856>] warn_slowpath_common+0x86/0xc0
[Wed May 25 09:08:23 2016] [<ffffffff8107d94a>] warn_slowpath_null+0x1a/0x20
[Wed May 25 09:08:23 2016] [<ffffffffc07d899a>] fill_inode.isra.16+0xb2a/0xc00 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffff812167c5>] ? inode_init_always+0x105/0x1b0
[Wed May 25 09:08:23 2016] [<ffffffffc07d8b87>] ceph_fill_trace+0x117/0x970 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffffc07f6d39>] handle_reply+0x429/0xca0 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffffc07f918e>] dispatch+0xae/0xaf0 [ceph]
[Wed May 25 09:08:23 2016] [<ffffffffc0618ac3>] try_read+0x443/0x1120 [libceph]
[Wed May 25 09:08:23 2016] [<ffffffff810b31c5>] ? put_prev_entity+0x35/0x670
[Wed May 25 09:08:23 2016] [<ffffffff8102c696>] ? __switch_to+0x1d6/0x570
[Wed May 25 09:08:23 2016] [<ffffffffc0619852>] ceph_con_workfn+0xb2/0x5d0 [libceph]
[Wed May 25 09:08:23 2016] [<ffffffff810959cd>] process_one_work+0x14d/0x3f0
[Wed May 25 09:08:23 2016] [<ffffffff8109614a>] worker_thread+0x11a/0x470
[Wed May 25 09:08:23 2016] [<ffffffff817ebe19>] ? __schedule+0x359/0x970
[Wed May 25 09:08:23 2016] [<ffffffff81096030>] ? rescuer_thread+0x310/0x310
[Wed May 25 09:08:23 2016] [<ffffffff8109b882>] kthread+0xd2/0xf0
[Wed May 25 09:08:23 2016] [<ffffffff8109b7b0>] ? kthread_park+0x50/0x50
[Wed May 25 09:08:23 2016] [<ffffffff817f004f>] ret_from_fork+0x3f/0x70
[Wed May 25 09:08:23 2016] [<ffffffff8109b7b0>] ? kthread_park+0x50/0x50
[Wed May 25 09:08:23 2016] ---[ end trace 99ae552d517bb8d1 ]---
[Wed May 25 09:08:23 2016] ceph: fill_inode badness ffff880422556e48 100041dee77.fffffffffffffffe

Attempts to remove the affected file or moving the directory itself of the the way also fail with the similar error messages.

How do I resolve this problem?


Related issues 3 (0 open3 closed)

Related to CephFS - Bug #16983: mds: handle_client_open failing on openResolvedPatrick Donnelly08/10/2016

Actions
Copied to CephFS - Backport #16625: jewel: Failing file operations on kernel based cephfs mount point leaves unaccessible file behind on hammer 0.94.7ResolvedNathan CutlerActions
Copied to CephFS - Backport #16626: hammer: Failing file operations on kernel based cephfs mount point leaves unaccessible file behind on hammer 0.94.7ResolvedNathan CutlerActions
Actions

Also available in: Atom PDF