Actions
Bug #13256
closedI/O error with cephfs accessing root .snap directory on v9.0.3
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I am running a test Ceph cluster using Ceph v9.0.3 with all Kernels at 4.2.0 on Ubuntu Trusty. I have enabled snapshots, and have a cron job creating a snapshot every hour from a system using a fuse mount. On two other systems using kernel mounts, I am now seeing:
# ls -l /cephfs/.snap ls: reading directory /cephfs/.snap: Input/output error total 0
When this happens, I get the following error message in the /var/log/kern.log file:
Sep 26 11:46:42 dfgw02 kernel: [642211.597617] ceph: dir contents are larger than expected Sep 26 11:46:42 dfgw02 kernel: [642211.597651] ------------[ cut here ]------------ Sep 26 11:46:42 dfgw02 kernel: [642211.597670] WARNING: CPU: 5 PID: 122627 at /home/kernel/COD/linux/fs/ceph/mds_client.c:188 handle_reply+0xafe/0xbd0 [ceph]() Sep 26 11:46:42 dfgw02 kernel: [642211.597672] Modules linked in: ipmi_devintf ipmi_ssif ttm drm_kms_helper ceph drm i2c_algo_bit gpio_ich libceph coretemp input_leds kvm acpi_power_meter 8250_fintek i7core_edac hpilo libcrc32c fscache serio_raw edac_core ipmi_si ipmi_msghandler lpc_ich shpchp mac_hid bonding lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic usbhid hid psmouse bnx2 mlx4_core hpsa Sep 26 11:46:42 dfgw02 kernel: [642211.597697] CPU: 5 PID: 122627 Comm: kworker/5:0 Tainted: G W I 4.2.0-040200-generic #201508301530 Sep 26 11:46:42 dfgw02 kernel: [642211.597699] Hardware name: HP ProLiant DL360 G6, BIOS P64 01/22/2015 Sep 26 11:46:42 dfgw02 kernel: [642211.597710] Workqueue: ceph-msgr con_work [libceph] Sep 26 11:46:42 dfgw02 kernel: [642211.597712] ffffffffc03c0838 ffff880c0115fb68 ffffffff817a1b43 0000000000000000 Sep 26 11:46:42 dfgw02 kernel: [642211.597714] 0000000000000000 ffff880c0115fba8 ffffffff8107719a ffff880c0115fbc8 Sep 26 11:46:42 dfgw02 kernel: [642211.597716] ffff880c03b66c00 ffff880c00c67e00 ffffc9000dd85830 ffff880601545008 Sep 26 11:46:42 dfgw02 kernel: [642211.597718] Call Trace: Sep 26 11:46:42 dfgw02 kernel: [642211.597726] [<ffffffff817a1b43>] dump_stack+0x45/0x57 Sep 26 11:46:42 dfgw02 kernel: [642211.597732] [<ffffffff8107719a>] warn_slowpath_common+0x8a/0xc0 Sep 26 11:46:42 dfgw02 kernel: [642211.597734] [<ffffffff8107728a>] warn_slowpath_null+0x1a/0x20 Sep 26 11:46:42 dfgw02 kernel: [642211.597742] [<ffffffffc03af07e>] handle_reply+0xafe/0xbd0 [ceph] Sep 26 11:46:42 dfgw02 kernel: [642211.597750] [<ffffffffc03b0c7e>] dispatch+0xae/0xc10 [ceph] Sep 26 11:46:42 dfgw02 kernel: [642211.597755] [<ffffffffc02b9af8>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] Sep 26 11:46:42 dfgw02 kernel: [642211.597761] [<ffffffffc02bdac1>] try_read+0x3d1/0x1060 [libceph] Sep 26 11:46:42 dfgw02 kernel: [642211.597766] [<ffffffff8101dc6b>] ? native_sched_clock+0x2b/0x80 Sep 26 11:46:42 dfgw02 kernel: [642211.597768] [<ffffffff8101dcc9>] ? sched_clock+0x9/0x10 Sep 26 11:46:42 dfgw02 kernel: [642211.597772] [<ffffffff810abbaf>] ? put_prev_entity+0x2f/0x4a0 Sep 26 11:46:42 dfgw02 kernel: [642211.597777] [<ffffffffc02be802>] con_work+0xb2/0x5f0 [libceph] Sep 26 11:46:42 dfgw02 kernel: [642211.597783] [<ffffffff8108f21e>] process_one_work+0x14e/0x3d0 Sep 26 11:46:42 dfgw02 kernel: [642211.597785] [<ffffffff8108f8ca>] worker_thread+0x11a/0x470 Sep 26 11:46:42 dfgw02 kernel: [642211.597787] [<ffffffff8108f7b0>] ? rescuer_thread+0x310/0x310 Sep 26 11:46:42 dfgw02 kernel: [642211.597790] [<ffffffff81094e29>] kthread+0xc9/0xe0 Sep 26 11:46:42 dfgw02 kernel: [642211.597792] [<ffffffff81094d60>] ? kthread_create_on_node+0x180/0x180 Sep 26 11:46:42 dfgw02 kernel: [642211.597796] [<ffffffff817a925f>] ret_from_fork+0x3f/0x70 Sep 26 11:46:42 dfgw02 kernel: [642211.597798] [<ffffffff81094d60>] ? kthread_create_on_node+0x180/0x180 Sep 26 11:46:42 dfgw02 kernel: [642211.597799] ---[ end trace cdeabe6cb4c8bb2d ]--- Sep 26 11:46:42 dfgw02 kernel: [642211.597800] ceph: problem parsing dir contents -5 Sep 26 11:46:42 dfgw02 kernel: [642211.597825] ceph: mds parse_reply err -5 Sep 26 11:46:42 dfgw02 kernel: [642211.597849] ceph: mdsc_handle_reply got corrupt reply mds0(tid:2010) Sep 26 11:46:42 dfgw02 kernel: [642211.597878] header: 00000000: bc 05 00 00 00 00 00 00 da 07 00 00 00 00 00 00 ................ Sep 26 11:46:42 dfgw02 kernel: [642211.597879] header: 00000010: 1a 00 7f 00 01 00 30 b8 00 00 00 00 00 00 00 00 ......0.........
Their are a lot more error message lines in the kern.log file, so I attached it to this report.
I have attached a file with the ls -l of the /cephfs/.snap directory taken from the system taking the snapshots using a fuse mount.
System info:
keeper@dfgw02:~$ uname -a Linux dfgw02 4.2.0-040200-generic #201508301530 SMP Sun Aug 30 19:31:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux keeper@dfgw02:~$ ceph -v ceph version 9.0.3 (7295612d29f953f46e6e88812ef372b89a43b9da) Kernel mount: root@dfgw02:~# mount | grep cephfs 10.16.51.21,10.16.51.22,10.16.51.23:/ on /cephfs type ceph (name=cephfs,key=client.cephfs) Fuse mount on separate system that is performing the snapshots root@dfadm01:~# mount | grep ceph ceph-fuse on /cephfs type fuse.ceph-fuse (rw,noatime,_netdev) root@dfgw02:~# ceph -s cluster c261c2dc-5e29-11e5-98ba-68b599c50db0 health HEALTH_WARN 21 requests are blocked > 32 sec monmap e1: 3 mons at {dfmon01=10.16.51.21:6789/0,dfmon02=10.16.51.22:6789/0,dfmon03=10.16.51.23:6789/0} election epoch 6, quorum 0,1,2 dfmon01,dfmon02,dfmon03 mdsmap e3222: 1/1/1 up {0=dfmds02=up:active}, 1 up:standby osdmap e5901: 176 osds: 169 up, 169 in pgmap v351926: 18496 pgs, 4 pools, 46873 GB data, 11909 kobjects 137 TB used, 108 TB / 246 TB avail 18496 active+clean client io 60516 kB/s rd, 49 op/s
Files
Updated by Zheng Yan over 8 years ago
- Status changed from New to Fix Under Review
Updated by Greg Farnum over 8 years ago
- Status changed from Fix Under Review to 7
Actions