Bug #21420

Updated by Ilya Dryomov over 2 years ago

kernel BUG at /home/kernel/COD/linux/net/ceph/osd_client.c:1554!
invalid opcode: 0000 [#1] SMP [Wed Aug 30 14:17:04 2017] Modules linked in: binfmt_misc ipmi_devintf ceph libceph libcrc32c fscache ipmi_ssif intel_powerclamp coretemp kvm_intel kvm gpio_ich input_leds ipmi_si serio_raw irqbypass intel_cstate shpchp i7core_edac hpilo lpc_ich edac_core acpi_power_meter ipmi_msghandler mac_hid 8021q garp mrp stp llc bonding nfsd auth_rpcgss nfs_acl lp lockd grace parport sunrpc autofs4 btrfs xor raid6_pq mlx4_en ptp pps_core hid_generic i2c_algo_bit ttm drm_kms_helper usbhid syscopyarea sysfillrect sysimgblt fb_sys_fops hid mlx4_core hpsa psmouse drm pata_acpi bnx2 devlink scsi_transport_sas fjes
CPU: 18 PID: 471071 Comm: vsftpd Tainted: G I 4.9.44-040944-generic #201708161731
Hardware name: HP ProLiant DL360 G6, BIOS P64 08/16/2015
task: ffff9268331bc080 task.stack: ffffab4a96988000
RIP: 0010:[<ffffffffc09c44f7>] [<ffffffffc09c44f7>] send_request+0xa27/0xab0 [libceph]
RSP: 0018:ffffab4a9698b8e8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000002201 RCX: ffff925e48490000
RDX: ffff926242fcf553 RSI: 0000000000001295 RDI: 0000000000002201
RBP: ffffab4a9698b958 R08: ffff92685f95c9e0 R09: 0000000000000000
R10: 0000000000000000 R11: ffff926842265680 R12: ffff92684078c610
R13: 0000000000000001 R14: ffff926242fc608b R15: ffff92684078c610
FS: 00007f5b59fd5700(0000) GS:ffff92685f940000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fec2bb2f003 CR3: 000000068ef59000 CR4: 00000000000006e0
01ffffffc09b9cd7 ffff92685f959300 0000000000000067 ffff9268331bc080
ffff926242fc7000 ffff926241a3e000 ffff926840215200 00000e8100002201
0000000000000000 ffff92684078c610 ffff926841ddf7c0 0000000000000000
Call Trace:
[<ffffffffc09c815a>] __submit_request+0x20a/0x2f0 [libceph]
[<ffffffffc09c826b>] submit_request+0x2b/0x30 [libceph]
[<ffffffffc09c8c14>] ceph_osdc_writepages+0x104/0x1a0 [libceph]
[<ffffffffc0a0f4b1>] writepage_nounlock+0x2c1/0x470 [ceph]
[<ffffffffa65f120a>] ? page_mkclean+0x6a/0xb0
[<ffffffffa65ef3b0>] ? __page_check_address+0x1c0/0x1c0
[<ffffffffc0a11f9c>] ceph_update_writeable_page+0xdc/0x4a0 [ceph]
[<ffffffffa65a974d>] ? pagecache_get_page+0x17d/0x2a0
[<ffffffffc0a123ca>] ceph_write_begin+0x6a/0x120 [ceph]
[<ffffffffa65a89b8>] generic_perform_write+0xc8/0x1c0
[<ffffffffa66592ee>] ? file_update_time+0x5e/0x110
[<ffffffffc0a0c402>] ceph_write_iter+0xba2/0xbe0 [ceph]
[<ffffffffa6b6238c>] ? release_sock+0x8c/0xa0
[<ffffffffa6bce0b9>] ? tcp_recvmsg+0x4c9/0xb50
[<ffffffffa6b5d65d>] ? sock_recvmsg+0x3d/0x50
[<ffffffffa663ad45>] __vfs_write+0xe5/0x160
[<ffffffffa663bfe5>] vfs_write+0xb5/0x1a0
[<ffffffffa663d465>] SyS_write+0x55/0xc0
[<ffffffffa6c9b9bb>] entry_SYSCALL_64_fastpath+0x1e/0xad
Code: fb ab e5 e9 de f6 ff ff ba 14 00 00 00 e9 42 f7 ff ff 49 c7 46 08 00 00 00 00 41 c7 46 10 00 00 00 00 49 8d 56 14 e9 6d fb ff ff <0f> 0b 0f 0b be 8f 05 00 00 48 c7 c7 d8 0c 9e c0 e8 b4 fb ab e5
RIP [<ffffffffc09c44f7>] send_request+0xa27/0xab0 [libceph]
RSP <ffffab4a9698b8e8>
---[ end trace 5c55854998e663dc ]---

You have quite a lot of snapshots -- 4758 of them? send_request()
attempted to encode a 8 + 4 + 4758*8 = ~38k snap context into a 4k
buffer. Normally it's fine because the snap context is taken into
account when allocating a message buffer. However, this particular
code path (... ceph_osdc_writepages()) uses pre-allocated messages,
which are always 4k in size.

I think it's a known bug^Wlimitation. As a short-term fix, we can
probably increase that pre-allocated size from 4k to something bigger.
A proper resolution would take a considerable amount of time. Until
then I'd recommend a much more aggressive snapshot rotation schedule,
which is a good idea anyway -- your writes will transmit faster!

ceph_writepages_start() (to a lesser extent) and ceph_osdc_writepages() code paths are affected.