Project

General

Profile

Bug #63586

Updated by Xiubo Li 6 months ago

https://pulpito.ceph.com/yuriw-2023-11-15_23:07:34-fs-wip-yuri-testing-2023-11-14-0743-reef-distro-default-smithi/7459949/ 

 From remote/smithi204/syslog/kern.log.gz: 

 <pre> 
 2023-11-16T02:47:20.751447+00:00 smithi204 kernel: [ 1895.456434] libceph: client5322 fsid 1325a234-ab9e-4632-9a6e-98afbef8ae76 
 2023-11-16T02:47:20.751448+00:00 smithi204 kernel: [ 1895.456483] ceph: test_dummy_encryption mode enabled 
 2023-11-16T02:50:12.194680+00:00 smithi204 kernel: [ 2066.894411] ------------[ cut here ]------------ 
 2023-11-16T02:50:12.194695+00:00 smithi204 kernel: [ 2066.894484] WARNING: CPU: 7 PID: 112988 at mm/page_alloc.c:4402 __alloc_pages+0x1e7/0x270 
 2023-11-16T02:50:12.194696+00:00 smithi204 kernel: [ 2066.894518] Modules linked in: ceph libceph fscache netfs veth xfs xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm ib_core overlay ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp mei_me coretemp crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel mei crypto_simd cryptd joydev wmi ioatdma acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad kvm_intel kvm irqbypass sch_fq_codel scsi_transport_iscsi lp parport ramoops reed_solomon nfsd efi_pstore configfs auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 hid_generic usbhid hid igb ixgbe i2c_algo_bit dca crc32_pclmul nvme i2c_i801 ahci ptp i2c_smbus lpc_ich libahci pps_core nvme_core mdio 
 2023-11-16T02:50:12.194699+00:00 smithi204 kernel: [ 2066.894671] CPU: 7 PID: 112988 Comm: kworker/7:0 Not tainted 6.6.0-rc5-g3b796cfe2969 #1 
 2023-11-16T02:50:12.194699+00:00 smithi204 kernel: [ 2066.894674] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0a 08/02/2016 
 2023-11-16T02:50:12.194730+00:00 smithi204 kernel: [ 2066.894676] Workqueue: ceph-msgr ceph_con_workfn [libceph] 
 2023-11-16T02:50:12.194732+00:00 smithi204 kernel: [ 2066.894701] RIP: 0010:__alloc_pages+0x1e7/0x270 
 2023-11-16T02:50:12.194732+00:00 smithi204 kernel: [ 2066.894704] Code: ff ff 00 0f 84 19 ff ff ff 80 ce 01 e9 11 ff ff ff 83 fe 0a 0f 86 9d fe ff ff 80 3d 80 f4 53 01 00 75 09 c6 05 77 f4 53 01 01 <0f> 0b 45 31 ed e9 57 ff ff ff e8 9a f3 e0 ff 84 c0 0f 85 4a ff ff 
 2023-11-16T02:50:12.194733+00:00 smithi204 kernel: [ 2066.894706] RSP: 0018:ffffbcae88d5bc58 EFLAGS: 00010246 
 2023-11-16T02:50:12.194734+00:00 smithi204 kernel: [ 2066.894709] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 
 2023-11-16T02:50:12.194734+00:00 smithi204 kernel: [ 2066.894711] RDX: 0000000000000000 RSI: 0000000000000018 RDI: 0000000000000000 
 2023-11-16T02:50:12.194742+00:00 smithi204 kernel: [ 2066.894713] RBP: 0000000000040c00 R08: ffff9ab023c70f00 R09: 0000000000000000 
 2023-11-16T02:50:12.194742+00:00 smithi204 kernel: [ 2066.894715] R10: ffff9aaf00ebf030 R11: ffff9ab023c6ee00 R12: 0000000000000018 
 2023-11-16T02:50:12.194743+00:00 smithi204 kernel: [ 2066.894717] R13: 0000000000000018 R14: ffffffffc1235c85 R15: ffff9aaf00ebf030 
 2023-11-16T02:50:12.194744+00:00 smithi204 kernel: [ 2066.894718] FS:    0000000000000000(0000) GS:ffff9ab65fdc0000(0000) knlGS:0000000000000000 
 2023-11-16T02:50:12.194744+00:00 smithi204 kernel: [ 2066.894720] CS:    0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
 2023-11-16T02:50:12.194745+00:00 smithi204 kernel: [ 2066.894722] CR2: 00007f8a8a03c180 CR3: 000000005a63e002 CR4: 00000000003706e0 
 2023-11-16T02:50:12.194750+00:00 smithi204 kernel: [ 2066.894724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
 2023-11-16T02:50:12.194750+00:00 smithi204 kernel: [ 2066.894726] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
 2023-11-16T02:50:12.194751+00:00 smithi204 kernel: [ 2066.894728] Call Trace: 
 2023-11-16T02:50:12.194751+00:00 smithi204 kernel: [ 2066.894730]    <TASK> 
 2023-11-16T02:50:12.194751+00:00 smithi204 kernel: [ 2066.894732]    ? __warn+0x84/0x170 
 2023-11-16T02:50:12.194752+00:00 smithi204 kernel: [ 2066.894736]    ? __alloc_pages+0x1e7/0x270 
 2023-11-16T02:50:12.194752+00:00 smithi204 kernel: [ 2066.894740]    ? report_bug+0x191/0x1a0 
 2023-11-16T02:50:12.194756+00:00 smithi204 kernel: [ 2066.894747]    ? handle_bug+0x42/0x70 
 2023-11-16T02:50:12.194756+00:00 smithi204 kernel: [ 2066.894751]    ? exc_invalid_op+0x18/0x70 
 2023-11-16T02:50:12.194757+00:00 smithi204 kernel: [ 2066.894754]    ? asm_exc_invalid_op+0x1a/0x20 
 2023-11-16T02:50:12.194757+00:00 smithi204 kernel: [ 2066.894760]    ? osd_sparse_read+0x115/0x670 [libceph] 
 2023-11-16T02:50:12.194757+00:00 smithi204 kernel: [ 2066.894791]    ? __alloc_pages+0x1e7/0x270 
 2023-11-16T02:50:12.194758+00:00 smithi204 kernel: [ 2066.894798]    __kmalloc_large_node+0x88/0x140 
 2023-11-16T02:50:12.194762+00:00 smithi204 kernel: [ 2066.894804]    __kmalloc+0xdf/0x150 
 2023-11-16T02:50:12.194762+00:00 smithi204 kernel: [ 2066.894819]    osd_sparse_read+0x115/0x670 [libceph] 
 2023-11-16T02:50:12.194762+00:00 smithi204 kernel: [ 2066.894834]    ? read_partial_message_chunk+0x8b/0x90 [libceph] 
 2023-11-16T02:50:12.194763+00:00 smithi204 kernel: [ 2066.894850]    ceph_con_v1_try_read+0x17cc/0x1e00 [libceph] 
 2023-11-16T02:50:12.194763+00:00 smithi204 kernel: [ 2066.894881]    ? ceph_con_workfn+0x2e/0x730 [libceph] 
 2023-11-16T02:50:12.194763+00:00 smithi204 kernel: [ 2066.894894]    ? lock_acquire+0xcc/0x2a0 
 2023-11-16T02:50:12.194764+00:00 smithi204 kernel: [ 2066.894911]    ? process_one_work+0x176/0x4c0 
 2023-11-16T02:50:12.194768+00:00 smithi204 kernel: [ 2066.894915]    ceph_con_workfn+0x383/0x730 [libceph] 
 2023-11-16T02:50:12.194768+00:00 smithi204 kernel: [ 2066.894929]    ? process_one_work+0x176/0x4c0 
 2023-11-16T02:50:12.194768+00:00 smithi204 kernel: [ 2066.894932]    process_one_work+0x1dd/0x4c0 
 2023-11-16T02:50:12.194769+00:00 smithi204 kernel: [ 2066.894939]    worker_thread+0x1df/0x3e0 
 2023-11-16T02:50:12.194769+00:00 smithi204 kernel: [ 2066.894943]    ? drain_workqueue+0x120/0x120 
 2023-11-16T02:50:12.194769+00:00 smithi204 kernel: [ 2066.894947]    kthread+0xf4/0x130 
 2023-11-16T02:50:12.194774+00:00 smithi204 kernel: [ 2066.894959]    ? kthread_complete_and_exit+0x20/0x20 
 2023-11-16T02:50:12.194774+00:00 smithi204 kernel: [ 2066.894964]    ret_from_fork+0x31/0x50 
 2023-11-16T02:50:12.194774+00:00 smithi204 kernel: [ 2066.894967]    ? kthread_complete_and_exit+0x20/0x20 
 2023-11-16T02:50:12.194775+00:00 smithi204 kernel: [ 2066.894970]    ret_from_fork_asm+0x11/0x20 
 2023-11-16T02:50:12.194775+00:00 smithi204 kernel: [ 2066.894980]    </TASK> 
 2023-11-16T02:50:12.194775+00:00 smithi204 kernel: [ 2066.894982] irq event stamp: 4219761 
 2023-11-16T02:50:12.194779+00:00 smithi204 kernel: [ 2066.894984] hardirqs last    enabled at (4219767): [<ffffffffa213778d>] console_unlock+0x11d/0x150 
 2023-11-16T02:50:12.194780+00:00 smithi204 kernel: [ 2066.894988] hardirqs last disabled at (4219772): [<ffffffffa2137772>] console_unlock+0x102/0x150 
 2023-11-16T02:50:12.194780+00:00 smithi204 kernel: [ 2066.894991] softirqs last    enabled at (4219726): [<ffffffffa20a5777>] __irq_exit_rcu+0x77/0xa0 
 2023-11-16T02:50:12.194780+00:00 smithi204 kernel: [ 2066.894996] softirqs last disabled at (4219709): [<ffffffffa20a5777>] __irq_exit_rcu+0x77/0xa0 
 2023-11-16T02:50:12.194781+00:00 smithi204 kernel: [ 2066.894998] ---[ end trace 0000000000000000 ]--- 
 2023-11-16T02:50:12.194781+00:00 smithi204 kernel: [ 2066.895001] libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents 
 2023-11-16T02:50:12.194788+00:00 smithi204 kernel: [ 2066.903155] libceph: osd6 (1)172.21.15.70:6812 socket error on read 
 2023-11-16T02:50:12.218371+00:00 smithi204 kernel: [ 2066.908743] libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents 
 2023-11-16T02:50:12.218406+00:00 smithi204 kernel: [ 2066.917112] libceph: osd6 (1)172.21.15.70:6812 socket error on read 
 2023-11-16T02:50:12.218407+00:00 smithi204 kernel: [ 2066.918810] libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents 
 2023-11-16T02:50:12.219434+00:00 smithi204 kernel: [ 2066.926987] libceph: osd6 (1)172.21.15.70:6812 socket error on read 
 ... 
 2023-11-16T14:13:46.983459+00:00 smithi204 kernel: [43082.250808] libceph: osd6 (1)172.21.15.70:6812 socket error on read 
 2023-11-16T14:13:47.002106+00:00 smithi204 kernel: [43082.252340] libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents 
 2023-11-16T14:13:47.002112+00:00 smithi204 kernel: [43082.260503] libceph: osd6 (1)172.21.15.70:6812 socket error on read 
 </pre> 

 The Rados just returned a very large extents *2208169984* and *kernel libceph* couldn't allocate enough memory for it and reset the connection and keep retrying the osd requests. 

 While in the Rados side, the *osd.6*, the read was succeeful: 

 <pre> 
 2023-11-16T02:50:12.200+0000 7fec2cc41640 15 osd.6 pg_epoch: 33 pg[5.11( v 33'9174 (0'0,33'9174] local-lis/les=31/32 n=4 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,4] r=0 lpr=31 crt=33'9174 lcod 33'9173 mlcod 33'9173 active+clean] log_op_stats osd_op(client.5322.0:160938 5.11 5:8b7d215f:::1000000985f.00000000:head [sparse-read 3756032~65536 out=65560b] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected e33) v8 inb 0 outb 65560 lat 0.000235 
 2023-11-16T02:50:12.200+0000 7fec28438640 10 osd.6 pg_epoch: 33 pg[5.2( v 33'5683 (0'0,33'5683] local-lis/les=31/32 n=1 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,0] r=0 lpr=31 crt=33'5683 lcod 33'5682 mlcod 33'5682 active+clean] do_op osd_op(client.5322.0:160962 5.2 5:411858ae:::10000009858.00000001:head [sparse-read 1630208~65536 [1@-1]] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected e33) v8 may_read -> read-ordered flags ondisk+retry+read+known_if_redirected 
 2023-11-16T02:50:12.200+0000 7fec2cc41640 20 osd.6 pg_epoch: 33 pg[5.11( v 33'9174 (0'0,33'9174] local-lis/les=31/32 n=4 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,4] r=0 lpr=31 crt=33'9174 lcod 33'9173 mlcod 33'9173 active+clean] prepare_stats_for_publish reporting purged_snaps [] 
 2023-11-16T02:50:12.200+0000 7fec2cc41640 15 osd.6 pg_epoch: 33 pg[5.11( v 33'9174 (0'0,33'9174] local-lis/les=31/32 n=4 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,4] r=0 lpr=31 crt=33'9174 lcod 33'9173 mlcod 33'9173 active+clean] publish_stats_to_osd 33:9995 
 2023-11-16T02:50:12.200+0000 7fec28438640 10 osd.6 pg_epoch: 33 pg[5.2( v 33'5683 (0'0,33'5683] local-lis/les=31/32 n=1 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,0] r=0 lpr=31 crt=33'5683 lcod 33'5682 mlcod 33'5682 active+clean] get_object_context: found obc in cache: obc(5:411858ae:::10000009858.00000001:head rwstate(none n=0 w=0)) 
 2023-11-16T02:50:12.200+0000 7fec2cc41640    1 -- [v2:172.21.15.70:6809/38829,v1:172.21.15.70:6812/38829] --> v1:172.21.15.204:0/484060394 -- osd_op_reply(160938 1000000985f.00000000 [sparse-read 3756032~65536 out=65560b] v0'0 uv9173 ondisk = 0) v8 -- 0x55b9f7dd4b40 con 0x55ba006e0400 
 </pre>

Back