Project

General

Profile

Actions

Bug #63586

open

libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents

Added by Xiubo Li 6 months ago. Updated 5 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/yuriw-2023-11-15_23:07:34-fs-wip-yuri-testing-2023-11-14-0743-reef-distro-default-smithi/7459949/

From remote/smithi204/syslog/kern.log.gz:

2023-11-16T02:47:20.751447+00:00 smithi204 kernel: [ 1895.456434] libceph: client5322 fsid 1325a234-ab9e-4632-9a6e-98afbef8ae76
2023-11-16T02:47:20.751448+00:00 smithi204 kernel: [ 1895.456483] ceph: test_dummy_encryption mode enabled
2023-11-16T02:50:12.194680+00:00 smithi204 kernel: [ 2066.894411] ------------[ cut here ]------------
2023-11-16T02:50:12.194695+00:00 smithi204 kernel: [ 2066.894484] WARNING: CPU: 7 PID: 112988 at mm/page_alloc.c:4402 __alloc_pages+0x1e7/0x270
2023-11-16T02:50:12.194696+00:00 smithi204 kernel: [ 2066.894518] Modules linked in: ceph libceph fscache netfs veth xfs xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm ib_core overlay ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp mei_me coretemp crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel mei crypto_simd cryptd joydev wmi ioatdma acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad kvm_intel kvm irqbypass sch_fq_codel scsi_transport_iscsi lp parport ramoops reed_solomon nfsd efi_pstore configfs auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 hid_generic usbhid hid igb ixgbe i2c_algo_bit dca crc32_pclmul nvme i2c_i801 ahci ptp i2c_smbus lpc_ich libahci pps_core nvme_core mdio
2023-11-16T02:50:12.194699+00:00 smithi204 kernel: [ 2066.894671] CPU: 7 PID: 112988 Comm: kworker/7:0 Not tainted 6.6.0-rc5-g3b796cfe2969 #1
2023-11-16T02:50:12.194699+00:00 smithi204 kernel: [ 2066.894674] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0a 08/02/2016
2023-11-16T02:50:12.194730+00:00 smithi204 kernel: [ 2066.894676] Workqueue: ceph-msgr ceph_con_workfn [libceph]
2023-11-16T02:50:12.194732+00:00 smithi204 kernel: [ 2066.894701] RIP: 0010:__alloc_pages+0x1e7/0x270
2023-11-16T02:50:12.194732+00:00 smithi204 kernel: [ 2066.894704] Code: ff ff 00 0f 84 19 ff ff ff 80 ce 01 e9 11 ff ff ff 83 fe 0a 0f 86 9d fe ff ff 80 3d 80 f4 53 01 00 75 09 c6 05 77 f4 53 01 01 <0f> 0b 45 31 ed e9 57 ff ff ff e8 9a f3 e0 ff 84 c0 0f 85 4a ff ff
2023-11-16T02:50:12.194733+00:00 smithi204 kernel: [ 2066.894706] RSP: 0018:ffffbcae88d5bc58 EFLAGS: 00010246
2023-11-16T02:50:12.194734+00:00 smithi204 kernel: [ 2066.894709] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
2023-11-16T02:50:12.194734+00:00 smithi204 kernel: [ 2066.894711] RDX: 0000000000000000 RSI: 0000000000000018 RDI: 0000000000000000
2023-11-16T02:50:12.194742+00:00 smithi204 kernel: [ 2066.894713] RBP: 0000000000040c00 R08: ffff9ab023c70f00 R09: 0000000000000000
2023-11-16T02:50:12.194742+00:00 smithi204 kernel: [ 2066.894715] R10: ffff9aaf00ebf030 R11: ffff9ab023c6ee00 R12: 0000000000000018
2023-11-16T02:50:12.194743+00:00 smithi204 kernel: [ 2066.894717] R13: 0000000000000018 R14: ffffffffc1235c85 R15: ffff9aaf00ebf030
2023-11-16T02:50:12.194744+00:00 smithi204 kernel: [ 2066.894718] FS:  0000000000000000(0000) GS:ffff9ab65fdc0000(0000) knlGS:0000000000000000
2023-11-16T02:50:12.194744+00:00 smithi204 kernel: [ 2066.894720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-11-16T02:50:12.194745+00:00 smithi204 kernel: [ 2066.894722] CR2: 00007f8a8a03c180 CR3: 000000005a63e002 CR4: 00000000003706e0
2023-11-16T02:50:12.194750+00:00 smithi204 kernel: [ 2066.894724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2023-11-16T02:50:12.194750+00:00 smithi204 kernel: [ 2066.894726] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2023-11-16T02:50:12.194751+00:00 smithi204 kernel: [ 2066.894728] Call Trace:
2023-11-16T02:50:12.194751+00:00 smithi204 kernel: [ 2066.894730]  <TASK>
2023-11-16T02:50:12.194751+00:00 smithi204 kernel: [ 2066.894732]  ? __warn+0x84/0x170
2023-11-16T02:50:12.194752+00:00 smithi204 kernel: [ 2066.894736]  ? __alloc_pages+0x1e7/0x270
2023-11-16T02:50:12.194752+00:00 smithi204 kernel: [ 2066.894740]  ? report_bug+0x191/0x1a0
2023-11-16T02:50:12.194756+00:00 smithi204 kernel: [ 2066.894747]  ? handle_bug+0x42/0x70
2023-11-16T02:50:12.194756+00:00 smithi204 kernel: [ 2066.894751]  ? exc_invalid_op+0x18/0x70
2023-11-16T02:50:12.194757+00:00 smithi204 kernel: [ 2066.894754]  ? asm_exc_invalid_op+0x1a/0x20
2023-11-16T02:50:12.194757+00:00 smithi204 kernel: [ 2066.894760]  ? osd_sparse_read+0x115/0x670 [libceph]
2023-11-16T02:50:12.194757+00:00 smithi204 kernel: [ 2066.894791]  ? __alloc_pages+0x1e7/0x270
2023-11-16T02:50:12.194758+00:00 smithi204 kernel: [ 2066.894798]  __kmalloc_large_node+0x88/0x140
2023-11-16T02:50:12.194762+00:00 smithi204 kernel: [ 2066.894804]  __kmalloc+0xdf/0x150
2023-11-16T02:50:12.194762+00:00 smithi204 kernel: [ 2066.894819]  osd_sparse_read+0x115/0x670 [libceph]
2023-11-16T02:50:12.194762+00:00 smithi204 kernel: [ 2066.894834]  ? read_partial_message_chunk+0x8b/0x90 [libceph]
2023-11-16T02:50:12.194763+00:00 smithi204 kernel: [ 2066.894850]  ceph_con_v1_try_read+0x17cc/0x1e00 [libceph]
2023-11-16T02:50:12.194763+00:00 smithi204 kernel: [ 2066.894881]  ? ceph_con_workfn+0x2e/0x730 [libceph]
2023-11-16T02:50:12.194763+00:00 smithi204 kernel: [ 2066.894894]  ? lock_acquire+0xcc/0x2a0
2023-11-16T02:50:12.194764+00:00 smithi204 kernel: [ 2066.894911]  ? process_one_work+0x176/0x4c0
2023-11-16T02:50:12.194768+00:00 smithi204 kernel: [ 2066.894915]  ceph_con_workfn+0x383/0x730 [libceph]
2023-11-16T02:50:12.194768+00:00 smithi204 kernel: [ 2066.894929]  ? process_one_work+0x176/0x4c0
2023-11-16T02:50:12.194768+00:00 smithi204 kernel: [ 2066.894932]  process_one_work+0x1dd/0x4c0
2023-11-16T02:50:12.194769+00:00 smithi204 kernel: [ 2066.894939]  worker_thread+0x1df/0x3e0
2023-11-16T02:50:12.194769+00:00 smithi204 kernel: [ 2066.894943]  ? drain_workqueue+0x120/0x120
2023-11-16T02:50:12.194769+00:00 smithi204 kernel: [ 2066.894947]  kthread+0xf4/0x130
2023-11-16T02:50:12.194774+00:00 smithi204 kernel: [ 2066.894959]  ? kthread_complete_and_exit+0x20/0x20
2023-11-16T02:50:12.194774+00:00 smithi204 kernel: [ 2066.894964]  ret_from_fork+0x31/0x50
2023-11-16T02:50:12.194774+00:00 smithi204 kernel: [ 2066.894967]  ? kthread_complete_and_exit+0x20/0x20
2023-11-16T02:50:12.194775+00:00 smithi204 kernel: [ 2066.894970]  ret_from_fork_asm+0x11/0x20
2023-11-16T02:50:12.194775+00:00 smithi204 kernel: [ 2066.894980]  </TASK>
2023-11-16T02:50:12.194775+00:00 smithi204 kernel: [ 2066.894982] irq event stamp: 4219761
2023-11-16T02:50:12.194779+00:00 smithi204 kernel: [ 2066.894984] hardirqs last  enabled at (4219767): [<ffffffffa213778d>] console_unlock+0x11d/0x150
2023-11-16T02:50:12.194780+00:00 smithi204 kernel: [ 2066.894988] hardirqs last disabled at (4219772): [<ffffffffa2137772>] console_unlock+0x102/0x150
2023-11-16T02:50:12.194780+00:00 smithi204 kernel: [ 2066.894991] softirqs last  enabled at (4219726): [<ffffffffa20a5777>] __irq_exit_rcu+0x77/0xa0
2023-11-16T02:50:12.194780+00:00 smithi204 kernel: [ 2066.894996] softirqs last disabled at (4219709): [<ffffffffa20a5777>] __irq_exit_rcu+0x77/0xa0
2023-11-16T02:50:12.194781+00:00 smithi204 kernel: [ 2066.894998] ---[ end trace 0000000000000000 ]---
2023-11-16T02:50:12.194781+00:00 smithi204 kernel: [ 2066.895001] libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents
2023-11-16T02:50:12.194788+00:00 smithi204 kernel: [ 2066.903155] libceph: osd6 (1)172.21.15.70:6812 socket error on read
2023-11-16T02:50:12.218371+00:00 smithi204 kernel: [ 2066.908743] libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents
2023-11-16T02:50:12.218406+00:00 smithi204 kernel: [ 2066.917112] libceph: osd6 (1)172.21.15.70:6812 socket error on read
2023-11-16T02:50:12.218407+00:00 smithi204 kernel: [ 2066.918810] libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents
2023-11-16T02:50:12.219434+00:00 smithi204 kernel: [ 2066.926987] libceph: osd6 (1)172.21.15.70:6812 socket error on read
...
2023-11-16T14:13:46.983459+00:00 smithi204 kernel: [43082.250808] libceph: osd6 (1)172.21.15.70:6812 socket error on read
2023-11-16T14:13:47.002106+00:00 smithi204 kernel: [43082.252340] libceph: osd_sparse_read: failed to allocate 2208169984 sparse read extents
2023-11-16T14:13:47.002112+00:00 smithi204 kernel: [43082.260503] libceph: osd6 (1)172.21.15.70:6812 socket error on read

The Rados just returned a very large extents 2208169984 and kernel libceph couldn't allocate enough memory for it and reset the connection and keep retrying the osd requests.

While in the Rados side, the osd.6, the read was succeeful:

2023-11-16T02:50:12.200+0000 7fec2cc41640 15 osd.6 pg_epoch: 33 pg[5.11( v 33'9174 (0'0,33'9174] local-lis/les=31/32 n=4 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,4] r=0 lpr=31 crt=33'9174 lcod 33'9173 mlcod 33'9173 active+clean] log_op_stats osd_op(client.5322.0:160938 5.11 5:8b7d215f:::1000000985f.00000000:head [sparse-read 3756032~65536 out=65560b] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected e33) v8 inb 0 outb 65560 lat 0.000235
2023-11-16T02:50:12.200+0000 7fec28438640 10 osd.6 pg_epoch: 33 pg[5.2( v 33'5683 (0'0,33'5683] local-lis/les=31/32 n=1 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,0] r=0 lpr=31 crt=33'5683 lcod 33'5682 mlcod 33'5682 active+clean] do_op osd_op(client.5322.0:160962 5.2 5:411858ae:::10000009858.00000001:head [sparse-read 1630208~65536 [1@-1]] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected e33) v8 may_read -> read-ordered flags ondisk+retry+read+known_if_redirected
2023-11-16T02:50:12.200+0000 7fec2cc41640 20 osd.6 pg_epoch: 33 pg[5.11( v 33'9174 (0'0,33'9174] local-lis/les=31/32 n=4 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,4] r=0 lpr=31 crt=33'9174 lcod 33'9173 mlcod 33'9173 active+clean] prepare_stats_for_publish reporting purged_snaps []
2023-11-16T02:50:12.200+0000 7fec2cc41640 15 osd.6 pg_epoch: 33 pg[5.11( v 33'9174 (0'0,33'9174] local-lis/les=31/32 n=4 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,4] r=0 lpr=31 crt=33'9174 lcod 33'9173 mlcod 33'9173 active+clean] publish_stats_to_osd 33:9995
2023-11-16T02:50:12.200+0000 7fec28438640 10 osd.6 pg_epoch: 33 pg[5.2( v 33'5683 (0'0,33'5683] local-lis/les=31/32 n=1 ec=31/31 lis/c=31/31 les/c/f=32/32/0 sis=31) [6,0] r=0 lpr=31 crt=33'5683 lcod 33'5682 mlcod 33'5682 active+clean] get_object_context: found obc in cache: obc(5:411858ae:::10000009858.00000001:head rwstate(none n=0 w=0))
2023-11-16T02:50:12.200+0000 7fec2cc41640  1 -- [v2:172.21.15.70:6809/38829,v1:172.21.15.70:6812/38829] --> v1:172.21.15.204:0/484060394 -- osd_op_reply(160938 1000000985f.00000000 [sparse-read 3756032~65536 out=65560b] v0'0 uv9173 ondisk = 0) v8 -- 0x55b9f7dd4b40 con 0x55ba006e0400
Actions

Also available in: Atom PDF