Actions
Bug #65563
openWARNING: CPU: 7 PID: 40807 at mm/page_alloc.c:4545 __alloc_pages+0x1e7/0x270
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
[ 2668.140622] WARNING: CPU: 7 PID: 40807 at mm/page_alloc.c:4545 __alloc_pages+0x1e7/0x270 [ 2668.148732] Modules linked in: ceph libceph dns_resolver netfs veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink bridge stp llc xfs libcrc32c in tel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal coretemp kvm_intel sunrpc kvm iTCO_wdt irqbypass ipmi_ssif iTCO_vendor_support mei_me i2c_i801 mei acpi_ipmi i2c_smbus joydev acpi_power_meter lpc_ich ipmi_si mfd_core ioatdma wmi ipmi_devintf ipmi_msghandler acpi_pad ixgbe igb crct10dif_pclmul crc32_pclmul crc32c_intel mdio ghash_clmulni_intel ptp nvme sha512_ssse3 sha256_ssse3 sha1_ssse3 pps_core nvme_core i 2c_algo_bit dca fuse [ 2668.205607] CPU: 7 PID: 40807 Comm: kworker/7:18 Kdump: loaded Not tainted 6.8.0-g1df0c7e8a235 #1 [ 2668.214474] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0a 08/02/2016 [ 2668.221959] Workqueue: ceph-msgr ceph_con_workfn [libceph] [ 2668.227468] RIP: 0010:__alloc_pages+0x1e7/0x270 [ 2668.232000] Code: ff ff 00 0f 84 19 ff ff ff 80 ce 01 e9 11 ff ff ff 83 fe 0a 0f 86 9d fe ff ff 80 3d 3b c5 4c 01 00 75 09 c6 05 32 c5 4c 01 01 <0f> 0b 45 31 ed e9 57 ff ff ff e8 ea 99 df ff 84 c0 0f 85 4a ff ff [ 2668.250746] RSP: 0018:ffffc90006223b08 EFLAGS: 00010246 [ 2668.255971] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 2668.263104] RDX: 0000000000000000 RSI: 000000000000000d RDI: 0000000000000000 [ 2668.270247] RBP: 0000000000040dc0 R08: 0000000000000008 R09: 0000000000000000 [ 2668.277380] R10: ffffc90006223c08 R11: 0000000000000000 R12: 000000000000000d [ 2668.284512] R13: 000000000000000d R14: 0000000000000dc0 R15: ffffffffa0b9df60 [ 2668.291652] FS: 0000000000000000(0000) GS:ffff88885fdc0000(0000) knlGS:0000000000000000 [ 2668.299741] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2668.305485] CR2: 000056049f303768 CR3: 000000000263e006 CR4: 00000000003706f0 [ 2668.312618] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2668.319752] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2668.326884] Call Trace: [ 2668.329337] <TASK> [ 2668.331445] ? __warn+0x84/0x170 [ 2668.334679] ? __alloc_pages+0x1e7/0x270 [ 2668.338606] ? report_bug+0x191/0x1a0 [ 2668.342282] ? handle_bug+0x42/0x70 [ 2668.345778] ? exc_invalid_op+0x18/0x70 [ 2668.349617] ? asm_exc_invalid_op+0x1a/0x20 [ 2668.353804] ? mds_dispatch+0x1f70/0x26d0 [ceph] [ 2668.358444] ? __alloc_pages+0x1e7/0x270 [ 2668.362371] __kmalloc_large_node+0x85/0x120 [ 2668.366641] __kmalloc+0x328/0x440 [ 2668.370050] ? mds_dispatch+0x1f70/0x26d0 [ceph] [ 2668.374675] mds_dispatch+0x1f70/0x26d0 [ceph] [ 2668.379129] ? sock_rps_record_flow+0x110/0x190 [ 2668.383663] ? __mutex_unlock_slowpath+0x45/0x280 [ 2668.388374] ceph_con_process_message+0x78/0x160 [libceph] [ 2668.393874] process_message+0xe/0x110 [libceph] [ 2668.398501] ceph_con_v2_try_read+0xa46/0x1c60 [libceph] [ 2668.403820] ? __mutex_lock+0x97/0xcf0 [ 2668.407573] ? __mutex_lock+0xaab/0xcf0 [ 2668.411414] ? ceph_con_workfn+0x2e/0x730 [libceph]
Updated by Xiubo Li 14 days ago ยท Edited
- Status changed from New to In Progress
The mds sent out the open session reply with cap_auths [MDSCapAuth( uid=1000 gids=1301readable=1, writeable=1),MDSCapAuth( fsname=cephfsreadable=1, writeable=1)]:
2024-04-06T00:04:22.467+0000 7fd3ac000640 1 --2- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] >> conn(0x559182a9dc00 0x559181aeab00 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).accept 2024-04-06T00:04:22.467+0000 7fd3ab600640 1 --2- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] >> conn(0x559182a9dc00 0x559181aeab00 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=1 required=1 2024-04-06T00:04:22.468+0000 7fd3ab600640 10 mds.d parse_caps: parsing auth_cap_str='allow rw uid=1000 gids=1301, allow rw fsname=cephfs' 2024-04-06T00:04:22.468+0000 7fd3ab600640 1 --2- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] >> 192.168.0.1:0/171529858 conn(0x559182a9dc00 0x559181aeab00 crc :-1 s=READY pgs=2 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).ready entity=client.23076 client_cookie=6c92e8679d0dafe9 server_cookie=7f9c71c5f1730060 in_seq=0 out_seq=0 2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.d new session 0x559182a7e800 for client.23076 192.168.0.1:0/171529858 con 0x559182a9dc00 2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.d parse_caps: parsing auth_cap_str='allow rw uid=1000 gids=1301, allow rw fsname=cephfs' 2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.d ms_handle_accept 192.168.0.1:0/171529858 con 0x559182a9dc00 session 0x559182a7e800 2024-04-06T00:04:22.468+0000 7fd3a7a00640 1 -- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] <== client.23076 192.168.0.1:0/171529858 1 ==== client_session(request_open) ==== 182+0+0 (crc 0 0 0) 0x559182aa4240 con 0x559182a9dc00 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.477 get_session have 0x559182a7e800 client.23076 192.168.0.1:0/171529858 state closed 2024-04-06T00:04:22.468+0000 7fd3a7a00640 3 mds.0.server handle_client_session client_session(request_open) from client.23076 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server handle_client_session CEPH_SESSION_REQUEST_OPEN metadata entries: 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server features: '0x000000000037fbff' 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server metric specification: [{metric_flags: '0x000000000080ffff'}] 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server entity_id: testuser 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server hostname: smithi163 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server kernel_version: 6.8.0-g1df0c7e8a235 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server root: / 2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.0.sessionmap add_session s=0x559182a7e800 name=client.23076 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.sessionmap mark_projected s=0x559182a7e800 name=client.23076 pv=5 -> 6 2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.0.sessionmap touch_session s=0x559182a7e800 name=client.23076 2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.log _submit_entry ESession client.23076 192.168.0.1:0/171529858 open cmapv 6 2024-04-06T00:04:22.468+0000 7fd39ee00640 5 mds.0.log _submit_thread 4206457~241 : ESession client.23076 192.168.0.1:0/171529858 open cmapv 6 2024-04-06T00:04:22.468+0000 7fd39ee00640 1 -- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] --> [v2:172.21.15.169:6808/1865752847,v1:172.21.15.169:6809/1865752847] -- osd_op(unknown.0.477:551 101.14 101:2e2fa760:::200.00000001:head [write 12153~261 [fadvise_dontneed] in=261b] snapc 0=[] ondisk+write+known_if_redirected+full_force+supports_pool_eio e563) -- 0x559182a9d000 con 0x559182891000 2024-04-06T00:04:22.472+0000 7fd3aac00640 1 -- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] <== osd.7 v2:172.21.15.169:6808/1865752847 86 ==== osd_op_reply(551 200.00000001 [write 12153~261 [fadvise_dontneed]] v563'16 uv16 ondisk = 0) ==== 156+0+0 (crc 0 0 0) 0x559182a3cf00 con 0x559182891000 2024-04-06T00:04:22.472+0000 7fd39f800640 10 MDSIOContextBase::complete: 20C_MDS_session_finish 2024-04-06T00:04:22.472+0000 7fd39f800640 10 MDSContext::complete: 20C_MDS_session_finish 2024-04-06T00:04:22.472+0000 7fd39f800640 10 mds.0.server _session_logged client.23076 192.168.0.1:0/171529858 state_seq 1 open 6 inos_to_free [] inotablev 0 inos_to_purge [] 2024-04-06T00:04:22.472+0000 7fd39f800640 20 mds.0.sessionmap mark_dirty s=0x559182a7e800 name=client.23076 v=5 2024-04-06T00:04:22.472+0000 7fd39f800640 10 mds.0.sessionmap touch_session s=0x559182a7e800 name=client.23076 2024-04-06T00:04:22.472+0000 7fd39f800640 10 add_session: mds.metrics: session=0x559182a7e800, client=client.23076 192.168.0.1:0/171529858 2024-04-06T00:04:22.472+0000 7fd39f800640 20 add_session: mds.metrics: metrics=[update_type=0, metrics={cap_hit_metric={hits=0, misses=0}, read_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, write_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, metadata_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, dentry_lease={hits=0, misses=0}, opened_files_metric={opened_files=0, total_inodes=0}, pinned_icaps_metric={pinned_icaps=0, total_inodes=0}, opened_inodes_metric={opened_inodes=0, total_inodes=0}, read_io_sizes_metric={total_ops=0, total_size=0}, write_io_sizes_metric={total_ops=0, total_size=0}}] 2024-04-06T00:04:22.472+0000 7fd39f800640 10 mds.0.477 send_message_client client.23076 192.168.0.1:0/171529858 client_session(open cap_auths [MDSCapAuth( uid=1000 gids=1301readable=1, writeable=1),MDSCapAuth( fsname=cephfsreadable=1, writeable=1)]) 2024-04-06T00:04:22.472+0000 7fd39f800640 1 -- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] --> 192.168.0.1:0/171529858 -- client_session(open cap_auths [MDSCapAuth( uid=1000 gids=1301readable=1, writeable=1),MDSCapAuth( fsname=cephfsreadable=1, writeable=1)]) -- 0x559182aa58c0 con 0x559182a9dc00 2024-04-06T00:04:22.472+0000 7fd39f800640 2 mds.0.server New client session: addr="192.168.0.1:0/171529858",elapsed=0.004016,throttled=0.000012,status="ACCEPTED",root="/"
While the kclient just parsed the very long fs_name and then failed to allocate memories for it:
[ 2668.515018] ceph: [21592a5b-a47c-40ae-9e87-d12e371aa285 23076]: No memory for fs_name^M [ 2684.672068] ceph: [21592a5b-a47c-40ae-9e87-d12e371aa285 23076]: mds0 caps renewed^M
Updated by Xiubo Li 13 days ago
I have fix the kernel call trace in kernel space, the patch like is https://patchwork.kernel.org/project/ceph-devel/list/?series=845831&archive=both.
Actions