Project

General

Profile

Actions

Bug #65563

open

WARNING: CPU: 7 PID: 40807 at mm/page_alloc.c:4545 __alloc_pages+0x1e7/0x270

Added by Venky Shankar 13 days ago. Updated 12 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/yuriw-2024-04-05_22:36:11-fs-wip-yuri7-testing-2024-04-04-0800-distro-default-smithi/7642062/

[ 2668.140622] WARNING: CPU: 7 PID: 40807 at mm/page_alloc.c:4545 __alloc_pages+0x1e7/0x270
[ 2668.148732] Modules linked in: ceph libceph dns_resolver netfs veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink bridge stp llc xfs libcrc32c in
tel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal coretemp kvm_intel sunrpc kvm iTCO_wdt irqbypass ipmi_ssif iTCO_vendor_support mei_me i2c_i801 mei acpi_ipmi i2c_smbus joydev acpi_power_meter lpc_ich
ipmi_si mfd_core ioatdma wmi ipmi_devintf ipmi_msghandler acpi_pad ixgbe igb crct10dif_pclmul crc32_pclmul crc32c_intel mdio ghash_clmulni_intel ptp nvme sha512_ssse3 sha256_ssse3 sha1_ssse3 pps_core nvme_core i
2c_algo_bit dca fuse
[ 2668.205607] CPU: 7 PID: 40807 Comm: kworker/7:18 Kdump: loaded Not tainted 6.8.0-g1df0c7e8a235 #1
[ 2668.214474] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0a 08/02/2016
[ 2668.221959] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[ 2668.227468] RIP: 0010:__alloc_pages+0x1e7/0x270
[ 2668.232000] Code: ff ff 00 0f 84 19 ff ff ff 80 ce 01 e9 11 ff ff ff 83 fe 0a 0f 86 9d fe ff ff 80 3d 3b c5 4c 01 00 75 09 c6 05 32 c5 4c 01 01 <0f> 0b 45 31 ed e9 57 ff ff ff e8 ea 99 df ff 84 c0 0f 85 4a ff
 ff
[ 2668.250746] RSP: 0018:ffffc90006223b08 EFLAGS: 00010246
[ 2668.255971] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2668.263104] RDX: 0000000000000000 RSI: 000000000000000d RDI: 0000000000000000
[ 2668.270247] RBP: 0000000000040dc0 R08: 0000000000000008 R09: 0000000000000000
[ 2668.277380] R10: ffffc90006223c08 R11: 0000000000000000 R12: 000000000000000d
[ 2668.284512] R13: 000000000000000d R14: 0000000000000dc0 R15: ffffffffa0b9df60
[ 2668.291652] FS:  0000000000000000(0000) GS:ffff88885fdc0000(0000) knlGS:0000000000000000
[ 2668.299741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2668.305485] CR2: 000056049f303768 CR3: 000000000263e006 CR4: 00000000003706f0
[ 2668.312618] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2668.319752] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2668.326884] Call Trace:
[ 2668.329337]  <TASK>
[ 2668.331445]  ? __warn+0x84/0x170
[ 2668.334679]  ? __alloc_pages+0x1e7/0x270
[ 2668.338606]  ? report_bug+0x191/0x1a0
[ 2668.342282]  ? handle_bug+0x42/0x70
[ 2668.345778]  ? exc_invalid_op+0x18/0x70
[ 2668.349617]  ? asm_exc_invalid_op+0x1a/0x20
[ 2668.353804]  ? mds_dispatch+0x1f70/0x26d0 [ceph]
[ 2668.358444]  ? __alloc_pages+0x1e7/0x270
[ 2668.362371]  __kmalloc_large_node+0x85/0x120
[ 2668.366641]  __kmalloc+0x328/0x440
[ 2668.370050]  ? mds_dispatch+0x1f70/0x26d0 [ceph]
[ 2668.374675]  mds_dispatch+0x1f70/0x26d0 [ceph]
[ 2668.379129]  ? sock_rps_record_flow+0x110/0x190
[ 2668.383663]  ? __mutex_unlock_slowpath+0x45/0x280
[ 2668.388374]  ceph_con_process_message+0x78/0x160 [libceph]
[ 2668.393874]  process_message+0xe/0x110 [libceph]
[ 2668.398501]  ceph_con_v2_try_read+0xa46/0x1c60 [libceph]
[ 2668.403820]  ? __mutex_lock+0x97/0xcf0
[ 2668.407573]  ? __mutex_lock+0xaab/0xcf0
[ 2668.411414]  ? ceph_con_workfn+0x2e/0x730 [libceph]
Actions #1

Updated by Xiubo Li 13 days ago ยท Edited

  • Status changed from New to In Progress

The mds sent out the open session reply with cap_auths [MDSCapAuth( uid=1000 gids=1301readable=1, writeable=1),MDSCapAuth( fsname=cephfsreadable=1, writeable=1)]:

2024-04-06T00:04:22.467+0000 7fd3ac000640  1 --2- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] >>  conn(0x559182a9dc00 0x559181aeab00 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).accept
2024-04-06T00:04:22.467+0000 7fd3ab600640  1 --2- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] >>  conn(0x559182a9dc00 0x559181aeab00 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=1 required=1
2024-04-06T00:04:22.468+0000 7fd3ab600640 10 mds.d parse_caps: parsing auth_cap_str='allow rw uid=1000 gids=1301, allow rw fsname=cephfs'
2024-04-06T00:04:22.468+0000 7fd3ab600640  1 --2- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] >> 192.168.0.1:0/171529858 conn(0x559182a9dc00 0x559181aeab00 crc :-1 s=READY pgs=2 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).ready entity=client.23076 client_cookie=6c92e8679d0dafe9 server_cookie=7f9c71c5f1730060 in_seq=0 out_seq=0
2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.d  new session 0x559182a7e800 for client.23076 192.168.0.1:0/171529858 con 0x559182a9dc00
2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.d parse_caps: parsing auth_cap_str='allow rw uid=1000 gids=1301, allow rw fsname=cephfs'
2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.d ms_handle_accept 192.168.0.1:0/171529858 con 0x559182a9dc00 session 0x559182a7e800
2024-04-06T00:04:22.468+0000 7fd3a7a00640  1 -- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] <== client.23076 192.168.0.1:0/171529858 1 ==== client_session(request_open) ==== 182+0+0 (crc 0 0 0) 0x559182aa4240 con 0x559182a9dc00
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.477 get_session have 0x559182a7e800 client.23076 192.168.0.1:0/171529858 state closed
2024-04-06T00:04:22.468+0000 7fd3a7a00640  3 mds.0.server handle_client_session client_session(request_open) from client.23076
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server handle_client_session CEPH_SESSION_REQUEST_OPEN metadata entries:
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server  features: '0x000000000037fbff'
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server  metric specification: [{metric_flags: '0x000000000080ffff'}]
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server   entity_id: testuser
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server   hostname: smithi163
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server   kernel_version: 6.8.0-g1df0c7e8a235
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.server   root: /
2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.0.sessionmap add_session s=0x559182a7e800 name=client.23076
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.sessionmap mark_projected s=0x559182a7e800 name=client.23076 pv=5 -> 6
2024-04-06T00:04:22.468+0000 7fd3a7a00640 10 mds.0.sessionmap touch_session s=0x559182a7e800 name=client.23076
2024-04-06T00:04:22.468+0000 7fd3a7a00640 20 mds.0.log _submit_entry ESession client.23076 192.168.0.1:0/171529858 open cmapv 6
2024-04-06T00:04:22.468+0000 7fd39ee00640  5 mds.0.log _submit_thread 4206457~241 : ESession client.23076 192.168.0.1:0/171529858 open cmapv 6
2024-04-06T00:04:22.468+0000 7fd39ee00640  1 -- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] --> [v2:172.21.15.169:6808/1865752847,v1:172.21.15.169:6809/1865752847] -- osd_op(unknown.0.477:551 101.14 101:2e2fa760:::200.00000001:head [write 12153~261 [fadvise_dontneed] in=261b] snapc 0=[] ondisk+write+known_if_redirected+full_force+supports_pool_eio e563) -- 0x559182a9d000 con 0x559182891000
2024-04-06T00:04:22.472+0000 7fd3aac00640  1 -- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] <== osd.7 v2:172.21.15.169:6808/1865752847 86 ==== osd_op_reply(551 200.00000001 [write 12153~261 [fadvise_dontneed]] v563'16 uv16 ondisk = 0) ==== 156+0+0 (crc 0 0 0) 0x559182a3cf00 con 0x559182891000
2024-04-06T00:04:22.472+0000 7fd39f800640 10 MDSIOContextBase::complete: 20C_MDS_session_finish
2024-04-06T00:04:22.472+0000 7fd39f800640 10 MDSContext::complete: 20C_MDS_session_finish
2024-04-06T00:04:22.472+0000 7fd39f800640 10 mds.0.server _session_logged client.23076 192.168.0.1:0/171529858 state_seq 1 open 6 inos_to_free [] inotablev 0 inos_to_purge []
2024-04-06T00:04:22.472+0000 7fd39f800640 20 mds.0.sessionmap mark_dirty s=0x559182a7e800 name=client.23076 v=5
2024-04-06T00:04:22.472+0000 7fd39f800640 10 mds.0.sessionmap touch_session s=0x559182a7e800 name=client.23076
2024-04-06T00:04:22.472+0000 7fd39f800640 10 add_session: mds.metrics: session=0x559182a7e800, client=client.23076 192.168.0.1:0/171529858
2024-04-06T00:04:22.472+0000 7fd39f800640 20 add_session: mds.metrics: metrics=[update_type=0, metrics={cap_hit_metric={hits=0, misses=0}, read_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, write_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, metadata_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, dentry_lease={hits=0, misses=0}, opened_files_metric={opened_files=0, total_inodes=0}, pinned_icaps_metric={pinned_icaps=0, total_inodes=0}, opened_inodes_metric={opened_inodes=0, total_inodes=0}, read_io_sizes_metric={total_ops=0, total_size=0}, write_io_sizes_metric={total_ops=0, total_size=0}}]
2024-04-06T00:04:22.472+0000 7fd39f800640 10 mds.0.477 send_message_client client.23076 192.168.0.1:0/171529858 client_session(open cap_auths [MDSCapAuth( uid=1000 gids=1301readable=1, writeable=1),MDSCapAuth( fsname=cephfsreadable=1, writeable=1)])
2024-04-06T00:04:22.472+0000 7fd39f800640  1 -- [v2:172.21.15.169:6832/2790854057,v1:172.21.15.169:6834/2790854057] --> 192.168.0.1:0/171529858 -- client_session(open cap_auths [MDSCapAuth( uid=1000 gids=1301readable=1, writeable=1),MDSCapAuth( fsname=cephfsreadable=1, writeable=1)]) -- 0x559182aa58c0 con 0x559182a9dc00
2024-04-06T00:04:22.472+0000 7fd39f800640  2 mds.0.server New client session: addr="192.168.0.1:0/171529858",elapsed=0.004016,throttled=0.000012,status="ACCEPTED",root="/" 

While the kclient just parsed the very long fs_name and then failed to allocate memories for it:

[ 2668.515018] ceph: [21592a5b-a47c-40ae-9e87-d12e371aa285 23076]: No memory for fs_name^M
[ 2684.672068] ceph: [21592a5b-a47c-40ae-9e87-d12e371aa285 23076]: mds0 caps renewed^M
Actions #2

Updated by Xiubo Li 13 days ago

  • Project changed from Linux kernel client to CephFS
  • Category deleted (fs/ceph)
  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 56981
Actions #4

Updated by Xiubo Li 12 days ago

I have fix the kernel call trace in kernel space, the patch like is https://patchwork.kernel.org/project/ceph-devel/list/?series=845831&archive=both.

Actions

Also available in: Atom PDF