Project

General

Profile

Bug #55332

Failure in snaptest-git-ceph.sh (it's an async unlink/create bug)

Added by Venky Shankar almost 2 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/vshankar-2022-04-11_12:24:06-fs-wip-vshankar-testing1-20220411-144044-testing-default-smithi/6786388/

Checking a snapshot (w/ git diff --exit-code fails).

022-04-11T15:29:53.327 INFO:tasks.workunit.client.0.smithi183.stderr:+ for v in $versions
2022-04-11T15:29:53.327 INFO:tasks.workunit.client.0.smithi183.stderr:+ ver=v0.1
2022-04-11T15:29:53.328 INFO:tasks.workunit.client.0.smithi183.stderr:+ echo checking v0.1
2022-04-11T15:29:53.328 INFO:tasks.workunit.client.0.smithi183.stderr:+ cd .snap/v0.1
2022-04-11T15:29:53.328 INFO:tasks.workunit.client.0.smithi183.stderr:+ git diff --exit-code
...
...
...
2022-04-11T15:30:16.495 DEBUG:teuthology.orchestra.run:got remote process result: 1

Similar to https://tracker.ceph.com/issues/17172, however this is w/ kernel client.

mds.e.log.unlink_crash.tar.gz (209 KB) Xiubo Li, 05/11/2022 09:13 AM

mds.a.log.unlink_crash.tar.gz (207 KB) Xiubo Li, 05/11/2022 09:13 AM


Related issues

Related to CephFS - Bug #17172: Failure in snaptest-git-ceph.sh Resolved 08/30/2016
Related to CephFS - Bug #55804: qa failure: pjd link tests failed Duplicate
Related to CephFS - Bug #58041: mds: src/mds/Server.cc: 3231: FAILED ceph_assert(straydn->get_name() == straydname) Duplicate
Duplicated by CephFS - Bug #54462: Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128 Duplicate
Copied to CephFS - Backport #57836: pacific: Failure in snaptest-git-ceph.sh (it's an async unlink/create bug) Resolved
Copied to CephFS - Backport #57837: quincy: Failure in snaptest-git-ceph.sh (it's an async unlink/create bug) Resolved
Copied to CephFS - Bug #62810: Failure in snaptest-git-ceph.sh (it's an async unlink/create bug) -- Need to fix again New

History

#1 Updated by Venky Shankar almost 2 years ago

  • Related to Bug #17172: Failure in snaptest-git-ceph.sh added

#2 Updated by Venky Shankar almost 2 years ago

  • Status changed from New to Triaged
  • Assignee set to Xiubo Li

Xiubo, please take a look.

#3 Updated by Xiubo Li almost 2 years ago

Venky Shankar wrote:

Xiubo, please take a look.

Sure.

#5 Updated by Xiubo Li almost 2 years ago

  • Status changed from Triaged to In Progress

#6 Updated by Xiubo Li almost 2 years ago

From remote/smithi183/syslog/kern.log.gz, the kernel has been crashed due to NULL pointer dereference:

2022-04-11T15:24:00.701508+00:00 smithi183 kernel: Key type dns_resolver registered
2022-04-11T15:24:00.719985+00:00 smithi183 kernel: Key type ceph registered
2022-04-11T15:24:00.725509+00:00 smithi183 kernel: libceph: loaded (mon/osd proto 15/24)
2022-04-11T15:24:00.753541+00:00 smithi183 kernel: ceph: loaded (mds proto 32)
2022-04-11T15:24:00.770543+00:00 smithi183 kernel: libceph: mon2 (2)172.21.15.190:3301 session established
2022-04-11T15:24:00.778531+00:00 smithi183 kernel: libceph: client4646 fsid 1c46d31f-e003-4951-9e08-b26bf9092c68
2022-04-11T15:24:08.421682+00:00 smithi183 kernel: libceph: mon2 (2)172.21.15.190:3301 socket closed (con state OPEN)
2022-04-11T15:24:08.421793+00:00 smithi183 kernel: libceph: mon2 (2)172.21.15.190:3301 session lost, hunting for new mon
2022-04-11T15:24:08.429506+00:00 smithi183 kernel: libceph: mon0 (2)172.21.15.183:3300 session established
2022-04-11T15:25:56.318797+00:00 smithi183 kernel: libceph: mon0 (2)172.21.15.183:3300 socket closed (con state OPEN)
2022-04-11T15:25:56.318984+00:00 smithi183 kernel: libceph: mon0 (2)172.21.15.183:3300 session lost, hunting for new mon
2022-04-11T15:25:56.326532+00:00 smithi183 kernel: libceph: mon2 (2)172.21.15.190:3301 session established
2022-04-11T15:27:19.491580+00:00 smithi183 kernel: libceph: mon2 (2)172.21.15.190:3301 socket closed (con state OPEN)
2022-04-11T15:27:19.491723+00:00 smithi183 kernel: libceph: mon2 (2)172.21.15.190:3301 session lost, hunting for new mon
2022-04-11T15:27:19.499524+00:00 smithi183 kernel: libceph: mon0 (2)172.21.15.183:3300 session established
2022-04-11T15:31:25.607191+00:00 smithi183 kernel: ceph: dump_cap_flushes: still waiting for cap flushes through 45572:
2022-04-11T15:31:25.607350+00:00 smithi183 kernel: ceph: 5000000008b:fffffffffffffffe Fw 23183 0 0
2022-04-11T15:31:25.607406+00:00 smithi183 kernel: ceph: 5000000008a:fffffffffffffffe Fw 23184 0 0
2022-04-11T15:31:25.612849+00:00 smithi183 kernel: ceph: 50000000089:fffffffffffffffe Fw 23185 0 0
2022-04-11T15:31:25.618548+00:00 smithi183 kernel: ceph: 50000000084:fffffffffffffffe Fw 23189 0 0
2022-04-11T15:31:25.624207+00:00 smithi183 kernel: ceph: 5000000007a:fffffffffffffffe Fw 23199 0 0
2022-04-11T15:31:25.629880+00:00 smithi183 kernel: ceph: 50000000094:fffffffffffffffe Fw 23374 0 0
2022-04-11T15:31:25.635911+00:00 smithi183 kernel: ceph: 50000000092:fffffffffffffffe Fw 23377 0 0
2022-04-11T15:31:25.647285+00:00 smithi183 kernel: ceph: 50000000091:fffffffffffffffe Fw 23378 0 0
2022-04-11T15:31:25.647363+00:00 smithi183 kernel: ceph: 5000000008e:fffffffffffffffe Fw 23380 0 0
2022-04-11T15:31:25.652953+00:00 smithi183 kernel: ceph: 50000000087:fffffffffffffffe Fw 23382 0 0
2022-04-11T15:31:25.664305+00:00 smithi183 kernel: ceph: 50000000086:fffffffffffffffe Fw 23383 0 0
2022-04-11T15:31:25.664380+00:00 smithi183 kernel: ceph: 50000000083:fffffffffffffffe Fw 23384 0 0
2022-04-11T15:31:25.669973+00:00 smithi183 kernel: ceph: 50000000082:fffffffffffffffe Fw 23385 0 0
2022-04-11T15:31:25.675641+00:00 smithi183 kernel: ceph: 50000000081:fffffffffffffffe Fw 23386 0 0
2022-04-11T15:31:25.681325+00:00 smithi183 kernel: ceph: 50000000080:fffffffffffffffe Fw 23387 0 0
2022-04-11T15:31:25.686968+00:00 smithi183 kernel: ceph: 5000000007e:fffffffffffffffe Fw 23389 0 0
2022-04-11T15:31:25.692622+00:00 smithi183 kernel: ceph: 5000000007b:fffffffffffffffe Fw 23392 0 0
2022-04-11T15:31:25.698327+00:00 smithi183 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000780
2022-04-11T15:31:26.214294+00:00 smithi183 kernel: #PF: supervisor read access in kernel mode
2022-04-11T15:31:26.214374+00:00 smithi183 kernel: #PF: error_code(0x0000) - not-present page
2022-04-11T15:31:26.214401+00:00 smithi183 kernel: PGD 0 P4D 0
2022-04-11T15:31:26.214426+00:00 smithi183 kernel: Oops: 0000 [#1] PREEMPT SMP PTI
2022-04-11T15:31:26.214449+00:00 smithi183 kernel: CPU: 3 PID: 46268 Comm: umount Tainted: G S                5.18.0-rc2-ceph-g1771083b2f18 #1
2022-04-11T15:31:26.214472+00:00 smithi183 kernel: Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
2022-04-11T15:31:26.214504+00:00 smithi183 kernel: RIP: 0010:ceph_mdsc_sync.cold.64+0x77/0xc3 [ceph]
2022-04-11T15:31:26.214528+00:00 smithi183 kernel: Code: 08 4d 8b 07 4d 39 c5 73 0f 48 8b 7c 24 08 e8 d0 7e 05 e1 e9 7f 90 ff ff 49 8b 47 30 41 0f b6 4f 0c 4c 89 44 24 28 41 8b 7f 08 <4c> 8b 88 80 07 00 00 89 4c 24 24 4c 89 4c 24 18 e8 84 00 fe ff 49
2022-04-11T15:31:26.214553+00:00 smithi183 kernel: RSP: 0018:ffffc90009c4fda8 EFLAGS: 00010212
2022-04-11T15:31:26.214574+00:00 smithi183 kernel: RAX: 0000000000000000 RBX: ffff8881abf63000 RCX: 0000000000000000
2022-04-11T15:31:26.214596+00:00 smithi183 kernel: RDX: 0000000000000000 RSI: ffffffff823932ad RDI: 0000000000000000
2022-04-11T15:31:26.214618+00:00 smithi183 kernel: RBP: ffff8881abf634f0 R08: 0000000000005dc7 R09: c0000000ffffdfff
2022-04-11T15:31:26.214639+00:00 smithi183 kernel: R10: 0000000000000001 R11: ffffc90009c4fbc8 R12: 0000000000000001
2022-04-11T15:31:26.214661+00:00 smithi183 kernel: R13: 000000000000b204 R14: ffffffffa0ab3598 R15: ffff88815d36a110
2022-04-11T15:31:26.214685+00:00 smithi183 kernel: FS:  00007f50eb25e080(0000) GS:ffff88885fcc0000(0000) knlGS:0000000000000000
2022-04-11T15:31:26.214708+00:00 smithi183 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-04-11T15:31:26.214729+00:00 smithi183 kernel: CR2: 0000000000000780 CR3: 0000000116ea2003 CR4: 00000000003706e0
2022-04-11T15:31:26.214751+00:00 smithi183 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2022-04-11T15:31:26.214772+00:00 smithi183 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2022-04-11T15:31:26.214794+00:00 smithi183 kernel: Call Trace:
2022-04-11T15:31:26.214815+00:00 smithi183 kernel: <TASK>
2022-04-11T15:31:26.214837+00:00 smithi183 kernel: ? schedstat_stop+0x10/0x10
2022-04-11T15:31:26.214859+00:00 smithi183 kernel: ceph_sync_fs+0x2c/0x100 [ceph]
2022-04-11T15:31:26.214882+00:00 smithi183 kernel: sync_filesystem+0x6d/0x90
2022-04-11T15:31:26.214906+00:00 smithi183 kernel: generic_shutdown_super+0x22/0x120
2022-04-11T15:31:26.214928+00:00 smithi183 kernel: kill_anon_super+0x14/0x30
2022-04-11T15:31:26.214950+00:00 smithi183 kernel: ceph_kill_sb+0x36/0x90 [ceph]
2022-04-11T15:31:26.214973+00:00 smithi183 kernel: deactivate_locked_super+0x29/0x60
2022-04-11T15:31:26.214995+00:00 smithi183 kernel: cleanup_mnt+0xb8/0x140
2022-04-11T15:31:26.215017+00:00 smithi183 kernel: task_work_run+0x6d/0xb0
2022-04-11T15:31:26.215039+00:00 smithi183 kernel: exit_to_user_mode_prepare+0x226/0x230
2022-04-11T15:31:26.215061+00:00 smithi183 kernel: syscall_exit_to_user_mode+0x25/0x60
2022-04-11T15:31:26.215084+00:00 smithi183 kernel: do_syscall_64+0x40/0x80
2022-04-11T15:31:26.215106+00:00 smithi183 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
2022-04-11T15:31:26.215131+00:00 smithi183 kernel: RIP: 0033:0x7f50e98fcdfb
2022-04-11T15:31:26.215153+00:00 smithi183 kernel: Code: 20 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 20 2c 00 f7 d8 64 89 01 48
2022-04-11T15:31:26.215176+00:00 smithi183 kernel: RSP: 002b:00007ffd3c78f468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
2022-04-11T15:31:26.215198+00:00 smithi183 kernel: RAX: 0000000000000000 RBX: 000055af2de6b5d0 RCX: 00007f50e98fcdfb
2022-04-11T15:31:26.215220+00:00 smithi183 kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055af2de6b7b0
2022-04-11T15:31:26.215241+00:00 smithi183 kernel: RBP: 0000000000000000 R08: 000055af2de6b7e0 R09: 00007f50e9980620
2022-04-11T15:31:26.215263+00:00 smithi183 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055af2de6b7b0
2022-04-11T15:31:26.215286+00:00 smithi183 kernel: R13: 00007f50eae58184 R14: 0000000000000000 R15: 00000000ffffffff
2022-04-11T15:31:26.215308+00:00 smithi183 kernel: </TASK>
2022-04-11T15:31:26.215332+00:00 smithi183 kernel: Modules linked in: ceph libceph dns_resolver fscache netfs veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink bridge stp llc xfs libcrc32c intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 iTCO_wdt irqbypass crct10dif_pclmul crc32_pclmul iTCO_vendor_support mei_me ghash_clmulni_intel ipmi_ssif i2c_smbus mei lpc_ich joydev mfd_core ioatdma wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler sunrpc acpi_pad acpi_power_meter igb ixgbe crc32c_intel mdio nvme ptp i2c_algo_bit pps_core dca nvme_core fuse
2022-04-11T15:31:26.215366+00:00 smithi183 kernel: CR2: 0000000000000780
2022-04-11T15:31:26.215389+00:00 smithi183 kernel: ---[ end trace 0000000000000000 ]---
2022-04-11T15:31:26.215412+00:00 smithi183 kernel: RIP: 0010:ceph_mdsc_sync.cold.64+0x77/0xc3 [ceph]
2022-04-11T15:31:26.215434+00:00 smithi183 kernel: Code: 08 4d 8b 07 4d 39 c5 73 0f 48 8b 7c 24 08 e8 d0 7e 05 e1 e9 7f 90 ff ff 49 8b 47 30 41 0f b6 4f 0c 4c 89 44 24 28 41 8b 7f 08 <4c> 8b 88 80 07 00 00 89 4c 24 24 4c 89 4c 24 18 e8 84 00 fe ff 49
2022-04-11T15:31:26.215456+00:00 smithi183 kernel: RSP: 0018:ffffc90009c4fda8 EFLAGS: 00010212
2022-04-11T15:31:26.215477+00:00 smithi183 kernel: RAX: 0000000000000000 RBX: ffff8881abf63000 RCX: 0000000000000000
2022-04-11T15:31:26.215509+00:00 smithi183 kernel: RDX: 0000000000000000 RSI: ffffffff823932ad RDI: 0000000000000000
2022-04-11T15:31:26.215533+00:00 smithi183 kernel: RBP: ffff8881abf634f0 R08: 0000000000005dc7 R09: c0000000ffffdfff
2022-04-11T15:31:26.215557+00:00 smithi183 kernel: R10: 0000000000000001 R11: ffffc90009c4fbc8 R12: 0000000000000001
2022-04-11T15:31:26.215579+00:00 smithi183 kernel: R13: 000000000000b204 R14: ffffffffa0ab3598 R15: ffff88815d36a110
2022-04-11T15:31:26.215601+00:00 smithi183 kernel: FS:  00007f50eb25e080(0000) GS:ffff88885fcc0000(0000) knlGS:0000000000000000
2022-04-11T15:31:26.215623+00:00 smithi183 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-04-11T15:31:26.215644+00:00 smithi183 kernel: CR2: 0000000000000780 CR3: 0000000116ea2003 CR4: 00000000003706e0
2022-04-11T15:31:26.215666+00:00 smithi183 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2022-04-11T15:31:26.215689+00:00 smithi183 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2022-04-11T15:31:26.215711+00:00 smithi183 kernel: note: umount[46268] exited with preempt_count 1

It seems there has some dirty caps haven't been flushed.

#7 Updated by Xiubo Li almost 2 years ago

There has no debug kernel for kernel-5.18.0_rc2_ceph_g1771083b2f18-1.x86_64.rpm, which could be get from [1], so I just objdump the ceph.ko and get the ceph_mdsc_sync.cold.64 related assemble code:

...
60965 00000000000012b0 <ceph_mdsc_sync.cold.64>:
60966     12b0:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
60967     12b7:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
60968     12be:       e8 00 00 00 00          callq  12c3 <ceph_mdsc_sync.cold.64+0x13>
60969     12c3:       e9 00 00 00 00          jmpq   12c8 <ceph_mdsc_sync.cold.64+0x18>
60970     12c8:       4c 89 ea                mov    %r13,%rdx
60971     12cb:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
60972     12d2:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
60973     12d9:       e8 00 00 00 00          callq  12de <ceph_mdsc_sync.cold.64+0x2e>
60974     12de:       48 8b 7c 24 08          mov    0x8(%rsp),%rdi
60975     12e3:       e8 00 00 00 00          callq  12e8 <ceph_mdsc_sync.cold.64+0x38>
60976     12e8:       48 8b 83 88 04 00 00    mov    0x488(%rbx),%rax
60977     12ef:       4c 8d 78 f0             lea    -0x10(%rax),%r15
60978     12f3:       49 8d 47 10             lea    0x10(%r15),%rax
60979     12f7:       48 39 44 24 10          cmp    %rax,0x10(%rsp)
60980     12fc:       74 08                   je     1306 <ceph_mdsc_sync.cold.64+0x56>
60981     12fe:       4d 8b 07                mov    (%r15),%r8
60982     1301:       4d 39 c5                cmp    %r8,%r13
60983     1304:       73 0f                   jae    1315 <ceph_mdsc_sync.cold.64+0x65>
60984     1306:       48 8b 7c 24 08          mov    0x8(%rsp),%rdi
60985     130b:       e8 00 00 00 00          callq  1310 <ceph_mdsc_sync.cold.64+0x60>
60986     1310:       e9 00 00 00 00          jmpq   1315 <ceph_mdsc_sync.cold.64+0x65>
60987     1315:       49 8b 47 30             mov    0x30(%r15),%rax
60988     1319:       41 0f b6 4f 0c          movzbl 0xc(%r15),%ecx
60989     131e:       4c 89 44 24 28          mov    %r8,0x28(%rsp)
60990     1323:       41 8b 7f 08             mov    0x8(%r15),%edi
60991     1327:       4c 8b 88 80 07 00 00    mov    0x780(%rax),%r9
60992     132e:       89 4c 24 24             mov    %ecx,0x24(%rsp)
60993     1332:       4c 89 4c 24 18          mov    %r9,0x18(%rsp)                                                                                    
60994     1337:       e8 00 00 00 00          callq  133c <ceph_mdsc_sync.cold.64+0x8c>
60995     133c:       49 8b 77 30             mov    0x30(%r15),%rsi
60996     1340:       8b 4c 24 24             mov    0x24(%rsp),%ecx
60997     1344:       4c 89 f7                mov    %r14,%rdi
60998     1347:       48 8b 96 98 04 00 00    mov    0x498(%rsi),%rdx
60999     134e:       48 8b b6 90 04 00 00    mov    0x490(%rsi),%rsi
61000     1355:       51                      push   %rcx
61001     1356:       48 89 c1                mov    %rax,%rcx
61002     1359:       4c 8b 4c 24 20          mov    0x20(%rsp),%r9
61003     135e:       4c 8b 44 24 30          mov    0x30(%rsp),%r8
61004     1363:       e8 00 00 00 00          callq  1368 <ceph_mdsc_sync.cold.64+0xb8>
61005     1368:       4d 8b 7f 10             mov    0x10(%r15),%r15
61006     136c:       5a                      pop    %rdx
61007     136d:       49 83 ef 10             sub    $0x10,%r15
61008     1371:       eb 80                   jmp    12f3 <ceph_mdsc_sync.cold.64+0x43>
...

From the core trace:

2022-04-11T15:31:26.214504+00:00 smithi183 kernel: RIP: 0010:ceph_mdsc_sync.cold.64+0x77/0xc3 [ceph]

It crashed in line 60991:

60991     1327:       4c 8b 88 80 07 00 00    mov    0x780(%rax),%r9

Compared this with the kernel C code, it crashed in:

1994 static void dump_cap_flushes(struct ceph_mds_client *mdsc, u64 want_tid)
1995 {
1996         struct ceph_cap_flush *cf;
1997         
1998         pr_info("%s: still waiting for cap flushes through %llu:\n",
1999                 __func__, want_tid);
2000         spin_lock(&mdsc->cap_dirty_lock);
2001         list_for_each_entry(cf, &mdsc->cap_flush_list, g_list) {
2002                 if (cf->tid > want_tid)
2003                         break;
2004                 pr_info("%llx:%llx %s %llu %llu %d\n",
2005                         ceph_vinop(&cf->ci->vfs_inode),
2006                         ceph_cap_string(cf->caps), cf->tid,
2007                         cf->ci->i_last_cap_flush_ack, cf->wake);
2008         }
2009         spin_unlock(&mdsc->cap_dirty_lock);
2010 }

in Line#2004 above.

[1] https://2.chacra.ceph.com/r/kernel/testing/1771083b2f18110dcc11eeada696858d64b041b9/centos/8/flavors/default/x86_64/

#8 Updated by Xiubo Li almost 2 years ago

#9 Updated by Xiubo Li almost 2 years ago

Venky Shankar wrote:

Another instance: https://pulpito.ceph.com/vshankar-2022-05-02_16:58:59-fs-wip-vshankar-testing1-20220502-201957-testing-default-smithi/6818698/

Checked this failure, from the remote/smithi040/syslog/kern.log.gz:

2022-05-02T17:28:32.749995+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket error on write
2022-05-02T17:28:33.019006+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket error on write
2022-05-02T17:28:33.287010+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket error on write
2022-05-02T17:28:33.573010+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:34.366017+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:34.834007+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:37.116330+00:00 smithi040 kernel: libceph: osd4 (2)172.21.15.53:6816 socket closed (con state OPEN)
2022-05-02T17:28:37.491007+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:40.110003+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:41.281006+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:41.682001+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:42.501016+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:46.639008+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:28:47.699008+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket error on write
2022-05-02T17:28:48.411189+00:00 smithi040 kernel: libceph: osd2 (2)172.21.15.40:6814 socket closed (con state OPEN)
2022-05-02T17:28:50.452009+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket error on write
2022-05-02T17:28:51.006593+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/os/ObjectStore.h result=-17!
2022-05-02T17:28:51.006773+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 00000000c5772917 1099511650591
2022-05-02T17:28:51.006872+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 1000000591f.fffffffffffffffe is shut down
2022-05-02T17:28:51.054824+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/osd/OSDMap.h result=-17!
2022-05-02T17:28:51.054936+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 00000000945389c3 1099511650640
2022-05-02T17:28:51.055005+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005950.fffffffffffffffe is shut down
2022-05-02T17:28:51.064357+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/osd/PG.h result=-17!
2022-05-02T17:28:51.079525+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 000000006dd6ad56 1099511650642
2022-05-02T17:28:51.079622+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005952.fffffffffffffffe is shut down
2022-05-02T17:28:51.088005+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/osd/osd_types.h result=-17!
2022-05-02T17:28:51.104698+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 000000002db759f8 1099511650645
2022-05-02T17:28:51.104819+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005955.fffffffffffffffe is shut down
...
2022-05-02T17:29:03.486011+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket error on write
2022-05-02T17:29:03.766004+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket error on write
2022-05-02T17:29:04.040992+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket error on write
2022-05-02T17:29:04.756003+00:00 smithi040 kernel: libceph: mds0 (2)172.21.15.40:6834 socket closed (con state OPEN)
2022-05-02T17:29:05.035493+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/mon/Paxos.h result=-17!
2022-05-02T17:29:05.035659+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 00000000dff7f5ed 1099511651327
2022-05-02T17:29:05.035736+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/mon/PaxosService.h result=-17!
2022-05-02T17:29:05.045283+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005bff.fffffffffffffffe is shut down
2022-05-02T17:29:05.045423+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 000000008ccc5bbf 1099511651328
2022-05-02T17:29:05.069568+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005c00.fffffffffffffffe is shut down
2022-05-02T17:29:05.069738+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/mon/Session.h result=-17!
2022-05-02T17:29:05.086243+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 000000003a42c6e3 1099511651329
2022-05-02T17:29:05.086419+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/mon/mon_types.h result=-17!
2022-05-02T17:29:05.095761+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005c01.fffffffffffffffe is shut down
2022-05-02T17:29:05.104305+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 00000000f681d78e 1099511651330
2022-05-02T17:29:05.120052+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005c02.fffffffffffffffe is shut down
2022-05-02T17:29:05.120151+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/monmaptool.cc result=-17!
2022-05-02T17:29:05.136561+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 000000002987cb52 1099511651331
2022-05-02T17:29:05.136652+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005c03.fffffffffffffffe is shut down

There have socket connection issue and async create file failure issues.

For the async create file failure issue, for example for:

2022-05-02T17:29:05.120151+00:00 smithi040 kernel: ceph: ceph: async create failure path=(1)client.0/tmp/ceph/src/monmaptool.cc result=-17!
2022-05-02T17:29:05.136561+00:00 smithi040 kernel: ceph:  dropping dirty Fw state for 000000002987cb52 1099511651331
2022-05-02T17:29:05.136652+00:00 smithi040 kernel: ceph: ceph_do_invalidate_pages: inode 10000005c03.fffffffffffffffe is shut down

From the remote/smithi040/log/ceph-mds.c.log.gz, the file monmaptool.cc was created at 22214281 2022-05-02T17:29:04.926+0000:

22214281 2022-05-02T17:29:04.926+0000 7f45e13f6700  1 -- [v2:172.21.15.40:6834/2740274198,v1:172.21.15.40:6835/2740274198] <== client.4678 172.21.15.40:0/1181180694 100974 ==== client_request(client.4678:52091 create #0x10000002d2c/monmaptool.cc 2022-05-02T17:29:03.260624+0000 ASYNC caller_uid=1000, caller_gid=1262{6,36,1000,1262,}) v4 ==== 181+0+62 (unknown 0 0 0) 0x559592fe7b80 con 0x55958fc7ec00
...
22214322 2022-05-02T17:29:04.926+0000 7f45e13f6700 12 mds.0.cache traverse: path seg depth 0 'monmaptool.cc' snapid head
22214323 2022-05-02T17:29:04.926+0000 7f45e13f6700 20 mds.0.cache.dir(0x10000002d2c.0*) lookup (monmaptool.cc, 'head')
22214324 2022-05-02T17:29:04.926+0000 7f45e13f6700 20 mds.0.cache.dir(0x10000002d2c.0*)   hit -> (monmaptool.cc,head)
22214325 2022-05-02T17:29:04.926+0000 7f45e13f6700 10 mds.0.locker acquire_locks request(client.4678:52091 nref=3 cr=0x559592fe7b80)
...
22214341 2022-05-02T17:29:04.926+0000 7f45e13f6700  7 mds.0.server reply_client_request -17 ((17) File exists) client_request(client.4678:52091 create #0x10000002d2c/monmaptool.cc 2022-05-02T17:29:03.260624+0000 ASYNC caller_uid=1000, caller_gid=1262{6,36,1000,1262,}) v4
...

While the file monmaptool.cc was just deleted at 22214186 2022-05-02T17:29:04.926+0000, but delayed for many time and until 22254030 2022-05-02T17:29:05.127+0000 could it acquire the locks successfully and continue the unlink work:

22214186 2022-05-02T17:29:04.926+0000 7f45e13f6700  1 -- [v2:172.21.15.40:6834/2740274198,v1:172.21.15.40:6835/2740274198] <== client.4678 172.21.15.40:0/1181180694 100973 ==== client_request(client.4678:52090 unlink #0x10000002d2c/monmaptool.cc 2022-05-02T17:29:03.260624+0000 ASYNC caller_uid=1000, caller_gid=1262{6,36,1000,1262,}) v4 ==== 225+0+0 (unknown 0 0 0) 0x559592fe6dc0 con 0x55958fc7ec00
...
22254030 2022-05-02T17:29:05.127+0000 7f45dcbed700 12 mds.0.cache traverse: path seg depth 0 'monmaptool.cc' snapid head
22254031 2022-05-02T17:29:05.127+0000 7f45dcbed700 20 mds.0.cache.dir(0x10000002d2c.0*) lookup (monmaptool.cc, 'head')
22254032 2022-05-02T17:29:05.127+0000 7f45dcbed700 20 mds.0.cache.dir(0x10000002d2c.0*)   hit -> (monmaptool.cc,head)
22254033 2022-05-02T17:29:05.127+0000 7f45dcbed700 10 mds.0.locker acquire_locks request(client.4678:52090 nref=3 cr=0x559592fe6dc0)
...
22254107 2022-05-02T17:29:05.127+0000 7f45dcbed700 10 mds.0.server _unlink_local [dentry #0x1/client.0/tmp/ceph/src/monmaptool.cc [20e,head] auth (dn xlock x=1 by 0x5595910c8000) (dversion lock w=1 last_client=4678) v=13777 ap=2 ino=0x1000000571c state=1610612736 | request=1 lock=2 inodepin=1         
dirty=1 authpin=1 0x55958fe2e780]
...
22254308 2022-05-02T17:29:05.127+0000 7f45dcbed700 10 mds.0.server early_reply 0 ((0) Success) client_request(client.4678:52090 unlink #0x10000002d2c/monmaptool.cc 2022-05-02T17:29:03.260624+0000 ASYNC caller_uid=1000, caller_gid=1262{6,36,1000,1262,}) v4
...
22329149 2022-05-02T17:29:05.911+0000 7f45db3ea700 10 mds.0.server _unlink_local_finish [dentry #0x1/client.0/tmp/ceph/src/monmaptool.cc [20e,head] auth (dn xlockdone x=1) (dversion lock w=1 last_client=4678) pv=14735 v=13777 ap=2 ino=0x1000000571c state=1610612736 | request=1 lock=2 inodepin=1 dirty=1 authpin=1 0x55958fe2e780]

That means while the file monmaptool.cc still under unlinking and delayed, the async create reqeust came and failed directly.

From teuthology.log we can see the snaptest failed:

2022-05-02T17:29:13.916 INFO:tasks.workunit.client.0.smithi040.stderr:+ for v in $versions
2022-05-02T17:29:13.917 INFO:tasks.workunit.client.0.smithi040.stderr:+ ver=v0.1
2022-05-02T17:29:13.917 INFO:tasks.workunit.client.0.smithi040.stderr:+ echo checking v0.1
2022-05-02T17:29:13.918 INFO:tasks.workunit.client.0.smithi040.stdout:checking v0.1
2022-05-02T17:29:13.919 INFO:tasks.workunit.client.0.smithi040.stderr:+ cd .snap/v0.1
2022-05-02T17:29:13.919 INFO:tasks.workunit.client.0.smithi040.stderr:+ git diff --exit-code
...
2022-05-02T17:31:14.320 INFO:tasks.workunit.client.0.smithi040.stderr:+ ver=v0.19
2022-05-02T17:31:14.320 INFO:tasks.workunit.client.0.smithi040.stderr:+ echo checking v0.19
2022-05-02T17:31:14.320 INFO:tasks.workunit.client.0.smithi040.stdout:checking v0.19
2022-05-02T17:31:14.321 INFO:tasks.workunit.client.0.smithi040.stderr:+ cd .snap/v0.19
2022-05-02T17:31:14.321 INFO:tasks.workunit.client.0.smithi040.stderr:+ git diff --exit-code
...
2022-05-02T17:31:18.908 INFO:tasks.workunit.client.0.smithi040.stdout:diff --git a/src/os/ObjectStore.h b/src/os/ObjectStore.h
2022-05-02T17:31:18.909 INFO:tasks.workunit.client.0.smithi040.stdout:deleted file mode 100644
2022-05-02T17:31:18.909 INFO:tasks.workunit.client.0.smithi040.stdout:index 527313d631d..00000000000
2022-05-02T17:31:18.909 INFO:tasks.workunit.client.0.smithi040.stdout:--- a/src/os/ObjectStore.h
2022-05-02T17:31:18.909 INFO:tasks.workunit.client.0.smithi040.stdout:+++ /dev/null
...
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:diff --git a/src/osd/OSDMap.h b/src/osd/OSDMap.h
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:deleted file mode 100644
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:index ea388a75d60..00000000000
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:--- a/src/osd/OSDMap.h
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:+++ /dev/null
...
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:diff --git a/src/osd/PG.h b/src/osd/PG.h
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:deleted file mode 100644
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:index d085e449b64..00000000000
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:--- a/src/osd/PG.h
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:+++ /dev/null
...
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:diff --git a/src/osd/osd_types.h b/src/osd/osd_types.h
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:deleted file mode 100644
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:index 40024f2ac0f..00000000000
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:--- a/src/osd/osd_types.h
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:+++ /dev/null
...
2022-05-02T17:31:18.908 INFO:tasks.workunit.client.0.smithi040.stdout:diff --git a/src/os/ObjectStore.h b/src/os/ObjectStore.h
2022-05-02T17:31:18.909 INFO:tasks.workunit.client.0.smithi040.stdout:deleted file mode 100644
2022-05-02T17:31:18.909 INFO:tasks.workunit.client.0.smithi040.stdout:index 527313d631d..00000000000
2022-05-02T17:31:18.909 INFO:tasks.workunit.client.0.smithi040.stdout:--- a/src/os/ObjectStore.h
2022-05-02T17:31:18.909 INFO:tasks.workunit.client.0.smithi040.stdout:+++ /dev/null
...
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:diff --git a/src/osd/OSDMap.h b/src/osd/OSDMap.h
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:deleted file mode 100644
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:index ea388a75d60..00000000000
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:--- a/src/osd/OSDMap.h
2022-05-02T17:31:18.965 INFO:tasks.workunit.client.0.smithi040.stdout:+++ /dev/null
...
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:diff --git a/src/osd/PG.h b/src/osd/PG.h
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:deleted file mode 100644
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:index d085e449b64..00000000000
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:--- a/src/osd/PG.h
2022-05-02T17:31:19.062 INFO:tasks.workunit.client.0.smithi040.stdout:+++ /dev/null
...
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:diff --git a/src/osd/osd_types.h b/src/osd/osd_types.h
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:deleted file mode 100644
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:index 40024f2ac0f..00000000000
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:--- a/src/osd/osd_types.h
2022-05-02T17:31:19.224 INFO:tasks.workunit.client.0.smithi040.stdout:+++ /dev/null
...

#10 Updated by Venky Shankar almost 2 years ago

This happens at least once when running fs suite. Latest - https://pulpito.ceph.com/vshankar-2022-05-09_10:08:21-fs-wip-vshankar-testing-20220509-125847-testing-default-smithi/6828291/ (seems to be the same issue).

Xiubo, could early_reply be the issue here (just guessing)? Server::_unlink_local_finish() unlinks the main dentry and responds back to the client. However, this is called after the unlink event is journaled, but the operation has been early replied. Could this be confusing the client?

#11 Updated by Xiubo Li almost 2 years ago

Venky Shankar wrote:

This happens at least once when running fs suite. Latest - https://pulpito.ceph.com/vshankar-2022-05-09_10:08:21-fs-wip-vshankar-testing-20220509-125847-testing-default-smithi/6828291/ (seems to be the same issue).

Xiubo, could early_reply be the issue here (just guessing)? Server::_unlink_local_finish() unlinks the main dentry and responds back to the client. However, this is called after the unlink event is journaled, but the operation has been early replied. Could this be confusing the client?

Today I couldn't connect to the VPN, couldn't check the logs more.

I have readed my comments above, the unlink and create were both async requests. The kclient won't wait the async unlink to be finished in the MDS side, it just assumes the unlink succeeds once the request successfully sent out from kclient.

Checked the MDS code too, it will unlink the inode from the dn only after the journal log is successfully submitted and applied to Rados. That means before that if the kclient requests to async create the file again it will fail with EEXIST directly.

For [1] I can see the async create failure with -EEXIST errno, while for [2] I couldn't see any async create failures. For [2] I think it was another corner case that it was using the sync create, which in MDS side it will open the file instead of failing it with -EEXIST errno, so the create request succeeded but in MDS side the unlink just deleted that file later.

[1] https://pulpito.ceph.com/vshankar-2022-05-02_16:58:59-fs-wip-vshankar-testing1-20220502-201957-testing-default-smithi/6818698/
[2] https://pulpito.ceph.com/vshankar-2022-04-11_12:24:06-fs-wip-vshankar-testing1-20220411-144044-testing-default-smithi/6786388/

#12 Updated by Xiubo Li almost 2 years ago

Locally when I running the snaptest-git-ceph.sh tests there had two MDS daemons crash with:

   -29> 2022-05-11T09:59:28.647+0800 14d80e1ee700  2 mds.4.cache Memory usage:  total 586512, rss 166568, heap 51468, baseline 49420, 337 / 3560 inodes have caps, 337 caps, 0.0946629 caps per inode
   -28> 2022-05-11T09:59:28.744+0800 14d80e9f2700  5 mds.beacon.a Sending beacon up:active seq 197
   -27> 2022-05-11T09:59:28.744+0800 14d80e9f2700 10 monclient: _send_mon_message to mon.c at v1:10.72.47.117:40414/0
   -26> 2022-05-11T09:59:28.748+0800 14d80f9fa700  5 mds.beacon.a received beacon reply up:active seq 197 rtt 0.00400014
   -25> 2022-05-11T09:59:28.898+0800 14d80edf4700 10 monclient: tick
   -24> 2022-05-11T09:59:28.898+0800 14d80edf4700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2022-05-11T09:58:58.898949+0800)
   -23> 2022-05-11T09:59:29.529+0800 14d80f1f6700  4 mds.4.16 apply_blocklist: killed 0, blocklisted sessions (0 blocklist entries, 1)
   -22> 2022-05-11T09:59:29.529+0800 14d80f1f6700 10 monclient: _renew_subs
   -21> 2022-05-11T09:59:29.529+0800 14d80f1f6700 10 monclient: _send_mon_message to mon.c at v1:10.72.47.117:40414/0
   -20> 2022-05-11T09:59:29.648+0800 14d80e1ee700  2 mds.4.cache Memory usage:  total 586512, rss 166568, heap 51468, baseline 49420, 337 / 3560 inodes have caps, 337 caps, 0.0946629 caps per inode
   -19> 2022-05-11T09:59:29.898+0800 14d80edf4700 10 monclient: tick
   -18> 2022-05-11T09:59:29.898+0800 14d80edf4700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2022-05-11T09:58:59.899101+0800)
   -17> 2022-05-11T09:59:30.044+0800 14d80f1f6700  3 mds.4.server handle_client_session client_session(request_renewcaps seq 27) from client.4240
   -16> 2022-05-11T09:59:30.595+0800 14d80f1f6700  4 mds.4.16 apply_blocklist: killed 0, blocklisted sessions (0 blocklist entries, 1)
   -15> 2022-05-11T09:59:30.595+0800 14d80f1f6700 10 monclient: _renew_subs
   -14> 2022-05-11T09:59:30.595+0800 14d80f1f6700 10 monclient: _send_mon_message to mon.c at v1:10.72.47.117:40414/0
   -13> 2022-05-11T09:59:30.650+0800 14d80e1ee700  2 mds.4.cache Memory usage:  total 586512, rss 166568, heap 51468, baseline 49420, 337 / 3560 inodes have caps, 337 caps, 0.0946629 caps per inode
   -12> 2022-05-11T09:59:30.905+0800 14d80edf4700 10 monclient: tick
   -11> 2022-05-11T09:59:30.905+0800 14d80edf4700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2022-05-11T09:59:00.905732+0800)
   -10> 2022-05-11T09:59:31.604+0800 14d80f1f6700  4 mds.4.16 apply_blocklist: killed 0, blocklisted sessions (0 blocklist entries, 1)
    -9> 2022-05-11T09:59:31.604+0800 14d80f1f6700 10 monclient: _renew_subs
    -8> 2022-05-11T09:59:31.604+0800 14d80f1f6700 10 monclient: _send_mon_message to mon.c at v1:10.72.47.117:40414/0
    -7> 2022-05-11T09:59:31.651+0800 14d80e1ee700  2 mds.4.cache Memory usage:  total 586512, rss 166568, heap 51468, baseline 49420, 337 / 3560 inodes have caps, 337 caps, 0.0946629 caps per inode
    -6> 2022-05-11T09:59:31.794+0800 14d80f1f6700  4 mds.4.16 apply_blocklist: killed 0, blocklisted sessions (0 blocklist entries, 1)
    -5> 2022-05-11T09:59:31.794+0800 14d80f1f6700 10 monclient: _renew_subs
    -4> 2022-05-11T09:59:31.794+0800 14d80f1f6700 10 monclient: _send_mon_message to mon.c at v1:10.72.47.117:40414/0
    -3> 2022-05-11T09:59:31.905+0800 14d80edf4700 10 monclient: tick
    -2> 2022-05-11T09:59:31.905+0800 14d80edf4700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2022-05-11T09:59:01.905938+0800)
    -1> 2022-05-11T09:59:32.099+0800 14d80f1f6700 -1 /data/ceph/src/mds/MDCache.cc: In function 'void MDCache::handle_dentry_unlink(ceph::cref_t<MDentryUnlink>&)' thread 14d80f1f6700 time 2022-05-11T09:59:32.078074+0800
/data/ceph/src/mds/MDCache.cc: 11233: FAILED ceph_assert(straydn->first >= in->first)

 ceph version 17.0.0-11972-ga328d3264a0 (a328d3264a0b64c03cf90c2d39f37c6962fa45cb) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x14d81720cddf]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x14d81720d011]
 3: (MDCache::handle_dentry_unlink(boost::intrusive_ptr<MDentryUnlink const> const&)+0x6ad) [0x559db355c4c7]
 4: (MDCache::dispatch(boost::intrusive_ptr<Message const> const&)+0x1ee) [0x559db3591a48]
 5: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0x114) [0x559db3405074]
 6: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x307) [0x559db340919b]
 7: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x5c) [0x559db3409dd4]
 8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26c) [0x559db33f683e]
 9: (DispatchQueue::entry()+0x1c2f) [0x14d817381265]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x14d817439db7]
 11: (Thread::entry_wrapper()+0x3f) [0x14d8171df801]
 12: (Thread::_entry_func(void*)+0x9) [0x14d8171df819]
 13: /lib64/libpthread.so.0(+0x817a) [0x14d81540817a]
 14: clone()

     0> 2022-05-11T09:59:32.107+0800 14d80f1f6700 -1 *** Caught signal (Aborted) **
 in thread 14d80f1f6700 thread_name:ms_dispatch

 ceph version 17.0.0-11972-ga328d3264a0 (a328d3264a0b64c03cf90c2d39f37c6962fa45cb) quincy (dev)
 1: /lib64/libpthread.so.0(+0x12c20) [0x14d815412c20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x2b2) [0x14d81720cf63]
 5: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x14d81720d011]
 6: (MDCache::handle_dentry_unlink(boost::intrusive_ptr<MDentryUnlink const> const&)+0x6ad) [0x559db355c4c7]
 7: (MDCache::dispatch(boost::intrusive_ptr<Message const> const&)+0x1ee) [0x559db3591a48]
 8: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0x114) [0x559db3405074]
 9: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x307) [0x559db340919b]
 10: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x5c) [0x559db3409dd4]
 11: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26c) [0x559db33f683e]
 12: (DispatchQueue::entry()+0x1c2f) [0x14d817381265]
 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x14d817439db7]
 14: (Thread::entry_wrapper()+0x3f) [0x14d8171df801]
 15: (Thread::_entry_func(void*)+0x9) [0x14d8171df819]
 16: /lib64/libpthread.so.0(+0x817a) [0x14d81540817a]
 17: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The fs dump output is:

[root@lxbceph1 build]# ./bin/ceph fs dump
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2022-05-11T16:33:29.354+0800 1500b81ff700 -1 WARNING: all dangerous and experimental features are enabled.
2022-05-11T16:33:29.388+0800 1500b759d700 -1 WARNING: all dangerous and experimental features are enabled.
e62
enable_multiple, ever_enabled_multiple: 1,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'a' (1)
fs_name    a
epoch    62
flags    12 joinable allow_snaps allow_multimds_snaps
created    2022-05-11T09:46:43.239638+0800
modified    2022-05-11T10:02:10.773892+0800
tableserver    0
root    0
session_timeout    60
session_autoclose    300
max_file_size    1099511627776
required_client_features    {}
last_failure    0
last_failure_osd_epoch    0
compat    compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds    5
in    0,1,2,3,4
up    {0=4233,1=4200,2=4198,3=4189,4=4169}
failed    
damaged    
stopped    
data_pools    [3]
metadata_pool    2
inline_data    disabled
balancer    
standby_count_wanted    1
[mds.e{0:4233} state up:active seq 4 laggy since 2022-05-11T09:59:46.938548+0800 export targets 1,2,3,4 addr v1:10.72.47.117:6817/3511908233 compat {c=[1],r=[1],i=[7ff]}]
[mds.b{1:4200} state up:active seq 206 addr v1:10.72.47.117:6814/4045335419 compat {c=[1],r=[1],i=[7ff]}]
[mds.d{2:4198} state up:active seq 204 addr v1:10.72.47.117:6816/1568717300 compat {c=[1],r=[1],i=[7ff]}]
[mds.c{3:4189} state up:active seq 55 addr v1:10.72.47.117:6815/3715964172 compat {c=[1],r=[1],i=[7ff]}]
[mds.a{4:4169} state up:active seq 56 laggy since 2022-05-11T09:59:46.938537+0800 export targets 0,1,2,3 addr v1:10.72.47.117:6813/3882317668 compat {c=[1],r=[1],i=[7ff]}]

dumped fsmap epoch 62
[root@lxbceph1 build]#

#13 Updated by Venky Shankar almost 2 years ago

Xiubo Li wrote:

Venky Shankar wrote:

This happens at least once when running fs suite. Latest - https://pulpito.ceph.com/vshankar-2022-05-09_10:08:21-fs-wip-vshankar-testing-20220509-125847-testing-default-smithi/6828291/ (seems to be the same issue).

Xiubo, could early_reply be the issue here (just guessing)? Server::_unlink_local_finish() unlinks the main dentry and responds back to the client. However, this is called after the unlink event is journaled, but the operation has been early replied. Could this be confusing the client?

Today I couldn't connect to the VPN, couldn't check the logs more.

I have readed my comments above, the unlink and create were both async requests. The kclient won't wait the async unlink to be finished in the MDS side, it just assumes the unlink succeeds once the request successfully sent out from kclient.

Shouldn't the client wait till it gets a response (and trace information) from the MDS for the operation? What happens if the (unlink) operation fails on the MDS? Does the client trim the dentry as soon as it sends out the unlink request?

Checked the MDS code too, it will unlink the inode from the dn only after the journal log is successfully submitted and applied to Rados. That means before that if the kclient requests to async create the file again it will fail with EEXIST directly.

FWIW, early reply too includes the trace information. Should the client at least wait till it receives early reply?

For [1] I can see the async create failure with -EEXIST errno, while for [2] I couldn't see any async create failures. For [2] I think it was another corner case that it was using the sync create, which in MDS side it will open the file instead of failing it with -EEXIST errno, so the create request succeeded but in MDS side the unlink just deleted that file later.

Do you think early reply could be culprit in both cases?

[1] https://pulpito.ceph.com/vshankar-2022-05-02_16:58:59-fs-wip-vshankar-testing1-20220502-201957-testing-default-smithi/6818698/
[2] https://pulpito.ceph.com/vshankar-2022-04-11_12:24:06-fs-wip-vshankar-testing1-20220411-144044-testing-default-smithi/6786388/

#14 Updated by Xiubo Li almost 2 years ago

Xiubo Li wrote:

Locally when I running the snaptest-git-ceph.sh tests there had two MDS daemons crash with:

[...]

The fs dump output is:

[...]

Uploaded the log files.

#15 Updated by Xiubo Li almost 2 years ago

Venky Shankar wrote:

Xiubo Li wrote:

Venky Shankar wrote:

This happens at least once when running fs suite. Latest - https://pulpito.ceph.com/vshankar-2022-05-09_10:08:21-fs-wip-vshankar-testing-20220509-125847-testing-default-smithi/6828291/ (seems to be the same issue).

Xiubo, could early_reply be the issue here (just guessing)? Server::_unlink_local_finish() unlinks the main dentry and responds back to the client. However, this is called after the unlink event is journaled, but the operation has been early replied. Could this be confusing the client?

Today I couldn't connect to the VPN, couldn't check the logs more.

I have readed my comments above, the unlink and create were both async requests. The kclient won't wait the async unlink to be finished in the MDS side, it just assumes the unlink succeeds once the request successfully sent out from kclient.

Shouldn't the client wait till it gets a response (and trace information) from the MDS for the operation?

Currently it won't. It will try to check the Fxr caps, and only send out an async unlink when these caps are issued.

1198                 err = ceph_mdsc_submit_request(mdsc, dir, req);
1199                 if (!err) {   
1200                         /*    
1201                          * We have enough caps, so we assume that the unlink                                                                      
1202                          * will succeed. Fix up the target inode and dcache.
1203                          */   
1204                         drop_nlink(inode);                  
1205                         d_delete(dentry);                   
1206                 } else if (err == -EJUKEBOX) {      
1207                         try_async = false;                  
1208                         ceph_mdsc_put_request(req);         
1209                         goto retry;
1210                 }

Then it just assumes the unlink will succeed.

What happens if the (unlink) operation fails on the MDS? Does the client trim the dentry as soon as it sends out the unlink request?

Yeah, as above code showing the dentry will be trimmed immediately after the async unlink request sent out successfully.

But later if the async unlink fails in MDS, the kclient will do just set up error in the inode and notify the userspaces:

1081         /* If op failed, mark everyone involved for errors */
1082         if (result) {
1083                 int pathlen = 0;
1084                 u64 base = 0;
1085                 char *path = ceph_mdsc_build_path(req->r_dentry, &pathlen,
1086                                                   &base, 0);
1087 
1088                 /* mark error on parent + clear complete */
1089                 mapping_set_error(req->r_parent->i_mapping, result);
1090                 ceph_dir_clear_complete(req->r_parent);
1091 
1092                 /* drop the dentry -- we don't know its status */
1093                 if (!d_unhashed(req->r_dentry))
1094                         d_drop(req->r_dentry);
1095 
1096                 /* mark inode itself for an error (since metadata is bogus) */                                                                    
1097                 mapping_set_error(req->r_old_inode->i_mapping, result);
1098 
1099                 pr_warn("ceph: async unlink failure path=(%llx)%s result=%d!\n",
1100                         base, IS_ERR(path) ? "<<bad>>" : path, result);
1101                 ceph_mdsc_free_path(path, pathlen);
1102         }

Checked the MDS code too, it will unlink the inode from the dn only after the journal log is successfully submitted and applied to Rados. That means before that if the kclient requests to async create the file again it will fail with EEXIST directly.

FWIW, early reply too includes the trace information. Should the client at least wait till it receives early reply?

The async create/unlink won't wait for the early replies once the kclient have enough caps issued.

For [1] I can see the async create failure with -EEXIST errno, while for [2] I couldn't see any async create failures. For [2] I think it was another corner case that it was using the sync create, which in MDS side it will open the file instead of failing it with -EEXIST errno, so the create request succeeded but in MDS side the unlink just deleted that file later.

Do you think early reply could be culprit in both cases?

From https://tracker.ceph.com/issues/55332?issue_count=63&issue_position=1&next_issue_id=55421#note-9 we can see the when the async create came the early reply for the async unlink request hadn't happen yet. NOTE: please check the line number of the logs in https://tracker.ceph.com/issues/55332?issue_count=63&issue_position=1&next_issue_id=55421#note-9.

From the logs why the async unlink was delay for several times was because it seems trying to get the lockers but failed, currently I couldn't open that log file due to the teuthology server's connection issue today, so couldn't confirm it. And during async unlink request waiting the async create came and fail. But if a sync create came, it would succeed but it's buggy too.

[1] https://pulpito.ceph.com/vshankar-2022-05-02_16:58:59-fs-wip-vshankar-testing1-20220502-201957-testing-default-smithi/6818698/
[2] https://pulpito.ceph.com/vshankar-2022-04-11_12:24:06-fs-wip-vshankar-testing1-20220411-144044-testing-default-smithi/6786388/

Anyway we need to make sure the unlink and create sequence in either the client side or the MDS side.

#16 Updated by Jeff Layton almost 2 years ago

Anyway we need to make sure the unlink and create sequence in either the client side or the MDS side.

We do d_drop the dentry on a failed unlink, but that may not happen before the client attempts to send a create for the same dentry.

One idea : declare a new ceph_dentry_info flag for dentries that are being unlinked. If that's set, then the create could block and wait on the reply before proceeding.

#17 Updated by Xiubo Li almost 2 years ago

Jeff Layton wrote:

Anyway we need to make sure the unlink and create sequence in either the client side or the MDS side.

We do d_drop the dentry on a failed unlink, but that may not happen before the client attempts to send a create for the same dentry.

One idea : declare a new ceph_dentry_info flag for dentries that are being unlinked. If that's set, then the create could block and wait on the reply before proceeding.

Yeah, the waiting here will only wait for the early reply, not for the safe reply. But if the unlink in MDS side not journalized and applied to the Rados, it seems the async openc in MDS still could fail, and for sync openc case still has the same issue, which is the file will be deleted after the openc succeeds. Need to check more about the MDS code and confirm it.

#18 Updated by Xiubo Li almost 2 years ago

Xiubo Li wrote:

Venky Shankar wrote:

Another instance: https://pulpito.ceph.com/vshankar-2022-05-02_16:58:59-fs-wip-vshankar-testing1-20220502-201957-testing-default-smithi/6818698/

Checked this failure, from the remote/smithi040/syslog/kern.log.gz:

[...]

There have socket connection issue and async create file failure issues.

For the async create file failure issue, for example for:

[...]

From the remote/smithi040/log/ceph-mds.c.log.gz, the file monmaptool.cc was created at 22214281 2022-05-02T17:29:04.926+0000:

[...]

While the file monmaptool.cc was just deleted at 22214186 2022-05-02T17:29:04.926+0000, but delayed for many time and until 22254030 2022-05-02T17:29:05.127+0000 could it acquire the locks successfully and continue the unlink work:

Checked the log again, it's not waiting to acquire the locks and it was waiting for the straydir to be unfroze and then just released the mds_lock and then the async create request was triggered:

22214268 2022-05-02T17:29:04.926+0000 7f45e13f6700 20 mds.0.cache.dir(0x605.011*) lookup (1000000571c, 'head')
22214269 2022-05-02T17:29:04.926+0000 7f45e13f6700 20 mds.0.cache.dir(0x605.011*)   miss -> (10000005967,head)
22214270 2022-05-02T17:29:04.926+0000 7f45e13f6700 10 mds.0.server prepare_stray_dentry: [dir 0x605.011* ~mds0/stray5/ [2,head] auth v=14670 cv=0/0 state=1610883081|complete|frozendir|sticky f(v0 12=12+0) n(v0 rc2022-05-02T17:28:57.345695+0000 b150137 12=12+0) hs=12+2,ss=0+0 dirty=12 | child=1 frozen=1 sticky=1 dirty=1 waiter=10x559592519200] is frozen, waiting                                                                       
22214271 2022-05-02T17:29:04.926+0000 7f45e13f6700 10 mds.0.locker xlock_finish on (dn xlock x=1 by 0x5595910c8000) [dentry #0x1/client.0/tmp/ceph/src/monmaptool.cc [20e,head] auth (dn xlock x=1 by 0x5595910c8000) (dversion lock w=1 last_client=4678) v=13777 ap=2 ino=0x1000000571c state=1610612736 | request=1 lock=2 inodepin=1 dirty=1 authpin=1 0x55958fe2e780]

[...]

That means while the file monmaptool.cc still under unlinking and delayed, the async create reqeust came and failed directly.

From teuthology.log we can see the snaptest failed:

[...]

#19 Updated by Xiubo Li almost 2 years ago

  • Subject changed from Failure in snaptest-git-ceph.sh to Failure in snaptest-git-ceph.sh (async unlink/create bugs)
  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 46331

#20 Updated by Xiubo Li almost 2 years ago

  • Subject changed from Failure in snaptest-git-ceph.sh (async unlink/create bugs) to Failure in snaptest-git-ceph.sh (it's an async unlink/create bug)

#21 Updated by Xiubo Li almost 2 years ago

It's a async unlink/create bug and fixed it in both kclient and MDS side:

[1] https://github.com/ceph/ceph-client/commit/b3c8646b2c1d93def4dc6f0737459d23582b4a07
[2] https://github.com/ceph/ceph/pull/46331

#22 Updated by Venky Shankar almost 2 years ago

Xiubo, there is another class of failures for this test. See - https://pulpito.ceph.com/vshankar-2022-06-15_04:03:39-fs-wip-vshankar-testing1-20220615-072516-testing-default-smithi/6879444/

Failures is:

2022-06-15T05:22:58.260 INFO:tasks.ceph.mds.d:Started
2022-06-15T05:22:58.261 DEBUG:teuthology.orchestra.run.smithi017:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs dump --format=json
2022-06-15T05:22:58.277 INFO:tasks.workunit.client.0.smithi017.stderr:Updating files:  99% (306/309)^MUpdating files:  99% (307/309)^MUpdating files: 100% (309/309)^MUpdating files: 100% (309/309), done.
2022-06-15T05:22:58.278 INFO:tasks.workunit.client.0.smithi017.stderr:fatal: Could not reset index file to revision 'v0.81'.
2022-06-15T05:22:58.332 DEBUG:teuthology.orchestra.run:got remote process result: 128
2022-06-15T05:22:58.333 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
2022-06-15T05:22:58.334 DEBUG:teuthology.orchestra.run.smithi017:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0

This looks like a different issue, isn't it?

#23 Updated by Xiubo Li almost 2 years ago

Venky Shankar wrote:

Xiubo, there is another class of failures for this test. See - https://pulpito.ceph.com/vshankar-2022-06-15_04:03:39-fs-wip-vshankar-testing1-20220615-072516-testing-default-smithi/6879444/

Failures is:

[...]

This looks like a different issue, isn't it?

`96% (297/309)^MUpdating files: 97% (300/309)^MUpdating files: 98% (303/309)^Merror: unable to create file src/tools/radosacl.cc: File exists`

It should be the same issue. The unlinked file still existed when creating a new one.

#24 Updated by Venky Shankar almost 2 years ago

Xiubo Li wrote:

Venky Shankar wrote:

Xiubo, there is another class of failures for this test. See - https://pulpito.ceph.com/vshankar-2022-06-15_04:03:39-fs-wip-vshankar-testing1-20220615-072516-testing-default-smithi/6879444/

Failures is:

[...]

This looks like a different issue, isn't it?

`96% (297/309)^MUpdating files: 97% (300/309)^MUpdating files: 98% (303/309)^Merror: unable to create file src/tools/radosacl.cc: File exists`

It should be the same issue. The unlinked file still existed when creating a new one.

ACK. Makes sense.

#25 Updated by Patrick Donnelly over 1 year ago

  • Duplicated by Bug #54462: Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi055 with status 128 added

#26 Updated by Rishabh Dave over 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#27 Updated by Backport Bot over 1 year ago

  • Copied to Backport #57836: pacific: Failure in snaptest-git-ceph.sh (it's an async unlink/create bug) added

#28 Updated by Backport Bot over 1 year ago

  • Copied to Backport #57837: quincy: Failure in snaptest-git-ceph.sh (it's an async unlink/create bug) added

#29 Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed

#30 Updated by Venky Shankar over 1 year ago

  • Related to Bug #55804: qa failure: pjd link tests failed added

#31 Updated by Venky Shankar over 1 year ago

  • Related to Bug #58041: mds: src/mds/Server.cc: 3231: FAILED ceph_assert(straydn->get_name() == straydname) added

#32 Updated by Xiubo Li about 1 year ago

  • Status changed from Pending Backport to Resolved

#34 Updated by Xiubo Li 11 months ago

Kotresh Hiremath Ravishankar wrote:

This is seen in reef QA run - https://pulpito.ceph.com/yuriw-2023-05-15_15:22:39-fs-wip-yuri6-testing-2023-04-26-1247-reef-distro-default-smithi/7274325

Venky/Xiubo,

I think this needs a backport for reef ?

Yeah, I will backport this together with another unlink relevant tracker https://tracker.ceph.com/issues/58340 later after it gets merges. Maybe soon.

#40 Updated by Venky Shankar 8 months ago

Xiubo, this needs backported to reef, yes?

#41 Updated by Venky Shankar 8 months ago

A bit unrelated, but mentioning here for completeness:

/a/yuriw-2023-07-14_23:37:57-fs-wip-yuri8-testing-2023-07-14-0803-reef-distro-default-smithi/7338278

`git` dumped core is one instance:

file ./remote/smithi155/coredump/1689390465.141618.core
./remote/smithi155/coredump/1689390465.141618.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'git reset --hard v0.63', real uid: 1000, effective uid: 1000, real gid: 1265, effective gid: 1265, execfn: '/usr/bin/git', platform: 'x86_64'

#42 Updated by Xiubo Li 8 months ago

Venky Shankar wrote:

Xiubo, this needs backported to reef, yes?

It's already in reef.

#43 Updated by Xiubo Li 7 months ago

  • Copied to Bug #62810: Failure in snaptest-git-ceph.sh (it's an async unlink/create bug) -- Need to fix again added

Also available in: Atom PDF