Project

General

Profile

Bug #47854

some clients may return failure in the scenario where multiple clients create directories at the same time

Added by wei qiaomiao 3 months ago. Updated 3 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
multimds
Pull request ID:
Crash signature:

Description

The issue can be reproduced by the following steps:
(1)ceph version: 14.2.10, multimds , multiple clients mount the same directory through ceph-fuse
(2) Use vdbench tool test performace, the test script is:

> hd=default,vdbench=/root/vdbench50406,user=root,shell=ssh

> hd=hd1,system=192.168.8.101,clients=5
> hd=hd2,system=192.168.8.104,clients=5
> hd=hd3,system=192.168.8.103,clients=5

> fsd=fsd1,anchor=/mnt/dir2,depth=2,width=100,files=10,sizes=4k,shared=yes

> fwd=format,threads=32,xfersize=4k

> fwd=default,rdpct=60,xfersize=4k,fileio=sequential,fileselect=sequential,threads=32

> fwd=fwd1,fsd=fsd*, host=hd1
> fwd=fwd2,fsd=fsd*, host=hd2
> fwd=fwd3,fsd=fsd*, host=hd3

> rd=rd1,fwd=fwd*,fwdrate=max,format=restart,elapsed=300,interval=1

(3) vdbench tool abort because mkdir failure

 
>08:53:27.586 All slaves are now connected
>08:53:29.001 Starting RD=format_for_rd1

>Oct 14, 2020 .Interval. .ReqstdOps... ...cpu%...  read ....read..... ....write.... ..mb/sec... mb/sec .xfer.. ...mkdir.... >...rmdir.... ...create... ....open.... ...close.... ...delete...
>                         rate   resp total  sys   pct   rate   resp   rate   resp  read write  total    size  rate   resp  rate   > > resp  rate   resp  rate   resp  rate   resp  rate   resp
> 08:53:30.123          1    0.0  0.000  17.3 4.58   0.0    0.0  0.000    0.0  0.000  0.00  0.00   0.00       0 473.0 33.676   0.0  > > 0.000   0.0  0.000   0.0  0.000   0.0  0.000   0.0  0.000
> 08:53:31.045          2    0.0  0.000  42.1 25.5   0.0    0.0  0.000    0.0  0.000  0.00  0.00   0.00       0 700.0 17.441   0.0  > > 0.000   0.0  0.000   0.0  0.000   0.0  0.000   0.0  0.000
> 08:53:32.062          3    0.0  0.000  38.8 25.7   0.0    0.0  0.000    0.0  0.000  0.00  0.00   0.00       0 750.0 16.095   0.0  > > 0.000   0.0  0.000   0.0  0.000   0.0  0.000   0.0  0.000
> 08:53:33.057          4    0.0  0.000  37.6 24.2   0.0    0.0  0.000    0.0  0.000  0.00  0.00   0.00       0 691.0 17.338   0.0  > > 0.000   0.0  0.000   0.0  0.000   0.0  0.000   0.0  0.000
> 08:53:34.056          5    0.0  0.000  30.3 20.8   0.0    0.0  0.000    0.0  0.000  0.00  0.00   0.00       0 616.0 17.556   0.0  > > 0.000   0.0  0.000   0.0  0.000   0.0  0.000   0.0  0.000
> 08:53:35.056          6    0.0  0.000  29.3 19.0   0.0    0.0  0.000    0.0  0.000  0.00  0.00   0.00       0 681.0 20.819   0.0  > > 0.000   0.0  0.000   0.0  0.000   0.0  0.000   0.0  0.000
> 08:53:36.054          7    0.0  0.000  34.7 24.9   0.0    0.0  0.000    0.0  0.000  0.00  0.00   0.00       0  1232 11.082   0.0  > > 0.000   0.0  0.000   0.0  0.000   0.0  0.000   0.0  0.000
> 08:53:37.057          8    0.0  0.000  36.2 25.9   0.0    0.0  0.000    0.0  0.000  0.00  0.00   0.00       0  1326  8.678   0.0  > > 0.000   0.0  0.000   0.0  0.000   0.0  0.000   0.0  0.000
> 08:53:37.956 *
> 08:53:37.956 ********************************************************************************************************
> 08:53:37.956 * Slave hd1_cl0-0 aborting:  Unable to create directory: /mnt/dir2/vdb.1_1.dir/vdb.2_17.dir/ *
> 08:53:37.956 ********************************************************************************************************pari

The following is a partial log of the client hd1. See the attachment for the complete log.

2020-10-13 14:34:52.429 7fc8d9ffb700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir -> -2 (0)
2020-10-13 14:34:52.429 7fc97b7fe700  3 client.621319 ll_mkdir 0x1000002130f.head vdb.2_17.dir
2020-10-13 14:34:52.429 7fc97b7fe700  8 client.621319 _mkdir(0x1000002130f vdb.2_17.dir, 0777, uid 0, gid 0)
2020-10-13 14:34:52.429 7fc97b7fe700 20 client.621319 get_or_create 0x1000002130f.head(faked_ino=0 ref=4 ll_ref=247 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-10-13 14:37:31.430429 ctime=2020-10-13 14:37:31.430429 caps=pAsLsXs(0=pAsLsXs) parents=0x1000000d8b3.head["vdb.1_1.dir"] 0x7fc9b801b1a0) name vdb.2_17.dir
2020-10-13 14:34:52.429 7fc97b7fe700 10 client.621319 send_request client_request(unknown.0:2663 mkdir #0x1000002130f/vdb.2_17.dir 2020-10-13 14:34:52.430764 caller_uid=0, caller_gid=0{0,}) v4 to mds.0
2020-10-13 14:34:52.599 7fc97b7fe700 10 client.621319 send_request client_request(unknown.0:2663 mkdir #0x1000002130f/vdb.2_17.dir 2020-10-13 14:34:52.430764 caller_uid=0, caller_gid=0{0,}) v4 to mds.1
2020-10-13 14:34:52.599 7fc97b7fe700  8 client.621319 _mkdir(#0x1000002130f/vdb.2_17.dir, 040755) = -17
2020-10-13 14:34:52.599 7fc97b7fe700  3 client.621319 ll_mkdir 0x1000002130f.head vdb.2_17.dir = -17 (0)
2020-10-13 14:34:52.600 7fc95a7fc700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir
2020-10-13 14:34:52.600 7fc95a7fc700 10 client.621319 _lookup 0x1000002130f.head(faked_ino=0 ref=3 ll_ref=248 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-10-13 14:37:36.563897 ctime=2020-10-13 14:37:36.563897 caps=pAsLsXs(0=pAsLsXs) parents=0x1000000d8b3.head["vdb.1_1.dir"] 0x7fc9b801b1a0) vdb.2_17.dir = -2
2020-10-13 14:34:52.600 7fc95a7fc700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir -> -2 (0)
2020-10-13 14:34:52.600 7fc89a7fc700  3 client.621319 ll_mkdir 0x1000002130f.head vdb.2_17.dir
2020-10-13 14:34:52.600 7fc89a7fc700  8 client.621319 _mkdir(0x1000002130f vdb.2_17.dir, 0777, uid 0, gid 0)
2020-10-13 14:34:52.600 7fc89a7fc700 20 client.621319 get_or_create 0x1000002130f.head(faked_ino=0 ref=4 ll_ref=248 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-10-13 14:37:36.563897 ctime=2020-10-13 14:37:36.563897 caps=pAsLsXs(0=pAsLsXs) parents=0x1000000d8b3.head["vdb.1_1.dir"] 0x7fc9b801b1a0) name vdb.2_17.dir
2020-10-13 14:34:52.600 7fc89a7fc700 10 client.621319 send_request client_request(unknown.0:2718 mkdir #0x1000002130f/vdb.2_17.dir 2020-10-13 14:34:52.601221 caller_uid=0, caller_gid=0{0,}) v4 to mds.0
2020-10-13 14:34:52.784 7fc89a7fc700 10 client.621319 send_request client_request(unknown.0:2718 mkdir #0x1000002130f/vdb.2_17.dir 2020-10-13 14:34:52.601221 caller_uid=0, caller_gid=0{0,}) v4 to mds.1
2020-10-13 14:34:52.784 7fc89a7fc700  8 client.621319 _mkdir(#0x1000002130f/vdb.2_17.dir, 040755) = -17
2020-10-13 14:34:52.784 7fc89a7fc700  3 client.621319 ll_mkdir 0x1000002130f.head vdb.2_17.dir = -17 (0)
2020-10-13 14:34:52.784 7fc8da7fc700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir
2020-10-13 14:34:52.784 7fc8da7fc700 10 client.621319 _lookup 0x1000002130f.head(faked_ino=0 ref=3 ll_ref=249 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-10-13 14:37:36.563897 ctime=2020-10-13 14:37:36.563897 caps=pAsLsXs(0=pAsLsXs) parents=0x1000000d8b3.head["vdb.1_1.dir"] 0x7fc9b801b1a0) vdb.2_17.dir = -2
2020-10-13 14:34:52.784 7fc8da7fc700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir -> -2 (0)
2020-10-13 14:34:52.784 7fc8b97fa700  3 client.621319 ll_mkdir 0x1000002130f.head vdb.2_17.dir
2020-10-13 14:34:52.784 7fc8b97fa700  8 client.621319 _mkdir(0x1000002130f vdb.2_17.dir, 0777, uid 0, gid 0)
2020-10-13 14:34:52.784 7fc8b97fa700 20 client.621319 get_or_create 0x1000002130f.head(faked_ino=0 ref=4 ll_ref=249 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-10-13 14:37:36.563897 ctime=2020-10-13 14:37:36.563897 caps=pAsLsXs(0=pAsLsXs) parents=0x1000000d8b3.head["vdb.1_1.dir"] 0x7fc9b801b1a0) name vdb.2_17.dir
2020-10-13 14:34:52.784 7fc8b97fa700 10 client.621319 send_request client_request(unknown.0:2780 mkdir #0x1000002130f/vdb.2_17.dir 2020-10-13 14:34:52.785694 caller_uid=0, caller_gid=0{0,}) v4 to mds.0
2020-10-13 14:34:52.966 7fc8b97fa700 10 client.621319 send_request client_request(unknown.0:2780 mkdir #0x1000002130f/vdb.2_17.dir 2020-10-13 14:34:52.785694 caller_uid=0, caller_gid=0{0,}) v4 to mds.1
2020-10-13 14:34:52.966 7fc8b97fa700  8 client.621319 _mkdir(#0x1000002130f/vdb.2_17.dir, 040755) = -17
2020-10-13 14:34:52.966 7fc8b97fa700  3 client.621319 ll_mkdir 0x1000002130f.head vdb.2_17.dir = -17 (0)
2020-10-13 14:34:52.967 7fc8ba7fc700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir
2020-10-13 14:34:52.967 7fc8ba7fc700 10 client.621319 _lookup 0x1000002130f.head(faked_ino=0 ref=3 ll_ref=250 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-10-13 14:37:37.323976 ctime=2020-10-13 14:37:37.323976 caps=pAsLsXs(0=pAsLsXs) parents=0x1000000d8b3.head["vdb.1_1.dir"] 0x7fc9b801b1a0) vdb.2_17.dir = -2
2020-10-13 14:34:52.967 7fc8ba7fc700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir -> -2 (0)
2020-10-13 14:34:52.967 7fc8b9ffb700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir
2020-10-13 14:34:52.967 7fc8b9ffb700 10 client.621319 _lookup 0x1000002130f.head(faked_ino=0 ref=3 ll_ref=250 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-10-13 14:37:37.323976 ctime=2020-10-13 14:37:37.323976 caps=pAsLsXs(0=pAsLsXs) parents=0x1000000d8b3.head["vdb.1_1.dir"] 0x7fc9b801b1a0) vdb.2_17.dir = -2
2020-10-13 14:34:52.967 7fc8b9ffb700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir -> -2 (0)
2020-10-13 14:34:52.967 7fc95bfff700  3 client.621319 ll_lookup 0x1000002130f.head vdb.2_17.dir

hd1_client.log View (19 KB) wei qiaomiao, 10/14/2020 03:11 AM


Related issues

Copied to CephFS - Backport #48129: octopus: some clients may return failure in the scenario where multiple clients create directories at the same time In Progress
Copied to CephFS - Backport #48130: nautilus: some clients may return failure in the scenario where multiple clients create directories at the same time New

History

#1 Updated by Patrick Donnelly 3 months ago

  • Description updated (diff)
  • ceph-qa-suite deleted (fs)

#2 Updated by Patrick Donnelly 3 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to wei qiaomiao
  • Target version set to v16.0.0
  • Source set to Community (dev)
  • Backport set to octopus,nautilus
  • Pull request ID set to 37664

#3 Updated by Patrick Donnelly 3 months ago

  • Status changed from Fix Under Review to Pending Backport

#4 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #48129: octopus: some clients may return failure in the scenario where multiple clients create directories at the same time added

#5 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #48130: nautilus: some clients may return failure in the scenario where multiple clients create directories at the same time added

Also available in: Atom PDF