Project

General

Profile

Bug #47294

client: thread hang in Client::_setxattr_maybe_wait_for_osdmap

Added by Patrick Donnelly 5 months ago. Updated 9 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature:

Description

2020-09-03T04:15:44.868+0000 7feae3457700  3 client.19878 _do_setxattr(0x10000000002, "ceph.dir.subvolume") = 0
2020-09-03T04:15:44.868+0000 7feae3457700  1 -- 172.21.15.71:0/3974804240 --> [v2:172.21.15.71:3300/0,v1:172.21.15.71:6789/0] -- mon_get_version(what=osdmap handle=2) v1 -- 0x558a28064420 con 0x558a2d0a9800
2020-09-03T04:15:44.868+0000 7fea98a01700  1 -- 172.21.15.71:0/3974804240 <== mon.0 v2:172.21.15.71:3300/0 8 ==== mon_get_version_reply(handle=2 version=618) v2 ==== 24+0+0 (secure 0 0 0) 0x558a28064420 con 0x558a2d0a9800

From : /ceph/teuthology-archive/pdonnell-2020-09-03_02:04:14-fs-wip-pdonnell-testing-20200903.000442-distro-basic-smithi/5401496/remote/smithi071/log/ceph-mgr.x.log.gz

At that point, 7feae3457700 is planning to do another setxattr with an invalid pool name. That waits for the latest osdmap but the CompletionToken does not get called:

https://github.com/ceph/ceph/blob/e4a37f6338cf39e76228492897c1f2165e294dba/src/client/Client.cc#L11995


Related issues

Related to CephFS - Bug #47293: client: osdmap wait not protected by mounted mutex Resolved

History

#1 Updated by Patrick Donnelly 5 months ago

  • Status changed from New to Triaged
  • Assignee changed from Adam Emerson to Xiubo Li

Actually, I think it's more likely the hang is in

https://github.com/ceph/ceph/blob/e4a37f6338cf39e76228492897c1f2165e294dba/src/client/Client.cc#L11621

which immediately follows the "_setxattr_maybe_wait_for_osdmap". Xiubo, can you take a look?

#2 Updated by Patrick Donnelly 5 months ago

Xiubo, I'd also suggest adding debugging entry/exit points for these methods. (If you're feeling motivated, debugging output for entry/exit points done through helper macros/methods would be really nice across all Client public methods.)

#3 Updated by Patrick Donnelly 5 months ago

  • Related to Bug #47293: client: osdmap wait not protected by mounted mutex added

#4 Updated by Xiubo Li 5 months ago

Patrick Donnelly wrote:

Actually, I think it's more likely the hang is in

https://github.com/ceph/ceph/blob/e4a37f6338cf39e76228492897c1f2165e294dba/src/client/Client.cc#L11621

which immediately follows the "_setxattr_maybe_wait_for_osdmap". Xiubo, can you take a look?

Sure. I will check it.

But it seems not hangs in _setxattr_maybe_wait_for_osdmap(), this seems won't wait for osdmap, it will do it asynchronously.

#5 Updated by Xiubo Li 5 months ago

Set the "ceph.dir.subvolume" won't fetch the osdmap, only for the pool related xattrs.

11980 void Client::_setxattr_maybe_wait_for_osdmap(const char *name, const void *value, size_t size)
11981 {
11982   // For setting pool of layout, MetaRequest need osdmap epoch.
11983   // There is a race which create a new data pool but client and mds both don't have.
11984   // Make client got the latest osdmap which make mds quickly judge whether get newer osdmap.
11985   if (strcmp(name, "ceph.file.layout.pool") == 0 || strcmp(name, "ceph.dir.layout.pool") == 0 ||
11986       strcmp(name, "ceph.file.layout") == 0 || strcmp(name, "ceph.dir.layout") == 0) {
11987     string rest(strstr(name, "layout"));
11988     string v((const char*)value, size); 
11989     int r = objecter->with_osdmap([&](const OSDMap& o) {
11990       return _setxattr_check_data_pool(rest, v, &o);                                                                                
11991     });
11992  
11993     if (r == -ENOENT) {      
11994       bs::error_code ec;     
11995       objecter->wait_for_latest_osdmap(ca::use_blocked[ec]);
11996     }
11997   }
11998 }

#6 Updated by Xiubo Li 5 months ago

From /ceph/teuthology-archive/pdonnell-2020-09-03_02:04:14-fs-wip-pdonnell-testing-20200903.000442-distro-basic-smithi/5401496/teuthology.log

2020-09-03T04:15:44.825 INFO:teuthology.orchestra.run.smithi143.stderr:2020-09-03T04:15:44.824+0000 7fc5f57fa700  1 -- 172.21.15.143:0/767655641 <== mon.1 v2:172.21.15.143:3300/0 7 ==== mgrmap(e 64) v1 ==== 65713+0+0 (secure 0 0 0) 0x7fc60000f070 con 0x7fc608143780
2020-09-03T04:15:44.858 INFO:teuthology.orchestra.run.smithi143.stderr:2020-09-03T04:15:44.857+0000 7fc60fa4f700  1 -- 172.21.15.143:0/767655641 --> [v2:172.21.15.71:6834/24165,v1:172.21.15.71:6835/24165] -- mgr_command(tid 0: {"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_0000000000499794", "pool_layout": "invalid_pool", "target": ["mon-mgr", ""]}) v1 -- 0x7fc608044b60 con 0x7fc5e0053030
2020-09-03T04:16:07.871 INFO:teuthology.orchestra.run.smithi071:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-09-03T04:16:07.875 INFO:teuthology.orchestra.run.smithi143:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-09-03T04:16:38.023 INFO:teuthology.orchestra.run.smithi071:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-09-03T04:16:38.028 INFO:teuthology.orchestra.run.smithi143:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-09-03T04:17:08.144 INFO:teuthology.orchestra.run.smithi071:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-09-03T04:17:08.149 INFO:teuthology.orchestra.run.smithi143:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-09-03T04:17:38.308 INFO:teuthology.orchestra.run.smithi071:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-09-03T04:17:38.314 INFO:teuthology.orchestra.run.smithi143:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-09-03T04:17:44.565 DEBUG:teuthology.orchestra.run:got remote process result: 124

It says the command was timeout after 120 seconds.

But from the /ceph/teuthology-archive/pdonnell-2020-09-03_02:04:14-fs-wip-pdonnell-testing-20200903.000442-distro-basic-smithi/5401496/remote/smithi071/log/ceph-mgr.x.log.gz

2020-09-03T04:15:44.864+0000 7feae3457700 10 client.19878 send_request rebuilding request 37 for mds.0
2020-09-03T04:15:44.864+0000 7feae3457700 10 client.19878 send_request client_request(unknown.0:37 setxattr #0x10000000002 ceph.dir.subvolume 2020-09-03T04:15:44.865712+0000 caller_uid=0, caller_gid=0{}) v4 to mds.0
2020-09-03T04:15:44.864+0000 7feae3457700  1 -- 172.21.15.71:0/3974804240 --> [v2:172.21.15.143:6800/526069828,v1:172.21.15.143:6801/526069828] -- client_request(unknown.0:37 setxattr #0x10000000002 ceph.dir.subvolume 2020-09-03T04:15:44.865712+0000 caller_uid=0, caller_gid=0{}) v4 -- 0x558a2d146a00 con 0x558a26114800

...

2020-09-03T04:15:44.867+0000 7fea98a01700  1 -- 172.21.15.71:0/3974804240 <== mds.0 v2:172.21.15.143:6800/526069828 77 ==== client_reply(???:33 = 0 (0) Success safe) v1 ==== 27+0+0 (secure 0 0 0) 0x558a250f8700 con 0x558a26114800
2020-09-03T04:15:44.867+0000 7fea98a01700 10 client.19878 insert_trace from 2020-09-03T04:15:44.534551+0000 mds.0 is_target=0 is_dentry=0
2020-09-03T04:15:44.867+0000 7fea98a01700 10 client.19878 insert_trace -- already got unsafe; ignoring
2020-09-03T04:15:44.868+0000 7fea98a01700  1 -- 172.21.15.71:0/3974804240 <== mds.0 v2:172.21.15.143:6800/526069828 78 ==== client_reply(???:36 = 0 (0) Success safe) v1 ==== 27+0+0 (secure 0 0 0) 0x558a250f81c0 con 0x558a26114800
2020-09-03T04:15:44.868+0000 7fea98a01700 10 client.19878 insert_trace from 2020-09-03T04:15:44.864715+0000 mds.0 is_target=0 is_dentry=0
2020-09-03T04:15:44.868+0000 7fea98a01700 10 client.19878 insert_trace -- already got unsafe; ignoring
2020-09-03T04:15:44.868+0000 7fea98a01700  1 -- 172.21.15.71:0/3974804240 <== mds.0 v2:172.21.15.143:6800/526069828 79 ==== client_reply(???:37 = 0 (0) Success safe) v1 ==== 27+0+0 (secure 0 0 0) 0x558a250f8c40 con 0x558a26114800
2020-09-03T04:15:44.868+0000 7fea98a01700 10 client.19878 insert_trace from 2020-09-03T04:15:44.865721+0000 mds.0 is_target=0 is_dentry=0
2020-09-03T04:15:44.868+0000 7fea98a01700 10 client.19878 insert_trace -- no trace
2020-09-03T04:15:44.868+0000 7feae3457700  3 client.19878 _do_setxattr(0x10000000002, "ceph.dir.subvolume") = 0

It says, the command has been successfully finished and returned. And the time is at `2020-09-03T04:15:44.868+0000`, long time before the subproc command expiring time at `2020-09-03T04:17:44.565`.

2020-09-03T04:15:44.868+0000 7feae3457700  1 -- 172.21.15.71:0/3974804240 --> [v2:172.21.15.71:3300/0,v1:172.21.15.71:6789/0] -- mon_get_version(what=osdmap handle=2) v1 -- 0x558a28064420 con 0x558a2d0a9800
2020-09-03T04:15:44.868+0000 7fea98a01700  1 -- 172.21.15.71:0/3974804240 <== mon.0 v2:172.21.15.71:3300/0 8 ==== mon_get_version_reply(handle=2 version=618) v2 ==== 24+0+0 (secure 0 0 0) 0x558a28064420 con 0x558a2d0a9800
2020-09-03T04:15:44.972+0000 7feac9a63700 10 cephx: validate_tickets want 38 have 38 need 0
2020-09-03T04:15:44.972+0000 7feac9a63700 20 cephx client: need_tickets: want=38 have=38 need=0

This could for setting 'ceph.dir.layout.pool' xattr, it seems stuck somewhere after this.

#7 Updated by Patrick Donnelly 5 months ago

Xiubo Li wrote:

From /ceph/teuthology-archive/pdonnell-2020-09-03_02:04:14-fs-wip-pdonnell-testing-20200903.000442-distro-basic-smithi/5401496/teuthology.log

[...]

It says the command was timeout after 120 seconds.

But from the /ceph/teuthology-archive/pdonnell-2020-09-03_02:04:14-fs-wip-pdonnell-testing-20200903.000442-distro-basic-smithi/5401496/remote/smithi071/log/ceph-mgr.x.log.gz

[...]

It says, the command has been successfully finished and returned. And the time is at `2020-09-03T04:15:44.868+0000`, long time before the subproc command expiring time at `2020-09-03T04:17:44.565`.

[...]

This could for setting 'ceph.dir.layout.pool' xattr, it seems stuck somewhere after this.

Right, so the setxattr on "ceph.dir.subvolume" succeeds:

https://github.com/ceph/ceph/blob/e9b8a8c4933fe4a73beda66f1491304425efd6c0/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v2.py#L171

Then the mgr/volumes plugin will try to set the data pool:

https://github.com/ceph/ceph/blob/e9b8a8c4933fe4a73beda66f1491304425efd6c0/src/pybind/mgr/volumes/fs/operations/versions/subvolume_base.py#L169-L172

Note: there is no debug output between finishing the setxattr of ceph.dir.subvolume and the entry into setxattr on ceph.dir.layout.pool. We see the client shoot off the request for the osdmap version indicating that it's in the setxattr path for ceph.dir.layout.pool:

https://github.com/ceph/ceph/blob/e4a37f6338cf39e76228492897c1f2165e294dba/src/client/Client.cc#L11619

I believe that call _setxattr_maybe_wait_for_osdmap completes okay. We have no reason to believe that method will start hanging as there have been no recent changes. It makes more sense that mref_reader is hanging.

#8 Updated by Xiubo Li 5 months ago

Patrick Donnelly wrote:

Xiubo Li wrote:

From /ceph/teuthology-archive/pdonnell-2020-09-03_02:04:14-fs-wip-pdonnell-testing-20200903.000442-distro-basic-smithi/5401496/teuthology.log

[...]

It says the command was timeout after 120 seconds.

But from the /ceph/teuthology-archive/pdonnell-2020-09-03_02:04:14-fs-wip-pdonnell-testing-20200903.000442-distro-basic-smithi/5401496/remote/smithi071/log/ceph-mgr.x.log.gz

[...]

It says, the command has been successfully finished and returned. And the time is at `2020-09-03T04:15:44.868+0000`, long time before the subproc command expiring time at `2020-09-03T04:17:44.565`.

[...]

This could for setting 'ceph.dir.layout.pool' xattr, it seems stuck somewhere after this.

Right, so the setxattr on "ceph.dir.subvolume" succeeds:

https://github.com/ceph/ceph/blob/e9b8a8c4933fe4a73beda66f1491304425efd6c0/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v2.py#L171

Then the mgr/volumes plugin will try to set the data pool:

https://github.com/ceph/ceph/blob/e9b8a8c4933fe4a73beda66f1491304425efd6c0/src/pybind/mgr/volumes/fs/operations/versions/subvolume_base.py#L169-L172

Note: there is no debug output between finishing the setxattr of ceph.dir.subvolume and the entry into setxattr on ceph.dir.layout.pool. We see the client shoot off the request for the osdmap version indicating that it's in the setxattr path for ceph.dir.layout.pool:

https://github.com/ceph/ceph/blob/e4a37f6338cf39e76228492897c1f2165e294dba/src/client/Client.cc#L11619

I believe that call _setxattr_maybe_wait_for_osdmap completes okay. We have no reason to believe that method will start hanging as there have been no recent changes. It makes more sense that mref_reader is hanging.

Checked the RWRef code, there is no reason it will block there. For mref_readers it will just check the state and returns under the 'S.lock', for writer's waiter helper it will hold the 'S.lock' to wait all readers to finish and in the Cond wait() it will unlock it.

The reader and writer initiailizing will just do:

152 
153   RWRef(RWRefState<T> &s, T require, bool ir=true)
154     :S(s), is_reader(ir) {     
155     ceph_assert(S.is_valid_state(require));
156    
157     std::scoped_lock l{S.lock};
158     if (likely(is_reader)) { // Readers will update the reader_cnt
159       if (S.check_reader_state(require)) {
160         S.reader_cnt++;        
161         satisfied = true;
162       }
163     } else { // Writers will update the state
164       is_reader = false;
165    
166       /*                       
167        * If the current state is not the same as "require" 
168        * then update the state and we are the first writer.
169        *
170        * Or if there already has one writer running or
171        * finished, it will let user to choose to continue
172        * or just break.
173        */
174       if (S.check_writer_state(require)) {
175         first_writer = true;
176         S.state = require;
177       }
178       satisfied = true;
179     }
180   }

The writer's waiter will do:

211   void wait_readers_done() {
212     // Only writers can wait
213     ceph_assert(!is_reader);
214 
215     std::unique_lock l{S.lock};
216 
217     S.cond.wait(l, [this] {
218       return !S.reader_cnt;
219     });
220   }

I couldn't foresee where it will be potentially blocked in RWRef code. IMO, it was probaly blocked by the 'client_lock' just after the mref_reader for some reasons.

#9 Updated by Xiubo Li 5 months ago

It was neither blocked by the "client_lock", nor by the RWRef's lock, because both kept working well:

For the tick():

2020-09-03T04:15:45.770+0000 7feb0152b700  1 -- 172.21.15.71:0/24165 --> [v2:172.21.15.71:3301/0,v1:172.21.15.71:6790/0] -- log(1 entries from seq 1098 at 2020-09-03T04:15:44.858933+0000) v1 -- 0x558a2d0fae00 con 0x558a24562800
2020-09-03T04:15:45.777+0000 7fea9a204700  1 -- 172.21.15.71:0/3974804240 --> [v2:172.21.15.143:6800/526069828,v1:172.21.15.143:6801/526069828] -- client_cap_release(1) v2 -- 0x558a2479bce0 con 0x558a26114800
2020-09-03T04:15:45.777+0000 7fea9a204700  1 -- 172.21.15.71:0/3974804240 --> [v2:172.21.15.143:6800/526069828,v1:172.21.15.143:6801/526069828] -- client_metrics v1 -- 0x558a2479b600 con 0x558a26114800
2020-09-03T04:15:45.777+0000 7fea9a204700 10 client.19878 _put_inode on 0x10000000001.head(faked_ino=0 ref=11 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-09-03T04:15:43.786513+0000 mtime=2020-09-03T04:15:43.787544+0000 ctime=2020-09-03T04:15:43.787544+0000 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x10000000000.head["_nogroup"] 0x558a1fa18580)
2020-09-03T04:15:45.777+0000 7fea9a204700 10 client.19878 _put_inode on 0x100000001f9.head(faked_ino=0 ref=6 ll_ref=0 cap_refs={} open={} mode=40700 size=0/0 nlink=1 btime=0.000000 mtime=2020-09-03T04:15:44.523119+0000 ctime=2020-09-03T04:15:44.523119+0000 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x10000000002.head[".trash"] 0x558a26a6ab00)
...
2020-09-03T04:15:47.777+0000 7fea9a204700  1 -- 172.21.15.71:0/3974804240 --> [v2:172.21.15.143:6800/526069828,v1:172.21.15.143:6801/526069828] -- client_metrics v1 -- 0x558a25246c60 con 0x558a26114800
...

For the RWRef, the "statx()" will access the RWRef's lock:

2020-09-03T04:15:45.536+0000 7fead1272700  3 client.19878 statx enter (relpath /volumes/_deleting want 2047)
2020-09-03T04:15:45.536+0000 7fead1272700 10 client.19878 path_walk #0x1/volumes/_deleting
2020-09-03T04:15:45.536+0000 7fead1272700 10 client.19878  0 0x1.head(faked_ino=0 ref=13 ll_ref=1 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-09-03T04:15:34.814369+0000 mtime=2020-09-03T04:15:43.782915+0000 ctime=2020-09-03T04:15:43.782915+0000 caps=pAsLsXs(0=pAsLsXs) COMPLETE has_dir_layout 0x558a2d0f9700) volumes
2020-09-03T04:15:45.536+0000 7fead1272700 10 client.19878 _getattr mask As issued=1
2020-09-03T04:15:45.536+0000 7fead1272700  3 client.19878 may_lookup 0x558a2d0f9700 = 0
...

Since the "statx()" could work and finish, so the unmounting operation was not started yet, or the mref_reader will just return the -ECONNECTION.

And we didn't see the path_walk related logs, so I suspect that it should be stucked by the "CompletionToken".

Thanks

#10 Updated by Xiubo Li 4 months ago

Hi Patrick,

For this let's add more debug logs to check where it is stucked in ?

I went through the client_lock/RWref/CompletionToken related code, couldn't figure out what had caused the stuck.

#11 Updated by Patrick Donnelly 4 months ago

Xiubo Li wrote:

Hi Patrick,

For this let's add more debug logs to check where it is stucked in ?

I went through the client_lock/RWref/CompletionToken related code, couldn't figure out what had caused the stuck.

Yes, I think that's reasonable. Please create a DNM PR for me ot include in testing.

#12 Updated by Xiubo Li 4 months ago

Patrick Donnelly wrote:

Xiubo Li wrote:

Hi Patrick,

For this let's add more debug logs to check where it is stucked in ?

I went through the client_lock/RWref/CompletionToken related code, couldn't figure out what had caused the stuck.

Yes, I think that's reasonable. Please create a DNM PR for me ot include in testing.

Hi Patrick,

Done and please see: https://github.com/ceph/ceph/pull/37174

Thanks

#13 Updated by Patrick Donnelly 4 months ago

  • Assignee changed from Xiubo Li to Adam Emerson

Xiubo Li wrote:

Patrick Donnelly wrote:

Xiubo Li wrote:

Hi Patrick,

For this let's add more debug logs to check where it is stucked in ?

I went through the client_lock/RWref/CompletionToken related code, couldn't figure out what had caused the stuck.

Yes, I think that's reasonable. Please create a DNM PR for me ot include in testing.

Hi Patrick,

Done and please see: https://github.com/ceph/ceph/pull/37174

Thanks

Thanks Xiubo. We now have clear evidence the hang is indeed in Objecter::wait_for_latest_osdmap. Here:

2020-09-20T12:52:00.254+0000 7f40d955d700  3 client.21147 _do_setxattr(0x10000000002, "ceph.dir.subvolume") = 0
2020-09-20T12:52:00.254+0000 7f40d955d700  1 client.21147 setxattr:11646 path /volumes/_nogroup/subvolume_0000000000689927/8b0ec551-d18c-4483-9fdb-c8b9cb802d68 name ceph.dir.layout.pool
2020-09-20T12:52:00.254+0000 7f40d955d700  1 client.21147 setxattr:11651 path /volumes/_nogroup/subvolume_0000000000689927/8b0ec551-d18c-4483-9fdb-c8b9cb802d68 name ceph.dir.layout.pool
2020-09-20T12:52:00.254+0000 7f40d955d700  1 client.21147 _setxattr_maybe_wait_for_osdmap:12030 name ceph.dir.layout.pool value 0x7f40def22f80
2020-09-20T12:52:00.254+0000 7f40d955d700  1 client.21147 _setxattr_maybe_wait_for_osdmap:12041 name ceph.dir.layout.pool value 0x7f40def22f80
2020-09-20T12:52:00.254+0000 7f40d955d700  1 -- 172.21.15.45:0/4242639671 --> [v2:172.21.15.45:3300/0,v1:172.21.15.45:6789/0] -- mon_get_version(what=osdmap handle=2) v1 -- 0x55b227d451e0 con 0x55b2302c6800
2020-09-20T12:52:00.254+0000 7f407daa5700  1 -- 172.21.15.45:0/4242639671 <== mon.0 v2:172.21.15.45:3300/0 8 ==== mon_get_version_reply(handle=2 version=614) v2 ==== 24+0+0 (secure 0 0 0) 0x55b227d451e0 con 0x55b2302c6800

From: /ceph/teuthology-archive/pdonnell-2020-09-20_07:10:11-fs-wip-pdonnell-testing-20200920.040823-distro-basic-smithi/5451397/remote/smithi045/log/ceph-mgr.x.log.gz

client is stuck here: https://github.com/ceph/ceph/pull/37174/files#diff-fbcd8ab0dd666648c8e021177ce875d2R12042

Adam, can you take a look? Any idea?

#14 Updated by Adam Emerson 4 months ago

All right, I'm going to shove some more debug information in Objecter and Monitor.

#15 Updated by Patrick Donnelly 4 months ago

Another: /ceph/teuthology-archive/pdonnell-2020-09-26_05:47:56-fs-wip-pdonnell-testing-20200926.000836-distro-basic-smithi/5471370/remote/smithi072/log/ceph-mgr.x.log.gz

#16 Updated by Adam Emerson 2 months ago

Patrick Donnelly wrote:

Another: /ceph/teuthology-archive/pdonnell-2020-09-26_05:47:56-fs-wip-pdonnell-testing-20200926.000836-distro-basic-smithi/5471370/remote/smithi072/log/ceph-mgr.x.log.gz

How regularly does this reproduce? I've got an instrumented branch and I've been having trouble seeing the hang. I've got a bunch of runs of volume functionality tests queued up to see if I can provoke it.

Do you know if there's anything that's more likely to trigger it?

#17 Updated by Patrick Donnelly 2 months ago

Adam Emerson wrote:

Patrick Donnelly wrote:

Another: /ceph/teuthology-archive/pdonnell-2020-09-26_05:47:56-fs-wip-pdonnell-testing-20200926.000836-distro-basic-smithi/5471370/remote/smithi072/log/ceph-mgr.x.log.gz

How regularly does this reproduce? I've got an instrumented branch and I've been having trouble seeing the hang. I've got a bunch of runs of volume functionality tests queued up to see if I can provoke it.

It was every 2 or so runs. I see you scheduled some runs but the priority seems to be too low: https://pulpito.ceph.com/?suite=fs

Do you know if there's anything that's more likely to trigger it?

No. I would just repeat the `-s fs:volumes` tests. It did seem to happen most frequently for test_subvolume_retain_snapshot_invalid_recreate. You can filter everything except the snapshot task by adding `--filter snapshot`.

#18 Updated by Adam Emerson 2 months ago

I have candidate fix in Teuthology at the moment, just as an update.

#19 Updated by Patrick Donnelly about 2 months ago

/ceph/teuthology-archive/pdonnell-2020-11-24_19:01:27-fs-wip-pdonnell-testing-20201123.213848-distro-basic-smithi/5654263/remote/smithi168/log/ceph-mgr.y.log

#20 Updated by Adam Emerson about 1 month ago

  • Status changed from Triaged to Fix Under Review

#21 Updated by Patrick Donnelly about 1 month ago

  • Category set to Correctness/Safety
  • Pull request ID set to 38668

#22 Updated by Patrick Donnelly about 1 month ago

  • Status changed from Fix Under Review to Resolved

#23 Updated by Patrick Donnelly 15 days ago

  • Status changed from Resolved to Need More Info

Looks like this is still here, maybe more racy than before:

2021-01-07T23:43:13.674+0000 7f5881496700  1 -- 172.21.15.133:0/969341472 --> [v2:172.21.15.133:3300/0,v1:172.21.15.133:6789/0] -- mon_get_version(what=osdmap handle=2) v1 -- 0x560c934f2000 con 0x560c935a7000
2021-01-07T23:43:13.674+0000 7f5862297700  1 -- 172.21.15.133:0/969341472 <== mon.1 v2:172.21.15.133:3300/0 8 ==== mon_get_version_reply(handle=2 version=56) v2 ==== 24+0+0 (secure 0 0 0) 0x560c934f2000 con 0x560c935a7000

From: /ceph/teuthology-archive/pdonnell-2021-01-07_22:28:17-fs-wip-pdonnell-testing-20210107.205543-distro-basic-smithi/5763097/remote/smithi133/log/ceph-mgr.y.log.gz

:(

#24 Updated by Adam Emerson 13 days ago

This doesn't look quite the same? Before we were having the operation file with a definite timeout and I can't find that in this log.

Though, what's the best way to reproduce this? I've been trying to on a branch with debug logging and haven't been able to.

#25 Updated by Patrick Donnelly 13 days ago

Adam Emerson wrote:

This doesn't look quite the same? Before we were having the operation file with a definite timeout and I can't find that in this log.

Though, what's the best way to reproduce this? I've been trying to on a branch with debug logging and haven't been able to.

I will get https://github.com/ceph/ceph/pull/38836 into master so we can logging information from now on which will indicate the issue is still present. The only way I know how to reproduce this is to rerun fs:volumes multiple times. I will give that a shot and get back to you...

#26 Updated by Patrick Donnelly 11 days ago

Patrick Donnelly wrote:

Adam Emerson wrote:

This doesn't look quite the same? Before we were having the operation file with a definite timeout and I can't find that in this log.

Though, what's the best way to reproduce this? I've been trying to on a branch with debug logging and haven't been able to.

I will get https://github.com/ceph/ceph/pull/38836 into master so we can logging information from now on which will indicate the issue is still present. The only way I know how to reproduce this is to rerun fs:volumes multiple times. I will give that a shot and get back to you...

Here it is:

2021-01-11T22:43:23.520+0000 7fb4a80d9700 20 client.6087 _setxattr_maybe_wait_for_osdmap: waiting for latest osdmap
2021-01-11T22:43:23.520+0000 7fb4a80d9700  1 -- 172.21.15.156:0/1463758597 --> [v2:172.21.15.156:3300/0,v1:172.21.15.156:6789/0] -- mon_get_version(what=osdmap handle=2) v1 -- 0x56094461d180 con 0x5609447a2c00
2021-01-11T22:43:23.520+0000 7fb486896700  1 -- 172.21.15.156:0/1463758597 <== mon.0 v2:172.21.15.156:3300/0 8 ==== mon_get_version_reply(handle=2 version=58) v2 ==== 24+0+0 (secure 0 0 0) 0x56094461d180 con 0x5609447a2c00
2021-01-11T22:43:24.160+0000 7fb49f8c8700 10 cephx: validate_tickets want 53 have 53 need 0
2021-01-11T22:43:24.160+0000 7fb49f8c8700 20 cephx client: need_tickets: want=53 have=53 need=0

From: /ceph/teuthology-archive/pdonnell-2021-01-11_22:27:13-fs:volumes-master-distro-basic-smithi/5778226/remote/smithi156/log/ceph-mgr.y.log.gz

Code (in master branch): https://github.com/ceph/ceph/blob/14b4787de14bd5ca1f85850f490ea7e752a15ff8/src/client/Client.cc#L12176-L12179

To reproduce I had to do:

teuthology-suite --force-priority --machine-type smithi --email pdonnell@redhat.com -p 60 --suite fs:volumes --ceph master --ceph-repo https://github.com/ceph/ceph.git --suite-repo https://github.com/ceph/ceph.git -S 318d3f4d80d90175439b50835f34e5896893c45b --filter test/snapshot -N 10

Only 1/40 failed so it's pretty hard to hit this.

#27 Updated by Patrick Donnelly 11 days ago

By the way, something odd I noticed today was:

https://github.com/ceph/ceph/blob/d20916964984242e513a645bd275fad89777466a/src/libcephfs.cc#L207

There may be multiple libcephfs mounts so stopping the io_context_pool looks weird. I don't know if that may be relevant to this bug.

#28 Updated by Adam Emerson 11 days ago

That is probably the underlying cause. I'll push a patch fixing that tomorrow and then a bunch of tests and see if that makes it go away.

#29 Updated by Adam Emerson 11 days ago

  • Status changed from Need More Info to Fix Under Review

Fix at: https://github.com/ceph/ceph/pull/38858

Thanks for spotting that.

#30 Updated by Patrick Donnelly 9 days ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF