Actions
Bug #59185
closedMDSMonitor: should batch propose osdmap/mdsmap changes via some fs commands
% Done:
0%
Source:
Q/A
Tags:
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Especially `fs fail`. Otherwise, you may see the MDS complain about blocklisting before it has a reasonable chance to see it's removed from the MDSMap. There's no way to completely remove this race. Example:
2023-03-27T23:11:27.641 DEBUG:teuthology.orchestra.run.smithi119:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs fail cephfs ... 2023-03-27T23:11:29.107 INFO:tasks.ceph.mds.c.smithi119.stderr:2023-03-27T23:11:29.108+0000 7f012f70c700 -1 mds.0.sessionmap _load_finish got (2) No such file or directory 2023-03-27T23:11:29.107 INFO:tasks.ceph.mds.c.smithi119.stderr:2023-03-27T23:11:29.108+0000 7f012f70c700 -1 log_channel(cluster) log [ERR] : error reading sessionmap 'mds0_sessionmap' -2 ((2) No such file or directory) 2023-03-27T23:11:29.115 INFO:tasks.ceph.mds.c.smithi119.stderr:2023-03-27T23:11:29.116+0000 7f012ef0b700 -1 mds.0.journalpointer Error writing pointer object '400.00000000': (108) Cannot send after transport endpoint shutdown 2023-03-27T23:11:29.115 INFO:tasks.ceph.mds.c.smithi119.stderr:/home/jenkins-build/b
From: /teuthology/pdonnell-2023-03-27_22:29:12-fs-wip-pdonnell-testing-20230327.200655-distro-default-smithi/7221875/teuthology.log
The mon log shows:
2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).mds e274 preprocess_query mon_command({"prefix": "fs fail", "fs_name": "cephfs"} v 0) v1 from client.? 172.21.15.119:0/2909907815 2023-03-27T23:11:28.005+0000 7f30d5c4e700 7 mon.a@0(leader).mds e274 prepare_update mon_command({"prefix": "fs fail", "fs_name": "cephfs"} v 0) v1 2023-03-27T23:11:28.005+0000 7f30d5c4e700 1 mon.a@0(leader).mds e274 fail_mds_gid 16242 mds.c role 0 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 blocklist [v2:172.21.15.119:6835/1865237494,v1:172.21.15.119:6837/1865237494] until 2023-03-28T23:11:28.006828+0000 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).paxosservice(osdmap 1..305) propose_pending 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 encode_pending e 306 2023-03-27T23:11:28.005+0000 7f30d5c4e700 1 mon.a@0(leader).osd e305 do_prune osdmap full prune enabled 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 should_prune currently holding only 304 epochs (min osdmap epochs: 500); do not prune. 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 update_pending_pgs 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 scan_for_creating_pgs already created 1 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 scan_for_creating_pgs already created 2 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 scan_for_creating_pgs already created 38 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 scan_for_creating_pgs already created 39 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 update_pending_pgs 0 pools queued 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 update_pending_pgs 0 pgs removed because they're created 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 update_pending_pgs queue remaining: 0 pools 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 update_pending_pgs 0/0 pgs added from queued pools 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).osd e305 encode_pending encoding full map with reef features 1080873256688364036 2023-03-27T23:11:28.005+0000 7f30d5c4e700 20 mon.a@0(leader).osd e305 full_crc 3543290452 inc_crc 3723423259 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader) e1 log_health updated 0 previous 0 2023-03-27T23:11:28.005+0000 7f30d5c4e700 5 mon.a@0(leader).paxos(paxos active c 2009..2668) queue_pending_finisher 0x5637d773f260 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).paxos(paxos active c 2009..2668) trigger_propose active, proposing now 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).paxos(paxos active c 2009..2668) propose_pending 2669 7045 bytes 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).paxos(paxos updating c 2009..2668) begin for 2669 7045 bytes 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).paxos(paxos updating c 2009..2668) sending begin to mon.1 2023-03-27T23:11:28.005+0000 7f30d5c4e700 1 -- [v2:172.21.15.119:3300/0,v1:172.21.15.119:6789/0] send_to--> mon [v2:172.21.15.154:3300/0,v1:172.21.15.154:6789/0] -- paxos(begin lc 2668 fc 0 pn 200 opn 0) v4 -- ?+0 0x5637d55c0c00 2023-03-27T23:11:28.005+0000 7f30d5c4e700 1 -- [v2:172.21.15.119:3300/0,v1:172.21.15.119:6789/0] --> [v2:172.21.15.154:3300/0,v1:172.21.15.154:6789/0] -- paxos(begin lc 2668 fc 0 pn 200 opn 0) v4 -- 0x5637d55c0c00 con 0x5637d415f400 2023-03-27T23:11:28.005+0000 7f30d5c4e700 10 mon.a@0(leader).paxos(paxos updating c 2009..2668) sending begin to mon.2 2023-03-27T23:11:28.005+0000 7f30d5c4e700 1 -- [v2:172.21.15.119:3300/0,v1:172.21.15.119:6789/0] send_to--> mon [v2:172.21.15.154:3301/0,v1:172.21.15.154:6790/0] -- paxos(begin lc 2668 fc 0 pn 200 opn 0) v4 -- ?+0 0x5637d608b800 2023-03-27T23:11:28.005+0000 7f30d5c4e700 1 -- [v2:172.21.15.119:3300/0,v1:172.21.15.119:6789/0] --> [v2:172.21.15.154:3301/0,v1:172.21.15.154:6790/0] -- paxos(begin lc 2668 fc 0 pn 200 opn 0) v4 -- 0x5637d608b800 con 0x5637d415f000
From: /teuthology/pdonnell-2023-03-27_22:29:12-fs-wip-pdonnell-testing-20230327.200655-distro-default-smithi/7221875/remote/smithi119/log/ceph-mon.a.log.gz
paxos began a proposal when we triggered the osdmon to propose but before the mdsmon could also propose its pending changes.
Updated by Patrick Donnelly about 1 year ago
- Status changed from New to Fix Under Review
- Pull request ID set to 50700
Updated by Patrick Donnelly 11 months ago
- Status changed from Fix Under Review to Rejected
Obsoleted by #59314.
Actions