Bug #7736
closedmon: can expose stale state
0%
Description
teuthology-2014-03-14_19:00:49-rados-dumpling-testing-basic-plana/130941
here is a deletion on the primary:
2014-03-14 19:41:37.801055 7f179a1df700 10 mon.a@0(leader).paxos(paxos updating c 1574..1590) handle_accept paxos(accept lc 1590 fc 0 pn 2500 opn 0) v3 2014-03-14 19:41:37.801060 7f179a1df700 10 mon.a@0(leader).paxos(paxos updating c 1574..1590) now 0,3,5,7,8 have accepted 2014-03-14 19:41:37.801064 7f179a1df700 10 mon.a@0(leader).paxos(paxos updating c 1574..1590) got majority, committing ... 2014-03-14 19:41:37.874952 7f179a1df700 20 mon.a@0(leader).osd e396 _pool_op_reply 0 2014-03-14 19:41:37.874955 7f179a1df700 1 -- 10.214.131.2:6789/0 --> 10.214.131.3:0/1022737 -- pool_op_reply(tid 1 (0) Success v396) v1 -- ?+0 0x2c1b6c0 con 0x2605c60
but, on one of the peons, we get a racing create for the same pool that succeeds (as a no-op)
2014-03-14 19:41:37.888531 7fe41f4b9700 1 -- 10.214.131.2:6791/0 <== client.5287 10.214.131.3:0/1022772 7 ==== pool_op(create pool 0 auid 0 tid 1 name foo v0) v4 ==== 68+0+0 (3773679816 0 0) 0x2046000 con 0x2296000 ... 2014-03-14 19:41:37.888617 7fe41f4b9700 20 mon.c@4(peon).osd e395 _pool_op_reply 0 2014-03-14 19:41:37.888620 7fe41f4b9700 1 -- 10.214.131.2:6791/0 --> 10.214.131.3:0/1022772 -- pool_op_reply(tid 1 (0) Success v395) v1 -- ?+0 0x216a000 con 0x2296000
we committed before getting all quorum members to acknowledge the agree, or otherwise agree not to expose the prior state.
Updated by Greg Farnum about 10 years ago
This is committed state. Looks like we need to wait for everybody to ack or for leases to expire before responding to users, though.
Updated by Sage Weil about 10 years ago
- Subject changed from mon: can expose uncommitted state to mon: can expose stale state
Updated by Sage Weil about 10 years ago
- Status changed from 12 to Fix Under Review
Updated by Joao Eduardo Luis about 10 years ago
This works for me.
Also noticed that a custom crafted message could trigger a commit with the old code. Apparently we were not checking if the monitor sending us the accept is in the quorum, and as long as it managed to send the correct pn and an appropriate 'last_committed', then we would consider the accept. Small window of opportunity and all, but this patch closes it completely.
Updated by Joao Eduardo Luis about 10 years ago
- Status changed from Fix Under Review to 15
Updated by Sage Weil about 10 years ago
- Status changed from 7 to Pending Backport
Updated by Sage Weil about 10 years ago
- Status changed from Pending Backport to Resolved