Project

General

Profile

Actions

Bug #7736

closed

mon: can expose stale state

Added by Sage Weil about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
emperor, dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

teuthology-2014-03-14_19:00:49-rados-dumpling-testing-basic-plana/130941

here is a deletion on the primary:

2014-03-14 19:41:37.801055 7f179a1df700 10 mon.a@0(leader).paxos(paxos updating c 1574..1590) handle_accept paxos(accept lc 1590 fc 0 pn 2500 opn 0) v3
2014-03-14 19:41:37.801060 7f179a1df700 10 mon.a@0(leader).paxos(paxos updating c 1574..1590)  now 0,3,5,7,8 have accepted
2014-03-14 19:41:37.801064 7f179a1df700 10 mon.a@0(leader).paxos(paxos updating c 1574..1590)  got majority, committing
...
2014-03-14 19:41:37.874952 7f179a1df700 20 mon.a@0(leader).osd e396 _pool_op_reply 0
2014-03-14 19:41:37.874955 7f179a1df700  1 -- 10.214.131.2:6789/0 --> 10.214.131.3:0/1022737 -- pool_op_reply(tid 1 (0) Success v396) v1 -- ?+0 0x2c1b6c0 con 0x2605c60

but, on one of the peons, we get a racing create for the same pool that succeeds (as a no-op)

2014-03-14 19:41:37.888531 7fe41f4b9700  1 -- 10.214.131.2:6791/0 <== client.5287 10.214.131.3:0/1022772 7 ==== pool_op(create pool 0 auid 0 tid 1 name foo v0) v4 ==== 68+0+0 (3773679816 0 0) 0x2046000 con 0x2296000
...
2014-03-14 19:41:37.888617 7fe41f4b9700 20 mon.c@4(peon).osd e395 _pool_op_reply 0
2014-03-14 19:41:37.888620 7fe41f4b9700  1 -- 10.214.131.2:6791/0 --> 10.214.131.3:0/1022772 -- pool_op_reply(tid 1 (0) Success v395) v1 -- ?+0 0x216a000 con 0x2296000

we committed before getting all quorum members to acknowledge the agree, or otherwise agree not to expose the prior state.


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #7689: librados: ENOENT on ioctx createDuplicateSage Weil03/11/2014

Actions
Actions #1

Updated by Greg Farnum about 10 years ago

This is committed state. Looks like we need to wait for everybody to ack or for leases to expire before responding to users, though.

Actions #2

Updated by Ian Colle about 10 years ago

  • Assignee set to Joao Eduardo Luis
Actions #3

Updated by Sage Weil about 10 years ago

  • Subject changed from mon: can expose uncommitted state to mon: can expose stale state
Actions #4

Updated by Sage Weil about 10 years ago

  • Status changed from 12 to Fix Under Review
Actions #5

Updated by Joao Eduardo Luis about 10 years ago

This works for me.

Also noticed that a custom crafted message could trigger a commit with the old code. Apparently we were not checking if the monitor sending us the accept is in the quorum, and as long as it managed to send the correct pn and an appropriate 'last_committed', then we would consider the accept. Small window of opportunity and all, but this patch closes it completely.

Actions #6

Updated by Joao Eduardo Luis about 10 years ago

  • Status changed from Fix Under Review to 15
Actions #7

Updated by Sage Weil about 10 years ago

  • Status changed from 15 to 7
Actions #8

Updated by Sage Weil about 10 years ago

  • Status changed from 7 to Pending Backport
Actions #9

Updated by Sage Weil about 10 years ago

  • Priority changed from Urgent to High
Actions #10

Updated by Sage Weil about 10 years ago

  • Assignee deleted (Joao Eduardo Luis)
Actions #11

Updated by Sage Weil about 10 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF