Project

General

Profile

Actions

Bug #8165

closed

mon: subscribe doesn't wait for PaxosService readable

Added by Sage Weil about 10 years ago. Updated about 10 years ago.

Status:
Duplicate
Priority:
Urgent
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/a/teuthology-2014-04-20_19:33:18-upgrade:dumpling-x:parallel-firefly---basic-plana/205300

2014-04-20T21:15:48.006 INFO:teuthology.task.mon_thrash.mon_thrasher:killing mon.b
2014-04-20T21:15:48.006 INFO:teuthology.task.mon_thrash.mon_thrasher:reviving mon.b
2014-04-20T21:15:48.006 INFO:teuthology.task.ceph.mon.b:Restarting
2014-04-20T21:15:48.006 DEBUG:teuthology.orchestra.run:Running [10.214.133.39]: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f -i b'
2014-04-20T21:15:48.009 INFO:teuthology.task.ceph.mon.b:Started
2014-04-20T21:15:48.009 INFO:teuthology.task.mon_thrash.ceph_manager:waiting for quorum size 3
2014-04-20T21:15:48.009 DEBUG:teuthology.orchestra.run:Running [10.214.133.37]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph quorum_status'
2014-04-20T21:15:48.119 INFO:teuthology.orchestra.run.err:[10.214.133.37]: 2014-04-20 21:15:48.118949 7f8e8c7e4700  0 -- :/1019991 >> 10.214.133.39:6789/0 pipe(0x7f8e8801a610 sd=7 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8e8801a880).fault
2014-04-20T21:15:48.696 INFO:teuthology.task.workunit.client.1.out:[10.214.133.32]: [ RUN      ] LibRadosListEC.ListObjects
2014-04-20T21:15:48.697 INFO:teuthology.task.workunit.client.1.out:[10.214.133.32]: test/librados/TestCase.cc:195: Failure
2014-04-20T21:15:48.697 INFO:teuthology.task.workunit.client.1.out:[10.214.133.32]: Value of: rados_ioctx_create(cluster, pool_name.c_str(), &ioctx)
2014-04-20T21:15:48.697 INFO:teuthology.task.workunit.client.1.out:[10.214.133.32]:   Actual: -2
2014-04-20T21:15:48.697 INFO:teuthology.task.workunit.client.1.out:[10.214.133.32]: Expected: 0
2014-04-20T21:15:48.721 INFO:teuthology.task.workunit.client.1.err:[10.214.133.32]: Segmentation fault (core dumped)


we need to do the same thing we just fixed up in handle_get_version() for handle_subscribe().

also, there may be a problem with paxos:

2014-04-20 21:15:51.085280 7f503c1cb700  1 -- 10.214.133.39:6789/0 --> mon.0 10.214.133.37:6789/0 -- paxos(lease_ack lc 2587 fc 2009 pn 0 opn 0) v3 -- ?+0 0x221fc80
2014-04-20 21:15:51.085293 7f503c1cb700 20 mon.b@1(peon).paxos(paxos active c 2009..2587) reset_lease_timeout - setting timeout event
2014-04-20 21:15:51.085301 7f503c1cb700  1 mon.b@1(peon).paxos(paxos active c 2009..2587) is_readable now=2014-04-20 21:15:51.085302 lease_expire=2014-04-20 21:15:56.056561 has v0 lc 2587
...(that is valid)...
...
2014-04-20 21:15:51.596630 7f503c1cb700  1 -- 10.214.133.39:6789/0 <== client.6400 10.214.133.37:0/1020059 5 ==== mon_subscribe({monmap=2+,osdmap=0}) v2 ==== 42+0+0 (2069260600 0 0) 0x2090380 con 0x218d2c0
2014-04-20 21:15:51.596640 7f503c1cb700 20 mon.b@1(peon) e1 have connection
2014-04-20 21:15:51.596641 7f503c1cb700 20 mon.b@1(peon) e1 ms_dispatch existing session MonSession: client.? 10.214.133.37:0/1020059 is openallow * for client.? 10.214.133.37:0/1020059
2014-04-20 21:15:51.596649 7f503c1cb700 20 mon.b@1(peon) e1  caps allow *
2014-04-20 21:15:51.596651 7f503c1cb700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({monmap=2+,osdmap=0}) v2
2014-04-20 21:15:51.596656 7f503c1cb700 10 mon.b@1(peon) e1 check_sub monmap next 2 have 1
2014-04-20 21:15:51.596659 7f503c1cb700 20 is_capable service=osd command= read on cap allow *
2014-04-20 21:15:51.596661 7f503c1cb700 20  allow so far , doing grant allow *
2014-04-20 21:15:51.596662 7f503c1cb700 20  allow all
2014-04-20 21:15:51.596663 7f503c1cb700 10 mon.b@1(peon).osd e338 check_sub 0x2260dc0 next 0 (onetime)
2014-04-20 21:15:51.596735 7f503c1cb700  1 -- 10.214.133.39:6789/0 --> client.? 10.214.133.37:0/1020059 -- osd_map(338..338 src has 1..338) v3 -- ?+0 0x2229480
2014-04-20 21:15:51.596750 7f503c1cb700  1 -- 10.214.133.39:6789/0 --> client.6400 10.214.133.37:0/1020059 -- mon_subscribe_ack(300s) v1 -- ?+0 0x2090000
...(we don't check for readable.. but readable is true anyway)...
...
 2014-04-20 21:15:52.323737 7f503c1cb700  1 -- 10.214.133.39:6789/0 <== mon.0 10.214.133.37:6789/0 1573490747 ==== paxos(begin lc 2587 fc 0 pn 4000 opn 0) v3 ==== 12691+0+0 (942881314 0 0) 0x221fc80 con 0x1eb7b00
2014-04-20 21:15:52.323775 7f503c1cb700 20 mon.b@1(peon) e1 have connection
2014-04-20 21:15:52.323780 7f503c1cb700 20 mon.b@1(peon) e1 ms_dispatch existing session MonSession: mon.0 10.214.133.37:6789/0 is openallow * for mon.0 10.214.133.37:6789/0
2014-04-20 21:15:52.323790 7f503c1cb700 20 mon.b@1(peon) e1  caps allow *
2014-04-20 21:15:52.323792 7f503c1cb700 20 is_capable service=mon command= exec on cap allow *
2014-04-20 21:15:52.323794 7f503c1cb700 20  allow so far , doing grant allow *
2014-04-20 21:15:52.323795 7f503c1cb700 20  allow all
2014-04-20 21:15:52.323797 7f503c1cb700 10 mon.b@1(peon).paxos(paxos active c 2009..2587) handle_begin paxos(begin lc 2587 fc 0 pn 4000 opn 0) v3
2014-04-20 21:15:52.323801 7f503c1cb700 10 mon.b@1(peon).paxos(paxos updating c 2009..2587) accepting value for 2588 pn 4000
2014-04-20 21:15:52.351714 7f503c1cb700  1 -- 10.214.133.39:6789/0 --> mon.0 10.214.133.37:6789/0 -- paxos(accept lc 2587 fc 0 pn 4000 opn 0) v3 -- ?+0 0x221f780
2014-04-20 21:15:52.382238 7f503c1cb700  1 -- 10.214.133.39:6789/0 <== mon.0 10.214.133.37:6789/0 1573490748 ==== paxos(commit lc 2588 fc 0 pn 4000 opn 0) v3 ==== 12691+0+0 (200593551 0 0) 0x221f780 con 0x1eb7b00
2014-04-20 21:15:52.382273 7f503c1cb700 20 mon.b@1(peon) e1 have connection
2014-04-20 21:15:52.382277 7f503c1cb700 20 mon.b@1(peon) e1 ms_dispatch existing session MonSession: mon.0 10.214.133.37:6789/0 is openallow * for mon.0 10.214.133.37:6789/0
2014-04-20 21:15:52.382287 7f503c1cb700 20 mon.b@1(peon) e1  caps allow *
2014-04-20 21:15:52.382290 7f503c1cb700 20 is_capable service=mon command= exec on cap allow *
2014-04-20 21:15:52.382291 7f503c1cb700 20  allow so far , doing grant allow *
2014-04-20 21:15:52.382292 7f503c1cb700 20  allow all
2014-04-20 21:15:52.382294 7f503c1cb700 10 mon.b@1(peon).paxos(paxos updating c 2009..2587) handle_commit on 2588
2014-04-20 21:15:52.382297 7f503c1cb700 10 mon.b@1(peon).paxos(paxos updating c 2009..2588) store_state [2588..2588]
2014-04-20 21:15:52.410346 7f503c1cb700 10 mon.b@1(peon) e1 refresh_from_paxos
2014-04-20 21:15:52.410407 7f503c1cb700 10 mon.b@1(peon).paxosservice(pgmap 504..1145) refresh
2014-04-20 21:15:52.410450 7f503c1cb700 10 mon.b@1(peon).paxosservice(mdsmap 1..15) refresh
2014-04-20 21:15:52.410495 7f503c1cb700 10 mon.b@1(peon).paxosservice(osdmap 1..338) refresh
2014-04-20 21:15:52.410550 7f503c1cb700 10 mon.b@1(peon).paxosservice(logm 504..1044) refresh
...

Actions #1

Updated by Ian Colle about 10 years ago

  • Assignee set to Joao Eduardo Luis
Actions #2

Updated by Joao Eduardo Luis about 10 years ago

  • Status changed from 12 to In Progress
Actions #3

Updated by Sage Weil about 10 years ago

  • Status changed from In Progress to Duplicate

dup of #7997

Actions

Also available in: Atom PDF