Project

General

Profile

Actions

Bug #3675

closed

osd: hang during intial peering

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
bobtail
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

the initial wait for healthy blocked on 2 pgs. ms inject socket failres = 500. everything was up.

no logs, so it's unclear why.

there were lots of messages like

2012-12-21 14:15:45.285022 7fc27eca0700  0 -- 10.214.132.10:6807/5960 >> 10.214.132.10:6802/5955 pipe(0x302cb40 sd=33 :6807 pgs=8 cs=3 l=0).reader got old message 1 <= 3 0x2fb4c40 pg_query(2.13 epoch 4) v2, discarding
2012-12-21 14:15:45.285233 7fc27eca0700  0 -- 10.214.132.10:6807/5960 >> 10.214.132.10:6802/5955 pipe(0x302cb40 sd=33 :6807 pgs=8 cs=3 l=0).reader got old message 2 <= 3 0x2fb4c40 pg_notify(2.14 epoch 4) v4, discarding
2012-12-21 14:15:45.285342 7fc27eca0700  0 -- 10.214.132.10:6807/5960 >> 10.214.132.10:6802/5955 pipe(0x302cb40 sd=33 :6807 pgs=8 cs=3 l=0).reader got old message 3 <= 3 0x2fb4c40 pg_notify(0.16 epoch 4) v4, discarding

but no messages related to the hung pgs.

job was

ubuntu@teuthology:/a/sage-2012-12-21_13:39:03-regression-next-testing-basic/19821$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: ec18aeecd4de479601363849d489668d8f12410c
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 500
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: accce830514c6b099eb0e00a8ae34396d14565a3
  s3tests:
    branch: next
  workunit:
    sha1: accce830514c6b099eb0e00a8ae34396d14565a3
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      client.0:
      - rados/test_python.sh

but this was before the workunit ran, so just doing the cluster setup/teardown should reproduce eventually.


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #4271: osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)ResolvedSage Weil02/26/2013

Actions
Actions #1

Updated by Sage Weil over 11 years ago

this is a messenger bug. if there is a socket error at the end of accept(), after the register_pipe(), we then fail and discard the queue. but it's possible that during that duration a racing thread queued an outgoing message on the connection... and we don't want that to get thrown out!

Actions #2

Updated by Sage Weil over 11 years ago

ubuntu@teuthology:/a/sage-peer1/21827

Actions #3

Updated by Sage Weil over 11 years ago

it took about 1500 iterations of this job to reproduce the hang:

ubuntu@teuthology:/a/sage-peer1/21827$ cat orig.config.yaml 
nuke-on-error: true
overrides:
  ceph:
    branch: next
    conf:
      global:
        ms inject socket failures: 500
      osd:
        debug ms: 20
        debug osd: 20
    fs: btrfs
    log-whitelist:
    - slow request
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
tasks:
- clock: null
- ceph: null

Actions #4

Updated by Sage Weil over 11 years ago

2012-12-22 19:37:06.020016 7f7705d02700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=0 cs=0 l=0).accept:  setting up session_security.
2012-12-22 19:37:06.020021 7f770de1a700 10 osd.1 pg_epoch: 3 pg[2.5( empty local-les=0 n=0 ec=1 les/c 0/1 3/3/3) [1,4] r=0 lpr=3 pi=1-2/2 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>:  querying info from osd.4
2012-12-22 19:37:06.020034 7f7705d02700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=0 cs=0 l=0).accept new session
2012-12-22 19:37:06.020047 7f7705d02700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).accept success, connect_seq = 1, sending READY
2012-12-22 19:37:06.020063 7f7705d02700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).accept features 67108863
2012-12-22 19:37:06.020082 7f7705d02700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).register_pipe
2012-12-22 19:37:06.020104 7f7705d02700  0 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).injecting socket failure
2012-12-22 19:37:06.020109 7f770de1a700 10 -- 10.214.132.25:6804/15154 get_connection osd.4 10.214.132.24:6801/13245 existing 0x3ac6900
2012-12-22 19:37:06.020128 7f770de1a700 20 osd.1 3 _share_map_outgoing 0x3aaa6e0 don't know epoch, doing nothing
2012-12-22 19:37:06.020132 7f770de1a700  7 osd.1 3 do_queries querying osd.4 on 1 PGs
2012-12-22 19:37:06.020143 7f770de1a700  1 -- 10.214.132.25:6804/15154 --> 10.214.132.24:6801/13245 -- pg_query(2.5 epoch 3) v2 -- ?+0 0x3b448c0 con 0x3aaa6e0
2012-12-22 19:37:06.020137 7f7705d02700  2 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).fault 32: Broken pipe
2012-12-22 19:37:06.020155 7f770de1a700 20 -- 10.214.132.25:6804/15154 submit_message pg_query(2.5 epoch 3) v2 remote, 10.214.132.24:6801/13245, have pipe.
2012-12-22 19:37:06.020156 7f7705d02700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).fault already closed|closing
2012-12-22 19:37:06.020191 7f7705d02700 10 -- 10.214.132.25:6804/15154 queue_reap 0x3ac6900
2012-12-22 19:37:06.020210 7f771662b700 10 -- 10.214.132.25:6804/15154 reaper
2012-12-22 19:37:06.020224 7f771662b700 10 -- 10.214.132.25:6804/15154 reaper reaping pipe 0x3ac6900 10.214.132.24:6801/13245
2012-12-22 19:37:06.020235 7f771662b700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).discard_queue
2012-12-22 19:37:06.020232 7f770de1a700 10 osd.1 3 pg[2.5( empty local-les=0 n=0 ec=1 les/c 0/1 3/3/3) [1,4] r=0 lpr=3 pi=1-2/2 mlcod 0'0 peering] is new
2012-12-22 19:37:06.020248 7f771662b700 20 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).  discard 0x3b448c0
2012-12-22 19:37:06.020266 7f771662b700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).unregister_pipe
2012-12-22 19:37:06.020258 7f770de1a700 10 osd.1 pg_epoch: 3 pg[2.5( empty local-les=0 n=0 ec=1 les/c 0/1 3/3/3) [1,4] r=0 lpr=3 pi=1-2/2 mlcod 0'0 peering] notify (query_epoch:3, epoch_sent:3, info:2.5( empty local-les=0 n=0 ec=1 les/c 0/1 3/3/3)) from osd.5
2012-12-22 19:37:06.020205 7f7705d02700 10 -- 10.214.132.25:6804/15154 >> 10.214.132.24:6801/13245 pipe(0x3ac6900 sd=32 :6804 pgs=5 cs=1 l=0).reader done

the message is queued after fault... we don't check the pipe state or anything else after we take the pipe_lock in Pipe::send()
Actions #5

Updated by Sage Weil over 11 years ago

ok, this is actually also a race that can cause the register_pipe assert. the locking needs to be reworked here. putting a sleep(2) in fault() after pipe_lock unlock and msgr->lock lock got me

2012-12-22 21:37:30.360401 7fe85bfe7700  1 -- 10.3.64.22:0/21242 --> 10.3.64.22:6802/20741 -- osd_ping(ping e131 stamp 2012-12-22 21:37:30.360322) v2 -- ?+0 0x29dc510 con 0x7fe85403b900
2012-12-22 21:37:30.360553 7fe85bfe7700 20 -- 10.3.64.22:0/21242 submit_message osd_ping(ping e131 stamp 2012-12-22 21:37:30.360322) v2 remote, 10.3.64.22:6802/20741, had pipe, but it closed.
2012-12-22 21:37:30.360619 7fe85bfe7700 20 -- 10.3.64.22:0/21242 submit_message osd_ping(ping e131 stamp 2012-12-22 21:37:30.360322) v2 remote, 10.3.64.22:6802/20741, new pipe.
2012-12-22 21:37:30.360638 7fe85bfe7700 10 -- 10.3.64.22:0/21242 connect_rank to 10.3.64.22:6802/20741, creating pipe and registering
2012-12-22 21:37:30.361036 7fe85bfe7700 10 -- 10.3.64.22:0/21242 >> :/0 pipe(0x29dcc10 sd=-1 :0 s=1 pgs=0 cs=0 l=0).randomize_out_seq 732632941
2012-12-22 21:37:30.361324 7fe85bfe7700 10 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=-1 :0 s=1 pgs=0 cs=0 l=1).register_pipe
2012-12-22 21:37:30.361350 7fe86c6f7700 10 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=-1 :0 s=1 pgs=0 cs=0 l=1).writer: state = connecting policy.server=0
2012-12-22 21:37:30.361451 7fe86c6f7700 10 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=-1 :0 s=1 pgs=0 cs=0 l=1).connect 0
2012-12-22 21:37:30.361547 7fe86c6f7700 10 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).connecting to 10.3.64.22:6802/20741
2012-12-22 21:37:30.361773 7fe86c6f7700  2 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).connect error 10.3.64.22:6802/20741, 111: Connection refused
2012-12-22 21:37:30.361805 7fe86c6f7700  2 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).fault 111: Connection refused
2012-12-22 21:37:30.361840 7fe86c6f7700  0 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).fault
2012-12-22 21:37:30.361854 7fe86c6f7700 10 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).writer: state = connecting policy.server=0
2012-12-22 21:37:30.361865 7fe86c6f7700 10 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).connect 0
2012-12-22 21:37:30.361936 7fe86c6f7700 10 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).connecting to 10.3.64.22:6802/20741
2012-12-22 21:37:30.362004 7fe86c6f7700  2 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).connect error 10.3.64.22:6802/20741, 111: Connection refused
2012-12-22 21:37:30.362031 7fe86c6f7700  2 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).fault 111: Connection refused
2012-12-22 21:37:30.362052 7fe86c6f7700 10 -- 10.3.64.22:0/21242 >> 10.3.64.22:6802/20741 pipe(0x29dcc10 sd=34 :0 s=1 pgs=0 cs=0 l=1).fault waiting 0.200000
2012-12-22 21:37:30.380333 7fe85bfe7700 -1 msg/Pipe.cc: In function 'void Pipe::register_pipe()' thread 7fe85bfe7700 time 2012-12-22 21:37:30.361441
msg/Pipe.cc: 1039: FAILED assert(msgr->rank_pipe.count(peer_addr) == 0)

 ceph version 0.55.1-352-gbf98c06 (bf98c062aa3a688f7caa7dea25b15ef1f9c14ad3)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x11a29ec]
 2: (Pipe::register_pipe()+0x194) [0x127fb42]
 3: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*, Message*)+0x2de) [0x11844c2]
 4: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0xae7) [0x1185533]
 5: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x2e1) [0x1182a87]
 6: (SimpleMessenger::send_message(Message*, Connection*)+0x2d) [0x1189041]
 7: (OSD::heartbeat()+0x7f5) [0xea16c5]
 8: (OSD::heartbeat_entry()+0x41) [0xea03c1]
 9: (OSD::T_Heartbeat::entry()+0x1c) [0xed282c]
 10: (Thread::_entry_func(void*)+0x23) [0x119113d]
 11: (()+0x6b50) [0x7fe8772cdb50]
 12: (clone()+0x6d) [0x7fe8759506dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #6

Updated by Sage Weil over 11 years ago

  • Status changed from 12 to Fix Under Review

wip-3675

Actions #7

Updated by Sage Weil over 11 years ago

  • Priority changed from Urgent to High
Actions #8

Updated by Sage Weil over 11 years ago

  • Status changed from Fix Under Review to Resolved
  • Backport set to bobtail
Actions

Also available in: Atom PDF