Project

General

Profile

Bug #16729

run_mon() fails in rados-striper.sh

Added by Kefu Chai almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

ceph client is not able to connect to ceph-mon with the bound address. "ceph osd pool delete rbd rbd --yes-i-really-really-mean-it" fails because of connection timeout.

ceph client log

2016-07-18 19:53:47.599627 7fa516ffd700  0 -- :/2375284273 >> 127.0.0.1:7116/0 pipe(0x7fa50c00a3e0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7fa50c005450).fault
2016-07-18 19:53:50.558628 7fa5200ee700  0 monclient(hunting): authenticate timed out after 300
2016-07-18 19:53:50.558717 7fa5200ee700  0 librados: client.admin authentication error (110) Connection timed out
Error connecting to cluster: TimedOut

mon.a.log

2016-07-18 19:48:50.232548 7f0c083ea4c0 10 mon.a@0(leader).auth v0 create_pending v 1
2016-07-18 19:48:50.232551 7f0c083ea4c0 10 mon.a@0(leader).auth v0 create_initial -- creating initial map
2016-07-18 19:48:50.232759 7f0c083ea4c0 10 mon.a@0(leader).auth v0 check_rotate updated rotating
2016-07-18 19:48:50.232771 7f0c083ea4c0 10 mon.a@0(leader).paxosservice(auth 0..0) propose_pending
2016-07-18 19:48:50.232782 7f0c083ea4c0 10 mon.a@0(leader).auth v0 encode_pending v 1
2016-07-18 19:48:50.232800 7f0c083ea4c0  5 mon.a@0(leader).paxos(paxos writing c 0..0) queue_pending_finisher 0xb18c250
2016-07-18 19:48:50.232804 7f0c083ea4c0 10 mon.a@0(leader).paxos(paxos writing c 0..0) trigger_propose not active, will propose later
2016-07-18 19:48:50.232805 7f0c083ea4c0 10 mon.a@0(leader).data_health(2) start_epoch epoch 2
2016-07-18 19:48:50.232811 7f0c083ea4c0  1 mon.a@0(leader) e0 apply_quorum_to_compatset_features enabling new quorum features: compat={},rocompat={},incompat={4=support erasure code pools,5=new-style osdmap encoding,6=support isa/lrc erasure code,7=support shec erasure code}
2016-07-18 19:48:50.233244 7f0c083ea4c0 10 mon.a@0(leader) e0 apply_compatset_features_to_quorum_requirements required_features 9025616074506240
2016-07-18 19:48:50.233253 7f0c083ea4c0 10 mon.a@0(leader) e0 timecheck_finish
2016-07-18 19:48:50.233256 7f0c083ea4c0 10 mon.a@0(leader) e0 resend_routed_requests
2016-07-18 19:48:50.233257 7f0c083ea4c0 10 mon.a@0(leader) e0 register_cluster_logger
2016-07-18 19:48:50.251223 7f0bff949700 10 accepter.accepter starting
2016-07-18 19:48:50.251235 7f0bff949700 20 accepter.accepter calling poll
2016-07-18 19:48:50.251233 7f0c0194d700 10 -- 127.0.0.1:7116/0 reaper_entry start
2016-07-18 19:48:50.251246 7f0c0194d700 10 -- 127.0.0.1:7116/0 reaper
2016-07-18 19:48:50.251248 7f0c0194d700 10 -- 127.0.0.1:7116/0 reaper done
2016-07-18 19:48:50.251258 7f0c0014a700 20 -- 127.0.0.1:7116/0 queue 0xb28cb40 prio 196
2016-07-18 19:48:50.252589 7f0c0094b700  1 -- 127.0.0.1:7116/0 <== mon.0 127.0.0.1:7116/0 0 ==== log(1 entries from seq 1 at 2016-07-18 19:48:50.231541) v1 ==== 0+0+0 (0 0 0) 0xb28cb40 con 0xb1c8800

it's not reproducible, and is very likely a racing issue.

see https://jenkins.ceph.com/job/ceph-pull-requests/9270/console

test.log View (43.6 KB) Kefu Chai, 07/19/2016 01:16 AM

History

#1 Updated by Kefu Chai almost 4 years ago

  • Status changed from New to Fix Under Review

#2 Updated by Kefu Chai almost 4 years ago

  • Status changed from Fix Under Review to Resolved
  • Assignee set to Kefu Chai

Also available in: Atom PDF