Actions
Bug #5440
closedosd: marked down due to no pgstats reports
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2013-06-24T02:04:34.124 INFO:teuthology.task.ceph.mon.b.err:2013-06-24 02:04:37.762017 7fe7e462b700 -1 mon.b@0(leader).osd e454 no osd or pg stats from osd.4 since 2013-06-24 01:49:37.718715, 900.043243 seconds ago. marking down
but hte osd didn't crash?
job was
ubuntu@teuthology:/a/teuthology-2013-06-24_01:00:12-rados-master-testing-basic/43954$ cat orig.config.yaml kernel: kdb: true sha1: 3d740946b3b79d51f07d9a735a5fb77a849f57dd machine_type: plana nuke-on-error: true overrides: admin_socket: branch: master ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 20 debug paxos: 20 fs: xfs log-whitelist: - slow request sha1: 134d08a9654f66634b893d493e4a92f38acc63cf install: ceph: sha1: 134d08a9654f66634b893d493e4a92f38acc63cf s3tests: branch: master workunit: sha1: 134d08a9654f66634b893d493e4a92f38acc63cf roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000
Updated by Sage Weil almost 11 years ago
ubuntu@teuthology:/a/teuthology-2013-06-25_01:00:06-rados-next-testing-basic/45417
in mon log, osd msgs suddenly stop
in osd log, i see
2013-06-25 01:11:41.187817 7fca012ea700 0 monclient: hunting for new mon 2013-06-25 01:11:42.476461 7fca012ea700 0 osd.0 3 crush map has features 1073741824, adjusting msgr requires for clients 2013-06-25 01:11:42.476468 7fca012ea700 0 osd.0 3 crush map has features 1073741824, adjusting msgr requires for osds 2013-06-25 01:11:42.476961 7fc9f75d2700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115000 sd=33 :6806 s=0 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 0 state connecting 2013-06-25 01:11:42.478145 7fc9f7ad7700 0 -- 10.214.133.23:6805/22388 >> 10.214.133.23:6789/0 pipe(0x30ce780 sd=32 :47712 s=2 pgs=5 cs=1 l=1).injecting socket failure 2013-06-25 01:11:42.478271 7fca012ea700 0 monclient: hunting for new mon 2013-06-25 01:11:42.482016 7fc9f75d2700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115000 sd=33 :6806 s=2 pgs=1 cs=1 l=0).fault, initiating reconnect 2013-06-25 01:11:42.482536 7fc9f71ce700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115a00 sd=45 :6806 s=0 pgs=0 cs=0 l=0).accept connect_seq 2 vs existing 2 state connecting 2013-06-25 01:11:42.488608 7fc9f71ce700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115a00 sd=45 :6806 s=2 pgs=2 cs=3 l=0).fault, initiating reconnect 2013-06-25 01:11:42.488682 7fc9f75d2700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115a00 sd=45 :6806 s=1 pgs=2 cs=4 l=0).fault 2013-06-25 01:11:42.488891 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115780 sd=47 :6806 s=0 pgs=0 cs=0 l=0).accept connect_seq 4 vs existing 4 state connecting 2013-06-25 01:11:42.489078 7fca012ea700 0 monclient: hunting for new mon 2013-06-25 01:11:42.489288 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115780 sd=47 :6806 s=2 pgs=3 cs=5 l=0).reader got old message 12 <= 13 0x31698c0 pg_notify(1.2(2) epoch 3) v4, discarding 2013-06-25 01:11:42.489383 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115780 sd=47 :6806 s=2 pgs=3 cs=5 l=0).reader got old message 13 <= 13 0x31698c0 pg_notify(0.3(2) epoch 3) v4, discarding 2013-06-25 01:11:42.492991 7fca012ea700 0 monclient: hunting for new mon 2013-06-25 01:11:43.711030 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115780 sd=47 :6806 s=2 pgs=3 cs=5 l=0).injecting socket failure 2013-06-25 01:11:43.711114 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115780 sd=47 :6806 s=2 pgs=3 cs=5 l=0).fault, initiating reconnect 2013-06-25 01:11:43.711967 7fc9f73d0700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115280 sd=55 :6806 s=0 pgs=0 cs=0 l=0).accept connect_seq 6 vs existing 6 state connecting 2013-06-25 01:11:43.713962 7fc9f73d0700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115280 sd=55 :6806 s=2 pgs=4 cs=7 l=0).injecting socket failure 2013-06-25 01:11:43.714025 7fc9f73d0700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115280 sd=55 :6806 s=2 pgs=4 cs=7 l=0).fault, initiating reconnect 2013-06-25 01:11:43.715197 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115c80 sd=57 :6806 s=0 pgs=0 cs=0 l=0).accept connect_seq 8 vs existing 8 state wait 2013-06-25 01:11:43.715867 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115c80 sd=57 :6806 s=2 pgs=5 cs=9 l=0).reader got old message 19 <= 19 0x317ab80 pg_log(1.6 epoch 4 query_epoch 4) v3, discarding 2013-06-25 01:11:43.767197 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115c80 sd=57 :6806 s=2 pgs=5 cs=9 l=0).fault, initiating reconnect 2013-06-25 01:11:43.771691 7fc9f73d0700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x3115c80 sd=57 :37059 s=2 pgs=6 cs=11 l=0).fault, initiating reconnect 2013-06-25 01:11:43.772414 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329bc80 sd=77 :6806 s=0 pgs=0 cs=0 l=0).accept connect_seq 12 vs existing 12 state connecting 2013-06-25 01:11:43.773002 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329bc80 sd=77 :6806 s=2 pgs=7 cs=13 l=0).reader got old message 73 <= 74 0x3199000 pg_info(1 pgs e4:1.7) v3, discarding 2013-06-25 01:11:43.773095 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329bc80 sd=77 :6806 s=2 pgs=7 cs=13 l=0).reader got old message 74 <= 74 0x3199000 pg_info(1 pgs e4:2.7) v3, discarding 2013-06-25 01:11:43.776719 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329bc80 sd=77 :6806 s=2 pgs=7 cs=13 l=0).injecting socket failure 2013-06-25 01:11:43.776915 7fc9f79d6700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329bc80 sd=77 :6806 s=2 pgs=7 cs=13 l=0).fault, initiating reconnect 2013-06-25 01:11:43.777069 7fc9f73d0700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329bc80 sd=77 :6806 s=1 pgs=7 cs=14 l=0).fault 2013-06-25 01:11:43.777345 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=0 pgs=0 cs=0 l=0).accept connect_seq 14 vs existing 14 state connecting 2013-06-25 01:11:43.777660 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 82 <= 91 0x30d0380 pg_info(1 pgs e4:1.4) v3, discarding 2013-06-25 01:11:43.777758 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 83 <= 91 0x30d0380 pg_info(1 pgs e4:2.6) v3, discarding 2013-06-25 01:11:43.777860 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 84 <= 91 0x30d0380 pg_info(1 pgs e4:0.1) v3, discarding 2013-06-25 01:11:43.777958 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 85 <= 91 0x30d0380 pg_info(1 pgs e4:0.1) v3, discarding 2013-06-25 01:11:43.778049 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 86 <= 91 0x30d0380 pg_info(1 pgs e4:2.1) v3, discarding 2013-06-25 01:11:43.778133 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 87 <= 91 0x30d0380 pg_info(1 pgs e4:1.3) v3, discarding 2013-06-25 01:11:43.778212 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 88 <= 91 0x30d0380 pg_info(1 pgs e4:1.0) v3, discarding 2013-06-25 01:11:43.778296 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 89 <= 91 0x30d0380 pg_info(1 pgs e4:2.3) v3, discarding 2013-06-25 01:11:43.778362 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 90 <= 91 0x30d0380 pg_info(1 pgs e4:1.0) v3, discarding 2013-06-25 01:11:43.778418 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).reader got old message 91 <= 91 0x30d0380 pg_info(1 pgs e4:1.5) v3, discarding 2013-06-25 01:11:48.037891 7fc9f75d2700 0 -- 10.214.133.23:6805/22388 >> 10.214.133.23:6789/0 pipe(0x3115500 sd=45 :47726 s=2 pgs=9 cs=1 l=1).injecting socket failure 2013-06-25 01:11:48.038067 7fca012ea700 0 monclient: hunting for new mon 2013-06-25 01:11:48.498529 7fc9f9adb700 0 osd.0 4 do_command r=0 2013-06-25 01:12:51.500519 7fc9f72cf700 0 -- 10.214.133.23:0/22388 >> 10.214.133.23:6802/22385 pipe(0x30cec80 sd=35 :39528 s=2 pgs=1 cs=1 l=1).injecting socket failure 2013-06-25 01:13:03.396588 7fc9f6fcc700 0 -- 10.214.133.23:6807/22388 >> 10.214.133.23:0/22385 pipe(0x32dba00 sd=35 :6807 s=2 pgs=7 cs=1 l=1).injecting socket failure 2013-06-25 01:13:06.896910 7fc9f70cd700 0 -- 10.214.133.23:6807/22388 >> 10.214.133.23:0/22385 pipe(0x32dbc80 sd=54 :6807 s=2 pgs=9 cs=1 l=1).injecting socket failure 2013-06-25 01:13:06.897101 7fc9fdae3700 0 -- 10.214.133.23:6807/22388 submit_message osd_ping(ping_reply e4 stamp 2013-06-25 01:13:06.896662) v2 remote, 10.214.133.23:0/22385, failed lossy con, dropping message 0x31d68c0 2013-06-25 01:13:20.819786 7fc9f6dca700 0 -- 10.214.133.23:6806/22388 >> 10.214.133.23:6801/22385 pipe(0x329ba00 sd=82 :6806 s=2 pgs=8 cs=15 l=0).fault with nothing to send, going to standby 2013-06-25 01:13:20.820129 7fc9f6fcc700 0 -- 10.214.133.23:0/22388 >> 10.214.133.23:6802/22385 pipe(0x32db280 sd=31 :0 s=1 pgs=0 cs=0 l=1).fault 2013-06-25 01:13:20.820157 7fc9f6ecb700 0 -- 10.214.133.23:0/22388 >> 10.214.133.23:6803/22385 pipe(0x311a000 sd=34 :0 s=1 pgs=0 cs=0 l=1).fault
Updated by Sage Weil almost 11 years ago
- Status changed from 12 to Resolved
broken test + test yaml, fixed in teuthology.git and ceph-qa-suite.git
Actions