Actions
Bug #16308
closedmon: osd.1 marked down for no apparent reason
Status:
Rejected
Priority:
Urgent
Assignee:
-
Category:
Monitor
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2016-06-14 11:48:33.039760 7fe7e481e700 10 mon.a@0(leader).osd e14 preprocess_query osd_alive(want up_thru 14 have 14) v1 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:33.039779 7fe7e481e700 10 mon.a@0(leader).osd e14 preprocess_alive want up_thru 14 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:33.039787 7fe7e481e700 7 mon.a@0(leader).osd e14 prepare_update osd_alive(want up_thru 14 have 14) v1 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:33.039796 7fe7e481e700 7 mon.a@0(leader).osd e14 prepare_alive want up_thru 14 have 14 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:34.091714 7fe7e481e700 10 mon.a@0(leader).osd e15 committed, telling random osd.1 172.21.15.36:6805/1851 all about it 2016-06-14 11:48:34.092194 7fe7e481e700 7 mon.a@0(leader).osd e15 _reply_map 14 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:34.092201 7fe7e481e700 5 mon.a@0(leader).osd e15 send_latest to osd.1 172.21.15.36:6805/1851 start 14 2016-06-14 11:48:34.092204 7fe7e481e700 5 mon.a@0(leader).osd e15 send_incremental [14..15] to osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:34.092207 7fe7e481e700 10 mon.a@0(leader).osd e15 osd.1 should have epoch 14 2016-06-14 11:48:34.134437 7fe7e481e700 10 mon.a@0(leader).osd e15 preprocess_query osd_alive(want up_thru 15 have 15) v1 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:34.134441 7fe7e481e700 10 mon.a@0(leader).osd e15 preprocess_alive want up_thru 15 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:34.134443 7fe7e481e700 7 mon.a@0(leader).osd e15 prepare_update osd_alive(want up_thru 15 have 15) v1 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:34.134445 7fe7e481e700 7 mon.a@0(leader).osd e15 prepare_alive want up_thru 15 have 15 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:35.159764 7fe7e481e700 10 mon.a@0(leader).osd e16 committed, telling random osd.1 172.21.15.36:6805/1851 all about it 2016-06-14 11:48:35.160218 7fe7e481e700 7 mon.a@0(leader).osd e16 _reply_map 15 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:35.160228 7fe7e481e700 5 mon.a@0(leader).osd e16 send_latest to osd.1 172.21.15.36:6805/1851 start 15 2016-06-14 11:48:35.160236 7fe7e481e700 5 mon.a@0(leader).osd e16 send_incremental [15..16] to osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:35.160246 7fe7e481e700 10 mon.a@0(leader).osd e16 osd.1 should have epoch 15 2016-06-14 11:48:48.116530 7fe7e481e700 10 mon.a@0(leader).osd e17 preprocess_query osd_alive(want up_thru 17 have 17) v1 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:48.116551 7fe7e481e700 10 mon.a@0(leader).osd e17 preprocess_alive want up_thru 17 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:48.116559 7fe7e481e700 7 mon.a@0(leader).osd e17 prepare_update osd_alive(want up_thru 17 have 17) v1 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:48.116568 7fe7e481e700 7 mon.a@0(leader).osd e17 prepare_alive want up_thru 17 have 17 from osd.1 172.21.15.36:6805/1851 2016-06-14 11:48:48.737515 7fe7e501f700 2 mon.a@0(leader).osd e17 osd.1 DOWN 2016-06-14 11:48:48.872492 7fe7e481e700 10 mon.a@0(leader).osd e18 adding osd.1 to down_pending_out map
I can't see in the log how/why osd.1 got marked down.
/a/sage-2016-06-14_04:22:11-rados-master---basic-smithi/258317
Updated by Sage Weil almost 8 years ago
also /a/sage-2016-06-14_04:22:11-rados-master---basic-smithi/258496
Updated by Sage Weil almost 8 years ago
and /a/sage-2016-06-14_04:22:11-rados-master---basic-smithi/258523
Updated by Samuel Just almost 8 years ago
ceph-qa-suite change 1b7552c9cb331978cb0bfd4d7dc4dcde4186c176 is marking the osds down manually to eliminate the retart->wait_for_clean race. The right answer is to whitelist it.
Updated by Samuel Just almost 8 years ago
- Related to Bug #16332: whitelist "wrongly marked me down" in lfn-upgrade-infernalis.yaml and lfn-upgrade-hammer.yaml added
Actions