Bug #1434
closedosds all failing each other
Description
build of git tree 3a623bb327
cluster boots, then within a minute all the osds start failing each other. logs show heartbeats not being received.
was working fine before, though this is the same cluster that was reporting errant scrub stat mismatches previously (but otherwise apparently functioning fine). See ticket #1376
see attached log from one of my 4 osds.
Files
Updated by John Leach over 12 years ago
- File osd.2.log.gz osd.2.log.gz added
- File osd.3.log.gz osd.3.log.gz added
reproduced, but with "debug ms = 1" this time.
now I notice that all but 1 osd crashes:
osd/PG.cc: 3892: FAILED assert(0 == "we got a bad state machine event")
stack traces in the log files. attaching logs from two osds, one that crashed one that didn't.
Updated by Sage Weil over 12 years ago
- Category set to OSD
- Priority changed from Normal to High
- Target version set to v0.35
Updated by Sage Weil over 12 years ago
That assert is fixed 4 commits later by cf3b7cf6a9d3f873ad27a313cc1635822bdd89a1. Can you still reproduce with the lastest? There's not enough here (before the crash) to see any heartbeat misbehavior.
Thanks!