Project

General

Profile

Actions

Bug #12093

closed

"osd/ReplicatedPG.cc: 10405: FAILED assert(obc)" timezone fix does not work

Added by CephM M almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I start my osds. They crash. The osds that fail are part of an SSD cache.

I've attached a sample crash and my osd map.

I see this is the duplicate. It's not the case. All machines involved have the same time and timezone, ntp synced and still have problems.

I was able to repair this earlier by starting just the OSDs without the MDS. I drained the ssd cache, set it to forward, and let it run. As soon as I enabled writeback again, an hour later the OSDs went back to crashing. They crash when connecting to the monitor. All OSDs on one machine can stay up as long as the monitor is off, and as soon as they connect OSDs from both machines will start crashing.

The data is completely inaccessible at this point until I can get these ssd cache OSDs to stay online. My main data OSDs do not have any problems, they are fine, but I can't access them due to a writeback cache being in front of them.


Files

crash.log (8.54 KB) crash.log CephM M, 06/19/2015 03:09 PM
map.log (2.01 KB) map.log CephM M, 06/19/2015 03:09 PM
Actions

Also available in: Atom PDF