Project

General

Profile

Actions

Bug #12093

closed

"osd/ReplicatedPG.cc: 10405: FAILED assert(obc)" timezone fix does not work

Added by CephM M almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I start my osds. They crash. The osds that fail are part of an SSD cache.

I've attached a sample crash and my osd map.

I see this is the duplicate. It's not the case. All machines involved have the same time and timezone, ntp synced and still have problems.

I was able to repair this earlier by starting just the OSDs without the MDS. I drained the ssd cache, set it to forward, and let it run. As soon as I enabled writeback again, an hour later the OSDs went back to crashing. They crash when connecting to the monitor. All OSDs on one machine can stay up as long as the monitor is off, and as soon as they connect OSDs from both machines will start crashing.

The data is completely inaccessible at this point until I can get these ssd cache OSDs to stay online. My main data OSDs do not have any problems, they are fine, but I can't access them due to a writeback cache being in front of them.


Files

crash.log (8.54 KB) crash.log CephM M, 06/19/2015 03:09 PM
map.log (2.01 KB) map.log CephM M, 06/19/2015 03:09 PM
Actions #1

Updated by CephM M almost 9 years ago

[Resolved]

During my upgrade, I did not disconnect all clients. The clients that were connected causes problems. I had to reboot the clients that were connected because they were using the kernel ceph and the connections were never dying. Once I did that, my OSDs came back online perfectly.

Actions #2

Updated by Samuel Just almost 9 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF