Bug #12093
closed"osd/ReplicatedPG.cc: 10405: FAILED assert(obc)" timezone fix does not work
0%
Description
I start my osds. They crash. The osds that fail are part of an SSD cache.
I've attached a sample crash and my osd map.
I see this is the duplicate. It's not the case. All machines involved have the same time and timezone, ntp synced and still have problems.
I was able to repair this earlier by starting just the OSDs without the MDS. I drained the ssd cache, set it to forward, and let it run. As soon as I enabled writeback again, an hour later the OSDs went back to crashing. They crash when connecting to the monitor. All OSDs on one machine can stay up as long as the monitor is off, and as soon as they connect OSDs from both machines will start crashing.
The data is completely inaccessible at this point until I can get these ssd cache OSDs to stay online. My main data OSDs do not have any problems, they are fine, but I can't access them due to a writeback cache being in front of them.
Files
Updated by CephM M almost 9 years ago
[Resolved]
During my upgrade, I did not disconnect all clients. The clients that were connected causes problems. I had to reboot the clients that were connected because they were using the kernel ceph and the connections were never dying. Once I did that, my OSDs came back online perfectly.