Bug #5723
closedOSD seemingly loses objects during crash
0%
Description
I had some vm's with the qemu-rbd driver doing a trim operation. One of my osd's crashed and now has an inconsistent pg following restart. OSD 14 was the primary and crashed while OSD 6 is the secondary. The missing objects are like this:
2.37d osd.14 missing f30b0f7d/rb.0.105b.238e1f29.000000000ff4/head//2
However it appears to be on disk on both the primary and secondary, just in different places in the directory tree depending on the OSD
find /data/osd.14/current/2.37d_head/ -name 'rb.0.105b.238e1f29.000000000ff4*'
/data/osd.14/current/2.37d_head/DIR_D/rb.0.105b.238e1f29.000000000ff4__head_F30B0F7D__2
find /data/osd.6/current/2.37d_head/ -name 'rb.0.105b.238e1f29.000000000ff4*'
/data/osd.6/current/2.37d_head/DIR_D/DIR_7/rb.0.105b.238e1f29.000000000ff4__head_F30B0F7D__2
Updated by Samuel Just almost 11 years ago
HashIndex merge needs to verify the collection contents before merging. In the mean time, you can recover by adjusting the cephos.phash.contents for DIR_D/DIR_7 from
(02:34:44 PM) jmlowe1: cephos.phash.contents
(02:34:44 PM) jmlowe1: 0000000: 0109 0000 0000 0000 0000 0000 0002 0000 ................
(02:34:44 PM) jmlowe1: 0000010: 00
to
(03:34:13 PM) sjust: 0000000: 01dc 0000 0000 0000 0002 0000 0002 0000
(03:34:13 PM) sjust: 0000010: 00
Updated by Samuel Just almost 11 years ago
- Assignee set to Samuel Just
- Priority changed from High to Urgent
Updated by Samuel Just almost 11 years ago
You also need to move all of the objects from DIR_D to DIR_D/DIR_7 again
Updated by Mike Lowe almost 11 years ago
removing cephos.phash.in_progress_op, setting cephos.phash.contents, moving the files and restarting seems to have resolved the missing objects
Updated by Sage Weil almost 11 years ago
- Status changed from New to Fix Under Review
Updated by Samuel Just almost 11 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Sage Weil almost 11 years ago
- Status changed from Pending Backport to Resolved