Project

General

Profile

Actions

Bug #6671

closed

FAILED assert(ret) in OSDMapRef OSDService::get_map(epoch_t)

Added by Tom Lanyon over 10 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We were on 0.56.1; we shut down all of the OSDs and mon on a particular node, rebooted the node for maintenance and when it came back online the OSDs wouldn't start up.

It looked like this was fixed in a later version, so we upgraded to dumpling 0.67.4 (mons upgraded via a step on 0.56.7) but the problematic OSDs still won't start. I've attached the log from one of the OSDs showing the full error.

2013-10-29 16:11:33.656195 7f9e0cbe1780 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9e0cbe1780 time 2013-10-29 16:11:33.655368
osd/OSD.h: 447: FAILED assert(ret)

ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
1: (OSD::load_pgs()+0x26fb) [0x69aefb]
2: (OSD::init()+0x122f) [0x69ca1f]
3: (main()+0x3210) [0x5af750]
4: (__libc_start_main()+0xfd) [0x3772a1ecdd]
5: /usr/bin/ceph-osd() [0x5ac189]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Files

ceph-osd.6.log (47.2 KB) ceph-osd.6.log log from OSD Tom Lanyon, 10/28/2013 11:16 PM
ceph-osd.5.log (530 KB) ceph-osd.5.log Example OSD log file Jim MacArthur, 04/08/2015 09:38 AM
Actions #1

Updated by Joao Eduardo Luis over 10 years ago

  • Category set to OSD
  • Status changed from New to 4
  • Assignee set to Samuel Just
  • Priority changed from Normal to High
  • Source changed from other to Community (user)

If this is indeed the same bug, and not a different iteration, then maybe we should backport the fix at least for Dumpling.

Assigning the ticket to Sam Just, setting it as waiting for Feedback, so that it pops on his radar.

Actions #2

Updated by Samuel Just over 10 years ago

This is not actually related to 5869.

Actions #3

Updated by Samuel Just over 10 years ago

Is it possible to recover without those osds?

Actions #4

Updated by Joao Eduardo Luis over 10 years ago

removing 'related to' link from this ticket to #5869

Actions #5

Updated by Sage Weil about 10 years ago

  • Status changed from 4 to Can't reproduce
Actions #6

Updated by Tom Lanyon about 10 years ago

Someone (I think it was Sam Just) helped me on #ceph to replace this OSD and rebuild data that was missing, so we worked around the problem but did not solve it. Unfortunately I can't help any further with reproducing the issue.

Actions #7

Updated by Jim MacArthur about 9 years ago

I think I see the same problem in 0.80.9. I was trying to simulate a disk failure by using umount -l on the OSD's block devices; after I rebooted only one of the four OSDs came back. The other three show this error.

This is in a VirtualBox VM running Ubuntu 12.04.

Actions #8

Updated by Wayne ho over 7 years ago

we happen to see the same problem in 10.2.0. When startup OSD with 'ceph' user, it show this error. But it's ok with root. So we consider it an authority problem. we exec cmd 'chown ceph:ceph /var/lib/ceph/osd/ceph-13/current' to change the owner. And it works.

Actions

Also available in: Atom PDF