Project

General

Profile

Actions

Bug #13594

closed

osd/PG.cc: 2856: FAILED assert(values.size() == 1)

Added by Laurent GUERBY over 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After a host reboot one of our OSD doesn't restart, it fails on one ASSERT :

0> 2015-10-26 08:15:59.923059 7f67f0cb2900 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t, ceph::bufferlist*)' thread 7f67f0cb2900 time 2015-10-26 08:15:59.922041
osd/PG.cc: 2856: FAILED assert(values.size() == 1)
ceph version 0.94.2-1-ga11cca9 (a11cca90395f7da516668bbba20dff6cf1d8a538)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xc19276]
2: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xa04) [0x7dfe34]
3: (OSD::load_pgs()+0xa0f) [0x6b2e9f]
4: (OSD::init()+0xc4c) [0x6b5d9c]
5: (main()+0x2821) [0x63dd11]
6: (__libc_start_main()+0xf5) [0x7f67edd7db45]
7: /usr/bin/ceph-osd() [0x6579d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Full log attached.

On surface similar to: http://tracker.ceph.com/issues/4855


Files

ceph-osd.0.log (458 KB) ceph-osd.0.log Laurent GUERBY, 10/26/2015 07:22 AM
Actions #1

Updated by Laurent GUERBY over 8 years ago

A second OSD in our cluster now has the exact same error. This happens during a recovery, if we keep loosing OSD we'll loose data ...

Is it a journal error? if so is it safe to blank the journal ceph-osd --mkjournal -i OSDNUM ?

Actions #2

Updated by Yuri Weinstein over 8 years ago

  • Release set to infernalis
  • ceph-qa-suite rados added

Also in:
Run: http://pulpito.ceph.com/teuthology-2015-10-29_21:00:11-rados-infernalis-distro-basic-multi/
Job: ['1131488']

Assertion: osd/PG.cc: 2850: FAILED assert(values.size() == 1)
ceph version 9.1.0-98-g3af21a1 (3af21a1432aabb86f82f9063aef7a6ab9865345e)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f9aa717f94b]
 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x8a0) [0x7f9aa6d14270]
 3: (OSD::load_pgs()+0x580) [0x7f9aa6bf6040]
 4: (OSD::init()+0xe74) [0x7f9aa6c061e4]
 5: (main()+0x2954) [0x7f9aa6b8b0c4]
 6: (__libc_start_main()+0xf5) [0x7f9aa3a0dec5]
 7: (()+0x2f7f07) [0x7f9aa6bbaf07]

Actions #3

Updated by Sage Weil over 8 years ago

  • Status changed from New to Resolved
Actions #5

Updated by Jens Harbott over 7 years ago

We are seeing a similar backtrace on a Hammer OSD, is this the same issue? Looks like the fix wasn't backported?

     0> 2016-12-07 07:46:21.105387 7f97149158c0 -1 osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)' thread 7f97149158c0 time 2016-12-07 07:46:21.104173
osd/PG.cc: 2889: FAILED assert(values.size() == 2)

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1fab]
 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x885) [0x7c19b5]
 3: (OSD::load_pgs()+0x9b7) [0x6b9cd7]
 4: (OSD::init()+0x17c7) [0x6bd777]
 5: (main()+0x2a31) [0x6480e1]
 6: (__libc_start_main()+0xf5) [0x7f9711c9cf45]
 7: /usr/bin/ceph-osd() [0x661147]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #6

Updated by Jens Harbott over 7 years ago

I tried an adapted version of https://github.com/ceph/ceph/pull/6444 which gets the OSD started, but it is getting a lot of other weird errors after that, so it would be great if someone could take a look at the hammer code and tell whether a different fix might be needed there.

Actions

Also available in: Atom PDF