Project

General

Profile

Actions

Bug #872

closed

osd: crash due to missing pginfo

Added by Wido den Hollander about 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just upgraded "noisy" and saw osd1 go down after restart with:

2011-03-10 14:20:29.799991 7f717c26d720 filestore(/var/lib/ceph/osd.1) collection_getattr /var/lib/ceph/osd.1/current/3.7c9_head 'info'
2011-03-10 14:20:29.800049 7f717c26d720 filestore(/var/lib/ceph/osd.1) collection_getattr /var/lib/ceph/osd.1/current/3.7c9_head 'info' = 309
2011-03-10 14:20:29.800061 7f717c26d720 filestore(/var/lib/ceph/osd.1) read /var/lib/ceph/osd.1/current/meta/pginfo_3.7c9_0 0~0
2011-03-10 14:20:29.800102 7f717c26d720 filestore(/var/lib/ceph/osd.1) FileStore::read(/var/lib/ceph/osd.1/current/meta/pginfo_3.7c9_0): open error error 2: No such file or directory
*** Caught signal (Aborted) **
 in thread 0x7f717c26d720
 ceph version 0.26~rc (commit:1f120284ed80ee1258b556fbedacab209098a0d1)
 1: /usr/bin/cosd() [0x61b078]
 2: (()+0xf8f0) [0x7f717bc4e8f0]
 3: (gsignal()+0x35) [0x7f717a81ea75]
 4: (abort()+0x180) [0x7f717a8225c0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f717b0d48e5]
 6: (()+0xcad16) [0x7f717b0d2d16]
 7: (()+0xcad43) [0x7f717b0d2d43]
 8: (()+0xcae3e) [0x7f717b0d2e3e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x12c) [0x468dbc]
 10: (void decode<unsigned int, PG::Interval>(std::map<unsigned int, PG::Interval, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, PG::Interval> > >&, ceph::buffer::list::iterator&)+0x31) [0x577b81]
 11: (PG::read_state(ObjectStore*)+0x32b) [0x55e78b]
 12: (OSD::load_pgs()+0x1b4) [0x4fbfb4]
 13: (OSD::init()+0x517) [0x519f87]
 14: (main()+0x1770) [0x4662f0]
 15: (__libc_start_main()+0xfd) [0x7f717a809c4d]
 16: /usr/bin/cosd() [0x4646e9]

3.7c9 was one of the PG's which kept blocking (#847)

A search for the pg info brought me:

root@noisy:/var/log/ceph# find /var/lib/ceph/ -name pginfo_3.7c9_0
/var/lib/ceph/osd.1/current.remove.me.846930886/meta/pginfo_3.7c9_0
/var/lib/ceph/osd.1/snap_856539/meta/pginfo_3.7c9_0
root@noisy:/var/log/ceph#
Actions

Also available in: Atom PDF