Project

General

Profile

Actions

Bug #3770

closed

OSD crashes on boot

Added by Faidon Liambotis over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

One of my 0.56.1 OSDs crashed and couldn't boot: it was reaching tp_op heartbeats, and even after increasing that I was getting nothing but:
2013-01-08 23:57:25.337731 7fc515c26700 0 -- 0.0.0.0:6805/31953 >> 10.64.0.174:6818/8710 pipe(0x3cceb6c0 sd=56 :0 pgs=0 cs=0 l=0).fault with nothing to send, going to standby
2013-01-08 23:57:29.043846 7fc515b25700 0 -- 0.0.0.0:6805/31953 >> 10.64.0.174:6845/4111 pipe(0x3cceb240 sd=57 :32953 pgs=0 cs=0 l=0).connect claims to be 10.64.0.174:6845/11414 not 10.64.0.174:6845/4111 - wrong node!
2013-01-08 23:57:29.043957 7fc515b25700 0 -- 0.0.0.0:6805/31953 >> 10.64.0.174:6845/4111 pipe(0x3cceb240 sd=57 :32953 pgs=0 cs=0 l=0).fault with nothing to send, going to standby
2013-01-08 23:57:38.310206 7fc515a24700 0 -- 0.0.0.0:6805/31953 >> 10.64.0.173:6842/821 pipe(0x16bf0d80 sd=58 :0 pgs=0 cs=0 l=0).fault with nothing to send, going to standby

I waited a few hours and left the cluster to recover and become healthy again. Now it's HEALTH_OK and all pgs are active+clean.

However, when trying now to start the OSD in question, it immediately dies on boot on assert(_get_map_bl(epoch, bl)). Attached is the --debug_ms 20 --debug_osd 20 log and a full backtrace from gdb.

This is on ceph.com 0.56.1 packages in a Ubuntu 12.04 LTS platform.


Files

ceph-osd.27.log (3.19 MB) ceph-osd.27.log Faidon Liambotis, 01/09/2013 12:23 AM
ceph-osd.27.gdb (17.7 KB) ceph-osd.27.gdb Faidon Liambotis, 01/09/2013 12:23 AM
ceph-osd.27.meta.gz (273 KB) ceph-osd.27.meta.gz Faidon Liambotis, 01/10/2013 04:15 PM
Actions #1

Updated by Sage Weil over 11 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Ian Colle over 11 years ago

  • Assignee set to Samuel Just
Actions #3

Updated by Samuel Just over 11 years ago

  • Status changed from New to Need More Info

From the backtrace:
pgid = {m_pool = 4, m_seed = 249, m_preferred = -1}

Based on the info attr, we try to load map 10705, which is 20k maps behind the other pgs. This suggests that the attr may be invalid.

Can you attach a hex dump of the attributes on the current/4.f9_head collection on the crashed osd?

Actions #4

Updated by Faidon Liambotis over 11 years ago

root@ms-be1003:/var/lib/ceph/osd/ceph-27/current/4.f9_head# attr -lq $PWD | while read attr; do echo $attr; attr -q -g $attr $PWD | hd; echo; done
cephos.collection_version
00000000 03 00 00 00 |....|
00000004

cephos.phash.contents
00000000 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 |.|
00000011

ceph.ondisklog
00000000 05 03 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| *
00000022

ceph.info
00000000 05 d1 29 00 00 |..)..|
00000005

For what it's worth, pool 4 is .rgw.gc.

Actions #5

Updated by Faidon Liambotis over 11 years ago

root@ms-be1003:/var/lib/ceph/osd/ceph-27# find current/meta/ | tee ~/ceph-osd.27.meta | wc -l
42992

Attached.

Actions #6

Updated by Ian Colle over 11 years ago

  • Status changed from Need More Info to 12
Actions #7

Updated by Faidon Liambotis over 11 years ago

sjust said that we're done collecting information and that I could rm the pg directory/log/info, which I did. Unfortunately, it keeps crashing on boot , so there are probably more PGs like that...

Actions #8

Updated by Mike Dawson over 11 years ago

I'm seeing this same assert failure when trying to startup 3 of my OSDs. Happy to provide feedback for the debugging effort if needed.

Actions #9

Updated by Samuel Just over 11 years ago

The fault is in OSD::handle_osd_map where we trim old maps. Prior to 0.50, the pgs would have processed up to the current OSD map by this point. Post 0.50, however, pgs may lag behind the OSD map epoch. In an extreme case, the OSD might trim past a map needed by a PG. This is what happened here. Working on patch now.

Actions #10

Updated by Samuel Just over 11 years ago

  • Status changed from 12 to Fix Under Review

wip_3770

Actions #11

Updated by Samuel Just over 11 years ago

  • Status changed from Fix Under Review to Resolved

66eb93b83648b4561b77ee6aab5b484e6dba4771

Actions #12

Updated by Faidon Liambotis over 11 years ago

So, my (very basic) understanding of this suggests that the fix is that the trim wouldn't happen in the first place.

How about the crash that I'm experiencing right now though? Would it be possible for the OSD to recover without the manual action of deleting PGs from the filesystem?

Actions #13

Updated by Samuel Just over 11 years ago

Yeah, I just pushed a work-around branch (which I haven't tested much, so ideally you would try it on a node you can afford to lose) wip-bobtail-load_pgs-workaround. There is a scenario in which this would be a problem, but if you have not been expanding the number of pgs in your pools, you won't hit it.

Actions

Also available in: Atom PDF