Project

General

Profile

Actions

Bug #407

closed

Kernel panic on 2.6.35

Added by Nick X over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

kernel: 2.6.35.4 (i686)
ceph built as module

First for all: sorry for not providing much technical info, as all I get - is screenshot with stack of paniced kernel client.

I had 2 osds and just was playing turning off and on the second one.
dmesg on kernel client was showing "osd1 up", then "osd1 down" etc.
And finally, I killed cmds and cosd on osd1 (mon0 is living on osd0)
and started them manually (just as I saw the commands in "ps": "/usr/bin/cmds -i 1 -c /tmp/ceph.conf.765" and "/usr/bin/cosd -i 1 -c /tmp/ceph.conf.765") - and the 3-rd system which was running kernel ceph client - paniced. Attaching screenshot.

Please feel free ignore this bug report if it's not inform enough.


Files

ceph-panic.png (18.1 KB) ceph-panic.png Nick X, 09/11/2010 06:58 PM
ceph.ko (224 KB) ceph.ko Nick X, 09/11/2010 08:51 PM
Actions #1

Updated by Sage Weil over 13 years ago

Can you attach your ceph.ko module?

Also, do you by change have access to the lines that scrolled off the screen (maybe shift-pgup works)? Specifically, the "RIP" line like

[ 5046.222540] RIP: 0010:[<ffffffffa00e6eab>]  [<ffffffffa00e6eab>] writepages_finish+0x140/0x3dd [ceph]

(and ideally everything around it, if shift-pgup works!) says exactly where things went sour and some additional context.

Thanks!

Actions #2

Updated by Sage Weil over 13 years ago

  • Target version set to v2.6.36
Actions #3

Updated by Nick X over 13 years ago

ceph.ko attached.

But unfortunately, pgUP was not live (I'll provide you the info initially if it was working...)
and netconsole was disabled....

Anyway, if the problem repeats (I now start to play seriously with ceph) - I'll be ready to provide the stack (netconsole always helps with this).

Actions #4

Updated by Sage Weil over 13 years ago

Oh, there was a bug we fixed the other day that could be causing this. It was something that we hadn't hit ourselves, but this looks like it could be it.

The ceph-client.git branch 'master' includes the fix (commit:3d4401d9d0aef5c40706350685ddea3df6708496). You could run that kernel and hopefully the problem will be gone. Or, we can try try to reproduce the problem and then more conclusively determine whether the new code fixes it. If you hit this again and get the full dump (EIP) that will tell us as well. Sticking with your current kernel (with netconsole) and hoping to crash again :) might be the way to go.

Actions #5

Updated by Nick X over 13 years ago

...and hoping to crash again :)

ok :)

Thank you for the quicky replies on this.

Actions #6

Updated by Sage Weil over 13 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF