Project

General

Profile

Actions

Bug #4063

closed

filer: probe crash on wip-bobtail-osd-msgr branch

Added by Tamilarasi muthamizhan about 11 years ago. Updated about 11 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 0.56.2-15-g2ebf4d0 [wip-bobtail-osd-msgr]

test set up: burnupi06, burnupi07

hit this when running bonnie and fsstress workload in parallel.

0> 2013-02-08 16:12:52.027715 7ffc6136d700 1 osdc/Filer.cc: In function 'void Filer::_probed(Filer::Probe*, const object_t&, uint64_t, utime_t)' thread 7ffc6136d700 time 2013-02-08 16:12:52.000573
osdc/Filer.cc: 163: FAILED assert(probe
>known_size[p->oid] <= shouldbe)
ceph version 0.56.2-15-g2ebf4d0 (2ebf4d065af3dc2e581a25b921071af3efb57f8a)
1: (Filer::_probed(Filer::Probe*, object_t const&, unsigned long, utime_t)+0x1194) [0x6dd904]
2: (Objecter::C_Stat::finish(int)+0xc0) [0x6ddb40]
3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe38) [0x6c9e08]
4: (MDS::handle_core_message(Message*)+0xae8) [0x4cf5a8]
5: (MDS::_dispatch(Message*)+0x2f) [0x4cf76f]
6: (MDS::ms_dispatch(Message*)+0x1db) [0x4d121b]
7: (DispatchQueue::entry()+0x349) [0x7e7089]
8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7700dd]
9: (()+0x7e9a) [0x7ffc656c9e9a]
10: (clone()+0x6d) [0x7ffc63e7fcbd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

ubuntu@burnupi07:/var/log/ceph$ sudo ceph -s
health HEALTH_WARN mds a is laggy
monmap e1: 3 mons at {a=10.214.133.8:6789/0,b=10.214.134.38:6789/0,c=10.214.134.38:6790/0}, election epoch 36, quorum 0,1,2 a,b,c
osdmap e1414: 4 osds: 4 up, 4 in
pgmap v679992: 648 pgs: 648 active+clean; 256 GB data, 517 GB used, 3205 GB / 3722 GB avail
mdsmap e166: 1/1/1 up {0=a=up:active(laggy or crashed)}

ubuntu@burnupi07:/var/log/ceph$ sudo cat /etc/ceph/ceph.conf
[global]
auth client required = cephx
auth service required = cephx
auth cluster required = cephx

[osd]
osd journal size = 1000

[osd.1]
host = burnupi06

[osd.2]
host = burnupi06

[osd.3]
host = burnupi07

[osd.4]
host = burnupi07

[mon.a]
host = burnupi06
mon addr = 10.214.133.8:6789

[mon.b]
host = burnupi07
mon addr = 10.214.134.38:6789

[mon.c]
host = burnupi07
mon addr = 10.214.134.38:6790

[mds.a]
host = burnupi06


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #2803: filer: probe crashResolvedSage Weil07/19/2012

Actions
Actions #1

Updated by Tamilarasi muthamizhan about 11 years ago

repasting the core dump

     0> 2013-02-08 16:12:52.027715 7ffc6136d700 -1 osdc/Filer.cc: In function 'void Filer::_probed(Filer::Probe*, const object_t&, uint64_t, utime_t)' thread 7ffc6136d700 time 2013-02-08 16:12:52.000573
osdc/Filer.cc: 163: FAILED assert(probe->known_size[p->oid] <= shouldbe)

 ceph version 0.56.2-15-g2ebf4d0 (2ebf4d065af3dc2e581a25b921071af3efb57f8a)
 1: (Filer::_probed(Filer::Probe*, object_t const&, unsigned long, utime_t)+0x1194) [0x6dd904]
 2: (Objecter::C_Stat::finish(int)+0xc0) [0x6ddb40]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe38) [0x6c9e08]
 4: (MDS::handle_core_message(Message*)+0xae8) [0x4cf5a8]
 5: (MDS::_dispatch(Message*)+0x2f) [0x4cf76f]
 6: (MDS::ms_dispatch(Message*)+0x1db) [0x4d121b]
 7: (DispatchQueue::entry()+0x349) [0x7e7089]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7700dd]
 9: (()+0x7e9a) [0x7ffc656c9e9a]
 10: (clone()+0x6d) [0x7ffc63e7fcbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #2

Updated by Tamilarasi muthamizhan about 11 years ago

restarting the mds/all daemons in the cluster does not help, still hitting the same issue again.

leaving the cluster as it is for reference.

Actions #3

Updated by Tamilarasi muthamizhan about 11 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF