Project

General

Profile

Actions

Bug #371

closed

OSD crash: PG::Log::Entry::decode

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While #367 happend i also saw this crash on a OSD:

Core was generated by `/usr/bin/cosd -i 7 -c /etc/ceph/ceph.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007f0fc2d08a75 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007f0fc2d08a75 in raise () from /lib/libc.so.6
#1  0x00007f0fc2d0c5c0 in abort () from /lib/libc.so.6
#2  0x00007f0fc35bd8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x00007f0fc35bbd16 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007f0fc35bbd43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007f0fc35bbe3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x0000000000544c69 in decode(std::string&, ceph::buffer::list::iterator&) ()
#7  0x000000000053ee68 in object_t::decode (this=0x1f67700, store=<value optimized out>) at ./include/object.h:46
#8  decode (this=0x1f67700, store=<value optimized out>) at ./include/object.h:49
#9  sobject_t::decode (this=0x1f67700, store=<value optimized out>) at ./include/object.h:148
#10 decode (this=0x1f67700, store=<value optimized out>) at ./include/object.h:152
#11 PG::Log::Entry::decode (this=0x1f67700, store=<value optimized out>) at osd/PG.h:289
#12 decode (this=0x1f67700, store=<value optimized out>) at osd/PG.h:1015
#13 PG::read_log (this=0x1f67700, store=<value optimized out>) at osd/PG.cc:2178
#14 0x0000000000540e46 in PG::read_state (this=<value optimized out>, store=<value optimized out>) at osd/PG.cc:2376
#15 0x00000000004e6cc5 in OSD::load_pgs (this=<value optimized out>) at osd/OSD.cc:971
#16 0x00000000004e76a8 in OSD::init (this=0x1bc0000) at osd/OSD.cc:498
#17 0x0000000000457fd2 in main (argc=<value optimized out>, argv=<value optimized out>) at cosd.cc:285

The last log lines show:

10.08.20_20:45:07.906992 7f0fc4431720 filestore(/srv/ceph/osd.7) read /srv/ceph/osd.7/current/meta/pglog_1.28_0 0~364
10.08.20_20:45:07.907124 7f0fc4431720 filestore(/srv/ceph/osd.7) read /srv/ceph/osd.7/current/meta/pglog_1.28_0 0~364 = 364
10.08.20_20:45:07.907150 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log 0 418'2920 (416'2919) m 1000013cfca.00000000/head by mds0.5:1320 10.08.07_20:03:39.000721
10.08.20_20:45:07.907189 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log  ignoring entry at 0 below log.tail
10.08.20_20:45:07.907219 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log 91 816'2921 (418'2920) m 1000013cfca.00000000/head by mds0.6:62919 10.08.09_13:53:58.000924
10.08.20_20:45:07.907284 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log  ignoring entry at 91 below log.tail
10.08.20_20:45:07.907317 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log 182 0'0 (0'0) ? /0 by unknown0.0:0 0.000000
10.08.20_20:45:07.907352 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log  ignoring entry at 182 below log.tail
10.08.20_20:45:07.907382 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log 253 20'4311744512 (778134374'7148111116603764785) ? /0 by unknown?.0:823032831901958143 -17.08.29_20:37:52.191496
10.08.20_20:45:07.907421 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log  ignoring entry at 253 below log.tail

Logs, core and binary are available at logger.ceph.widodh.nl in /srv/ceph/issues/osd_pg_log_entry_decode

Actions #1

Updated by Sage Weil over 13 years ago

  • Status changed from New to Can't reproduce

If you see one of these again, please save the actual pglog file in question too (in this case it was srv/ceph/osd.7/current/meta/pglog_1.28_0). I should have looked at this sooner! (Or maybe i did? Don't remember.) In any case, we need to know what the corruption looks like to clue us in on where the problem is.

Actions

Also available in: Atom PDF