Actions
Bug #371
closedOSD crash: PG::Log::Entry::decode
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
While #367 happend i also saw this crash on a OSD:
Core was generated by `/usr/bin/cosd -i 7 -c /etc/ceph/ceph.conf'. Program terminated with signal 6, Aborted. #0 0x00007f0fc2d08a75 in raise () from /lib/libc.so.6 (gdb) bt #0 0x00007f0fc2d08a75 in raise () from /lib/libc.so.6 #1 0x00007f0fc2d0c5c0 in abort () from /lib/libc.so.6 #2 0x00007f0fc35bd8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #3 0x00007f0fc35bbd16 in ?? () from /usr/lib/libstdc++.so.6 #4 0x00007f0fc35bbd43 in std::terminate() () from /usr/lib/libstdc++.so.6 #5 0x00007f0fc35bbe3e in __cxa_throw () from /usr/lib/libstdc++.so.6 #6 0x0000000000544c69 in decode(std::string&, ceph::buffer::list::iterator&) () #7 0x000000000053ee68 in object_t::decode (this=0x1f67700, store=<value optimized out>) at ./include/object.h:46 #8 decode (this=0x1f67700, store=<value optimized out>) at ./include/object.h:49 #9 sobject_t::decode (this=0x1f67700, store=<value optimized out>) at ./include/object.h:148 #10 decode (this=0x1f67700, store=<value optimized out>) at ./include/object.h:152 #11 PG::Log::Entry::decode (this=0x1f67700, store=<value optimized out>) at osd/PG.h:289 #12 decode (this=0x1f67700, store=<value optimized out>) at osd/PG.h:1015 #13 PG::read_log (this=0x1f67700, store=<value optimized out>) at osd/PG.cc:2178 #14 0x0000000000540e46 in PG::read_state (this=<value optimized out>, store=<value optimized out>) at osd/PG.cc:2376 #15 0x00000000004e6cc5 in OSD::load_pgs (this=<value optimized out>) at osd/OSD.cc:971 #16 0x00000000004e76a8 in OSD::init (this=0x1bc0000) at osd/OSD.cc:498 #17 0x0000000000457fd2 in main (argc=<value optimized out>, argv=<value optimized out>) at cosd.cc:285
The last log lines show:
10.08.20_20:45:07.906992 7f0fc4431720 filestore(/srv/ceph/osd.7) read /srv/ceph/osd.7/current/meta/pglog_1.28_0 0~364 10.08.20_20:45:07.907124 7f0fc4431720 filestore(/srv/ceph/osd.7) read /srv/ceph/osd.7/current/meta/pglog_1.28_0 0~364 = 364 10.08.20_20:45:07.907150 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log 0 418'2920 (416'2919) m 1000013cfca.00000000/head by mds0.5:1320 10.08.07_20:03:39.000721 10.08.20_20:45:07.907189 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log ignoring entry at 0 below log.tail 10.08.20_20:45:07.907219 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log 91 816'2921 (418'2920) m 1000013cfca.00000000/head by mds0.6:62919 10.08.09_13:53:58.000924 10.08.20_20:45:07.907284 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log ignoring entry at 91 below log.tail 10.08.20_20:45:07.907317 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log 182 0'0 (0'0) ? /0 by unknown0.0:0 0.000000 10.08.20_20:45:07.907352 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log ignoring entry at 182 below log.tail 10.08.20_20:45:07.907382 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log 253 20'4311744512 (778134374'7148111116603764785) ? /0 by unknown?.0:823032831901958143 -17.08.29_20:37:52.191496 10.08.20_20:45:07.907421 7f0fc4431720 osd7 6579 pg[1.28( v 6577'2923 (816'2921,6577'2923] n=111 ec=2 les=6152 6151/6151/6139) [] r=0 (info mismatch, log(816'2921,0'0]) (log bound mismatch, empty) lcod 0'0 mlcod 0'0 inactive] read_log ignoring entry at 253 below log.tail
Logs, core and binary are available at logger.ceph.widodh.nl in /srv/ceph/issues/osd_pg_log_entry_decode
Updated by Sage Weil over 13 years ago
- Status changed from New to Can't reproduce
If you see one of these again, please save the actual pglog file in question too (in this case it was srv/ceph/osd.7/current/meta/pglog_1.28_0). I should have looked at this sooner! (Or maybe i did? Don't remember.) In any case, we need to know what the corruption looks like to clue us in on where the problem is.
Actions