Project

General

Profile

Bug #10734

v0.91 to v0.92 upgrade journal replay crash

Added by Samuel Just over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/ceph-osd -i 10 --pid-file /var/run/ceph/osd.10.pid -c /etc/ceph/ceph.c'.
Program terminated with signal 6, Aborted.
#0 0x00007f53cfa78f6b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007f53cfa78f6b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x0000000000bdeaba in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2 handle_fatal_signal (signum=6) at global/signal_handler.cc:109
#3 <signal handler called>
#4 0x00007f53ce3e2165 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f53ce3e53e0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007f53cec3989d in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007f53cec37996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007f53cec379c3 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00007f53cec37bee in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x0000000000d3cd19 in ceph::buffer::list::iterator::copy (this=0x7fff5af319d8, len=520092484, dest=...) at common/buffer.cc:926
#11 0x000000000085f480 in decode (p=..., s=...) at ./include/encoding.h:174
#12 decode<std::string, ceph::buffer::list> (m=..., p=...) at ./include/encoding.h:610
#13 0x0000000000a450a4 in decode_attrset (aset=..., this=0x7fff5af319c0) at os/ObjectStore.h:828
#14 FileStore::_do_transaction (this=this@entry=0x3c96000, t=..., op_seq=op_seq@entry=1154691, trans_num=trans_num@entry=0, handle=handle@entry=0x0) at os/FileStore.cc:2593
#15 0x0000000000a49174 in FileStore::_do_transactions (this=0x3c96000, tls=..., op_seq=1154691, handle=0x0) at os/FileStore.cc:1991
#16 0x0000000000a5e3bc in JournalingObjectStore::journal_replay (this=0x3c96000, fs_op_seq=<optimized out>) at os/JournalingObjectStore.cc:91
#17 0x0000000000a3f8c0 in FileStore::mount (this=0x3c96000) at os/FileStore.cc:1558
#18 0x00000000007e7d61 in OSD::init (this=0x3cea000) at osd/OSD.cc:1766
#19 0x000000000078bf7c in main (argc=<optimized out>, argv=<optimized out>) at ceph_osd.cc:517

2015-02-03 18:22:15.756863 7f7f77dfd840 -1 journal FileJournal::_open: aio not supported without directio; disabling aio
[ { "offset": 241516544,
"seq": 1154689,
"transactions": [ { "trans_num": 0,
"ops": [ { "op_num": 0,
"op_name": "write",
"collection": "meta",
"oid": "47890a1e\/inc_osdmap.80410\/0\/\/-1",
"length": 189,
"offset": 0,
"bufferlist length": 189}, { "op_num": 1,
"op_name": "write",
"collection": "meta",
"oid": "4f99bdb9\/osdmap.80410\/0\/\/-1",
"length": 50968,
"offset": 0,
"bufferlist length": 50968}, { "op_num": 2,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83b19\/osdmap.44555\/0\/\/-1"}, { "op_num": 3,
"op_name": "remove",
"collection": "meta",
"oid": "471ff07e\/inc_osdmap.44555\/0\/\/-1"}, { "op_num": 4,
"op_name": "remove",
"collection": "meta",
"oid": "4ee838a9\/osdmap.44556\/0\/\/-1"}, { "op_num": 5,
"op_name": "remove",
"collection": "meta",
"oid": "471ff10e\/inc_osdmap.44556\/0\/\/-1"}, { "op_num": 6,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83879\/osdmap.44557\/0\/\/-1"}, { "op_num": 7,
"op_name": "remove",
"collection": "meta",
"oid": "471ff6de\/inc_osdmap.44557\/0\/\/-1"}, { "op_num": 8,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83909\/osdmap.44558\/0\/\/-1"}, { "op_num": 9,
"op_name": "remove",
"collection": "meta",
"oid": "471ff66e\/inc_osdmap.44558\/0\/\/-1"}, { "op_num": 10,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83ed9\/osdmap.44559\/0\/\/-1"}, { "op_num": 11,
"op_name": "remove",
"collection": "meta",
"oid": "471ff73e\/inc_osdmap.44559\/0\/\/-1"}, { "op_num": 12,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83f39\/osdmap.44560\/0\/\/-1"}, { "op_num": 13,
"op_name": "remove",
"collection": "meta",
"oid": "471ff59e\/inc_osdmap.44560\/0\/\/-1"}, { "op_num": 14,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83cc9\/osdmap.44561\/0\/\/-1"}, { "op_num": 15,
"op_name": "remove",
"collection": "meta",
"oid": "471ff52e\/inc_osdmap.44561\/0\/\/-1"}, { "op_num": 16,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83d99\/osdmap.44562\/0\/\/-1"}, { "op_num": 17,
"op_name": "remove",
"collection": "meta",
"oid": "471f8afe\/inc_osdmap.44562\/0\/\/-1"}, { "op_num": 18,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83d29\/osdmap.44563\/0\/\/-1"}, { "op_num": 19,
"op_name": "remove",
"collection": "meta",
"oid": "471f8b8e\/inc_osdmap.44563\/0\/\/-1"}, { "op_num": 20,
"op_name": "remove",
"collection": "meta",
"oid": "4ee832f9\/osdmap.44564\/0\/\/-1"}, { "op_num": 21,
"op_name": "remove",
"collection": "meta",
"oid": "471f8b5e\/inc_osdmap.44564\/0\/\/-1"}, { "op_num": 22,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83389\/osdmap.44565\/0\/\/-1"}, { "op_num": 23,
"op_name": "remove",
"collection": "meta",
"oid": "471f88ee\/inc_osdmap.44565\/0\/\/-1"}, { "op_num": 24,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83359\/osdmap.44566\/0\/\/-1"}, { "op_num": 25,
"op_name": "remove",
"collection": "meta",
"oid": "471f89be\/inc_osdmap.44566\/0\/\/-1"}, { "op_num": 26,
"op_name": "remove",
"collection": "meta",
"oid": "4ee830e9\/osdmap.44567\/0\/\/-1"}, { "op_num": 27,
"op_name": "remove",
"collection": "meta",
"oid": "471f894e\/inc_osdmap.44567\/0\/\/-1"}, { "op_num": 28,
"op_name": "remove",
"collection": "meta",
"oid": "4ee831b9\/osdmap.44568\/0\/\/-1"}, { "op_num": 29,
"op_name": "remove",
"collection": "meta",
"oid": "471f8e1e\/inc_osdmap.44568\/0\/\/-1"}, { "op_num": 30,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83149\/osdmap.44569\/0\/\/-1"}, { "op_num": 31,
"op_name": "remove",
"collection": "meta",
"oid": "471f8fae\/inc_osdmap.44569\/0\/\/-1"}, { "op_num": 32,
"op_name": "remove",
"collection": "meta",
"oid": "4ee837a9\/osdmap.44570\/0\/\/-1"}, { "op_num": 33,
"op_name": "remove",
"collection": "meta",
"oid": "471f8c0e\/inc_osdmap.44570\/0\/\/-1"}, { "op_num": 34,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83779\/osdmap.44571\/0\/\/-1"}, { "op_num": 35,
"op_name": "remove",
"collection": "meta",
"oid": "471f8dde\/inc_osdmap.44571\/0\/\/-1"}, { "op_num": 36,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83409\/osdmap.44572\/0\/\/-1"}, { "op_num": 37,
"op_name": "remove",
"collection": "meta",
"oid": "471f8d6e\/inc_osdmap.44572\/0\/\/-1"}, { "op_num": 38,
"op_name": "remove",
"collection": "meta",
"oid": "4ee835d9\/osdmap.44573\/0\/\/-1"}, { "op_num": 39,
"op_name": "remove",
"collection": "meta",
"oid": "471f823e\/inc_osdmap.44573\/0\/\/-1"}, { "op_num": 40,
"op_name": "remove",
"collection": "meta",
"oid": "4ee83569\/osdmap.44574\/0\/\/-1"}, { "op_num": 41,
"op_name": "remove",
"collection": "meta",
"oid": "471f83ce\/inc_osdmap.44574\/0\/\/-1"}, { "op_num": 42,
"op_name": "remove",
"collection": "meta",
"oid": "4eefca39\/osdmap.44575\/0\/\/-1"}, { "op_num": 43,
"op_name": "remove",
"collection": "meta",
"oid": "471f809e\/inc_osdmap.44575\/0\/\/-1"}, { "op_num": 44,
"op_name": "remove",
"collection": "meta",
"oid": "4eefcbc9\/osdmap.44576\/0\/\/-1"}, { "op_num": 45,
"op_name": "remove",
"collection": "meta",
"oid": "471f802e\/inc_osdmap.44576\/0\/\/-1"}, { "op_num": 46,
"op_name": "remove",
"collection": "meta",
"oid": "4eefc899\/osdmap.44577\/0\/\/-1"}, { "op_num": 47,
"op_name": "remove",
"collection": "meta",
"oid": "471f81fe\/inc_osdmap.44577\/0\/\/-1"}, { "op_num": 48,
"op_name": "remove",
"collection": "meta",
"oid": "4eefc829\/osdmap.44578\/0\/\/-1"}, { "op_num": 49,
"op_name": "remove",
"collection": "meta",
"oid": "471f868e\/inc_osdmap.44578\/0\/\/-1"}, { "op_num": 50,
"op_name": "remove",
"collection": "meta",
"oid": "4eefc9f9\/osdmap.44579\/0\/\/-1"}, { "op_num": 51,
"op_name": "remove",
"collection": "meta",
"oid": "471f865e\/inc_osdmap.44579\/0\/\/-1"}, { "op_num": 52,
"op_name": "remove",
"collection": "meta",
"oid": "4eefce59\/osdmap.44580\/0\/\/-1"}, { "op_num": 53,
"op_name": "remove",
"collection": "meta",
"oid": "471f84be\/inc_osdmap.44580\/0\/\/-1"}, { "op_num": 54,
"op_name": "remove",
"collection": "meta",
"oid": "4eefcfe9\/osdmap.44581\/0\/\/-1"}, { "op_num": 55,
"op_name": "remove",
"collection": "meta",
"oid": "471f844e\/inc_osdmap.44581\/0\/\/-1"}, { "op_num": 56,
"op_name": "remove",
"collection": "meta",
"oid": "4eefccb9\/osdmap.44582\/0\/\/-1"}, { "op_num": 57,
"op_name": "remove",
"collection": "meta",
"oid": "471f851e\/inc_osdmap.44582\/0\/\/-1"}, { "op_num": 58,
"op_name": "remove",
"collection": "meta",
"oid": "4eefcc49\/osdmap.44583\/0\/\/-1"}, { "op_num": 59,
"op_name": "remove",
"collection": "meta",
"oid": "471f9aae\/inc_osdmap.44583\/0\/\/-1"}, { "op_num": 60,
"op_name": "remove",
"collection": "meta",
"oid": "4eefcd19\/osdmap.44584\/0\/\/-1"}, { "op_num": 61,
"op_name": "remove",
"collection": "meta",
"oid": "471f9a7e\/inc_osdmap.44584\/0\/\/-1"}, { "op_num": 62,
"op_name": "write",
"collection": "meta",
"oid": "23c2fcde\/osd_superblock\/0\/\/-1",
"length": 413,
"offset": 0,
"bufferlist length": 413}]}]}, { "offset": 241577984,
"seq": 1154690,
"transactions": [ { "trans_num": 0,
"ops": [ { "op_num": 0,
"op_name": "write",
"collection": "meta",
"oid": "47890bae\/inc_osdmap.80411\/0\/\/-1",
"length": 606,
"offset": 0,
"bufferlist length": 606}, { "op_num": 1,
"op_name": "write",
"collection": "meta",
"oid": "4f99bd49\/osdmap.80411\/0\/\/-1",
"length": 50968,
"offset": 0,
"bufferlist length": 50968}, { "op_num": 2,
"op_name": "write",
"collection": "meta",
"oid": "23c2fcde\/osd_superblock\/0\/\/-1",
"length": 413,
"offset": 0,
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
what(): buffer::end_of_buffer
"bufferlist length": 413}]}]}*** Caught signal (Aborted)
in thread 7f7f77dfd840
ceph version 0.92 (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0)
1: ceph-osd() [0xbde9fc]
2: (()+0xf0a0) [0x7f7f76d140a0]
3: (gsignal()+0x35) [0x7f7f7567d165]
4: (abort()+0x180) [0x7f7f756803e0]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f7f75ed489d]
6: (()+0x63996) [0x7f7f75ed2996]
7: (()+0x639c3) [0x7f7f75ed29c3]
8: (()+0x63bee) [0x7f7f75ed2bee]
9: ceph-osd() [0xd3cd19]
10: (void decode<std::string, ceph::buffer::list>(std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, ceph::buffer::list::iterator&)+0xf0) [0x85f480]
11: (ObjectStore::Transaction::dump(ceph::Formatter*)+0x1328) [0xab4c18]
12: (FileJournal::dump(std::ostream&)+0x67b) [0xb8e69b]
13: (FileStore::dump_journal(std::ostream&)+0x9d) [0xa2e57d]
14: (main()+0x8a2) [0x78a312]
15: (__libc_start_main()+0xfd) [0x7f7f75669ead]
16: ceph-osd() [0x791c09]
2015-02-03 18:22:15.761677 7f7f77dfd840 -1
Caught signal (Aborted) *
in thread 7f7f77dfd840

ceph version 0.92 (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0)
1: ceph-osd() [0xbde9fc]
2: (()+0xf0a0) [0x7f7f76d140a0]
3: (gsignal()+0x35) [0x7f7f7567d165]
4: (abort()+0x180) [0x7f7f756803e0]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f7f75ed489d]
6: (()+0x63996) [0x7f7f75ed2996]
7: (()+0x639c3) [0x7f7f75ed29c3]
8: (()+0x63bee) [0x7f7f75ed2bee]
9: ceph-osd() [0xd3cd19]
10: (void decode&lt;std::string, ceph::buffer::list&gt;(std::map&lt;std::string, ceph::buffer::list, std::less&lt;std::string&gt;, std::allocator&lt;std::pair&lt;std::string const, ceph::buffer::list&gt; > >&, ceph::buffer::list::iterator&)+0xf0) [0x85f480]
11: (ObjectStore::Transaction::dump(ceph::Formatter*)+0x1328) [0xab4c18]
12: (FileJournal::dump(std::ostream&)+0x67b) [0xb8e69b]
13: (FileStore::dump_journal(std::ostream&)+0x9d) [0xa2e57d]
14: (main()+0x8a2) [0x78a312]
15: (__libc_start_main()+0xfd) [0x7f7f75669ead]
16: ceph-osd() [0x791c09]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
-10> 2015-02-03 18:22:15.756863 7f7f77dfd840 -1 journal FileJournal::_open: aio not supported without directio; disabling aio
0> 2015-02-03 18:22:15.761677 7f7f77dfd840 -1 ** Caught signal (Aborted) *
in thread 7f7f77dfd840
ceph version 0.92 (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0)
1: ceph-osd() [0xbde9fc]
2: (()+0xf0a0) [0x7f7f76d140a0]
3: (gsignal()+0x35) [0x7f7f7567d165]
4: (abort()+0x180) [0x7f7f756803e0]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f7f75ed489d]
6: (()+0x63996) [0x7f7f75ed2996]
7: (()+0x639c3) [0x7f7f75ed29c3]
8: (()+0x63bee) [0x7f7f75ed2bee]
9: ceph-osd() [0xd3cd19]
10: (void decode&lt;std::string, ceph::buffer::list&gt;(std::map&lt;std::string, ceph::buffer::list, std::less&lt;std::string&gt;, std::allocator&lt;std::pair&lt;std::string const, ceph::buffer::list&gt; > >&, ceph::buffer::list::iterator&)+0xf0) [0x85f480]
11: (ObjectStore::Transaction::dump(ceph::Formatter*)+0x1328) [0xab4c18]
12: (FileJournal::dump(std::ostream&)+0x67b) [0xb8e69b]
13: (FileStore::dump_journal(std::ostream&)+0x9d) [0xa2e57d]
14: (main()+0x8a2) [0x78a312]
15: (__libc_start_main()+0xfd) [0x7f7f75669ead]
16: ceph-osd() [0x791c09]

Related issues

Related to Ceph - Bug #10985: Some OSDs don't get up after upgrade from v0.92 to v0.93 Won't Fix 03/02/2015

History

#2 Updated by Samuel Just over 5 years ago

That journal is apparently after a mkjournal and may be a result of the append/recovery class of bugs. We're trying to get another log/journal combo.

#3 Updated by Daniel Swarbrick over 5 years ago

Reproduced on another node that was still running 0.91 OSDs. Coredump, log and journal are on cephdump in directory "bug10734".

backtrace:

Core was generated by `/usr/bin/ceph-osd -i 32 --pid-file /var/run/ceph/osd.32.pid -c /etc/ceph/ceph.c'.
Program terminated with signal 6, Aborted.
#0  0x00007f79db161f6b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f79db161f6b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000000000bdeaba in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:109
#3  <signal handler called>
#4  0x00007f79d9acb165 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f79d9ace3e0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f79da32289d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f79da320996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f79da3209c3 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f79da320bee in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x0000000000d3cd19 in ceph::buffer::list::iterator::copy (this=0x7fffd1463810, len=1634966223, dest=...) at common/buffer.cc:926
#11 0x0000000000cf42b4 in decode (p=..., s=...) at ./include/encoding.h:174
#12 decode (bl=..., this=0x7fffd14635c0) at ./include/object.h:51
#13 decode (p=..., c=...) at ./include/object.h:54
#14 ghobject_t::decode (this=0x7fffd14635c0, bl=...) at common/hobject.cc:219
#15 0x00000000009e2856 in decode (p=..., c=...) at ./common/hobject.h:333
#16 decode<ghobject_t, unsigned int> (m=..., p=...) at ./include/encoding.h:610
#17 0x0000000000a5e271 in decode (bl=..., this=0x3e21340) at os/ObjectStore.h:1574
#18 Transaction (dp=..., this=0x3e21340) at os/ObjectStore.h:1527
#19 JournalingObjectStore::journal_replay (this=0x3e04000, fs_op_seq=<optimized out>) at os/JournalingObjectStore.cc:86
#20 0x0000000000a3f8c0 in FileStore::mount (this=0x3e04000) at os/FileStore.cc:1558
#21 0x00000000007e7d61 in OSD::init (this=0x3e5c000) at osd/OSD.cc:1766
#22 0x000000000078bf7c in main (argc=<optimized out>, argv=<optimized out>) at ceph_osd.cc:517

#5 Updated by Samuel Just over 5 years ago

#6 Updated by Samuel Just over 5 years ago

  • Status changed from New to Fix Under Review

#7 Updated by Sage Weil over 5 years ago

  • Status changed from Fix Under Review to Resolved

#8 Updated by Samuel Just over 5 years ago

The work around if you are stuck on v0.91/v0.92 is to wait for hammer, shutdown all osds, upgrade to hammer, and restart all osds.

Also available in: Atom PDF