Project

General

Profile

Actions

Bug #14471

closed

assert(last_e.version.version < e.version.version) failed

Added by huang jun over 8 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

hi, when i restart my osd, it failed in read_log,
here is the gdb log:
@(gdb) bt#0 0x0000003d26c0f5db in raise () from /lib64/libpthread.so.0#1 0x0000000000a57266 in reraise_fatal (signum=6) at

global/signal_handler.cc:59
#2 handle_fatal_signal (signum=6) at global/signal_handler.cc:109
#3 <signal handler called>
#4 0x0000003d26832925 in raise () from /lib64/libc.so.6
#5 0x0000003d26834105 in abort () from /lib64/libc.so.6
#6 0x0000003d28cbea5d in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#7 0x0000003d28cbcbe6 in ?? () from /usr/lib64/libstdc++.so.6
#8 0x0000003d28cbcc13 in std::terminate() () from /usr/lib64/libstdc++.so.6
#9 0x0000003d28cbcd0e in _cxa_throw () from /usr/lib64/libstdc++.so.6
#10 0x0000000000adb93a in ceph::
_ceph_assert_fail (assertion=0x7fbef869a200 "$\225\241V\200sA\022", file=<value optimized out>,

line=-129613568,
func=0xc89200 "static void PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t,

hobject_t, std::less<eversion_t>, std::allocator<std::pair<const eversion_t, hobject_t> > >"...) at common/assert.cc:77
#11 0x0000000000863371 in PGLog::read_log (store=0x7fbef8489000, pg_coll=..., log_coll=<value optimized out>, log_oid=<value

optimized out>,
info=..., divergent_priors=std::map with 0 elements, log=..., missing=..., oss=..., log_keys_debug=0x0) at

osd/PGLog.cc:889#12 0x00000000008ac6cc in read_log (this=0x7fbef86db000, store=0x7fbef8489000, bl=<value optimized out>) at

osd/PGLog.h:669
#13 PG::read_state (this=0x7fbef86db000, store=0x7fbef8489000, bl=<value optimized out>) at osd/PG.cc:3079
#14 0x0000000000672a38 in OSD::load_pgs (this=0x7fbef85cd000) at osd/OSD.cc:2850
#15 0x0000000000688cce in OSD::init (this=0x7fbef85cd000) at osd/OSD.cc:1893
#16 0x000000000062f6ff in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:523
(gdb) f 11
#11 0x0000000000863371 in PGLog::read_log (store=0x7fbef8489000, pg_coll=..., log_coll=<value optimized out>, log_oid=<value

optimized out>,
info=..., divergent_priors=std::map with 0 elements, log=..., missing=..., oss=..., log_keys_debug=0x0) at osd/PGLog.cc:889
889 assert(last_e.version.version < e.version.version);
(gdb) l
884 pg_log_entry_t e;
885 e.decode_with_checksum(bp);
886 dout(20) << "read_log " << e << dendl;
887 if (!log.log.empty()) {
888 pg_log_entry_t last_e(log.log.back());
889 assert(last_e.version.version < e.version.version);
890 assert(last_e.version.epoch <= e.version.epoch);
891 }
892 log.log.push_back(e);
893 log.head = e.version;
(gdb) l
894 if (log_keys_debug)
895 log_keys_debug->insert(e.get_key_name());
896 }
897 }
898 }
899 log.head = info.last_update;
900 log.index();
901
902 // build missing
903 if (info.last_complete < info.last_update) {
(gdb) p e.version.version
$1 = 7
(gdb) p last_e.version.version
$2 = 7
(gdb) p last_e.version.epoch
$3 = 4223
(gdb) p e.version.epoch
$4 = 4275
@
i don't know why the last log entry's version is equals to one read from omap.

Actions #1

Updated by huang jun over 8 years ago

and the osd log:
@-9> 2016-01-22 10:58:10.056257 7fb4d099b800 20 read_log coll 1.7b_head log_oid 7b//head//1
-8> 2016-01-22 10:58:10.056316 7fb4d099b800 20 read_log 3789'1 (0'0) modify 11d8077b/1000000041f.00000000/head//1 by

mds.0.6:95207 2016-01-19 18:19:33.512527
-7> 2016-01-22 10:58:10.056338 7fb4d099b800 20 read_log 3789'2 (0'0) modify 6bb2097b/100000008e7.00000000/head//1 by

mds.0.6:8485810 2016-01-21 11:06:55.576031
-6> 2016-01-22 10:58:10.056354 7fb4d099b800 20 read_log 3789'3 (0'0) modify 1dfe1a7b/10000000444.00000000/head//1 by

mds.0.6:8485840 2016-01-21 11:06:55.577931
-5> 2016-01-22 10:58:10.056374 7fb4d099b800 20 read_log 4164'4 (3789'1) modify 11d8077b/1000000041f.00000000/head//1 by

mds.0.7:601523 2016-01-21 18:41:08.754611
-4> 2016-01-22 10:58:10.056390 7fb4d099b800 20 read_log 4164'5 (3789'2) modify 6bb2097b/100000008e7.00000000/head//1 by

mds.0.7:601864 2016-01-21 18:41:08.769450
-3> 2016-01-22 10:58:10.056408 7fb4d099b800 20 read_log 4164'6 (3789'3) modify 1dfe1a7b/10000000444.00000000/head//1 by

mds.0.7:601894 2016-01-21 18:41:08.770631
-2> 2016-01-22 10:58:10.056424 7fb4d099b800 20 read_log 4223'7 (4164'4) modify 11d8077b/1000000041f.00000000/head//1 by

mds.0.8:180409 2016-01-22 09:47:24.217448
-1> 2016-01-22 10:58:10.056439 7fb4d099b800 20 read_log 4275'7 (4223'8) modify 6bb2097b/100000008e7.00000000/head//1 by

mds.0.9:6383 2016-01-22 09:53:24.777944@

Actions #2

Updated by Sage Weil about 8 years ago

  • Status changed from New to Need More Info
  • Source changed from other to Community (dev)

what version is this? what caused the osd crash? are other osds failing, or was it able to recover with teh other replicas?

Actions #3

Updated by huang jun about 8 years ago

ceph version is hammer v0.94.5.
The osd was crashed by "assert(last_e.version.version < e.version.version) failed",
there are also some other osds crashed by the same reason.
We can not reproduce it every time,
so next time it occured we will set debug_osd=20,debug_filestore=20,debug_ms=1, and post the log.

Actions #4

Updated by Josh Durgin about 7 years ago

  • Status changed from Need More Info to Can't reproduce

Please reopen if you've reproduced with logs

Actions

Also available in: Atom PDF