Project

General

Profile

Actions

Bug #13010

closed

Failed on starting osd-daemon when upgrade giant-0.87.1 to hammer-0.94.3

Added by tvm tvm over 8 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/hammer
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

?hi all:

I got on error when upgrade my ceph cluster from giant-0.87.2 to hammer-0.94.3, my local environment is:
CentOS 6.7 x86_64
Kernel 3.10.86-1.el6.elrepo.x86_64
HDD: XFS, 2TB
Install Package: ceph.com official RPMs x86_64

step 1:
Upgrade MON server from 0.87.1 to 0.94.3, all is fine!

step 2:
Upgrade OSD server from 0.87.1 to 0.94.3. i just upgrade two servers and noticed that some osds can not started!
server-1 have 4 osds, all of them can not started;
server-2 have 3 osds, 2 of them can not started, but 1 of them successfully started and work in good.

the error log 1:
service ceph start osd.4
/var/log/ceph/ceph-osd.24.log
(attachment file: ceph.24.log)

error log 2:
/usr/bin/ceph-osd -c /etc/ceph/ceph.conf -i 4 -f
(attachment file: cli.24.log)

---------------------
There seems some data file version error, so how can i repair my osds?

thank you ~


Files

ceph.24.log (35.8 KB) ceph.24.log tvm tvm, 09/09/2015 11:59 AM
cli.24.log (4.13 KB) cli.24.log tvm tvm, 09/09/2015 11:59 AM
Actions #1

Updated by huang jun over 8 years ago

the assert log
2015-09-09 12:28:30.146367 7fc36bee4800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t, ceph::bufferlist*)' thread 7fc36bee4800 time 2015-09-09 12:28:31.064297
osd/PG.cc: 2864: FAILED assert(values.size() 1)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0x803) [0x826d73]
2: (OSD::load_pgs()+0x1506) [0x6697a6]
3: (OSD::init()+0x174e) [0x68a89e]
4: (main()+0x384f) [0x62e2cf]
5: (__libc_start_main()+0xfd) [0x3dbd81ed5d]
6: /usr/bin/ceph-osd() [0x6299d9]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-09-09 12:28:31.065935 7fc36bee4800 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t, ceph::bufferlist*)' thread 7fc36bee4800 time 2015-09-09 12:28:31.064297
osd/PG.cc: 2864: FAILED assert(values.size() 1)

Actions #2

Updated by huang jun over 8 years ago

you can install ceph-debuginfo package and gdb to trace the problem if you have a core file.

Actions #3

Updated by tvm tvm over 8 years ago

i install ceph-debuginfo and have a core file:

gdb /usr/bin/ceph-osd /core.66523
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-83.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/&gt;...
Reading symbols from /usr/bin/ceph-osd...Reading symbols from /usr/lib/debug/usr/bin/ceph-osd.debug...done.
done.
[New Thread 66523]
[New Thread 66524]
[New Thread 66529]
[New Thread 66528]
[New Thread 66541]
[New Thread 66542]
[New Thread 66539]
[New Thread 66544]
[New Thread 66537]
[New Thread 66534]
[New Thread 66533]
[New Thread 66536]
[New Thread 66532]
[New Thread 66530]
[New Thread 66531]
[New Thread 66547]
[New Thread 66543]
[New Thread 66540]
[New Thread 66538]
[New Thread 66546]
[New Thread 66535]
[New Thread 66545]
Missing separate debuginfo for
Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/1a/9ae5787f45887a8ee6e9724c56d31a8cff1a40
Reading symbols from /usr/lib64/liblttng-ust.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/liblttng-ust.so.0
Reading symbols from /lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libaio.so.1
Reading symbols from /usr/lib64/libleveldb.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libleveldb.so.1
Reading symbols from /usr/lib64/libtcmalloc.so.4...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libtcmalloc.so.4
Reading symbols from /usr/lib64/libnss3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libnss3.so
Reading symbols from /lib64/libnspr4.so...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnspr4.so
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libuuid.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libuuid.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libboost_thread-mt.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libboost_thread-mt.so.5
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/liblttng-ust-tracepoint.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/liblttng-ust-tracepoint.so.0
Reading symbols from /usr/lib64/liburcu-bp.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/liburcu-bp.so.1
Reading symbols from /usr/lib64/liburcu-cds.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/liburcu-cds.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libsnappy.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libsnappy.so.1
Reading symbols from /usr/lib64/libunwind.so.8...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libunwind.so.8
Reading symbols from /usr/lib64/libnssutil3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libnssutil3.so
Reading symbols from /lib64/libplc4.so...(no debugging symbols found)...done.
Loaded symbols for /lib64/libplc4.so
Reading symbols from /lib64/libplds4.so...(no debugging symbols found)...done.
Loaded symbols for /lib64/libplds4.so
Reading symbols from /usr/lib64/liburcu-common.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/liburcu-common.so.1
Reading symbols from /usr/lib64/libsoftokn3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libsoftokn3.so
Reading symbols from /usr/lib64/libsqlite3.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libsqlite3.so.0
Reading symbols from /usr/lib64/libfreeblpriv3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libfreeblpriv3.so
Reading symbols from /usr/lib64/ceph/erasure-code/libec_jerasure.so.2.0.0...Reading symbols from /usr/lib/debug/usr/lib64/ceph/erasure-code/libec_jerasure.so.2.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/ceph/erasure-code/libec_jerasure.so.2.0.0
Reading symbols from /usr/lib64/libboost_system-mt.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libboost_system-mt.so.5
Reading symbols from /usr/lib64/ceph/erasure-code/libec_jerasure_sse4.so.2.0.0...Reading symbols from /usr/lib/debug/usr/lib64/ceph/erasure-code/libec_jerasure_sse4.so.2.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/ceph/erasure-code/libec_jerasure_sse4.so.2.0.0
Reading symbols from /usr/lib64/ceph/erasure-code/libec_lrc.so.1.0.0...Reading symbols from /usr/lib/debug/usr/lib64/ceph/erasure-code/libec_lrc.so.1.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/ceph/erasure-code/libec_lrc.so.1.0.0
Reading symbols from /usr/lib64/ceph/erasure-code/libec_isa.so.2.0.10...Reading symbols from /usr/lib/debug/usr/lib64/ceph/erasure-code/libec_isa.so.2.0.10.debug...done.
done.
Loaded symbols for /usr/lib64/ceph/erasure-code/libec_isa.so.2.0.10
Reading symbols from /usr/lib64/rados-classes/libcls_user.so.1.0.0...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_user.so.1.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_user.so.1.0.0
Reading symbols from /usr/lib64/rados-classes/libcls_kvs.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_kvs.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_kvs.so
Reading symbols from /usr/lib64/rados-classes/libcls_refcount.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_refcount.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_refcount.so
Reading symbols from /usr/lib64/rados-classes/libcls_log.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_log.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_log.so
Reading symbols from /usr/lib64/rados-classes/libcls_rbd.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_rbd.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_rbd.so
Reading symbols from /usr/lib64/rados-classes/libcls_rgw.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_rgw.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_rgw.so
Reading symbols from /usr/lib64/rados-classes/libcls_statelog.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_statelog.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_statelog.so
Reading symbols from /usr/lib64/rados-classes/libcls_version.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_version.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_version.so
Reading symbols from /usr/lib64/rados-classes/libcls_replica_log.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_replica_log.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_replica_log.so
Reading symbols from /usr/lib64/rados-classes/libcls_hello.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_hello.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_hello.so
Reading symbols from /usr/lib64/rados-classes/libcls_lock.so...Reading symbols from /usr/lib/debug/usr/lib64/rados-classes/libcls_lock.so.debug...done.
done.
Loaded symbols for /usr/lib64/rados-classes/libcls_lock.so

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffcc51dc000
Core was generated by `/usr/bin/ceph-osd -i 4 --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 6, Aborted.
#0 0x000000306540f5db in raise () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install boost-system-1.41.0-25.el6.centos.x86_64 boost-thread-1.41.0-25.el6.centos.x86_64 glibc-2.12-1.149.el6_6.5.x86_64 gperftools-libs-2.0-11.el6.3.x86_64 leveldb-1.7.0-2.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64 libunwind-1.1-2.el6.x86_64 libuuid-2.17.2-12.18.el6.x86_64 lttng-ust-2.4.1-1.el6.x86_64 nspr-4.10.6-1.el6_5.x86_64 nss-3.16.2.3-3.el6_6.x86_64 nss-softokn-3.14.3-22.el6_6.x86_64 nss-softokn-freebl-3.14.3-22.el6_6.x86_64 nss-util-3.16.2.3-2.el6_6.x86_64 snappy-1.1.0-1.el6.x86_64 sqlite-3.6.20-1.el6.x86_64 userspace-rcu-0.7.7-1.el6.x86_64

(gdb) backtrace
#0 0x000000306540f5db in raise () from /lib64/libpthread.so.0
#1 0x0000000000a48926 in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2 handle_fatal_signal (signum=6) at global/signal_handler.cc:109
#3 <signal handler called>
#4 0x0000003065032625 in raise () from /lib64/libc.so.6
#5 0x0000003065033e05 in abort () from /lib64/libc.so.6
#6 0x00000030668bea7d in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#7 0x00000030668bcbd6 in ?? () from /usr/lib64/libstdc++.so.6
#8 0x00000030668bcc03 in std::terminate() () from /usr/lib64/libstdc++.so.6
#9 0x00000030668bcd22 in _cxa_throw () from /usr/lib64/libstdc++.so.6
#10 0x0000000000b1b19a in ceph::
_ceph_assert_fail (assertion=0xc0e3f60 "0@\016\f", file=<value optimized out>, line=66060288,
func=0xc9cbe0 "static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t, ceph::bufferlist*)") at common/assert.cc:77
#11 0x0000000000826d73 in PG::peek_map_epoch (store=0x3f38000, pgid=..., bl=<value optimized out>) at osd/PG.cc:2864
#12 0x00000000006697a6 in OSD::load_pgs (this=0x3ea8000) at osd/OSD.cc:2822
#13 0x000000000068a89e in OSD::init (this=0x3ea8000) at osd/OSD.cc:1893
#14 0x000000000062e2cf in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:523
(gdb)

Actions #4

Updated by Sage Weil about 7 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF