Bug #13511
closedcore dump during osd start
0%
Description
root@stmon:~# ceph-osd -i 90 -c /etc/ceph/ceph.conf --mkjournal --mkfs -f --debug-osd 20 --debug-ms 1
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2015-10-16 04:03:29.668459 7fe932a05900 -1 journal Unable to read past sequence 2 but header indicates the journal has committed up through 3512, journal is corrupt
os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7fe932a05900 time 2015-10-16 04:03:29.668479
os/FileJournal.cc: 1780: FAILED assert(0)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xae3) [0xa789b3]
3: (JournalingObjectStore::journal_replay(unsigned long)+0x191) [0x940711]
4: (FileStore::mount()+0x3bb6) [0x911786]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0xf0) [0x68c020]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-10-16 04:03:29.673209 7fe932a05900 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7fe932a05900 time 2015-10-16 04:03:29.668479
os/FileJournal.cc: 1780: FAILED assert(0)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xae3) [0xa789b3]
3: (JournalingObjectStore::journal_replay(unsigned long)+0x191) [0x940711]
4: (FileStore::mount()+0x3bb6) [0x911786]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0xf0) [0x68c020]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-1> 2015-10-16 04:03:29.668459 7fe932a05900 -1 journal Unable to read past sequence 2 but header indicates the journal has committed up through 3512, journal is corrupt
0> 2015-10-16 04:03:29.673209 7fe932a05900 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7fe932a05900 time 2015-10-16 04:03:29.668479
os/FileJournal.cc: 1780: FAILED assert(0)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xae3) [0xa789b3]
3: (JournalingObjectStore::journal_replay(unsigned long)+0x191) [0x940711]
4: (FileStore::mount()+0x3bb6) [0x911786]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0xf0) [0x68c020]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-10-16 04:03:29.674677 7fe932a05900 -1 OSD::mkfs: caught unknown exception.
common/config.cc: In function 'void md_config_t::remove_observer(md_config_obs_t*)' thread 7fe932a05900 time 2015-10-16 04:03:29.674954
common/config.cc: 196: FAILED assert(found_obs)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
3: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
4: (FileStore::umount()+0x170) [0x8f11e0]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-10-16 04:03:29.679166 7fe932a05900 -1 common/config.cc: In function 'void md_config_t::remove_observer(md_config_obs_t*)' thread 7fe932a05900 time 2015-10-16 04:03:29.674954
common/config.cc: 196: FAILED assert(found_obs)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
3: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
4: (FileStore::umount()+0x170) [0x8f11e0]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-1> 2015-10-16 04:03:29.674677 7fe932a05900 -1 OSD::mkfs: caught unknown exception.
0> 2015-10-16 04:03:29.679166 7fe932a05900 -1 common/config.cc: In function 'void md_config_t::remove_observer(md_config_obs_t*)' thread 7fe932a05900 time 2015-10-16 04:03:29.674954
common/config.cc: 196: FAILED assert(found_obs)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
3: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
4: (FileStore::umount()+0x170) [0x8f11e0]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
- Caught signal (Aborted)
in thread 7fe932a05900
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: ceph-osd() [0xacb49a]
2: (()+0x10340) [0x7fe9316b9340]
3: (gsignal()+0x39) [0x7fe92fb58cc9]
4: (abort()+0x148) [0x7fe92fb5c0d8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7fe930463535]
6: (()+0x5e6d6) [0x7fe9304616d6]
7: (()+0x5e703) [0x7fe930461703]
8: (()+0x5e922) [0x7fe930461922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc3958]
10: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
11: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
12: (FileStore::umount()+0x170) [0x8f11e0]
13: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
14: (main()+0xa1f) [0x65025f]
15: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
16: ceph-osd() [0x66b0d7]
2015-10-16 04:03:29.684739 7fe932a05900 -1 Caught signal (Aborted) *
in thread 7fe932a05900
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: ceph-osd() [0xacb49a]
2: (()+0x10340) [0x7fe9316b9340]
3: (gsignal()+0x39) [0x7fe92fb58cc9]
4: (abort()+0x148) [0x7fe92fb5c0d8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7fe930463535]
6: (()+0x5e6d6) [0x7fe9304616d6]
7: (()+0x5e703) [0x7fe930461703]
8: (()+0x5e922) [0x7fe930461922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc3958]
10: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
11: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
12: (FileStore::umount()+0x170) [0x8f11e0]
13: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
14: (main()+0xa1f) [0x65025f]
15: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
16: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2015-10-16 04:03:29.684739 7fe932a05900 -1 ** Caught signal (Aborted) *
in thread 7fe932a05900
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: ceph-osd() [0xacb49a]
2: (()+0x10340) [0x7fe9316b9340]
3: (gsignal()+0x39) [0x7fe92fb58cc9]
4: (abort()+0x148) [0x7fe92fb5c0d8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7fe930463535]
6: (()+0x5e6d6) [0x7fe9304616d6]
7: (()+0x5e703) [0x7fe930461703]
8: (()+0x5e922) [0x7fe930461922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc3958]
10: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
11: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
12: (FileStore::umount()+0x170) [0x8f11e0]
13: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
14: (main()+0xa1f) [0x65025f]
15: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
16: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aborted (core dumped)
Updated by Yuri Weinstein over 8 years ago
See similar in run:
http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-12-02_20:55:02-rados-hammer-distro-basic-openstack/
Job: 25952
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2015-12-02_20:55:02-rados-hammer-distro-basic-openstack/25952/teuthology.log
2015-12-02T22:41:11.566 INFO:tasks.ceph.osd.2.target070106.stderr:--00:00:09:20.959 5088-- You may be able to write your own handler. 2015-12-02T22:41:11.566 INFO:tasks.ceph.osd.2.target070106.stderr:--00:00:09:20.959 5088-- Read the file README_MISSING_SYSCALL_OR_IOCTL. 2015-12-02T22:41:11.566 INFO:tasks.ceph.osd.2.target070106.stderr:--00:00:09:20.959 5088-- Nevertheless we consider this a bug. Please report 2015-12-02T22:41:11.567 INFO:tasks.ceph.osd.2.target070106.stderr:--00:00:09:20.959 5088-- it at http://valgrind.org/support/bug_reports.html. 2015-12-02T22:41:12.433 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- WARNING: unhandled syscall: 306 2015-12-02T22:41:12.433 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- You may be able to write your own handler. 2015-12-02T22:41:12.434 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- Read the file README_MISSING_SYSCALL_OR_IOCTL. 2015-12-02T22:41:12.434 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- Nevertheless we consider this a bug. Please report 2015-12-02T22:41:12.434 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- it at http://valgrind.org/support/bug_reports.html. 2015-12-02T22:41:12.976 INFO:tasks.ceph.osd.0.target070106.stderr:2015-12-02 22:41:12.851128 2d432700 -1 osd.0 1366 pgid 114.0s0 has ref count of 2 2015-12-02T22:41:13.018 INFO:tasks.ceph.osd.0.target070106.stderr:osd/OSD.cc: In function 'int OSD::shutdown()' thread 2d432700 time 2015-12-02 22:41:12.854339 2015-12-02T22:41:13.019 INFO:tasks.ceph.osd.0.target070106.stderr:osd/OSD.cc: 2401: FAILED assert(0) 2015-12-02T22:41:13.069 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.139 7450-- WARNING: unhandled syscall: 306 2015-12-02T22:41:13.069 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.143 7450-- You may be able to write your own handler. 2015-12-02T22:41:13.069 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.143 7450-- Read the file README_MISSING_SYSCALL_OR_IOCTL. 2015-12-02T22:41:13.070 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.143 7450-- Nevertheless we consider this a bug. Please report 2015-12-02T22:41:13.070 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.143 7450-- it at http://valgrind.org/support/bug_reports.html. 2015-12-02T22:41:13.076 INFO:tasks.ceph.osd.0.target070106.stderr: ceph version 0.94.5-163-g8c4145e (8c4145ecc4a68accdb2120889fd933e8f6630dba) 2015-12-02T22:41:13.076 INFO:tasks.ceph.osd.0.target070106.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc527b] 2015-12-02T22:41:13.077 INFO:tasks.ceph.osd.0.target070106.stderr: 2: (OSD::shutdown()+0x169c) [0x68966c] 2015-12-02T22:41:13.077 INFO:tasks.ceph.osd.0.target070106.stderr: 3: (OSD::handle_signal(int)+0x60) [0x689e90] 2015-12-02T22:41:13.077 INFO:tasks.ceph.osd.0.target070106.stderr: 4: (SignalHandler::entry()+0x117) [0xacd607] 2015-12-02T22:41:13.077 INFO:tasks.ceph.osd.0.target070106.stderr: 5: (()+0x8182) [0x5d72182] 2015-12-02T22:41:13.078 INFO:tasks.ceph.osd.0.target070106.stderr: 6: (clone()+0x6d) [0x784447d] 2015-12-02T22:41:13.078 INFO:tasks.ceph.osd.0.target070106.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2015-12-02T22:41:13.079 INFO:tasks.ceph.osd.0.target070106.stderr:2015-12-02 22:41:12.954352 2d432700 -1 osd/OSD.cc: In function 'int OSD::shutdown()' thread 2d432700 time 2015-12-02 22:41:12.854339
Updated by Samuel Just over 8 years ago
- Related to Bug #13989: OSD boot fails with os/FileJournal.cc: 1907: FAILED assert(0) added
Updated by Samuel Just over 8 years ago
- Related to deleted (Bug #13989: OSD boot fails with os/FileJournal.cc: 1907: FAILED assert(0))
Updated by Samuel Just over 8 years ago
- Has duplicate Bug #13989: OSD boot fails with os/FileJournal.cc: 1907: FAILED assert(0) added
Updated by Yuri Weinstein about 8 years ago
Seems similar in ceph version 0.94.5
Run: http://pulpito.ceph.com/teuthology-2016-01-16_09:00:08-rados-hammer-distro-basic-mira/
Job: http://qa-proxy.ceph.com/teuthology/teuthology-2016-01-16_09:00:08-rados-hammer-distro-basic-mira/31430/teuthology.log
2016-01-16T16:54:01.554 INFO:tasks.ceph.osd.2.mira117.stderr: 0> 2016-01-16 19:54:01.425482 7fa72d56d700 -1 os/FileJournal.cc: In function 'void FileJournal::do_write(ceph::bufferlist&)' thread 7fa72d56d700 time 2016-01-16 19:54:01.390011 2016-01-16T16:54:01.554 INFO:tasks.ceph.osd.2.mira117.stderr:os/FileJournal.cc: 1075: FAILED assert(0) 2016-01-16T16:54:01.554 INFO:tasks.ceph.osd.2.mira117.stderr: 2016-01-16T16:54:01.555 INFO:tasks.ceph.osd.2.mira117.stderr: ceph version 0.94.5-221-g4e67418 (4e67418958e5caf5e4f81c4ed566e8c7269930fa) 2016-01-16T16:54:01.555 INFO:tasks.ceph.osd.2.mira117.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xba856b] 2016-01-16T16:54:01.555 INFO:tasks.ceph.osd.2.mira117.stderr: 2: (FileJournal::do_write(ceph::buffer::list&)+0x931) [0xa62441] 2016-01-16T16:54:01.555 INFO:tasks.ceph.osd.2.mira117.stderr: 3: (FileJournal::write_thread_entry()+0x69b) [0xa663bb] 2016-01-16T16:54:01.556 INFO:tasks.ceph.osd.2.mira117.stderr: 4: (FileJournal::Writer::entry()+0xd) [0x91182d] 2016-01-16T16:54:01.556 INFO:tasks.ceph.osd.2.mira117.stderr: 5: (()+0x8182) [0x7fa738909182] 2016-01-16T16:54:01.556 INFO:tasks.ceph.osd.2.mira117.stderr: 6: (clone()+0x6d) [0x7fa736e7447d] 2016-01-16T16:54:01.556 INFO:tasks.ceph.osd.2.mira117.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Yuri Weinstein about 8 years ago
Yuri Weinstein wrote:
Seems similar in ceph version 0.94.5
Run: http://pulpito.ceph.com/teuthology-2016-01-16_09:00:08-rados-hammer-distro-basic-mira/
Job: http://qa-proxy.ceph.com/teuthology/teuthology-2016-01-16_09:00:08-rados-hammer-distro-basic-mira/31430/teuthology.log[...]
Those errors ^ are unrelated and are because of bad disks on nodes.
Updated by Loïc Dachary about 8 years ago
- Status changed from New to Rejected
The root cause of these failures is a bad disk.