Project

General

Profile

Bug #13511

core dump during osd start

Added by Vaidyanath Manogaran over 8 years ago. Updated about 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

root@stmon:~# ceph-osd -i 90 -c /etc/ceph/ceph.conf --mkjournal --mkfs -f --debug-osd 20 --debug-ms 1
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2015-10-16 04:03:29.668459 7fe932a05900 -1 journal Unable to read past sequence 2 but header indicates the journal has committed up through 3512, journal is corrupt
os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7fe932a05900 time 2015-10-16 04:03:29.668479
os/FileJournal.cc: 1780: FAILED assert(0)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xae3) [0xa789b3]
3: (JournalingObjectStore::journal_replay(unsigned long)+0x191) [0x940711]
4: (FileStore::mount()+0x3bb6) [0x911786]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0xf0) [0x68c020]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-10-16 04:03:29.673209 7fe932a05900 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7fe932a05900 time 2015-10-16 04:03:29.668479
os/FileJournal.cc: 1780: FAILED assert(0)

ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xae3) [0xa789b3]
3: (JournalingObjectStore::journal_replay(unsigned long)+0x191) [0x940711]
4: (FileStore::mount()+0x3bb6) [0x911786]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0xf0) [0x68c020]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
-1> 2015-10-16 04:03:29.668459 7fe932a05900 -1 journal Unable to read past sequence 2 but header indicates the journal has committed up through 3512, journal is corrupt
0> 2015-10-16 04:03:29.673209 7fe932a05900 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7fe932a05900 time 2015-10-16 04:03:29.668479
os/FileJournal.cc: 1780: FAILED assert(0)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xae3) [0xa789b3]
3: (JournalingObjectStore::journal_replay(unsigned long)+0x191) [0x940711]
4: (FileStore::mount()+0x3bb6) [0x911786]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0xf0) [0x68c020]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

2015-10-16 04:03:29.674677 7fe932a05900 -1 OSD::mkfs: caught unknown exception.
common/config.cc: In function 'void md_config_t::remove_observer(md_config_obs_t*)' thread 7fe932a05900 time 2015-10-16 04:03:29.674954
common/config.cc: 196: FAILED assert(found_obs)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
3: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
4: (FileStore::umount()+0x170) [0x8f11e0]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-10-16 04:03:29.679166 7fe932a05900 -1 common/config.cc: In function 'void md_config_t::remove_observer(md_config_obs_t*)' thread 7fe932a05900 time 2015-10-16 04:03:29.674954
common/config.cc: 196: FAILED assert(found_obs)

ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
3: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
4: (FileStore::umount()+0x170) [0x8f11e0]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
-1> 2015-10-16 04:03:29.674677 7fe932a05900 -1 OSD::mkfs: caught unknown exception.
0> 2015-10-16 04:03:29.679166 7fe932a05900 -1 common/config.cc: In function 'void md_config_t::remove_observer(md_config_obs_t*)' thread 7fe932a05900 time 2015-10-16 04:03:29.674954
common/config.cc: 196: FAILED assert(found_obs)
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc376b]
2: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
3: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
4: (FileStore::umount()+0x170) [0x8f11e0]
5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
6: (main()+0xa1f) [0x65025f]
7: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
8: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
  • Caught signal (Aborted)
    in thread 7fe932a05900
    ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
    1: ceph-osd() [0xacb49a]
    2: (()+0x10340) [0x7fe9316b9340]
    3: (gsignal()+0x39) [0x7fe92fb58cc9]
    4: (abort()+0x148) [0x7fe92fb5c0d8]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7fe930463535]
    6: (()+0x5e6d6) [0x7fe9304616d6]
    7: (()+0x5e703) [0x7fe930461703]
    8: (()+0x5e922) [0x7fe930461922]
    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc3958]
    10: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
    11: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
    12: (FileStore::umount()+0x170) [0x8f11e0]
    13: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
    14: (main()+0xa1f) [0x65025f]
    15: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
    16: ceph-osd() [0x66b0d7]
    2015-10-16 04:03:29.684739 7fe932a05900 -1
    Caught signal (Aborted) *
    in thread 7fe932a05900
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: ceph-osd() [0xacb49a]
2: (()+0x10340) [0x7fe9316b9340]
3: (gsignal()+0x39) [0x7fe92fb58cc9]
4: (abort()+0x148) [0x7fe92fb5c0d8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7fe930463535]
6: (()+0x5e6d6) [0x7fe9304616d6]
7: (()+0x5e703) [0x7fe930461703]
8: (()+0x5e922) [0x7fe930461922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc3958]
10: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
11: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
12: (FileStore::umount()+0x170) [0x8f11e0]
13: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
14: (main()+0xa1f) [0x65025f]
15: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
16: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
0> 2015-10-16 04:03:29.684739 7fe932a05900 -1 ** Caught signal (Aborted) *
in thread 7fe932a05900
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: ceph-osd() [0xacb49a]
2: (()+0x10340) [0x7fe9316b9340]
3: (gsignal()+0x39) [0x7fe92fb58cc9]
4: (abort()+0x148) [0x7fe92fb5c0d8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7fe930463535]
6: (()+0x5e6d6) [0x7fe9304616d6]
7: (()+0x5e703) [0x7fe930461703]
8: (()+0x5e922) [0x7fe930461922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc3958]
10: (md_config_t::remove_observer(md_config_obs_t*)+0xd4) [0xbe0094]
11: (ThreadPool::stop(bool)+0x1ce) [0xbb115e]
12: (FileStore::umount()+0x170) [0x8f11e0]
13: (OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int)+0x6bd) [0x68c5ed]
14: (main()+0xa1f) [0x65025f]
15: (__libc_start_main()+0xf5) [0x7fe92fb43ec5]
16: ceph-osd() [0x66b0d7]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

Aborted (core dumped)


Related issues

Duplicated by Ceph - Bug #13989: OSD boot fails with os/FileJournal.cc: 1907: FAILED assert(0) Duplicate 12/05/2015

History

#1 Updated by Yuri Weinstein over 8 years ago

See similar in run:
http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-12-02_20:55:02-rados-hammer-distro-basic-openstack/
Job: 25952
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2015-12-02_20:55:02-rados-hammer-distro-basic-openstack/25952/teuthology.log

2015-12-02T22:41:11.566 INFO:tasks.ceph.osd.2.target070106.stderr:--00:00:09:20.959 5088-- You may be able to write your own handler.
2015-12-02T22:41:11.566 INFO:tasks.ceph.osd.2.target070106.stderr:--00:00:09:20.959 5088-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
2015-12-02T22:41:11.566 INFO:tasks.ceph.osd.2.target070106.stderr:--00:00:09:20.959 5088-- Nevertheless we consider this a bug.  Please report
2015-12-02T22:41:11.567 INFO:tasks.ceph.osd.2.target070106.stderr:--00:00:09:20.959 5088-- it at http://valgrind.org/support/bug_reports.html.
2015-12-02T22:41:12.433 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- WARNING: unhandled syscall: 306
2015-12-02T22:41:12.433 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- You may be able to write your own handler.
2015-12-02T22:41:12.434 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
2015-12-02T22:41:12.434 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- Nevertheless we consider this a bug.  Please report
2015-12-02T22:41:12.434 INFO:tasks.ceph.osd.0.target070106.stderr:--00:00:19:40.650 31714-- it at http://valgrind.org/support/bug_reports.html.
2015-12-02T22:41:12.976 INFO:tasks.ceph.osd.0.target070106.stderr:2015-12-02 22:41:12.851128 2d432700 -1 osd.0 1366 pgid 114.0s0 has ref count of 2
2015-12-02T22:41:13.018 INFO:tasks.ceph.osd.0.target070106.stderr:osd/OSD.cc: In function 'int OSD::shutdown()' thread 2d432700 time 2015-12-02 22:41:12.854339
2015-12-02T22:41:13.019 INFO:tasks.ceph.osd.0.target070106.stderr:osd/OSD.cc: 2401: FAILED assert(0)
2015-12-02T22:41:13.069 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.139 7450-- WARNING: unhandled syscall: 306
2015-12-02T22:41:13.069 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.143 7450-- You may be able to write your own handler.
2015-12-02T22:41:13.069 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.143 7450-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
2015-12-02T22:41:13.070 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.143 7450-- Nevertheless we consider this a bug.  Please report
2015-12-02T22:41:13.070 INFO:tasks.ceph.osd.1.target070106.stderr:--00:00:04:28.143 7450-- it at http://valgrind.org/support/bug_reports.html.
2015-12-02T22:41:13.076 INFO:tasks.ceph.osd.0.target070106.stderr: ceph version 0.94.5-163-g8c4145e (8c4145ecc4a68accdb2120889fd933e8f6630dba)
2015-12-02T22:41:13.076 INFO:tasks.ceph.osd.0.target070106.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc527b]
2015-12-02T22:41:13.077 INFO:tasks.ceph.osd.0.target070106.stderr: 2: (OSD::shutdown()+0x169c) [0x68966c]
2015-12-02T22:41:13.077 INFO:tasks.ceph.osd.0.target070106.stderr: 3: (OSD::handle_signal(int)+0x60) [0x689e90]
2015-12-02T22:41:13.077 INFO:tasks.ceph.osd.0.target070106.stderr: 4: (SignalHandler::entry()+0x117) [0xacd607]
2015-12-02T22:41:13.077 INFO:tasks.ceph.osd.0.target070106.stderr: 5: (()+0x8182) [0x5d72182]
2015-12-02T22:41:13.078 INFO:tasks.ceph.osd.0.target070106.stderr: 6: (clone()+0x6d) [0x784447d]
2015-12-02T22:41:13.078 INFO:tasks.ceph.osd.0.target070106.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-12-02T22:41:13.079 INFO:tasks.ceph.osd.0.target070106.stderr:2015-12-02 22:41:12.954352 2d432700 -1 osd/OSD.cc: In function 'int OSD::shutdown()' thread 2d432700 time 2015-12-02 22:41:12.854339

#2 Updated by Yuri Weinstein over 8 years ago

  • ceph-qa-suite rados added

#3 Updated by Samuel Just over 8 years ago

  • Related to Bug #13989: OSD boot fails with os/FileJournal.cc: 1907: FAILED assert(0) added

#4 Updated by Samuel Just over 8 years ago

  • Related to deleted (Bug #13989: OSD boot fails with os/FileJournal.cc: 1907: FAILED assert(0))

#5 Updated by Samuel Just over 8 years ago

  • Duplicated by Bug #13989: OSD boot fails with os/FileJournal.cc: 1907: FAILED assert(0) added

#6 Updated by Yuri Weinstein about 8 years ago

Seems similar in ceph version 0.94.5
Run: http://pulpito.ceph.com/teuthology-2016-01-16_09:00:08-rados-hammer-distro-basic-mira/
Job: http://qa-proxy.ceph.com/teuthology/teuthology-2016-01-16_09:00:08-rados-hammer-distro-basic-mira/31430/teuthology.log

2016-01-16T16:54:01.554 INFO:tasks.ceph.osd.2.mira117.stderr:     0> 2016-01-16 19:54:01.425482 7fa72d56d700 -1 os/FileJournal.cc: In function 'void FileJournal::do_write(ceph::bufferlist&)' thread 7fa72d56d700 time 2016-01-16 19:54:01.390011
2016-01-16T16:54:01.554 INFO:tasks.ceph.osd.2.mira117.stderr:os/FileJournal.cc: 1075: FAILED assert(0)
2016-01-16T16:54:01.554 INFO:tasks.ceph.osd.2.mira117.stderr:
2016-01-16T16:54:01.555 INFO:tasks.ceph.osd.2.mira117.stderr: ceph version 0.94.5-221-g4e67418 (4e67418958e5caf5e4f81c4ed566e8c7269930fa)
2016-01-16T16:54:01.555 INFO:tasks.ceph.osd.2.mira117.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xba856b]
2016-01-16T16:54:01.555 INFO:tasks.ceph.osd.2.mira117.stderr: 2: (FileJournal::do_write(ceph::buffer::list&)+0x931) [0xa62441]
2016-01-16T16:54:01.555 INFO:tasks.ceph.osd.2.mira117.stderr: 3: (FileJournal::write_thread_entry()+0x69b) [0xa663bb]
2016-01-16T16:54:01.556 INFO:tasks.ceph.osd.2.mira117.stderr: 4: (FileJournal::Writer::entry()+0xd) [0x91182d]
2016-01-16T16:54:01.556 INFO:tasks.ceph.osd.2.mira117.stderr: 5: (()+0x8182) [0x7fa738909182]
2016-01-16T16:54:01.556 INFO:tasks.ceph.osd.2.mira117.stderr: 6: (clone()+0x6d) [0x7fa736e7447d]
2016-01-16T16:54:01.556 INFO:tasks.ceph.osd.2.mira117.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#7 Updated by Yuri Weinstein about 8 years ago

Yuri Weinstein wrote:

Seems similar in ceph version 0.94.5
Run: http://pulpito.ceph.com/teuthology-2016-01-16_09:00:08-rados-hammer-distro-basic-mira/
Job: http://qa-proxy.ceph.com/teuthology/teuthology-2016-01-16_09:00:08-rados-hammer-distro-basic-mira/31430/teuthology.log

[...]

Those errors ^ are unrelated and are because of bad disks on nodes.

#8 Updated by Loïc Dachary about 8 years ago

  • Status changed from New to Rejected

The root cause of these failures is a bad disk.

Also available in: Atom PDF