Project

General

Profile

Actions

Bug #17278

closed

fail to start osd server with bluestore

Added by Star Guo over 7 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a server with 3 osd for testing bluestore as fielstore backend. Start osd.2 with error log:

4> 2016-09-15 23:27:19.272148 7f569acff700  5 - op tracker -- seq: 1924, time: 2016-09-15 23:27:19.272148, event: done, op: pg_info(1 pgs e310:12.3a)
3> 2016-09-15 23:27:19.287101 7f56a25d9700 5 - op tracker -- seq: 1925, time: 2016-09-15 23:27:19.287101, event: queued_for_pg, op: MOSDPGPull(12.57 310 [PullOp(12:eb0aab19:::rbd_object_map.3c17238e1f29:head, recovery_info: ObjectRecoveryInfo(12:eb0aab19:::rbd_object_map.3c17238e1f29:head@172'332, size: 18446744073709551615, copy_subset: [0~18446744073709551615], clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))])
2> 2016-09-15 23:27:19.287171 7f568cc6c700 5 - op tracker -- seq: 1925, time: 2016-09-15 23:27:19.287170, event: reached_pg, op: MOSDPGPull(12.57 310 [PullOp(12:eb0aab19:::rbd_object_map.3c17238e1f29:head, recovery_info: ObjectRecoveryInfo(12:eb0aab19:::rbd_object_map.3c17238e1f29:head@172'332, size: 18446744073709551615, copy_subset: [0~18446744073709551615], clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))])
-1> 2016-09-15 23:27:19.288206 7f568cc6c700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xadb2ea3d, expected 0xa54bc593, device location [0x100cdd000~1000], object #12:eb0aab19:::rbd_object_map.3c17238e1f29:head#
0> 2016-09-15 23:27:19.292696 7f568cc6c700 -1 /root/rpmbuild/BUILD/ceph-11.0.0-2316-g371eb41/src/os/bluestore/BlueStore.cc: In function 'virtual int BlueStore::read(ObjectStore::CollectionHandle&, const ghobject_t&, uint64_t, size_t, ceph::bufferlist&, uint32_t, bool)' thread 7f568cc6c700 time 2016-09-15 23:27:19.288253
/root/rpmbuild/BUILD/ceph-11.0.0-2316-g371eb41/src/os/bluestore/BlueStore.cc: 4570: FAILED assert(allow_eio || r != -5)
ceph version HEAD-HASH-NOTFOUND (GITDIR-NOTFOUND)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f56abb1f055]
2: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int, bool)+0x4f9) [0x7f56ab898fd9]
3: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*, bool)+0x26b) [0x7f56ab75ae6b]
4: (ReplicatedBackend::handle_pull(pg_shard_t, PullOp&, PushOp*)+0xd2) [0x7f56ab75c3a2]
5: (ReplicatedBackend::do_pull(std::shared_ptr<OpRequest>)+0x1cc) [0x7f56ab75e2dc]
6: (ReplicatedBackend::handle_message(std::shared_ptr<OpRequest>)+0x363) [0x7f56ab764c13]
7: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x10d) [0x7f56ab62ad2d]
8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7f56ab4d970d]
9: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> const&)+0x6d) [0x7f56ab4d995d]
10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x86c) [0x7f56ab4fb39c]
11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) [0x7f56abb24cc7]
12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f56abb26e20]
13: (()+0x7dc5) [0x7f56a7da6dc5]
14: (clone()+0x6d) [0x7f56a6c8dced]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 0 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.2.log
--
end dump of recent events ---
2016-09-15 23:27:19.316059 7f568cc6c700 -1 ** Caught signal (Aborted) *
in thread 7f568cc6c700 thread_name:tp_osd_tp

ceph version HEAD-HASH-NOTFOUND (GITDIR-NOTFOUND)
1: (()+0x89232a) [0x7f56ab99232a]
2: (()+0xf100) [0x7f56a7dae100]
3: (gsignal()+0x37) [0x7f56a6bcc5f7]
4: (abort()+0x148) [0x7f56a6bcdce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f56abb1f237]
6: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int, bool)+0x4f9) [0x7f56ab898fd9]
7: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*, bool)+0x26b) [0x7f56ab75ae6b]
8: (ReplicatedBackend::handle_pull(pg_shard_t, PullOp&, PushOp*)+0xd2) [0x7f56ab75c3a2]
9: (ReplicatedBackend::do_pull(std::shared_ptr<OpRequest>)+0x1cc) [0x7f56ab75e2dc]
10: (ReplicatedBackend::handle_message(std::shared_ptr<OpRequest>)+0x363) [0x7f56ab764c13]
11: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x10d) [0x7f56ab62ad2d]
12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7f56ab4d970d]
13: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> const&)+0x6d) [0x7f56ab4d995d]
14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x86c) [0x7f56ab4fb39c]
15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) [0x7f56abb24cc7]
16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f56abb26e20]
17: (()+0x7dc5) [0x7f56a7da6dc5]
18: (clone()+0x6d) [0x7f56a6c8dced]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2016-09-15 23:27:19.316059 7f568cc6c700 -1 ** Caught signal (Aborted) *
in thread 7f568cc6c700 thread_name:tp_osd_tp

ceph version HEAD-HASH-NOTFOUND (GITDIR-NOTFOUND)
1: (()+0x89232a) [0x7f56ab99232a]
2: (()+0xf100) [0x7f56a7dae100]
3: (gsignal()+0x37) [0x7f56a6bcc5f7]
4: (abort()+0x148) [0x7f56a6bcdce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f56abb1f237]
6: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int, bool)+0x4f9) [0x7f56ab898fd9]
7: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*, bool)+0x26b) [0x7f56ab75ae6b]
8: (ReplicatedBackend::handle_pull(pg_shard_t, PullOp&, PushOp*)+0xd2) [0x7f56ab75c3a2]
9: (ReplicatedBackend::do_pull(std::shared_ptr<OpRequest>)+0x1cc) [0x7f56ab75e2dc]
10: (ReplicatedBackend::handle_message(std::shared_ptr<OpRequest>)+0x363) [0x7f56ab764c13]
11: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x10d) [0x7f56ab62ad2d]
12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7f56ab4d970d]
13: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> const&)+0x6d) [0x7f56ab4d995d]
14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x86c) [0x7f56ab4fb39c]
15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) [0x7f56abb24cc7]
16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f56abb26e20]
17: (()+0x7dc5) [0x7f56a7da6dc5]
18: (clone()+0x6d) [0x7f56a6c8dced]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 0 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.2.log
--
end dump of recent events ---

Actions #1

Updated by Abhishek Lekshmanan over 7 years ago

  • Tracker changed from Tasks to Bug
  • Project changed from Stable releases to Ceph
Actions #2

Updated by Sage Weil about 7 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF