Bug #20997
closedbluestore_types.h: 739: FAILED assert(p != extents.end())
0%
Description
2017-08-14T06:18:20.933 INFO:tasks.ceph.osd.5.smithi132.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.3-13-g79aed5d/rpm/el7/BUILD/ceph-12.1.3-13-g79aed5d/src/os/bluestore/bluestore_types.h: In function 'int bluestore_blob_t::map(uint64_t, uint64_t, std::function<int(long unsigned int, long unsigned int)>) const' thread 7f18fc050700 time 2017-08-14 06:18:18.653797 2017-08-14T06:18:21.081 INFO:tasks.ceph.osd.5.smithi132.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.3-13-g79aed5d/rpm/el7/BUILD/ceph-12.1.3-13-g79aed5d/src/os/bluestore/bluestore_types.h: 739: FAILED assert(p != extents.end()) 2017-08-14T06:18:21.121 INFO:tasks.ceph.osd.5.smithi132.stderr: ceph version 12.1.3-13-g79aed5d (79aed5d4f8e854a8caddd074ff2cbdc73dc2b88f) luminous (rc) 2017-08-14T06:18:21.159 INFO:tasks.ceph.osd.5.smithi132.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f191c334eb0] 2017-08-14T06:18:21.207 INFO:tasks.ceph.osd.5.smithi132.stderr: 2: (bluestore_blob_t::map(unsigned long, unsigned long, std::function<int (unsigned long, unsigned long)>) const+0xfe) [0x7f191c21a8be] 2017-08-14T06:18:21.211 INFO:tasks.ceph.osd.5.smithi132.stderr: 3: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0x1161) [0x7f191c1f9941] 2017-08-14T06:18:21.292 INFO:tasks.ceph.osd.5.smithi132.stderr: 4: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x94) [0x7f191c1fa214] 2017-08-14T06:18:21.296 INFO:tasks.ceph.osd.5.smithi132.stderr: 5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x15af) [0x7f191c20cb7f] 2017-08-14T06:18:21.299 INFO:tasks.ceph.osd.5.smithi132.stderr: 6: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x7f191c20daa0] 2017-08-14T06:18:21.302 INFO:tasks.ceph.osd.5.smithi132.stderr: 7: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f191bf7f9b5] 2017-08-14T06:18:21.306 INFO:tasks.ceph.osd.5.smithi132.stderr: 8: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x631) [0x7f191c09f6e1] 2017-08-14T06:18:21.309 INFO:tasks.ceph.osd.5.smithi132.stderr: 9: (ECBackend::try_reads_to_commit()+0x1af3) [0x7f191c0ad983] 2017-08-14T06:18:21.312 INFO:tasks.ceph.osd.5.smithi132.stderr: 10: (ECBackend::check_ops()+0x1c) [0x7f191c0adebc] 2017-08-14T06:18:21.315 INFO:tasks.ceph.osd.5.smithi132.stderr: 11: (ECBackend::handle_sub_write_reply(pg_shard_t, ECSubWriteReply const&, ZTracer::Trace const&)+0x2ae) [0x7f191c0ae17e] 2017-08-14T06:18:21.321 INFO:tasks.ceph.osd.5.smithi132.stderr: 12: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2df) [0x7f191c0b023f] 2017-08-14T06:18:21.324 INFO:tasks.ceph.osd.5.smithi132.stderr: 13: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f191bfb4bf0] 2017-08-14T06:18:21.333 INFO:tasks.ceph.osd.5.smithi132.stderr: 14: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x58e) [0x7f191bf20d2e] 2017-08-14T06:18:21.432 INFO:tasks.ceph.osd.5.smithi132.stderr: 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x7f191bdb3e29] 2017-08-14T06:18:21.437 INFO:tasks.ceph.osd.5.smithi132.stderr: 16: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f191c023157] 2017-08-14T06:18:21.440 INFO:tasks.ceph.osd.5.smithi132.stderr: 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7f191bddf44e] 2017-08-14T06:18:21.444 INFO:tasks.ceph.osd.5.smithi132.stderr: 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f191c33a9c9] 2017-08-14T06:18:21.448 INFO:tasks.ceph.osd.5.smithi132.stderr: 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f191c33c960] 2017-08-14T06:18:21.451 INFO:tasks.ceph.osd.5.smithi132.stderr: 20: (()+0x7dc5) [0x7f191911bdc5] 2017-08-14T06:18:21.457 INFO:tasks.ceph.osd.5.smithi132.stderr: 21: (clone()+0x6d) [0x7f191820f73d] 2017-08-14T06:18:21.460 INFO:tasks.ceph.osd.5.smithi132.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Updated by Sage Weil over 6 years ago
- Status changed from New to Need More Info
Jason, is this still happening?
Logs have been archived so I have no more information here besides the backtrace.
Updated by Jason Dillaman over 6 years ago
@Sage Weil: I don't remember seeing this again this past month. Feel free to close and if it does pop back up I can just re-open it.
Updated by Marek Grzybowski over 6 years ago
Hi
The same problem hit my setup: single node, cephfs on ec-pool(zlib) on bluestore on dmcrypt, pool was created on kraken,
then it was upgraded to 12.0 , then to 12.2 .
First i got this error on osd.3 and could not bring this demon up again, so i removed osd.3 and reformated bluestore, ec was degraded, then the same way osd.1 failed , and now i can not bring osd.1 up :
2017-10-03 20:20:14.063415 7f80177ba700 -1 /build/ceph-12.2.1/src/os/bluestore/bluestore_types.h: In function 'int bluestore_blob_t::map(uint64_t, uint64_t, std::function<i nt(long unsigned int, long unsigned int)>) const' thread 7f80177ba700 time 2017-10-03 20:20:13.781372 /build/ceph-12.2.1/src/os/bluestore/bluestore_types.h: 739: FAILED assert(p != extents.end()) ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55d2baabb3f2] 2: (bluestore_blob_t::map(unsigned long, unsigned long, std::function<int (unsigned long, unsigned long)>) const+0x157) [0x55d2ba985077] 3: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xad4) [0x55d2ba95e324] 4: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x55d2ba95f1db] 5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1bbc) [0x55d2ba97597c] 6: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedO p>, ThreadPool::TPHandle*)+0x52e) [0x55d2ba976a7e] 7: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x66) [0x55d2ba69f e26] 8: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x889) [0x55d2ba7e1609] 9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x331) [0x55d2ba7fa6e1] 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55d2ba6ddaa0] 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x55d) [0x55d2ba64262d] 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x55d2ba4c51a9] 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55d2ba75eae7] 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x130e) [0x55d2ba4ec7de] 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x55d2baac01e4] 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55d2baac3220] 17: (()+0x76ba) [0x7f802fb3e6ba] 18: (clone()+0x6d) [0x7f802ebb53dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I tried luminous branch from github, osd deamon is failing the same way:
ceph version 12.2.1-73-g66af04c (66af04cd9b61d482362361cb3b12af2d8ffa462c) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x559a8806f2e2] 2: (bluestore_blob_t::map(unsigned long, unsigned long, std::function<int (unsigned long, unsigned long)>) const+0x157) [0x559a87f378a7] 3: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xad4) [0x559a87f11964] 4: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x559a87f1281b] 5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1bbc) [0x559a87f28f8c] 6: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedO p>, ThreadPool::TPHandle*)+0x52e) [0x559a87f2a08e] 7: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle *)+0x17c) [0x559a87ad6e7c] 8: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x6c) [0x559a87a5c3bc] 9: (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x451) [0x559a87a88bb1] 10: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x27) [0x559a87af5537] 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0xdb9) [0x559a880761e9] 12: (ThreadPool::WorkThread::entry()+0x10) [0x559a880772f0] 13: (()+0x76ba) [0x7f093c4966ba] 14: (clone()+0x6d) [0x7f093b50d3dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
fsck do not found any problems:
# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-1/ --deep no --command fsck action fsck fsck success
Any hints what i could check next ? Thx in advance.
Updated by Marek Grzybowski over 6 years ago
Update: After investigation i narrow problem to osd.1 pg "22.5s3" and object "22.5s3_head,3#22:addebba8:::10000014f8e.00000145:head#32ce3"
( please note 32ce3 at the end ) . It looks like this under fuse mount in /mnt/target :
... /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head# /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head#/bitwise_hash /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head#/data /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head#/omap /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head#/attr /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head#/attr/_ /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head#/attr/hinfo_key /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head#/attr/snapset /mnt/target/22.5s3_head/all/3#22:addeb487:::1000000a84f.00000e50:head#/omap_header /mnt/target/22.5s3_head/all/3#22:addebba8:::10000014f8e.00000145:head#32ce3 /mnt/target/22.5s3_head/all/3#22:addebba8:::10000014f8e.00000145:head#32ce3/bitwise_hash /mnt/target/22.5s3_head/all/3#22:addebba8:::10000014f8e.00000145:head#32ce3/data /mnt/target/22.5s3_head/all/3#22:addebba8:::10000014f8e.00000145:head#32ce3/omap /mnt/target/22.5s3_head/all/3#22:addebba8:::10000014f8e.00000145:head#32ce3/attr /mnt/target/22.5s3_head/all/3#22:addebba8:::10000014f8e.00000145:head#32ce3/omap_header /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head# /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head#/bitwise_hash /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head#/data /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head#/omap /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head#/attr /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head#/attr/_ /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head#/attr/hinfo_key /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head#/attr/snapset /mnt/target/22.5s3_head/all/3#22:addebc3d:::1000000af30.000004c5:head#/omap_header /mnt/target/22.5s3_head/all/3#22:addec6a1:::10000008ca3.000002d1:head# ...
I tried to remove this "#32ce3" object:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --pgid 22.5s3 '["22.5s3",{"oid":"10000014f8e.00000145","key":"","snapid":-2,"hash":366836661,"max":0,"pool":22,"namespace":"","shard_id":3,"max":0}]' remove <pre> Error getting attr on : 22.5s3_head,3#22:addebba8:::10000014f8e.00000145:head#32ce3, (61) No data available </pre> pg export: <pre> ceph-objectstore-tool --force --data-path /var/lib/ceph/osd/ceph-1/ --pgid 22.5s3 --op export --file /tank1/ceph-recovery/recover.22.5s3v2 2>&1 | tee recover.22.5s3v2.log </pre> Fails: <pre> Read 3#22:addea9c4:::1000000d42b.00000073:head# Read 3#22:addeb3f7:::10000009d3b.00000440:head# Read 3#22:addeb487:::1000000a84f.00000e50:head# Read 3#22:addebba8:::10000014f8e.00000145:head#32ce3 getattr failure object_info -61 export_files error -61 </pre> fsck: <pre> action fsck 2017-10-06 00:25:06.521227 7efd6b504e00 1 bluestore(/var/lib/ceph/osd/ceph-1/) fsck (deep) start ... 2017-10-06 18:10:56.034601 7efd6b504e00 1 bdev(0x564a11d186c0 /var/lib/ceph/osd/ceph-1//block) close 2017-10-06 18:10:56.161994 7efd6b504e00 1 bdev(0x564a11d18240 /var/lib/ceph/osd/ceph-1//block) close 2017-10-06 18:10:56.341885 7efd6b504e00 1 bluestore(/var/lib/ceph/osd/ceph-1/) fsck finish with 0 errors in 63949.820554 seconds </pre> ceph-objectstore-tool fsck also passes witchout a warning. I could not find any method to remove "#32ce3" object from bluestore. Please advise . </prebut ended up removing legit object ("22:addebba8:::10000014f8e.00000145:head", and was left with #32ce3.
Operations like "ceph-objectstore-tool list" shows warning:
Updated by Marek Grzybowski over 6 years ago
Sorry for the above post, i did not sleep much ;).
Operations like "ceph-objectstore-tool list" shows warning:
Error getting attr on : 22.5s3_head,3#22:addebba8:::10000014f8e.00000145:head#/32ce3, (61) No data available
i could not find any way to access/delete this 32ce3 object. Please advise.
Updated by Sage Weil over 6 years ago
Hi Marek,
Sorry for the slow response--just catching up around here. Is the OSD still in this state? Can you capture the log output from the deep fsck and upload that?
Updated by Marek Grzybowski over 6 years ago
Hi Sage
Thank You very much for the reply.
I did fsck deep by ceph-bluestore-tool like this:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --pgid 22.5s3 --op fsck 2>&1 > 22.5s3.fsck.txt
i also tried fsck by "ceph-objectstore-tool" . All of the fscks i tried showed no errors. Please check attached file for one of logs that remain from that time.
Currently osd is modified. I get rid of this "22.5s3_head,3#22:addebba8:::10000014f8e.00000145:head#/32ce3"
object by simple/dirty hack of https://github.com/ceph/ceph/blob/8ad1a5b642f1c5db3b2d103dc9e64e3c8ad70a27/src/tools/ceph_objectstore_tool.cc#L522
to "return 0" and ignore error during export .
In summary i was able to export pgid 22.5s3 , delete it , and import again without this problematic object .
But now my osd is failing on log reply, and next thing on ma task list is get rid of that error.
Situation that i'm stuck in currently:
2017-10-21 13:51:48.514312 7fc120ab1700 0 log_channel(cluster) log [INF] : 22.fs0 continuing backfill to osd.2(1) from (1911'206676,3406'208178] 22:f00023a4:::1000000990 e.000000ba:head to 3406'208178 2017-10-21 13:51:48.704440 7fc120ab1700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction error (2) No such file or directory not handled on operation 30 (op 3, counting from 0) 2017-10-21 13:51:48.704579 7fc120ab1700 -1 bluestore(/var/lib/ceph/osd/ceph-1) ENOENT on clone suggests osd bug 2017-10-21 13:51:48.704582 7fc120ab1700 0 bluestore(/var/lib/ceph/osd/ceph-1) transaction dump: { "ops": [ { "op_num": 0, "op_name": "remove", "collection": "22.5s3_head", "oid": "3#22:a2610227:::10000014f95.00000153:head#" }, { "op_num": 1, "op_name": "remove", "collection": "22.5s3_head", "oid": "3#22:a96a1eec:::10000014f8e.000003d0:head#" }, { "op_num": 2, "op_name": "remove", "collection": "22.5s3_head", "oid": "3#22:ab489177:::10000014f8e.000003cd:head#" }, { "op_num": 3, "op_name": "clonerange2", "collection": "22.5s3_head", "src_oid": "3#22:addebba8:::10000014f8e.00000145:head#32ce3", "dst_oid": "3#22:addebba8:::10000014f8e.00000145:head#", "src_offset": 0, "len": 700416, "dst_offset": 0 }, { "op_num": 4, "op_name": "remove", "collection": "22.5s3_head", "oid": "3#22:addebba8:::10000014f8e.00000145:head#32ce3" }, { "op_num": 5, "op_name": "setattrs", "collection": "22.5s3_head", "oid": "3#22:addebba8:::10000014f8e.00000145:head#", "attr_lens": { "_": 275, "hinfo_key": 18, "snapset": 35 } }, "op_num": 6, "op_name": "remove", "collection": "22.5s3_head", "oid": "3#22:af870550:::10000014f95.0000014f:head#" }, { "op_num": 7, "op_name": "nop" }, { "op_num": 8, "op_name": "op_omap_rmkeyrange", "collection": "22.5s3_head", "oid": "3#22:a0000000::::head#", "first": "0000001921.00000000000000208099", "last": "4294967295.18446744073709551615" }, { "op_num": 9, "op_name": "omap_setkeys", "collection": "22.5s3_head", "oid": "3#22:a0000000::::head#", "attr_lens": { "_biginfo": 365, "_epoch": 4, "_info": 895, "can_rollback_to": 12, "rollback_info_trimmed_to": 12 } } ] }
Updated by Sage Weil about 6 years ago
- Status changed from Need More Info to Can't reproduce
Marek, if you still see this (or a related issue) please ping me. I dropped the ball on this bug but we're newly focused on more carefully tracking these issues now! THanks-