Project

General

Profile

Actions

Bug #38272

closed

"no available blob id" assertion might occur

Added by Igor Fedotov about 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We observed that on-site but unfortunately OSD were removed and are unavailable for inspection.
However I managed to reproduce the issue using UT in store_test.
Not sure the write pattern in the UT is identical to the customer's one but it shows that the error is possible.

Customer ceph is v12.2.5
I reproduced that against master.


Files

no_span_blob.diff (3.7 KB) no_span_blob.diff diff to repro the issue Igor Fedotov, 02/12/2019 03:32 PM

Related issues 3 (0 open3 closed)

Copied to bluestore - Backport #40447: mimic: "no available blob id" assertion might occurRejectedActions
Copied to bluestore - Backport #40448: luminous: "no available blob id" assertion might occurRejectedActions
Copied to bluestore - Backport #40449: nautilus: "no available blob id" assertion might occurResolvedIgor FedotovActions
Actions #1

Updated by Igor Fedotov about 5 years ago

Stack trace from the customer log:
2019-02-06 00:04:25.934977 7ff3e3bca700 -1 /home/abuild/rpmbuild/BUILD/ceph-12.2.5-419-g8cbf63d997/src/os/bluestore/BlueStore.cc: In function 'bid_t BlueStore::ExtentMap::allocate_spanning_blob_id()' thread 7ff3e3bca700 time 2019-02-06 00:04:25.928793
/home/abuild/rpmbuild/BUILD/ceph-12.2.5-419-g8cbf63d997/src/os/bluestore/BlueStore.cc: 2117: FAILED assert(0 == "no available blob id")

ceph version 12.2.5-419-g8cbf63d997 (8cbf63d997fb5cdc783fe7bfcd4f5032ee140c0c) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x5556ee03228e]
2: (()+0x8f0a3e) [0x5556ede9ea3e]
3: (BlueStore::ExtentMap::reshard(KeyValueDB*, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x13ba) [0x5556edef49da]
4: (BlueStore::_txc_write_nodes(BlueStore::TransContext*, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x18f) [0x5556edef609f]
5: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3e5) [0x5556edf0f735]
6: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x55) [0x5556edc90425]
7: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x617) [0x5556eddaa4f7]
8: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2f7) [0x5556eddba807]
9: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x78) [0x5556edcc3df8]
10: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x56c) [0x5556edc31e0c]
11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3e6) [0x5556edabff86]
12: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x47) [0x5556edd31cf7]
13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfbe) [0x5556edaee3de]
14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x829) [0x5556ee037a39]
15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5556ee039980]
16: (()+0x8724) [0x7ff3fd11a724]
17: (clone()+0x6d) [0x7ff3fc1a2e8d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #2

Updated by Nathan Cutler about 5 years ago

  • Backport set to mimic, luminous
Actions #3

Updated by Igor Fedotov about 5 years ago

Backtrace from UT:

-1> 2019-02-12 18:23:48.346 7fca6fab1b40 -1 /home/if/ceph/src/os/bluestore/BlueStore.cc: In function 'bid_t BlueStore::ExtentMap::allocate_spanning_blob_i
d()' thread 7fca6fab1b40 time 2019-02-12 18:23:48.349154
/home/if/ceph/src/os/bluestore/BlueStore.cc: 2208: abort()

ceph version Development (no_version) nautilus (dev)
1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xda) [0x7fca71aac143]
2: (BlueStore::ExtentMap::allocate_spanning_blob_id()+0xe7) [0x5574151df7c7]
3: (BlueStore::ExtentMap::reshard(KeyValueDB*, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x1325) [0x5574151fbe65]
4: (BlueStore::_record_onode(boost::intrusive_ptr<BlueStore::Onode>&, std::shared_ptr<KeyValueDB::TransactionImpl>&)+0xca) [0x557415246d1a]
5: (BlueStore::_txc_write_nodes(BlueStore::TransContext*, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x7e) [0x5574152483fe]
6: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x471) [0x557415252891]
7: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x78) [0x55741510f5f8]
8: (int queue_transaction<boost::scoped_ptr<ObjectStore> >(boost::scoped_ptr<ObjectStore>&, boost::intrusive_ptr<ObjectStore::CollectionImpl>, ObjectStore::Transaction&&)+0x6a) [0x55741510f70a]
9: (StoreTestSpecificAUSize_ReproNoBlobMultiTest_Test::TestBody()+0x6fa) [0x55741506314a]
10: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x4a) [0x55741538924a]
11: (testing::Test::Run()+0xba) [0x55741538026a]
12: (testing::TestInfo::Run()+0x118) [0x5574153803b8]
13: (testing::TestCase::Run()+0xe5) [0x5574153804c5]
14: (testing::internal::UnitTestImpl::RunAllTests()+0x494) [0x557415380a04]
15: (testing::UnitTest::Run()+0x69) [0x557415380b29]
16: (main()+0x7e3) [0x557414f6b9e3]
17: (__libc_start_main()+0xeb) [0x7fca70aa7feb]
18: (_start()+0x2a) [0x55741505507a]
Actions #5

Updated by Igor Fedotov about 5 years ago

onode dump shortly before the assertion:
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_onode 0x5574177fc580 #-1:68309cac:::Object 1:head# nid 1 size 0x7fc0800 (1339
57632) expected_object_size 0 expected_write_size 0 in 2048 shards, 32768 spanning blobs
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x0(0x5e bytes) (loaded)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x8400(0x64 bytes) (loaded)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x10400(0x64 bytes) (loaded)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x18400(0x64 bytes) (loaded)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x20400(0x63 bytes) (loaded)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x28400(0x63 bytes) (loaded)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x30400(0x63 bytes) (loaded)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x38400(0x64 bytes) (loaded) (dirty)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x80400(0x60 bytes) (loaded) (dirty)
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) _dump_extent_map shard 0x88400(0x62 bytes) (loaded) (dirt
....

Actions #6

Updated by Sage Weil about 5 years ago

  • Status changed from New to In Progress
  • Assignee set to Igor Fedotov
Actions #7

Updated by Igor Fedotov about 5 years ago

  • Pull request ID set to 26882
Actions #8

Updated by Igor Fedotov about 5 years ago

  • Status changed from In Progress to Fix Under Review
Actions #9

Updated by Igor Fedotov almost 5 years ago

  • Status changed from Fix Under Review to In Progress
Actions #11

Updated by Josh Durgin almost 5 years ago

  • Status changed from In Progress to Fix Under Review
Actions #12

Updated by Sage Weil almost 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from mimic, luminous to nautilus, mimic, luminous

Not sure if this can/should be backported beyond nautilus...?

Actions #13

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #40447: mimic: "no available blob id" assertion might occur added
Actions #14

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #40448: luminous: "no available blob id" assertion might occur added
Actions #15

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #40449: nautilus: "no available blob id" assertion might occur added
Actions #16

Updated by Nathan Cutler almost 5 years ago

  • Pull request ID changed from 26882 to 28229
Actions #17

Updated by Igor Fedotov over 4 years ago

  • Status changed from Pending Backport to Resolved
  • Backport changed from nautilus, mimic, luminous to nautilus
Actions #18

Updated by Jiang Yu over 3 years ago

Hello everyone,
I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph 12.2.13.
Is there any way to remedy it?

Actions #19

Updated by Nathan Cutler over 3 years ago

Jiang Yu wrote:

I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph 12.2.13.

There are two reasons for that:

(1) this patch was not approved for backport to luminous
(2) luminous is EOL (End Of Life)

To get rid of this problem, upgrade to one of the active stable releases: https://docs.ceph.com/en/latest/releases/general/#active-stable-releases

Actions #20

Updated by Igor Fedotov over 3 years ago

Nathan Cutler wrote:

Jiang Yu wrote:

I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph 12.2.13.

There are two reasons for that:

(1) this patch was not approved for backport to luminous
(2) luminous is EOL (End Of Life)

To get rid of this problem, upgrade to one of the active stable releases: https://docs.ceph.com/en/latest/releases/general/#active-stable-releases

Indeed this has been backported back to Nautilus only.
Don't remember exact reason but I presume the fix is pretty complicated and the issue wasn't met that frequently.
Moreover @Jiang Yu please note that the patch doesn't fix existing issue but prevents (hopefully) new issues from the appearance. Hence one needs to redeploy broken OSD. Are you getting this failure with different OSDs from time to time or this is just a single shot?

Actions

Also available in: Atom PDF