Project

General

Profile

Actions

Bug #22510

closed

osd: BlueStore.cc: BlueStore::_balance_bluefs_freespace: assert(0 == "allocate failed, wtf");

Added by Aleksei Gutikov over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

luminous 12.2.2 osd crash during deep scrubbing after 7 days of workload.

Same as http://tracker.ceph.com/issues/18698
but in different source file.

Preconditions:
36 ssds, 200G each contain 400G of rbds and 500G of small files each 5kb.
Almost all ssds used above 50%.

After 7 days of testing started deep scrubbing.
As I understand that leads to bluestore rebalancing.
And then crash occurs:

bluestore(/var/lib/ceph/osd/ceph-45) _balance_bluefs_freespace allocate failed on 0x76c00000 min_alloc_size 0x1000
...
... very very long stupidalloc dump 
...
 2: (()+0x11390) [0x7fb052f47390]
 3: (gsignal()+0x38) [0x7fb051ee2428]
 4: (abort()+0x16a) [0x7fb051ee402a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5641f18d9a2e]
 6: (BlueStore::_balance_bluefs_freespace(std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t> >*)+0x1b21) [0x5641f176b5c1]
 7: (BlueStore::_kv_sync_thread()+0x1ac0) [0x5641f176e040]
 8: (BlueStore::KVSyncThread::entry()+0xd) [0x5641f17b1f8d]
 9: (()+0x76ba) [0x7fb052f3d6ba]
10: (clone()+0x6d) [0x7fb051fb43dd]

As I understand _balance_bluefs_freespace try to allocate space (~2G) for compacted metadata, but due to fragmentation allocation failed,
and _balance_bluefs_freespac can't handle this.

Will check if it can be eliminated by setting bluestore_max_alloc_size...


Files

18698.osd.log (394 KB) 18698.osd.log Aleksei Gutikov, 12/20/2017 01:23 PM

Related issues 1 (0 open1 closed)

Copied to bluestore - Backport #23063: luminous: osd: BlueStore.cc: BlueStore::_balance_bluefs_freespace: assert(0 == "allocate failed, wtf");ResolvedIgor FedotovActions
Actions #1

Updated by Sage Weil over 6 years ago

  • Project changed from RADOS to bluestore
  • Status changed from New to Triaged
  • Priority changed from Normal to Urgent
Actions #2

Updated by Aleksei Gutikov over 6 years ago

Probably bluestore_max_alloc_size will not help because it is not used in code of BlueStore.

Actions #3

Updated by Igor Fedotov over 6 years ago

Probably that's the same as https://github.com/ceph/ceph/pull/18494

Actions #4

Updated by Aleksei Gutikov over 6 years ago

Seems that is same as described here for bitmap allocator:
https://www.spinics.net/lists/ceph-devel/msg32462.html

After set bluestore_min_alloc_size = bluefs_alloc_size = 1M
crashes were not reproducible.

We used bluestore_min_alloc_size=4k, but even with default 16k or 64k
seems crashes will be reproduced.

Actions #5

Updated by Sage Weil about 6 years ago

  • Status changed from Triaged to 12
Actions #6

Updated by Sage Weil about 6 years ago

  • Status changed from 12 to Pending Backport
  • Backport set to luminous

https://github.com/ceph/ceph/pull/18494 is the fix in master; should be backported to luminous

Actions #7

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23063: luminous: osd: BlueStore.cc: BlueStore::_balance_bluefs_freespace: assert(0 == "allocate failed, wtf"); added
Actions #8

Updated by Nathan Cutler about 6 years ago

  • Assignee set to Igor Fedotov
Actions #9

Updated by Josh Durgin about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions #10

Updated by Nathan Cutler about 6 years ago

  • Status changed from Resolved to Pending Backport

The luminous PR is still open.

Actions #11

Updated by Igor Fedotov about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF