Project

General

Profile

Actions

Bug #37360

closed

bluefs-bdev-expand aborts

Added by Марк Коренберг over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
mimic, luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

root@node1:~# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-16
infering bluefs devices from bluestore path
slot 1 /var/lib/ceph/osd/ceph-16/block
start:
1 : size 0x14ca000000 : own 0x[100000~c00000,e00000~100000,2500000~1500000,3b00000~100000,3d00000~100000,3f00000~5400000,9400000~900000,9e00000~e00000,ad00000~b00000,b900000~100000,bb00000~100000,bd00000~100000,bf00000~100000,c100000~100000,c300000~600000,ca00000~700000,d200000~300000,d600000~100000,db00000~a00000,e600000~100000,ec00000~600000,f300000~600000,fe00000~400000,10300000~100000,10500000~700000,10d00000~100000,10f00000~800000,11800000~100000,11d00000~700000,12500000~100000,12700000~100000,12900000~100000,12b00000~600000,13200000~300000,13600000~d00000,14700000~500000,14d00000~700000,15500000~200000,15800000~100000,15a00000~600000,16400000~400000,16c00000~400000,17100000~200000,17700000~900000,18100000~800000,18a00000~100000,18c00000~100000,18e00000~100000,19000000~100000,19500000~400000,19a00000~100000,19c00000~800000,1a500000~200000,1ab00000~a00000,1b600000~200000,1bc00000~600000,1c300000~700000,1cb00000~800000,1d400000~300000,1db00000~400000,1e000000~700000,1e800000~700000,1f000000~100000,1f600000~600000,1fd00000~700000,20500000~200000,20800000~500000,20e00000~300000,21200000~700000,21a00000~700000,22200000~100000,22700000~7800000,2a000000~700000,2a800000~200000,2ae00000~500000,2b400000~700000,2bc00000~700000,2c400000~100000,2c600000~600000,2cd00000~100000,2cf00000~200000,2d200000~100000,2d400000~600000,2db00000~200000,2e100000~900000,2eb00000~200000,2ee00000~900000,2f800000~700000,30000000~100000,30200000~200000,30900000~a00000,31700000~400000,32000000~500000,32600000~700000,32e00000~200000,33100000~700000,33900000~100000,33b00000~100000,33d00000~700000,34500000~100000,34a00000~500000,35300000~900000,35d00000~100000,35f00000~200000,36200000~100000,36400000~600000,36b00000~100000,36d00000~700000,37500000~300000,37900000~100000,37c00000~100000,38100000~900000,38e00000~500000,39b00000~800000,3a400000~200000,3a700000~100000,3a900000~400000,3ae00000~100000,3b000000~100000,3b200000~100000,3b400000~100000,3b600000~700000,3be00000~800000,3ca00000~400000,3cf00000~400000,3d800000~b00000,3e400000~a00000,3ef00000~200000,3f200000~800000,3fb00000~100000,3fd00000~200000,40000000~500000,40900000~600000,41400000~b00000,42000000~100000,42200000~100000,42700000~500000,42d00000~100000,42f00000~600000,43600000~100000,43800000~700000,44000000~300000,44400000~600000,44b00000~100000,44d00000~400000,45500000~b00000,46100000~100000,46a00000~900000,47400000~700000,47c00000~700000,48800000~800000,49100000~600000,49800000~200000,49e00000~16900000,69200000~100000,69400000~100000,69700000~300000,69b00000~100000,120000000~40000000]
/build/ceph-12.2.8/src/include/interval_set.h: In function 'T interval_set<T, Map>::range_end() const [with T = long unsigned int; Map = std::map<long unsigned int, long unsigned int, std::less<long unsigned int>, std::allocator<std::pair<const long unsigned int, long unsigned int> > >]' thread 7fe8fb161f80 time 2018-11-22 02:21:21.298051
/build/ceph-12.2.8/src/include/interval_set.h: 419: FAILED assert(!empty())
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fe8f19a7ac2]
2: (main()+0x24aa) [0x558fc13e251a]
3: (__libc_start_main()+0xf1) [0x7fe8eebf62e1]
4: (_start()+0x2a) [0x558fc14627aa]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

(continuation in attachment)


Files

ceph2.txt (7.01 KB) ceph2.txt Марк Коренберг, 11/21/2018 09:35 PM

Related issues 2 (0 open2 closed)

Copied to bluestore - Backport #37494: mimic: bluefs-bdev-expand abortsResolvedIgor FedotovActions
Copied to bluestore - Backport #37495: luminous: bluefs-bdev-expand abortsResolvedIgor FedotovActions
Actions #1

Updated by Igor Fedotov over 5 years ago

  • Project changed from Ceph to bluestore
Actions #2

Updated by Igor Fedotov over 5 years ago

Wondering if bluefs-bdev-sizes command works fine? What's about fsck?

Actions #3

Updated by Марк Коренберг over 5 years ago

root@node1:~# ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-16
infering bluefs devices from bluestore path
 slot 1 /var/lib/ceph/osd/ceph-16/block
1 : size 0x14ca000000 : own 0x[100000~c00000,e00000~100000,2500000~1500000,3b00000~100000,3d00000~100000,3f00000~5400000,9400000~900000,9e00000~e00000,ad00000~b00000,b900000~100000,bb00000~100000,bd00000~100000,bf00000~100000,c100000~100000,c300000~600000,ca00000~700000,d200000~300000,d600000~100000,db00000~a00000,e600000~100000,ec00000~600000,f300000~600000,fe00000~400000,10300000~100000,10500000~700000,10d00000~100000,10f00000~800000,11800000~100000,11d00000~700000,12500000~100000,12700000~100000,12900000~100000,12b00000~600000,13200000~300000,13600000~d00000,14700000~500000,14d00000~700000,15500000~200000,15800000~100000,15a00000~600000,16400000~400000,16c00000~400000,17100000~200000,17700000~900000,18100000~800000,18a00000~100000,18c00000~100000,18e00000~100000,19000000~100000,19500000~400000,19a00000~100000,19c00000~800000,1a500000~200000,1ab00000~a00000,1b600000~200000,1bc00000~600000,1c300000~700000,1cb00000~800000,1d400000~300000,1db00000~400000,1e000000~700000,1e800000~700000,1f000000~100000,1f600000~600000,1fd00000~700000,20500000~200000,20800000~500000,20e00000~300000,21200000~700000,21a00000~700000,22200000~100000,22700000~7800000,2a000000~700000,2a800000~200000,2ae00000~500000,2b400000~700000,2bc00000~700000,2c400000~100000,2c600000~600000,2cd00000~100000,2cf00000~200000,2d200000~100000,2d400000~600000,2db00000~200000,2e100000~900000,2eb00000~200000,2ee00000~900000,2f800000~700000,30000000~100000,30200000~200000,30900000~a00000,31700000~400000,32000000~500000,32600000~700000,32e00000~200000,33100000~700000,33900000~100000,33b00000~100000,33d00000~700000,34500000~100000,34a00000~500000,35300000~900000,35d00000~100000,35f00000~200000,36200000~100000,36400000~600000,36b00000~100000,36d00000~700000,37500000~300000,37900000~100000,37c00000~100000,38100000~900000,38e00000~500000,39b00000~800000,3a400000~200000,3a700000~100000,3a900000~400000,3ae00000~100000,3b000000~100000,3b200000~100000,3b400000~100000,3b600000~700000,3be00000~800000,3ca00000~400000,3cf00000~400000,3d800000~b00000,3e400000~a00000,3ef00000~200000,3f200000~800000,3fb00000~100000,3fd00000~200000,40000000~500000,40900000~600000,41400000~b00000,42000000~100000,42200000~100000,42700000~500000,42d00000~100000,42f00000~600000,43600000~100000,43800000~700000,44000000~300000,44400000~600000,44b00000~100000,44d00000~400000,45500000~b00000,46100000~100000,46a00000~900000,47400000~700000,47c00000~700000,48800000~800000,49100000~600000,49800000~200000,49e00000~16900000,69200000~100000,69400000~100000,69700000~300000,69b00000~100000,120000000~40000000]
root@node1:~# 
Actions #4

Updated by Марк Коренберг over 5 years ago

root@node1:~# ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-16
fsck success
root@node1:~# 
Actions #5

Updated by Марк Коренберг over 5 years ago

Problem is still triggered every time.

Actions #6

Updated by Igor Fedotov over 5 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Igor Fedotov over 5 years ago

Actually there are 2 aspects for this ticket:
1) the tool improperly handles OSD deployments that lack DB and/or WAL volumes. This is a bug and should be fixed in all supported releases. Will do that shortly.
2) In this ticket Mark is trying to expand main device which isn't supported. Standalone BlueFS volumes are supposed to benefit from the "expand" feature for now only. I'm going to check how feasible main device expansion feature is and implement it if so. But I'm not sure if we plan to backport it to earlier releases.

Mark, meanwhile, may I have some clarification on your intentions to expand this volume. Do you want larger main device for both user data and metadata? Or you want to expand DB part only (not sure why this might be needed though)?

Actions #8

Updated by Марк Коренберг over 5 years ago

I decided to enlarge OSD backing store device to be able to store more data on this OSD without re-creating it.

Sequence of my actions:

1. Created LVM of size 10G.
2. Deployed OSD using ceph-deploy
3. Used it for some tests, felt with data for about 10%.
4. Called lvresize while OSD process was running to enlarge it.
5. Tried to call ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-16 WHILE OSD PROCESS WAS RUNNING
6. Restarted OSD process
7. ceph -s showed me usage of this OSD as 87% and generated NEAR_FULL warning
8. ceph osd out osd.16 and waited.
9. Problem still happens.

Actions #9

Updated by Igor Fedotov over 5 years ago

Got it. Thanks, Mark!

So as I said before main device resize isn't supported at the moment.
Will probably start adding the support for offline resizing for such volume in Nautilus+ releases.

Actions #11

Updated by Igor Fedotov over 5 years ago

  • Status changed from In Progress to Fix Under Review
  • Affected Versions v14.0.0 added
  • Affected Versions deleted (v12.2.8)
Actions #12

Updated by Igor Fedotov over 5 years ago

mimic fix (which is completely different from Nautilus one as we don't backport main device expansion feature): https://github.com/ceph/ceph/pull/25348

Actions #13

Updated by Igor Fedotov over 5 years ago

  • Affected Versions v13.2.3 added
  • Affected Versions deleted (v14.0.0)
Actions #14

Updated by Igor Fedotov over 5 years ago

  • Affected Versions v12.2.8 added
  • Affected Versions deleted (v13.2.3)
Actions #15

Updated by Nathan Cutler over 5 years ago

  • Backport set to mimic, luminous
Actions #16

Updated by Nathan Cutler over 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #17

Updated by Nathan Cutler over 5 years ago

Actions #18

Updated by Nathan Cutler over 5 years ago

Actions #19

Updated by Igor Fedotov about 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF