Project

General

Profile

Actions

Bug #39156

closed

zap should force writes to disk - dd does not by default

Added by Guillaume Abrioux about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus, mimic
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Error seen in ceph-ansible when deploying nautilus 14.2.0.

Looks like a race condition because it's not 100% reproducible
(eg: https://2.jenkins.ceph.com/job/ceph-ansible-prs-nautilus-ubuntu-non_container-purge/12/consoleFull#-1886347834d45e74d1-4e82-45b8-a53b-321a132caf6b)

stderr_lines:
- 'Traceback (most recent call last):'
- ' File "/usr/sbin/ceph-volume", line 11, in <module>'
- ' load_entry_point(''ceph-volume==1.0.0'', ''console_scripts'', ''ceph-volume'')()'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/main.py", line 38, in init'
- ' self.main(self.argv)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 59, in newfunc'
- ' return f(a, kw)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/main.py", line 148, in main'
- ' terminal.dispatch(self.mapper, subcommand_args)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/terminal.py", line 182, in dispatch'
- ' instance.main()'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/main.py", line 40, in main'
- ' terminal.dispatch(self.mapper, self.argv)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/terminal.py", line 182, in dispatch'
- ' instance.main()'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/create.py", line 69, in main'
- ' self.create(args)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 16, in is_root'
- ' return func(*a, **kw)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/create.py", line 26, in create'
- ' prepare_step.safe_prepare(args)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 219, in safe_prepare'
- ' self.prepare()'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 16, in is_root'
- ' return func(*a, **kw)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 320, in prepare'
- ' osd_fsid,'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 119, in prepare_bluestore'
- ' db=db'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/util/prepare.py", line 430, in osd_mkfs_bluestore'
- ' raise RuntimeError(''Command failed with exit code %s: %s'' % (returncode, '' ''.join(command)))'
- 'RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --bluestore-block-db-path /dev/journals/journal1 --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 4069624b-c8ef-4bab-bf60-41b0424fc98f --setuser ceph --setgroup ceph'
stdout: |

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring i - osd new 4069624b-c8ef-4bab-bf60-41b0424fc98f
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
-
> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: /bin/chown h ceph:ceph /dev/test_group/data-lv2
Running command: /bin/chown -R ceph:ceph /dev/dm-1
Running command: /bin/ln -s /dev/test_group/data-lv2 /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-2/activate.monmap
stderr: got monmap epoch 1
Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-2/keyring --create-keyring --name osd.2 --add-key AQDDWqxcABJpIxAAUBrEQARzV6DvWSyh9MbJaw==
stdout: creating /var/lib/ceph/osd/ceph-2/keyring
stdout: added entity osd.2 auth(key=AQDDWqxcABJpIxAAUBrEQARzV6DvWSyh9MbJaw==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/
Running command: /bin/chown -h ceph:ceph /dev/journals/journal1
Running command: /bin/chown -R ceph:ceph /dev/dm-2
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --bluestore-block-db-path /dev/journals/journal1 --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 4069624b-c8ef-4bab-bf60-41b0424fc98f --setuser ceph --setgroup ceph
stderr: 2019-04-09 08:41:40.753 7f16195b1f00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) read_fsid unparsable uuid
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f16195b1f00 time 2019-04-09 08:41:40.828356
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: 731: FAILED ceph_assert(available >= allocated)
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (ceph::
_ceph_assert_fail(char const
, char const*, int, char const*)+0x152) [0x555c21cde7e0]
stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 4: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 5: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 6: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 7: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 9: (main()+0x1979) [0x555c21ce3849]
stderr: 10: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 11: (_start()+0x2a) [0x555c21dcb12a]
stderr:
Caught signal (Aborted)
stderr: in thread 7f16195b1f00 thread_name:ceph-osd
stderr: 2019-04-09 08:41:40.825 7f16195b1f00 -1 /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f16195b1f00 time 2019-04-09 08:41:40.828356
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: 731: FAILED ceph_assert(available >= allocated)
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (ceph::
_ceph_assert_fail(char const, char const*, int, char const*)+0x152) [0x555c21cde7e0]
stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 4: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 5: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 6: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 7: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 9: (main()+0x1979) [0x555c21ce3849]
stderr: 10: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 11: (_start()+0x2a) [0x555c21dcb12a]
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (()+0x12890) [0x7f1616dd9890]
stderr: 2: (gsignal()+0xc7) [0x7f1615a8be97]
stderr: 3: (abort()+0x141) [0x7f1615a8d801]
stderr: 4: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x555c21cde831]
stderr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 6: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 7: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 8: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 9: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 10: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 11: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 12: (main()+0x1979) [0x555c21ce3849]
stderr: 13: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 14: (_start()+0x2a) [0x555c21dcb12a]
stderr: 2019-04-09 08:41:40.829 7f16195b1f00 -1
Caught signal (Aborted)
stderr: in thread 7f16195b1f00 thread_name:ceph-osd
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (()+0x12890) [0x7f1616dd9890]
stderr: 2: (gsignal()+0xc7) [0x7f1615a8be97]
stderr: 3: (abort()+0x141) [0x7f1615a8d801]
stderr: 4: (ceph::
_ceph_assert_fail(char const, char const*, int, char const*)+0x1a3) [0x555c21cde831]
stderr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 6: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 7: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 8: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 9: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 10: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 11: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 12: (main()+0x1979) [0x555c21ce3849]
stderr: 13: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 14: (_start()+0x2a) [0x555c21dcb12a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
stderr: -22> 2019-04-09 08:41:40.753 7f16195b1f00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
stderr: -1> 2019-04-09 08:41:40.825 7f16195b1f00 -1 /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f16195b1f00 time 2019-04-09 08:41:40.828356
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: 731: FAILED ceph_assert(available >= allocated)
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x555c21cde7e0]
stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 4: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 5: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 6: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 7: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 9: (main()+0x1979) [0x555c21ce3849]
stderr: 10: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 11: (_start()+0x2a) [0x555c21dcb12a]
stderr: 0> 2019-04-09 08:41:40.829 7f16195b1f00 -1
Caught signal (Aborted)
stderr: in thread 7f16195b1f00 thread_name:ceph-osd
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (()+0x12890) [0x7f1616dd9890]
stderr: 2: (gsignal()+0xc7) [0x7f1615a8be97]
stderr: 3: (abort()+0x141) [0x7f1615a8d801]
stderr: 4: (ceph::
_ceph_assert_fail(char const, char const*, int, char const*)+0x1a3) [0x555c21cde831]
stderr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 6: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 7: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 8: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 9: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 10: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 11: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 12: (main()+0x1979) [0x555c21ce3849]
stderr: 13: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 14: (_start()+0x2a) [0x555c21dcb12a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
stderr: -22> 2019-04-09 08:41:40.753 7f16195b1f00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
stderr: -1> 2019-04-09 08:41:40.825 7f16195b1f00 -1 /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f16195b1f00 time 2019-04-09 08:41:40.828356
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: 731: FAILED ceph_assert(available >= allocated)
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x555c21cde7e0]
stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 4: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 5: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 6: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 7: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 9: (main()+0x1979) [0x555c21ce3849]
stderr: 10: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 11: (_start()+0x2a) [0x555c21dcb12a]
stderr: 0> 2019-04-09 08:41:40.829 7f16195b1f00 -1
Caught signal (Aborted) *
stderr: in thread 7f16195b1f00 thread_name:ceph-osd
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (()+0x12890) [0x7f1616dd9890]
stderr: 2: (gsignal()+0xc7) [0x7f1615a8be97]
stderr: 3: (abort()+0x141) [0x7f1615a8d801]
stderr: 4: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x555c21cde831]
stderr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 6: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 7: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 8: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 9: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 10: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 11: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 12: (main()+0x1979) [0x555c21ce3849]
stderr: 13: (__libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 14: (_start()+0x2a) [0x555c21dcb12a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.2 --yes-i-really-mean-it
stderr: purged osd.2
stderr:
stdout_lines: <omitted>

Files

ceph-osd.2.log (37 KB) ceph-osd.2.log 'osd.2' logs from node 'osd1' Guillaume Abrioux, 04/09/2019 12:26 PM
ceph-osd.3.log (122 KB) ceph-osd.3.log more verbose log from failing OSD Guillaume Abrioux, 04/15/2019 02:26 PM

Related issues 2 (0 open2 closed)

Copied to ceph-volume - Backport #42740: mimic: zap should force writes to disk - dd does not by defaultResolvedJan FajerskiActions
Copied to ceph-volume - Backport #42741: nautilus: zap should force writes to disk - dd does not by defaultResolvedJan FajerskiActions
Actions #1

Updated by Guillaume Abrioux about 5 years ago

this happens in the purge testing scenario from ceph-ansible CI.

here is what happen in this scenario :

1/ deploy a nautilus 14.2.0 cluster
2/ purge it using ceph-volume
3/ redeploy a nautilus 14.2.0 cluster <-- the error reported shows up in this step

I was suspecting something not wiped/cleaned correctly in the second step, but sometimes that same job passes the CI (hence the fact I wonder about a race condition)

It seems to happen only on ubuntu hosts.

Actions #2

Updated by Igor Fedotov about 5 years ago

@Guillaume, could you please attach failing osd log as well. It would be super if you manage to collect it with the raised logging level ('debug bluefs=20 and debug bluestore=20")

Actions #3

Updated by Igor Fedotov about 5 years ago

Also could you please provide some info on /dev/journals/journal1
What kind is this device? LV, partition, file? What's the size?
Is it modified somehow during purge/deploy process?

Actions #4

Updated by Igor Fedotov about 5 years ago

  • Status changed from New to Need More Info
Actions #6

Updated by Guillaume Abrioux about 5 years ago

I'll see if I can get a branch of ceph-ansible with the required verbosity added running in the CI. If the error shows up I'll attach the log.
Meantime, I've attached the failing osd log.

the node osd1 where the failure shows up is a VM (libvirt), the device /dev/journals/journal1 from that VM is a 25Gb LV

Actions #7

Updated by Igor Fedotov about 5 years ago

@Guillaume, thanks for the update. Unfortunately existing log provides almost no clues. Hence waiting for more verbose one..

Actions #8

Updated by Guillaume Abrioux about 5 years ago

Would it be enough to add debug bluefs=20 and debug bluestore=20 in ceph.conf ?

Actions #9

Updated by Igor Fedotov about 5 years ago

I think so.

You can probably make sure it's been applied to the resulting log even if no failure occurred.

Actions #11

Updated by Guillaume Abrioux about 5 years ago

Hi Igor,

I've finally got a new job failing with more verbose logs.

Actions #12

Updated by Igor Fedotov about 5 years ago

Guillaume Abrioux wrote:

Hi Igor,

I've finally got a new job failing with more verbose logs.

Hi Guillaume,

thanks for the update.

I don't have an exact root cause so far but let me share some clues:
Here is the most interesting part of your log showing BlueFS log replay:

2019-04-15 12:14:25.743 7f76d9a85f00 10 bluefs _replay
2019-04-15 12:14:25.743 7f76d9a85f00 10 bluefs _replay log_fnode file(ino 1 size 0x1000 mtime 0.000000 bdev 0 allocated 400000 extents [1:0x100000~400000])
2019-04-15 12:14:25.743 7f76d9a85f00 10 bluefs _read h 0x562fd292c990 0x0~1000 from file(ino 1 size 0x1000 mtime 0.000000 bdev 0 allocated 400000 extents [1:0x100000~400000])
2019-04-15 12:14:25.743 7f76d9a85f00 20 bluefs _read fetching 0x0~100000 of 1:0x100000~400000
2019-04-15 12:14:25.747 7f76d9a85f00 20 bluefs _read left 0x100000 len 0x1000
2019-04-15 12:14:25.747 7f76d9a85f00 20 bluefs _read got 4096
2019-04-15 12:14:25.747 7f76d9a85f00 10 bluefs _replay 0x0: stop: uuid 00000000-0000-0000-0000-000000000000 != super.uuid 2ed85005-a3e1-413c-ab5f-0af4a2e126ad
2019-04-15 12:14:25.747 7f76d9a85f00 10 bluefs _replay log file size was 0x1000
2019-04-15 12:14:25.747 7f76d9a85f00 10 bluefs _replay done
2019-04-15 12:14:25.747 7f76d9a85f00 10 fbmap_alloc 0x562fd2958400 init_rm_free 0x100000~400000

Which shows that there are no entries in BlueFS log and at least some data contains all zeros (see retrieved uuid).
At the same time some lines above during mkfs we can see at least 2 records added to the log which track space given to BlueFS allocators. Successful log flush is observed as well.

019-04-15 12:14:25.711 7f76d9a85f00 20 bluefs mkfs op_alloc_add 1 0x2000~63fbfe000
2019-04-15 12:14:25.711 7f76d9a85f00 20 bluefs mkfs op_alloc_add 2 0x170000000~40000000
2019-04-15 12:14:25.711 7f76d9a85f00 10 bluefs _flush_and_sync_log txn(seq 1 len 0x25 crc 0x7f22e346)
2019-04-15 12:14:25.711 7f76d9a85f00 10 bluefs _flush 0x562fd2981680 0x0~1000 to file(ino 1 size 0x0 mtime 0.000000 bdev 0 allocated 400000 extents [1:0x100000~400000])
2019-04-15 12:14:25.711 7f76d9a85f00 10 bluefs _flush_range 0x562fd2981680 pos 0x0 0x0~1000 to file(ino 1 size 0x0 mtime 0.000000 bdev 0 allocated 400000 extents [1:0x100000~400000])
2019-04-15 12:14:25.711 7f76d9a85f00 20 bluefs _flush_range file now file(ino 1 size 0x1000 mtime 0.000000 bdev 0 allocated 400000 extents [1:0x100000~400000])
2019-04-15 12:14:25.711 7f76d9a85f00 20 bluefs _flush_range in 1:0x100000~400000 x_off 0x0
2019-04-15 12:14:25.711 7f76d9a85f00 20 bluefs _flush_range h 0x562fd2981680 pos now 0x1000
2019-04-15 12:14:25.711 7f76d9a85f00 10 bluefs _claim_completed_aios got 1 aios
2019-04-15 12:14:25.711 7f76d9a85f00 10 bluefs wait_for_aio 0x562fd2981680
2019-04-15 12:14:25.719 7f76d9a85f00 10 bluefs wait_for_aio 0x562fd2981680 done in 0.009824
2019-04-15 12:14:25.719 7f76d9a85f00 20 bluefs flush_bdev
2019-04-15 12:14:25.727 7f76d9a85f00 20 bluefs _flush_and_sync_log log_seq_stable 1

So it's absolutely unexpected that we get no records on replay and as a result BlueFS allocators track no space.
Hence the assertion.

So now the question is why BlueFS log contains no data (and probably contains all zeros). Is this observed on subsequent startup only or this content is persistent?

Side notes: given OSD log BlueFS log resides at DB device (/dev/journals/journal1) at offset 0x100000

I suspect a race between ceph-volume purge and BlueFS mkfs. Or lack of written data flush after mkfs which is less probable IMO. Or some improper HW behavior. Just in case don't you have any HW caching along the path?

So could you please check/learn how ceph-volume purge operates - does it perform disk flush immediately after zapping? Can one be sure that all the data(zeros) landed to the disk after it had completed? What are disk extent bounds ceph-volume purges?

Also it would be nice to inspect disk content (the first 2 MB) after the failure. What is present at offset 0x100000? Zeros?

Verifying if issue is still present with different HW might deserve some attention too.

Actions #13

Updated by Alfredo Deza about 5 years ago

This output should help clarify how ceph-volume does 'zap':

# ceph-volume lvm zap --destroy /dev/sdc
--> Zapping: /dev/sdc
Running command: /sbin/wipefs --all /dev/sdc
Running command: /bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0116904 s, 897 MB/s
--> Zapping successful for: <Raw Device: /dev/sdc>

It is all done using system commands, there isn't an explicit "flush", I doubt there is any race condition here.

Actions #14

Updated by Igor Fedotov about 5 years ago

@Alfredo - shouldn't we use/try oflag=sync,direct for dd?

Actions #15

Updated by Alfredo Deza about 5 years ago

I don't know what those can do, but if you think that is part of the problem here I wouldn't be opposed to try them out

Actions #16

Updated by Igor Fedotov about 5 years ago

more background on dd/sync topic:
https://askubuntu.com/questions/318186/wait-for-dd-command-to-fully-write-to-the-disk

In short - by default dd ensures no disk writing completion.

So one should use conv=fsync or oflag=sync[,direct] option for this to happen.

Actions #17

Updated by Josh Durgin almost 5 years ago

  • Project changed from bluestore to ceph-volume
  • Subject changed from FAILED ceph_assert(available >= allocated) to zap should force writes to disk - dd does not by default
  • Status changed from Need More Info to New
Actions #18

Updated by Rishabh Dave over 4 years ago

  • Status changed from New to In Progress
  • Assignee set to Rishabh Dave
Actions #19

Updated by Rishabh Dave over 4 years ago

  • Pull request ID set to 31479
Actions #20

Updated by Rishabh Dave over 4 years ago

  • Status changed from In Progress to Fix Under Review
Actions #21

Updated by Jan Fajerski over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus, mimic
Actions #22

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #42740: mimic: zap should force writes to disk - dd does not by default added
Actions #23

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #42741: nautilus: zap should force writes to disk - dd does not by default added
Actions #24

Updated by Jan Fajerski over 4 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF