Project

General

Profile

Actions

Bug #39156

closed

zap should force writes to disk - dd does not by default

Added by Guillaume Abrioux about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus, mimic
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Error seen in ceph-ansible when deploying nautilus 14.2.0.

Looks like a race condition because it's not 100% reproducible
(eg: https://2.jenkins.ceph.com/job/ceph-ansible-prs-nautilus-ubuntu-non_container-purge/12/consoleFull#-1886347834d45e74d1-4e82-45b8-a53b-321a132caf6b)

stderr_lines:
- 'Traceback (most recent call last):'
- ' File "/usr/sbin/ceph-volume", line 11, in <module>'
- ' load_entry_point(''ceph-volume==1.0.0'', ''console_scripts'', ''ceph-volume'')()'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/main.py", line 38, in init'
- ' self.main(self.argv)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 59, in newfunc'
- ' return f(a, kw)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/main.py", line 148, in main'
- ' terminal.dispatch(self.mapper, subcommand_args)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/terminal.py", line 182, in dispatch'
- ' instance.main()'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/main.py", line 40, in main'
- ' terminal.dispatch(self.mapper, self.argv)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/terminal.py", line 182, in dispatch'
- ' instance.main()'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/create.py", line 69, in main'
- ' self.create(args)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 16, in is_root'
- ' return func(*a, **kw)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/create.py", line 26, in create'
- ' prepare_step.safe_prepare(args)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 219, in safe_prepare'
- ' self.prepare()'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 16, in is_root'
- ' return func(*a, **kw)'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 320, in prepare'
- ' osd_fsid,'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 119, in prepare_bluestore'
- ' db=db'
- ' File "/usr/lib/python2.7/dist-packages/ceph_volume/util/prepare.py", line 430, in osd_mkfs_bluestore'
- ' raise RuntimeError(''Command failed with exit code %s: %s'' % (returncode, '' ''.join(command)))'
- 'RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --bluestore-block-db-path /dev/journals/journal1 --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 4069624b-c8ef-4bab-bf60-41b0424fc98f --setuser ceph --setgroup ceph'
stdout: |

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring i - osd new 4069624b-c8ef-4bab-bf60-41b0424fc98f
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
-
> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: /bin/chown h ceph:ceph /dev/test_group/data-lv2
Running command: /bin/chown -R ceph:ceph /dev/dm-1
Running command: /bin/ln -s /dev/test_group/data-lv2 /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-2/activate.monmap
stderr: got monmap epoch 1
Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-2/keyring --create-keyring --name osd.2 --add-key AQDDWqxcABJpIxAAUBrEQARzV6DvWSyh9MbJaw==
stdout: creating /var/lib/ceph/osd/ceph-2/keyring
stdout: added entity osd.2 auth(key=AQDDWqxcABJpIxAAUBrEQARzV6DvWSyh9MbJaw==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/
Running command: /bin/chown -h ceph:ceph /dev/journals/journal1
Running command: /bin/chown -R ceph:ceph /dev/dm-2
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --bluestore-block-db-path /dev/journals/journal1 --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 4069624b-c8ef-4bab-bf60-41b0424fc98f --setuser ceph --setgroup ceph
stderr: 2019-04-09 08:41:40.753 7f16195b1f00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) read_fsid unparsable uuid
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f16195b1f00 time 2019-04-09 08:41:40.828356
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: 731: FAILED ceph_assert(available >= allocated)
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (ceph::
_ceph_assert_fail(char const
, char const*, int, char const*)+0x152) [0x555c21cde7e0]
stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 4: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 5: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 6: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 7: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 9: (main()+0x1979) [0x555c21ce3849]
stderr: 10: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 11: (_start()+0x2a) [0x555c21dcb12a]
stderr:
Caught signal (Aborted)
stderr: in thread 7f16195b1f00 thread_name:ceph-osd
stderr: 2019-04-09 08:41:40.825 7f16195b1f00 -1 /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f16195b1f00 time 2019-04-09 08:41:40.828356
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: 731: FAILED ceph_assert(available >= allocated)
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (ceph::
_ceph_assert_fail(char const, char const*, int, char const*)+0x152) [0x555c21cde7e0]
stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 4: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 5: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 6: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 7: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 9: (main()+0x1979) [0x555c21ce3849]
stderr: 10: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 11: (_start()+0x2a) [0x555c21dcb12a]
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (()+0x12890) [0x7f1616dd9890]
stderr: 2: (gsignal()+0xc7) [0x7f1615a8be97]
stderr: 3: (abort()+0x141) [0x7f1615a8d801]
stderr: 4: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x555c21cde831]
stderr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 6: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 7: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 8: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 9: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 10: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 11: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 12: (main()+0x1979) [0x555c21ce3849]
stderr: 13: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 14: (_start()+0x2a) [0x555c21dcb12a]
stderr: 2019-04-09 08:41:40.829 7f16195b1f00 -1
Caught signal (Aborted)
stderr: in thread 7f16195b1f00 thread_name:ceph-osd
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (()+0x12890) [0x7f1616dd9890]
stderr: 2: (gsignal()+0xc7) [0x7f1615a8be97]
stderr: 3: (abort()+0x141) [0x7f1615a8d801]
stderr: 4: (ceph::
_ceph_assert_fail(char const, char const*, int, char const*)+0x1a3) [0x555c21cde831]
stderr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 6: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 7: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 8: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 9: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 10: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 11: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 12: (main()+0x1979) [0x555c21ce3849]
stderr: 13: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 14: (_start()+0x2a) [0x555c21dcb12a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
stderr: -22> 2019-04-09 08:41:40.753 7f16195b1f00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
stderr: -1> 2019-04-09 08:41:40.825 7f16195b1f00 -1 /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f16195b1f00 time 2019-04-09 08:41:40.828356
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: 731: FAILED ceph_assert(available >= allocated)
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x555c21cde7e0]
stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 4: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 5: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 6: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 7: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 9: (main()+0x1979) [0x555c21ce3849]
stderr: 10: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 11: (_start()+0x2a) [0x555c21dcb12a]
stderr: 0> 2019-04-09 08:41:40.829 7f16195b1f00 -1
Caught signal (Aborted)
stderr: in thread 7f16195b1f00 thread_name:ceph-osd
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (()+0x12890) [0x7f1616dd9890]
stderr: 2: (gsignal()+0xc7) [0x7f1615a8be97]
stderr: 3: (abort()+0x141) [0x7f1615a8d801]
stderr: 4: (ceph::
_ceph_assert_fail(char const, char const*, int, char const*)+0x1a3) [0x555c21cde831]
stderr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 6: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 7: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 8: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 9: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 10: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 11: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 12: (main()+0x1979) [0x555c21ce3849]
stderr: 13: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 14: (_start()+0x2a) [0x555c21dcb12a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
stderr: -22> 2019-04-09 08:41:40.753 7f16195b1f00 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
stderr: -1> 2019-04-09 08:41:40.825 7f16195b1f00 -1 /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f16195b1f00 time 2019-04-09 08:41:40.828356
stderr: /build/ceph-14.2.0/src/os/bluestore/fastbmap_allocator_impl.h: 731: FAILED ceph_assert(available >= allocated)
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x555c21cde7e0]
stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 4: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 5: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 6: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 7: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 9: (main()+0x1979) [0x555c21ce3849]
stderr: 10: (_libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 11: (_start()+0x2a) [0x555c21dcb12a]
stderr: 0> 2019-04-09 08:41:40.829 7f16195b1f00 -1
Caught signal (Aborted) *
stderr: in thread 7f16195b1f00 thread_name:ceph-osd
stderr: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
stderr: 1: (()+0x12890) [0x7f1616dd9890]
stderr: 2: (gsignal()+0xc7) [0x7f1615a8be97]
stderr: 3: (abort()+0x141) [0x7f1615a8d801]
stderr: 4: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x555c21cde831]
stderr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x555c21cde9bb]
stderr: 6: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x6fc) [0x555c223e1b8c]
stderr: 7: (BlueFS::mount()+0x2a0) [0x555c223ad630]
stderr: 8: (BlueStore::_open_bluefs(bool)+0x8c) [0x555c22288a0c]
stderr: 9: (BlueStore::_open_db(bool, bool, bool)+0xa5c) [0x555c22289d8c]
stderr: 10: (BlueStore::mkfs()+0x11d1) [0x555c222ef351]
stderr: 11: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5) [0x555c21decc35]
stderr: 12: (main()+0x1979) [0x555c21ce3849]
stderr: 13: (__libc_start_main()+0xe7) [0x7f1615a6eb97]
stderr: 14: (_start()+0x2a) [0x555c21dcb12a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.2 --yes-i-really-mean-it
stderr: purged osd.2
stderr:
stdout_lines: <omitted>

Files

ceph-osd.2.log (37 KB) ceph-osd.2.log 'osd.2' logs from node 'osd1' Guillaume Abrioux, 04/09/2019 12:26 PM
ceph-osd.3.log (122 KB) ceph-osd.3.log more verbose log from failing OSD Guillaume Abrioux, 04/15/2019 02:26 PM

Related issues 2 (0 open2 closed)

Copied to ceph-volume - Backport #42740: mimic: zap should force writes to disk - dd does not by defaultResolvedJan FajerskiActions
Copied to ceph-volume - Backport #42741: nautilus: zap should force writes to disk - dd does not by defaultResolvedJan FajerskiActions
Actions

Also available in: Atom PDF