Bug #20422: Out of space (btrfs journal write) - Ceph - Ceph

Actions

Copy link

Bug #20422

closed

Out of space (btrfs journal write)

Added by David Zafman almost 7 years ago. Updated almost 7 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

A journal write to btrfs got ENOSPC. The osd never saw any significant disk usage though.

Test run: dzafman-2017-06-26_14:07:20-rados-wip-13837-distro-basic-smithi/1328203

2017-06-26T21:22:37.026 DEBUG:teuthology.run:Teuthology command: teuthology /tmp/teuthology-worker.rbilWJ.tmp -- --verbose --lock --description rados/thrash/{0-size-min-size-overrides/2-size-1-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml backoff/normal.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/fastclose.yaml msgr/async.yaml objectstore/filestore-btrfs.yaml rados.yaml rocksdb.yaml thrashers/mapgap.yaml workloads/radosbench.yaml} --name dzafman-2017-06-26_14:07:20-rados-wip-13837-distro-basic-smithi --block --owner scheduled_dzafman@TrustyTahr --archive /home/teuthworker/archive/dzafman-2017-06-26_14:07:20-rados-wip-13837-distro-basic-smithi/1328203


2017-06-26 21:46:48.904182 7fd4c3bd5700 20 osd.3 448 check_full_status cur ratio 0.0136102. nearfull_ratio 0.97. backfillfull_ratio 0.97, full_ratio 0.97, failsafe_ratio 0.97, new state none

2017-06-26 21:46:52.954212 7fd4d2c0d700 -1 journal FileJournal::write_bl : write_fd failed: (28) No space left on device
2017-06-26 21:46:52.954219 7fd4d2c0d700 -1 journal FileJournal::do_write: write_bl(pos=20344832) failed

2017-06-26 21:46:52.955552 7fd4d2c0d700 -1 *** Caught signal (Aborted) **
 in thread 7fd4d2c0d700 thread_name:journal_write

 ceph version 12.0.3-1975-g640c004 (640c00456c926f2344fbf0e024568d4c19a6f3a2) luminous (dev)
 1: (()+0x9f93e2) [0x563312a9c3e2]
 2: (()+0x113e0) [0x7fd4e1f3c3e0]
 3: (gsignal()+0x38) [0x7fd4e0ed8428]
 4: (abort()+0x16a) [0x7fd4e0eda02a]
 5: (FileJournal::do_write(ceph::buffer::list&)+0x863) [0x563312a28d53]
 6: (FileJournal::write_thread_entry()+0xc2d) [0x563312a2d92d]
 7: (FileJournal::Writer::entry()+0xd) [0x5633128a520d]
 8: (()+0x770a) [0x7fd4e1f3270a]
 9: (clone()+0x6d) [0x7fd4e0fa982d]

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by David Zafman almost 7 years ago

This also happened in another job with btrfs: dzafman-2017-06-26_14:07:20-rados-wip-13837-distro-basic-smithi/1328316

Again disk is no where near out of space.


2017-06-26 22:15:03.331684 7fa7f9df1700 20 osd.3 68 check_full_status cur ratio 0.0137066. nearfull_ratio 0.85. backfillfull_ratio 0.9, full_ratio 0.95, failsafe_ratio 0.97, new state none

2017-06-26 22:15:04.498672 7fa80fe39700 -1 filestore(/var/lib/ceph/osd/ceph-3) FileStore::_setattrs: chain_setxattr returned -28
2017-06-26 22:15:04.498678 7fa80fe39700 10 filestore(/var/lib/ceph/osd/ceph-3) setattrs 0.16_head/#0:6e0a36c3:::benchmark_data_smithi090_25394_object10018:head# = -28
2017-06-26 22:15:04.498683 7fa80fe39700  0 filestore(/var/lib/ceph/osd/ceph-3)  ENOSPC on setxattrs on 0.16_head/#0:6e0a36c3:::benchmark_data_smithi090_25394_object10018:head#
2017-06-26 22:15:04.498685 7fa80fe39700 -1 filestore(/var/lib/ceph/osd/ceph-3)  error (28) No space left on device not handled on operation 0x55689e6e2f8c (42917.1.1, or op 1, counting from 0)
2017-06-26 22:15:04.498689 7fa80fe39700  0 filestore(/var/lib/ceph/osd/ceph-3) ENOSPC from disk filesystem, misconfigured cluster
2017-06-26 22:15:04.498690 7fa80fe39700  0 filestore(/var/lib/ceph/osd/ceph-3)  transaction dump:

Actions

Copy link