Project

General

Profile

Bug #20169

filestore+btrfs occasionally returns ENOSPC

Added by Sage Weil over 1 year ago. Updated over 1 year ago.

Status:
Verified
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
06/02/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
FileStore
Pull request ID:

Description

2017-06-02T12:14:43.073 INFO:teuthology.orchestra.run.smithi130.stderr:2017-06-02 12:14:43.072152 7fa83089f700  0 -- 172.21.15.130:0/4243660710 >> 172.21.15.130:6802/4426 conn(0x7fa814020110 :-1 s=STATE_OPEN pgs=76 cs=1 l=1)._try_send injecting socket failure
2017-06-02T12:14:43.418 INFO:tasks.ceph.osd.0.smithi130.stderr:2017-06-02 12:14:43.417084 7fdd782ef700 -1 filestore(/var/lib/ceph/osd/ceph-0) error creating #0:57e7a330:::benchmark_data_smithi130_24108_object14400:head# (/var/lib/ceph/osd/ceph-0/current/0.2a_head/benchmark\udata\usmithi130\u24108\uobject14400__head
_0CC5E7EA__0) in index: (28) No space left on device
2017-06-02T12:14:43.418 INFO:tasks.ceph.osd.0.smithi130.stderr:2017-06-02 12:14:43.417099 7fdd782ef700 -1 filestore(/var/lib/ceph/osd/ceph-0)  error (28) No space left on device not handled on operation 0x7fdd93808d84 (60051.1.0, or op 0, counting from 0)
2017-06-02T12:14:43.440 INFO:teuthology.orchestra.run.smithi130.stdout:    5      16     14489     14473   11.3059   10.8945   0.0051336  0.00552149
2017-06-02T12:14:43.596 INFO:tasks.ceph.osd.0.smithi130.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-1995-g6bf7363/rpm/el7/BUILD/ceph-12.0.2-1995-g6bf7363/src/os/filestore/FileStore.cc: In functio
n 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7fdd782ef700 time 2017-06-02 12:14:43.594878
2017-06-02T12:14:43.597 INFO:tasks.ceph.osd.0.smithi130.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-1995-g6bf7363/rpm/el7/BUILD/ceph-12.0.2-1995-g6bf7363/src/os/filestore/FileStore.cc: 3097: FAIL
ED assert(0 == "unexpected error")
2017-06-02T12:14:43.618 INFO:tasks.ceph.osd.0.smithi130.stderr: ceph version  12.0.2-1995-g6bf7363 (6bf7363b3bddac5202fd115ab323602a20c32ff7) luminous (dev)
2017-06-02T12:14:43.618 INFO:tasks.ceph.osd.0.smithi130.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fdd857421d0]
2017-06-02T12:14:43.618 INFO:tasks.ceph.osd.0.smithi130.stderr: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x801) [0x7fdd85533e41]
2017-06-02T12:14:43.618 INFO:tasks.ceph.osd.0.smithi130.stderr: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x7fdd8553924b]

this is centos.

probably wontfix, but opening the ticket to document that this is a known failure that happens on smithi.


Related issues

Duplicated by Ceph - Bug #20422: Out of space (btrfs journal write) Duplicate 06/26/2017

History

#1 Updated by Greg Farnum over 1 year ago

  • Description updated (diff)
  • Priority changed from High to Normal

It's less likely on Centos, but I think we've seen this before and it's usually been a btrfs kernel bug that got resolved prior to or in the next release.

#2 Updated by Sage Weil over 1 year ago

/a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265435

rados/thrash/{0-size-min-size-overrides/2-size-1-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/fastclose.yaml msgr/random.yaml objectstore/filestore-btrfs.yaml rados.yaml rocksdb.yaml thrashers/morepggrow.yaml workloads/radosbench.yaml}

#3 Updated by Sage Weil over 1 year ago

/a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265467
rados/thrash/{0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml backoff/peering.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/osd-delay.yaml msgr/async.yaml objectstore/filestore-btrfs.yaml rados.yaml rocksdb.yaml thrashers/morepggrow.yaml workloads/cache-pool-snaps-readproxy.yaml}

#4 Updated by Sage Weil over 1 year ago

  • Subject changed from filestore+btrfs occasionally returns ENOSPC to filestore+btrfs occasionally returns ENOSPC on centos

#5 Updated by Sage Weil over 1 year ago

  • Subject changed from filestore+btrfs occasionally returns ENOSPC on centos to filestore+btrfs occasionally returns ENOSPC

ugh just saw this on xenial too. hrm.

/a/sage-2017-06-08_20:27:41-rados-wip-sage-testing2-distro-basic-smithi/1271606

#6 Updated by Sage Weil over 1 year ago

  • Status changed from New to Verified
  • Priority changed from Normal to Urgent
  -394> 2017-06-12 22:32:43.387004 7fe595ccb700 20 filestore(/var/lib/ceph/osd/ceph-0) _omap_setkeys 5.6_head/#5:60000000::::head# = 0
  -393> 2017-06-12 22:32:43.387012 7fe595ccb700 10 journal op_apply_finish 2584 open_ops 1 -> 0, max_applied_seq 2583 -> 2584
  -392> 2017-06-12 22:32:43.387014 7fe595ccb700 10 filestore(/var/lib/ceph/osd/ceph-0) _do_op 0x7fe5b7fc1960 seq 2584 r = 0, finisher 0x7fe5b3e556e0 0x7fe5b8018300
  -391> 2017-06-12 22:32:43.387026 7fe595ccb700 10 filestore(/var/lib/ceph/osd/ceph-0) _finish_op 0x7fe5b7fc1960 seq 2584 osr(5.6 0x7fe5acfce2e0)/0x7fe5acfce2e0 lat 0.063476
  -390> 2017-06-12 22:32:43.387061 7fe58f4be700 10 journal commit_start max_applied_seq 2584, open_ops 0
  -389> 2017-06-12 22:32:43.387070 7fe58f4be700 10 journal commit_start blocked, all open_ops have completed
  -388> 2017-06-12 22:32:43.387071 7fe58f4be700 10 journal commit_start committing 2584, still blocked
  -387> 2017-06-12 22:32:43.387073 7fe58f4be700 10 journal commit_start
  -386> 2017-06-12 22:32:43.387079 7fe58f4be700 15 filestore(/var/lib/ceph/osd/ceph-0) sync_entry committing 2584
  -385> 2017-06-12 22:32:43.387095 7fe58f4be700 10 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-0) create_checkpoint: 'snap_2584'
  -384> 2017-06-12 22:32:43.390597 7fe57c474700  2 rocksdb: [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960ae3/src/rocksdb/db/version_set.cc:2439] MANIFES
T write: IO error: No space left on device/var/lib/ceph/osd/ceph-0/current/omap/MANIFEST-000005: No space left on device

  -383> 2017-06-12 22:32:43.390631 7fe57c474700  2 rocksdb: [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960ae3/src/rocksdb/db/version_set.cc:2513] [defaul
t] Error in committing version edit to MANIFEST: 
VersionEdit {
  LogNumber: 38
  PrevLogNumber: 0
  NextFileNumber: 40
  LastSeq: 7228
  AddFile: 0 39 12601448 '5F484F424A544F5345515F0062656E63686D61726B2575646174612575736D697468693032322575313132343325756F626A656374302E2E2E686561642E352E3534354234353636' seq:7030, type:0 .. '5F555345525F303030303030303030303030303334365F555345525F0062656E63682D6F6D61702D6B65792D30' seq:7176, type:1
  ColumnFamily: 0
}

  -382> 2017-06-12 22:32:43.390673 7fe57c474700  2 rocksdb: [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960ae3/src/rocksdb/db/db_impl_compaction_flush.cc:
1197] Waiting after background flush error: IO error: No space left on device/var/lib/ceph/osd/ceph-0/current/omap/MANIFEST-000005: No space left on deviceAccumulated background error counts: 1
  -381> 2017-06-12 22:32:43.390679 7fe57c474700  4 rocksdb: (Original Log Time 2017/06/12-22:32:43.390514) [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960
ae3/src/rocksdb/db/memtable_list.cc:360] [default] Level-0 commit table #39 started
  -380> 2017-06-12 22:32:43.390682 7fe57c474700  4 rocksdb: (Original Log Time 2017/06/12-22:32:43.390641) [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960
ae3/src/rocksdb/db/memtable_list.cc:394] Level-0 commit table #39: memtable #1 failed
  -379> 2017-06-12 22:32:43.390684 7fe57c474700  4 rocksdb: (Original Log Time 2017/06/12-22:32:43.390656) EVENT_LOG_v1 {"time_micros": 1497306763390648, "job": 17, "event": "flush_finished", "lsm_state": [3, 3, 1, 0, 0, 0, 0], "immutable_memtables": 1}
  -378> 2017-06-12 22:32:43.392797 7fe59c484700  1 -- 172.21.15.22:6805/7301 <== client.4300 172.21.15.22:0/3137743598 16 ==== osd_op(client.4300.0:19 5.7 5.39894217 (undecoded) ondisk+write+known_if_redirected e32) v8 ==== 324+0+8388636 (1126993834 0 2233775840) 0x7fe5ae85b800 con 0x7fe5ad685000
  -377> 2017-06-12 22:32:43.392848 7fe59c484700 15 osd.0 32 enqueue_op 0x7fe5b9387760 prio 63 cost 8388636 latency 0.009557 epoch 32 osd_op(client.4300.0:19 5.7 5.39894217 (undecoded) ondisk+write+known_if_redirected e32) v8
  -376> 2017-06-12 22:32:43.392857 7fe59c484700 20 osd.0 op_wq(2) _enqueue 5.7 PGQueueable(0x7fe5b9387760 prio 63 cost 8388636 e32)
  -375> 2017-06-12 22:32:43.414002 7fe5984d0700 10 osd.0 32 tick
  -374> 2017-06-12 22:32:43.414019 7fe5984d0700 10 osd.0 32 do_waiters -- start
  -373> 2017-06-12 22:32:43.414020 7fe5984d0700 10 osd.0 32 do_waiters -- finish
  -372> 2017-06-12 22:32:43.530112 7fe58fcbf700 20 journal do_write latency 0.148256
  -371> 2017-06-12 22:32:43.530124 7fe58fcbf700 20 journal do_write queueing finishers through seq 2580
  -370> 2017-06-12 22:32:43.530126 7fe58fcbf700 10 journal queue_completions_thru seq 2580 queueing seq 2579 0x7fe5b80097e0 lat 0.208883
  -369> 2017-06-12 22:32:43.530133 7fe58fcbf700 10 journal queue_completions_thru seq 2580 queueing seq 2580 0x7fe5adb93220 lat 0.208746
  -368> 2017-06-12 22:32:43.530137 7fe58fcbf700  5 journal complete_write finished 2 ops and 16781571 bytes
  -367> 2017-06-12 22:32:43.530599 7fe58fcbf700 10 journal room 37679103 max_size 104857600 pos 99733504 header.start 32559104 top 4096
  -366> 2017-06-12 22:32:43.530607 7fe58fcbf700 10 journal check_for_full at 99733504 : 8396800 < 37679103
  -365> 2017-06-12 22:32:43.530608 7fe58fcbf700 15 journal prepare_single_write 1 will write 99733504 : seq 2581 len 8390788 -> 8396800

i think this is rocksdb vs btrfs snaps?

#7 Updated by Kefu Chai over 1 year ago

/a/kchai-2017-06-15_17:39:27-rados-wip-kefu-testing---basic-smithi/1291475 also with rocksdb + btrfs

#8 Updated by Sage Weil over 1 year ago

/a/sage-2017-06-16_00:46:50-rados-wip-sage-testing-distro-basic-smithi/1292433

rados/thrash-erasure-code/{ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-mkfs.yaml fast/fast.yaml leveldb.yaml msgr-failures/fastclose.yaml objectstore/filestore-btrfs.yaml rados.yaml thrashers/mapgap.yaml workloads/ec-radosbench.yaml}

#9 Updated by Greg Farnum over 1 year ago

Do we have any idea why it hasn't popped up in leveldb? Is the multi-threading stuff less conducive to being snapshotted somehow?

#10 Updated by Greg Farnum over 1 year ago

  • Project changed from Ceph to RADOS
  • Component(RADOS) FileStore added

#12 Updated by Sage Weil over 1 year ago

/a/sage-2017-06-20_16:21:45-rados-wip-sage-testing2-distro-basic-smithi/1305525
rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml backoff/peering.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/few.yaml msgr/random.yaml objectstore/filestore-btrfs.yaml rados.yaml rocksdb.yaml thrashers/morepggrow.yaml workloads/radosbench.yaml}

#13 Updated by Sage Weil over 1 year ago

2017-06-20 18:18:22.913720 7fd59ba16700 10 filestore(/var/lib/ceph/osd/ceph-3) error opening file /var/lib/ceph/osd/ceph-3/current/0.5c_head/benchmark\udata\usmithi107\u8090\uobject14919__head_B9E1125C__0 with flags=66: (28) No space left on device

that's this code:
  r = ::open((*path)->path(), flags, 0644);
  if (r < 0) {
    r = -errno;
    dout(10) << "error opening file " << (*path)->path() << " with flags=" 
      << flags << ": " << cpp_strerror(-r) << dendl;
    goto fail;
  }

and about 1 second before this,
2017-06-20 18:18:21.871308 7fd5861c3700 20 osd.3 505 update_osd_stat osd_stat(1322 MB used, 94065 MB avail, 95388 MB total, peers [0,1,2,4,5] op hist [])

the mkfs is
2017-06-20T18:03:07.093 INFO:tasks.ceph:['mkfs.btrfs', '-m', 'single', '-l', '32768', '-n', '32768', '-f'] on /dev/nvme0n1p1 on ubuntu@smithi085.front.sepia.ceph.com
2017-06-20T18:03:07.093 INFO:teuthology.orchestra.run.smithi085:Running: 'yes | sudo mkfs.btrfs -m single -l 32768 -n 32768 -f /dev/nvme0n1p1'
2017-06-20T18:03:07.164 INFO:teuthology.orchestra.run.smithi085.stderr:WARNING: --leafsize is deprecated, use --nodesize
2017-06-20T18:03:07.595 INFO:teuthology.orchestra.run.smithi085.stdout:btrfs-progs v4.4
2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:See http://btrfs.wiki.kernel.org for more information.
2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:
2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:Performing full device TRIM (93.15GiB) ...
2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:Label:              (null)
2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:UUID:               914843f1-ebf3-426d-948a-d89b720f6a8f
2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:Node size:          32768
2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:Sector size:        4096
2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:Filesystem size:    93.15GiB
2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:Block group profiles:
2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:  Data:             single            8.00MiB
2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:  Metadata:         single            8.00MiB
2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:  System:           single            4.00MiB
2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:SSD detected:       yes
2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:Incompat features:  extref, skinny-metadata
2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:Number of devices:  1
2017-06-20T18:03:07.598 INFO:teuthology.orchestra.run.smithi085.stdout:Devices:
2017-06-20T18:03:07.598 INFO:teuthology.orchestra.run.smithi085.stdout:   ID        SIZE  PATH
2017-06-20T18:03:07.598 INFO:teuthology.orchestra.run.smithi085.stdout:    1    93.15GiB  /dev/nvme0n1p1

wth.

#14 Updated by Sage Weil over 1 year ago

I've seen xenial and centos failures now, no trusty yet.

#15 Updated by Sage Weil over 1 year ago

  • Status changed from Verified to Need Review

#16 Updated by David Disseldorp over 1 year ago

Has this been reproduced with the following kernel fix applied?

commit 70e7af244f24c94604ef6eca32ad297632018583
Author: Omar Sandoval <>
Date: Fri Jun 2 01:20:01 2017 -0700

Btrfs: fix delalloc accounting leak caused by u32 overflow
btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication,
which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2).
For a nodesize of 16kB, this overflow happens at 16k items. Usually,
num_items is a small constant passed to btrfs_start_transaction(), but
we also use btrfs_calc_trans_metadata_size() for metadata reservations
for extent items in btrfs_delalloc_{reserve,release}_metadata().
In drop_outstanding_extents(), num_items is calculated as
inode->reserved_extents - inode->outstanding_extents. The difference
between these two counters is usually small, but if many delalloc
extents are reserved and then the outstanding extents are merged in
btrfs_merge_extent_hook(), the difference can become large enough to
overflow in btrfs_calc_trans_metadata_size().
The overflow manifests itself as a leak of a multiple of 4 GB in
delalloc_block_rsv and the metadata bytes_may_use counter. This in turn
can cause early ENOSPC errors. Additionally, these WARN_ONs in
extent-tree.c will be hit when unmounting:
WARN_ON(fs_info->delalloc_block_rsv.size > 0);
WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
WARN_ON(space_info->bytes_pinned > 0 ||
space_info->bytes_reserved > 0 ||
space_info->bytes_may_use > 0);
Fix it by casting nodesize to a u64 so that
btrfs_calc_trans_metadata_size() does a full 64-bit multiplication.
While we're here, do the same in btrfs_calc_trunc_metadata_size(); this
can't overflow with any existing uses, but it's better to be safe here
than have another hard-to-debug problem later on.
Cc: 
Signed-off-by: Omar Sandoval &lt;&gt;
Reviewed-by: David Sterba &lt;&gt;
Signed-off-by: Chris Mason &lt;&gt;

#17 Updated by Sage Weil over 1 year ago

  • Status changed from Need Review to Verified
  • Priority changed from Urgent to Normal

I didn't do any digging through what patches were in the centos or xenial kernels. Happy if someone wants to chase this down, but I'm not too worried about the btrfs mode since we recommend against it anyway and are moving to bluestore.

For now the teuthology tests restrict btrfs tests to trusty.

#18 Updated by Sage Weil over 1 year ago

  • Duplicated by Bug #20422: Out of space (btrfs journal write) added

Also available in: Atom PDF