Bug #20169
openfilestore+btrfs occasionally returns ENOSPC
0%
Description
2017-06-02T12:14:43.073 INFO:teuthology.orchestra.run.smithi130.stderr:2017-06-02 12:14:43.072152 7fa83089f700 0 -- 172.21.15.130:0/4243660710 >> 172.21.15.130:6802/4426 conn(0x7fa814020110 :-1 s=STATE_OPEN pgs=76 cs=1 l=1)._try_send injecting socket failure 2017-06-02T12:14:43.418 INFO:tasks.ceph.osd.0.smithi130.stderr:2017-06-02 12:14:43.417084 7fdd782ef700 -1 filestore(/var/lib/ceph/osd/ceph-0) error creating #0:57e7a330:::benchmark_data_smithi130_24108_object14400:head# (/var/lib/ceph/osd/ceph-0/current/0.2a_head/benchmark\udata\usmithi130\u24108\uobject14400__head _0CC5E7EA__0) in index: (28) No space left on device 2017-06-02T12:14:43.418 INFO:tasks.ceph.osd.0.smithi130.stderr:2017-06-02 12:14:43.417099 7fdd782ef700 -1 filestore(/var/lib/ceph/osd/ceph-0) error (28) No space left on device not handled on operation 0x7fdd93808d84 (60051.1.0, or op 0, counting from 0) 2017-06-02T12:14:43.440 INFO:teuthology.orchestra.run.smithi130.stdout: 5 16 14489 14473 11.3059 10.8945 0.0051336 0.00552149 2017-06-02T12:14:43.596 INFO:tasks.ceph.osd.0.smithi130.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-1995-g6bf7363/rpm/el7/BUILD/ceph-12.0.2-1995-g6bf7363/src/os/filestore/FileStore.cc: In functio n 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7fdd782ef700 time 2017-06-02 12:14:43.594878 2017-06-02T12:14:43.597 INFO:tasks.ceph.osd.0.smithi130.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-1995-g6bf7363/rpm/el7/BUILD/ceph-12.0.2-1995-g6bf7363/src/os/filestore/FileStore.cc: 3097: FAIL ED assert(0 == "unexpected error") 2017-06-02T12:14:43.618 INFO:tasks.ceph.osd.0.smithi130.stderr: ceph version 12.0.2-1995-g6bf7363 (6bf7363b3bddac5202fd115ab323602a20c32ff7) luminous (dev) 2017-06-02T12:14:43.618 INFO:tasks.ceph.osd.0.smithi130.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fdd857421d0] 2017-06-02T12:14:43.618 INFO:tasks.ceph.osd.0.smithi130.stderr: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x801) [0x7fdd85533e41] 2017-06-02T12:14:43.618 INFO:tasks.ceph.osd.0.smithi130.stderr: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x7fdd8553924b]
this is centos.
probably wontfix, but opening the ticket to document that this is a known failure that happens on smithi.
Updated by Greg Farnum almost 7 years ago
- Description updated (diff)
- Priority changed from High to Normal
It's less likely on Centos, but I think we've seen this before and it's usually been a btrfs kernel bug that got resolved prior to or in the next release.
Updated by Sage Weil almost 7 years ago
/a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265435
rados/thrash/{0-size-min-size-overrides/2-size-1-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/fastclose.yaml msgr/random.yaml objectstore/filestore-btrfs.yaml rados.yaml rocksdb.yaml thrashers/morepggrow.yaml workloads/radosbench.yaml}
Updated by Sage Weil almost 7 years ago
/a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265467
rados/thrash/{0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml backoff/peering.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/osd-delay.yaml msgr/async.yaml objectstore/filestore-btrfs.yaml rados.yaml rocksdb.yaml thrashers/morepggrow.yaml workloads/cache-pool-snaps-readproxy.yaml}
Updated by Sage Weil almost 7 years ago
- Subject changed from filestore+btrfs occasionally returns ENOSPC to filestore+btrfs occasionally returns ENOSPC on centos
Updated by Sage Weil almost 7 years ago
- Subject changed from filestore+btrfs occasionally returns ENOSPC on centos to filestore+btrfs occasionally returns ENOSPC
ugh just saw this on xenial too. hrm.
/a/sage-2017-06-08_20:27:41-rados-wip-sage-testing2-distro-basic-smithi/1271606
Updated by Sage Weil almost 7 years ago
- Status changed from New to 12
- Priority changed from Normal to Urgent
-394> 2017-06-12 22:32:43.387004 7fe595ccb700 20 filestore(/var/lib/ceph/osd/ceph-0) _omap_setkeys 5.6_head/#5:60000000::::head# = 0 -393> 2017-06-12 22:32:43.387012 7fe595ccb700 10 journal op_apply_finish 2584 open_ops 1 -> 0, max_applied_seq 2583 -> 2584 -392> 2017-06-12 22:32:43.387014 7fe595ccb700 10 filestore(/var/lib/ceph/osd/ceph-0) _do_op 0x7fe5b7fc1960 seq 2584 r = 0, finisher 0x7fe5b3e556e0 0x7fe5b8018300 -391> 2017-06-12 22:32:43.387026 7fe595ccb700 10 filestore(/var/lib/ceph/osd/ceph-0) _finish_op 0x7fe5b7fc1960 seq 2584 osr(5.6 0x7fe5acfce2e0)/0x7fe5acfce2e0 lat 0.063476 -390> 2017-06-12 22:32:43.387061 7fe58f4be700 10 journal commit_start max_applied_seq 2584, open_ops 0 -389> 2017-06-12 22:32:43.387070 7fe58f4be700 10 journal commit_start blocked, all open_ops have completed -388> 2017-06-12 22:32:43.387071 7fe58f4be700 10 journal commit_start committing 2584, still blocked -387> 2017-06-12 22:32:43.387073 7fe58f4be700 10 journal commit_start -386> 2017-06-12 22:32:43.387079 7fe58f4be700 15 filestore(/var/lib/ceph/osd/ceph-0) sync_entry committing 2584 -385> 2017-06-12 22:32:43.387095 7fe58f4be700 10 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-0) create_checkpoint: 'snap_2584' -384> 2017-06-12 22:32:43.390597 7fe57c474700 2 rocksdb: [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960ae3/src/rocksdb/db/version_set.cc:2439] MANIFES T write: IO error: No space left on device/var/lib/ceph/osd/ceph-0/current/omap/MANIFEST-000005: No space left on device -383> 2017-06-12 22:32:43.390631 7fe57c474700 2 rocksdb: [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960ae3/src/rocksdb/db/version_set.cc:2513] [defaul t] Error in committing version edit to MANIFEST: VersionEdit { LogNumber: 38 PrevLogNumber: 0 NextFileNumber: 40 LastSeq: 7228 AddFile: 0 39 12601448 '5F484F424A544F5345515F0062656E63686D61726B2575646174612575736D697468693032322575313132343325756F626A656374302E2E2E686561642E352E3534354234353636' seq:7030, type:0 .. '5F555345525F303030303030303030303030303334365F555345525F0062656E63682D6F6D61702D6B65792D30' seq:7176, type:1 ColumnFamily: 0 } -382> 2017-06-12 22:32:43.390673 7fe57c474700 2 rocksdb: [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960ae3/src/rocksdb/db/db_impl_compaction_flush.cc: 1197] Waiting after background flush error: IO error: No space left on device/var/lib/ceph/osd/ceph-0/current/omap/MANIFEST-000005: No space left on deviceAccumulated background error counts: 1 -381> 2017-06-12 22:32:43.390679 7fe57c474700 4 rocksdb: (Original Log Time 2017/06/12-22:32:43.390514) [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960 ae3/src/rocksdb/db/memtable_list.cc:360] [default] Level-0 commit table #39 started -380> 2017-06-12 22:32:43.390682 7fe57c474700 4 rocksdb: (Original Log Time 2017/06/12-22:32:43.390641) [/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-1508-g9960ae3/rpm/el7/BUILD/ceph-12.0.3-1508-g9960 ae3/src/rocksdb/db/memtable_list.cc:394] Level-0 commit table #39: memtable #1 failed -379> 2017-06-12 22:32:43.390684 7fe57c474700 4 rocksdb: (Original Log Time 2017/06/12-22:32:43.390656) EVENT_LOG_v1 {"time_micros": 1497306763390648, "job": 17, "event": "flush_finished", "lsm_state": [3, 3, 1, 0, 0, 0, 0], "immutable_memtables": 1} -378> 2017-06-12 22:32:43.392797 7fe59c484700 1 -- 172.21.15.22:6805/7301 <== client.4300 172.21.15.22:0/3137743598 16 ==== osd_op(client.4300.0:19 5.7 5.39894217 (undecoded) ondisk+write+known_if_redirected e32) v8 ==== 324+0+8388636 (1126993834 0 2233775840) 0x7fe5ae85b800 con 0x7fe5ad685000 -377> 2017-06-12 22:32:43.392848 7fe59c484700 15 osd.0 32 enqueue_op 0x7fe5b9387760 prio 63 cost 8388636 latency 0.009557 epoch 32 osd_op(client.4300.0:19 5.7 5.39894217 (undecoded) ondisk+write+known_if_redirected e32) v8 -376> 2017-06-12 22:32:43.392857 7fe59c484700 20 osd.0 op_wq(2) _enqueue 5.7 PGQueueable(0x7fe5b9387760 prio 63 cost 8388636 e32) -375> 2017-06-12 22:32:43.414002 7fe5984d0700 10 osd.0 32 tick -374> 2017-06-12 22:32:43.414019 7fe5984d0700 10 osd.0 32 do_waiters -- start -373> 2017-06-12 22:32:43.414020 7fe5984d0700 10 osd.0 32 do_waiters -- finish -372> 2017-06-12 22:32:43.530112 7fe58fcbf700 20 journal do_write latency 0.148256 -371> 2017-06-12 22:32:43.530124 7fe58fcbf700 20 journal do_write queueing finishers through seq 2580 -370> 2017-06-12 22:32:43.530126 7fe58fcbf700 10 journal queue_completions_thru seq 2580 queueing seq 2579 0x7fe5b80097e0 lat 0.208883 -369> 2017-06-12 22:32:43.530133 7fe58fcbf700 10 journal queue_completions_thru seq 2580 queueing seq 2580 0x7fe5adb93220 lat 0.208746 -368> 2017-06-12 22:32:43.530137 7fe58fcbf700 5 journal complete_write finished 2 ops and 16781571 bytes -367> 2017-06-12 22:32:43.530599 7fe58fcbf700 10 journal room 37679103 max_size 104857600 pos 99733504 header.start 32559104 top 4096 -366> 2017-06-12 22:32:43.530607 7fe58fcbf700 10 journal check_for_full at 99733504 : 8396800 < 37679103 -365> 2017-06-12 22:32:43.530608 7fe58fcbf700 15 journal prepare_single_write 1 will write 99733504 : seq 2581 len 8390788 -> 8396800
i think this is rocksdb vs btrfs snaps?
Updated by Kefu Chai almost 7 years ago
/a/kchai-2017-06-15_17:39:27-rados-wip-kefu-testing---basic-smithi/1291475 also with rocksdb + btrfs
Updated by Sage Weil almost 7 years ago
/a/sage-2017-06-16_00:46:50-rados-wip-sage-testing-distro-basic-smithi/1292433
rados/thrash-erasure-code/{ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-mkfs.yaml fast/fast.yaml leveldb.yaml msgr-failures/fastclose.yaml objectstore/filestore-btrfs.yaml rados.yaml thrashers/mapgap.yaml workloads/ec-radosbench.yaml}
Updated by Greg Farnum almost 7 years ago
Do we have any idea why it hasn't popped up in leveldb? Is the multi-threading stuff less conducive to being snapshotted somehow?
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Component(RADOS) FileStore added
Updated by Sage Weil almost 7 years ago
/a/sage-2017-06-20_16:21:45-rados-wip-sage-testing2-distro-basic-smithi/1305525
rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml backoff/peering.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/few.yaml msgr/random.yaml objectstore/filestore-btrfs.yaml rados.yaml rocksdb.yaml thrashers/morepggrow.yaml workloads/radosbench.yaml}
Updated by Sage Weil almost 7 years ago
2017-06-20 18:18:22.913720 7fd59ba16700 10 filestore(/var/lib/ceph/osd/ceph-3) error opening file /var/lib/ceph/osd/ceph-3/current/0.5c_head/benchmark\udata\usmithi107\u8090\uobject14919__head_B9E1125C__0 with flags=66: (28) No space left on device
that's this code:
r = ::open((*path)->path(), flags, 0644); if (r < 0) { r = -errno; dout(10) << "error opening file " << (*path)->path() << " with flags=" << flags << ": " << cpp_strerror(-r) << dendl; goto fail; }
and about 1 second before this,
2017-06-20 18:18:21.871308 7fd5861c3700 20 osd.3 505 update_osd_stat osd_stat(1322 MB used, 94065 MB avail, 95388 MB total, peers [0,1,2,4,5] op hist [])
the mkfs is
2017-06-20T18:03:07.093 INFO:tasks.ceph:['mkfs.btrfs', '-m', 'single', '-l', '32768', '-n', '32768', '-f'] on /dev/nvme0n1p1 on ubuntu@smithi085.front.sepia.ceph.com 2017-06-20T18:03:07.093 INFO:teuthology.orchestra.run.smithi085:Running: 'yes | sudo mkfs.btrfs -m single -l 32768 -n 32768 -f /dev/nvme0n1p1' 2017-06-20T18:03:07.164 INFO:teuthology.orchestra.run.smithi085.stderr:WARNING: --leafsize is deprecated, use --nodesize 2017-06-20T18:03:07.595 INFO:teuthology.orchestra.run.smithi085.stdout:btrfs-progs v4.4 2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:See http://btrfs.wiki.kernel.org for more information. 2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout: 2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:Performing full device TRIM (93.15GiB) ... 2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:Label: (null) 2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:UUID: 914843f1-ebf3-426d-948a-d89b720f6a8f 2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:Node size: 32768 2017-06-20T18:03:07.596 INFO:teuthology.orchestra.run.smithi085.stdout:Sector size: 4096 2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:Filesystem size: 93.15GiB 2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:Block group profiles: 2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout: Data: single 8.00MiB 2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout: Metadata: single 8.00MiB 2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout: System: single 4.00MiB 2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:SSD detected: yes 2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:Incompat features: extref, skinny-metadata 2017-06-20T18:03:07.597 INFO:teuthology.orchestra.run.smithi085.stdout:Number of devices: 1 2017-06-20T18:03:07.598 INFO:teuthology.orchestra.run.smithi085.stdout:Devices: 2017-06-20T18:03:07.598 INFO:teuthology.orchestra.run.smithi085.stdout: ID SIZE PATH 2017-06-20T18:03:07.598 INFO:teuthology.orchestra.run.smithi085.stdout: 1 93.15GiB /dev/nvme0n1p1
wth.
Updated by Sage Weil almost 7 years ago
I've seen xenial and centos failures now, no trusty yet.
Updated by Sage Weil almost 7 years ago
- Status changed from 12 to Fix Under Review
Updated by David Disseldorp almost 7 years ago
Has this been reproduced with the following kernel fix applied?
commit 70e7af244f24c94604ef6eca32ad297632018583
Author: Omar Sandoval <osandov@fb.com>
Date: Fri Jun 2 01:20:01 2017 -0700
Btrfs: fix delalloc accounting leak caused by u32 overflow
btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication,
which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2).
For a nodesize of 16kB, this overflow happens at 16k items. Usually,
num_items is a small constant passed to btrfs_start_transaction(), but
we also use btrfs_calc_trans_metadata_size() for metadata reservations
for extent items in btrfs_delalloc_{reserve,release}_metadata().
In drop_outstanding_extents(), num_items is calculated as
inode->reserved_extents - inode->outstanding_extents. The difference
between these two counters is usually small, but if many delalloc
extents are reserved and then the outstanding extents are merged in
btrfs_merge_extent_hook(), the difference can become large enough to
overflow in btrfs_calc_trans_metadata_size().
The overflow manifests itself as a leak of a multiple of 4 GB in
delalloc_block_rsv and the metadata bytes_may_use counter. This in turn
can cause early ENOSPC errors. Additionally, these WARN_ONs in
extent-tree.c will be hit when unmounting:
WARN_ON(fs_info->delalloc_block_rsv.size > 0);
WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
WARN_ON(space_info->bytes_pinned > 0 ||
space_info->bytes_reserved > 0 ||
space_info->bytes_may_use > 0);
Fix it by casting nodesize to a u64 so that
btrfs_calc_trans_metadata_size() does a full 64-bit multiplication.
While we're here, do the same in btrfs_calc_trunc_metadata_size(); this
can't overflow with any existing uses, but it's better to be safe here
than have another hard-to-debug problem later on.
Cc: stable@vger.kernel.org
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Updated by Sage Weil almost 7 years ago
- Status changed from Fix Under Review to 12
- Priority changed from Urgent to Normal
I didn't do any digging through what patches were in the centos or xenial kernels. Happy if someone wants to chase this down, but I'm not too worried about the btrfs mode since we recommend against it anyway and are moving to bluestore.
For now the teuthology tests restrict btrfs tests to trusty.
Updated by Sage Weil almost 7 years ago
- Has duplicate Bug #20422: Out of space (btrfs journal write) added