Project

General

Profile

Actions

Bug #63132

open

qa: subvolume_snapshot_rm.sh stalls when waiting for OSD_FULL warning

Added by Venky Shankar 7 months ago. Updated 2 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
Testing
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef,quincy,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2023-10-06_22:26:38-fs-wip-yuri3-testing-2023-10-06-0948-quincy-distro-default-smithi/7415802

Probably not related to cephfs as such, but let's debug and hand it over to the rados folks if nothing stands out.


Related issues 6 (3 open3 closed)

Related to CephFS - Bug #62501: pacific(?): qa: mgr-osd-full causes OSD aborts due to ENOSPC (fs/full/subvolume_snapshot_rm.sh)DuplicateDhairya Parmar

Actions
Related to CephFS - Bug #63539: fs/full/subvolume_clone.sh: Health check failed: 1 full osd(s) (OSD_FULL)Duplicate

Actions
Related to CephFS - Bug #64012: qa: Command failed qa/workunits/fs/full/subvolume_clone.shDuplicateKotresh Hiremath Ravishankar

Actions
Copied to CephFS - Backport #64581: quincy: qa: subvolume_snapshot_rm.sh stalls when waiting for OSD_FULL warningIn ProgressKotresh Hiremath RavishankarActions
Copied to CephFS - Backport #64582: reef: qa: subvolume_snapshot_rm.sh stalls when waiting for OSD_FULL warningIn ProgressKotresh Hiremath RavishankarActions
Copied to CephFS - Backport #64583: squid: qa: subvolume_snapshot_rm.sh stalls when waiting for OSD_FULL warningIn ProgressKotresh Hiremath RavishankarActions
Actions #1

Updated by Kotresh Hiremath Ravishankar 7 months ago

The test sets up a 1GB, 1 OSD cluster. The test creates a cephfs subvolume and writes 800GB of file. While writing the 800G file, osd has crashed.

The df on cephfs mount before write:

2023-10-07T05:10:19.014 INFO:tasks.workunit.client.0.smithi156.stderr:+ echo 'Before write'
2023-10-07T05:10:19.015 INFO:tasks.workunit.client.0.smithi156.stderr:+ df /home/ubuntu/cephtest/mnt.0
2023-10-07T05:10:19.015 INFO:tasks.workunit.client.0.smithi156.stdout:Filesystem     1K-blocks  Used Available Use% Mounted on
2023-10-07T05:10:19.016 INFO:tasks.workunit.client.0.smithi156.stdout:ceph-fuse         675840     0    675840   0% /home/ubuntu/cephtest/mnt.0

The OSD df output before write:

2023-10-07T05:10:19.416 INFO:tasks.workunit.client.0.smithi156.stdout:ID  CLASS  WEIGHT   REWEIGHT  SIZE   RAW USE  DATA     OMAP  META     AVAIL    %USE   VAR   PGS  STATUS
2023-10-07T05:10:19.417 INFO:tasks.workunit.client.0.smithi156.stdout: 0    ssd  0.00099   1.00000  1 GiB  310 MiB  7.3 MiB   0 B  302 MiB  714 MiB  30.23  1.00   97      up
2023-10-07T05:10:19.417 INFO:tasks.workunit.client.0.smithi156.stdout:                       TOTAL  1 GiB  310 MiB  7.3 MiB   0 B  302 MiB  714 MiB  30.23
2023-10-07T05:10:19.417 INFO:tasks.workunit.client.0.smithi156.stdout:MIN/MAX VAR: 1.00/1.00  STDDEV: 0

The OSD crash while writing 800MB file:


2023-10-07T05:10:19.429 INFO:tasks.workunit.client.0.smithi156.stderr:+ ignore_failure sudo dd if=/dev/urandom of=/home/ubuntu/cephtest/mnt.0/volumes/_nogroup/sub_0/6b940132-4581-44d4-97d9-dfea73d938c2/800MB_file-1 status=progress bs=1M count=800 conv=fdatasync
2023-10-07T05:10:19.429 INFO:tasks.workunit.client.0.smithi156.stderr:+ sudo dd if=/dev/urandom of=/home/ubuntu/cephtest/mnt.0/volumes/_nogroup/sub_0/6b940132-4581-44d4-97d9-dfea73d938c2/800MB_file-1 status=progress bs=1M count=800 conv=fdatasync
2023-10-07T05:10:23.407 INFO:tasks.ceph.osd.0.smithi156.stderr:2023-10-07T05:10:23.406+0000 7f689489d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _do_alloc_write failed to allocate 0x400000 allocated 0x 251000 min_alloc_size 0x1000 available 0x 0
2023-10-07T05:10:23.407 INFO:tasks.ceph.osd.0.smithi156.stderr:2023-10-07T05:10:23.406+0000 7f689489d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _do_write _do_alloc_write failed with (28) No space left on device
2023-10-07T05:10:23.407 INFO:tasks.ceph.osd.0.smithi156.stderr:2023-10-07T05:10:23.406+0000 7f689489d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _txc_add_transaction error (28) No space left on device not handled on operation 10 (op 2, counting from 0)
2023-10-07T05:10:23.407 INFO:tasks.ceph.osd.0.smithi156.stderr:2023-10-07T05:10:23.406+0000 7f689489d700 -1 bluestore(/var/lib/ceph/osd/ceph-0) ENOSPC from bluestore, misconfigured cluster
2023-10-07T05:10:23.411 INFO:tasks.ceph.osd.0.smithi156.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.6-1298-gc656b96b/rpm/el8/BUILD/ceph-17.2.6-1298-gc656b96b/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f689489d700 time 2023-10-07T05:10:23.407401+0000
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.6-1298-gc656b96b/rpm/el8/BUILD/ceph-17.2.6-1298-gc656b96b/src/os/bluestore/BlueStore.cc: 14614: ceph_abort_msg("unexpected error")
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: ceph version 17.2.6-1298-gc656b96b (c656b96b5bee9f08d23a3059b37bb78b669c2895) quincy (stable)
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd7) [0x55eadad815ad]
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x9fb) [0x55eadb480f8b]
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2df) [0x55eadb49212f]
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x53) [0x55eadb04c5a3]
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x7c2) [0x55eadb2a3272]
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x50d) [0x55eadafc41bd]
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: 7: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xd25) [0x55eadb0213e5]
2023-10-07T05:10:23.412 INFO:tasks.ceph.osd.0.smithi156.stderr: 8: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x288d) [0x55eadb028a4d]
2023-10-07T05:10:23.413 INFO:tasks.ceph.osd.0.smithi156.stderr: 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1c0) [0x55eadae84060]
2023-10-07T05:10:23.413 INFO:tasks.ceph.osd.0.smithi156.stderr: 10: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6d) [0x55eadb18c9cd]
2023-10-07T05:10:23.413 INFO:tasks.ceph.osd.0.smithi156.stderr: 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x112f) [0x55eadaebed3f]
2023-10-07T05:10:23.413 INFO:tasks.ceph.osd.0.smithi156.stderr: 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x435) [0x55eadb6022f5]
2023-10-07T05:10:23.413 INFO:tasks.ceph.osd.0.smithi156.stderr: 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55eadb604a14]
2023-10-07T05:10:23.413 INFO:tasks.ceph.osd.0.smithi156.stderr: 14: /lib64/libpthread.so.0(+0x81ca) [0x7f68bb0cb1ca]
2023-10-07T05:10:23.413 INFO:tasks.ceph.osd.0.smithi156.stderr: 15: clone()

The OSD is not expected to crash but suppose to return ENOSPC

Actions #2

Updated by Venky Shankar 7 months ago

ah, so its the way the test config is setup. We should fix that in fs qa suite,

Actions #3

Updated by Venky Shankar 7 months ago

  • Related to Bug #62501: pacific(?): qa: mgr-osd-full causes OSD aborts due to ENOSPC (fs/full/subvolume_snapshot_rm.sh) added
Actions #4

Updated by Venky Shankar 7 months ago

  • Status changed from New to Triaged
  • Assignee set to Kotresh Hiremath Ravishankar
Actions #5

Updated by Kotresh Hiremath Ravishankar 6 months ago

It is strange that the the test are passing in main branch.

On the main branch - Available is 997 MiB

2023-11-04T07:49:50.239 INFO:tasks.workunit.client.0.smithi062.stderr:+ ceph osd df
2023-11-04T07:49:50.621 INFO:tasks.workunit.client.0.smithi062.stdout:ID  CLASS  WEIGHT   REWEIGHT  SIZE   RAW USE  DATA     OMAP     META    AVAIL    %USE  VAR   PGS STATUS
2023-11-04T07:49:50.622 INFO:tasks.workunit.client.0.smithi062.stdout: 0    ssd  0.00099   1.00000  1 GiB   27 MiB  736 KiB    1 KiB  26 MiB  997 MiB  2.65  1.00   97     up
2023-11-04T07:49:50.622 INFO:tasks.workunit.client.0.smithi062.stdout:                       TOTAL  1 GiB   27 MiB  736 KiB  1.6 KiB  26 MiB  997 MiB  2.65

On the quincy: Available is 729 MiB

2023-10-07T05:07:29.602 INFO:tasks.workunit.client.0.smithi156.stderr:+ ceph osd df
2023-10-07T05:07:29.998 INFO:tasks.workunit.client.0.smithi156.stdout:ID  CLASS  WEIGHT   REWEIGHT  SIZE   RAW USE  DATA     OMAP  META     AVAIL    %USE   VAR   PGS  STATUS
2023-10-07T05:07:29.998 INFO:tasks.workunit.client.0.smithi156.stdout: 0    ssd  0.00099   1.00000  1 GiB  295 MiB  720 KiB   0 B  294 MiB  729 MiB  28.81  1.00   97      up
2023-10-07T05:07:29.999 INFO:tasks.workunit.client.0.smithi156.stdout:                       TOTAL  1 GiB  295 MiB  720 KiB   0 B  294 MiB  729 MiB  28.81

For the same test suite configuration of 1G OSD, How can it be different between main and release branches?

Because of this difference, the test fails in release branches.

Venky/Patrick,

Any idea ?

Actions #6

Updated by Venky Shankar 6 months ago

Kotresh Hiremath Ravishankar wrote:

It is strange that the the test are passing in main branch.

On the main branch - Available is 997 MiB

[...]

On the quincy: Available is 729 MiB
[...]

Do you see any config difference? main branch could be using a different allocator. I would start from that...

Actions #7

Updated by Rishabh Dave 6 months ago

  • Related to Bug #63539: fs/full/subvolume_clone.sh: Health check failed: 1 full osd(s) (OSD_FULL) added
Actions #8

Updated by Kotresh Hiremath Ravishankar 6 months ago

Venky Shankar wrote:

Kotresh Hiremath Ravishankar wrote:

It is strange that the the test are passing in main branch.

On the main branch - Available is 997 MiB

[...]

On the quincy: Available is 729 MiB
[...]

Do you see any config difference? main branch could be using a different allocator. I would start from that...

ok, I didn't go into osd configuration. I will check that out.

Actions #9

Updated by Radoslaw Zarzynski 6 months ago

A speculation: in main we compress RocksDB (bluestore_rocksdb_options; see https://github.com/ceph/ceph/pull/53343).
The patch isn't in reef yet.

Excluding it's the cause would make further thinking easier.

Actions #10

Updated by Kotresh Hiremath Ravishankar 6 months ago

Copying the slack communication here
-------------------------------------

Kotresh H R
@Radoslaw Zarzynski
Thanks for the input. It could be the reason. Is the PR https://github.com/ceph/ceph/pull/53343 planned to be backported to release branches ? If not I might think towards modifying the testcases. (edited)
5:39
I also need to validate, it is the reason. I will do that.
5:41
Well I see that in commit message, it's said it would be default form Squid, so no backport to reef/quincy expected ?

rzarzynski
5:43 PM
it's going to be in squid if passes extra testing for regressions

Actions #11

Updated by Venky Shankar 5 months ago

I suggested that a test be run with disabling rocksdb compression to check if the space utilization shoots up and makes the test trip.

Actions #12

Updated by Kotresh Hiremath Ravishankar 5 months ago

I reverted the rocksdb compression PR and ran the tests.

http://qa-proxy.ceph.com/teuthology/khiremat-2023-12-01_07:20:13-fs:full-wip-khiremat-test-fs-full-suite-no-rocksdb-compr-distro-default-smithi/7473631/teuthology.log

I observe that the space utilization is unchanged in the main with compression and without compression before writing any data.

2023-12-01T07:42:49.720 INFO:tasks.workunit.client.0.smithi069.stdout:Before write
2023-12-01T07:42:49.720 INFO:tasks.workunit.client.0.smithi069.stderr:+ df -h
2023-12-01T07:42:49.740 INFO:tasks.workunit.client.0.smithi069.stdout:Filesystem                Size  Used Avail Use% Mounted on
2023-12-01T07:42:49.740 INFO:tasks.workunit.client.0.smithi069.stdout:tmpfs                     3.2G  1.4M  3.2G   1% /run
2023-12-01T07:42:49.740 INFO:tasks.workunit.client.0.smithi069.stdout:/dev/sda1                 916G   11G  860G   2% /
2023-12-01T07:42:49.740 INFO:tasks.workunit.client.0.smithi069.stdout:tmpfs                      16G   32K   16G   1% /dev/shm
2023-12-01T07:42:49.741 INFO:tasks.workunit.client.0.smithi069.stdout:tmpfs                     5.0M     0  5.0M   0% /run/lock
2023-12-01T07:42:49.741 INFO:tasks.workunit.client.0.smithi069.stdout:tmpfs                     3.2G     0  3.2G   0% /run/user/1000
2023-12-01T07:42:49.741 INFO:tasks.workunit.client.0.smithi069.stdout:/dev/mapper/vg_nvme-lv_5   15G   72M   15G   1% /var/lib/ceph
2023-12-01T07:42:49.741 INFO:tasks.workunit.client.0.smithi069.stdout:ceph-fuse                 944M     0  944M   0% /home/ubuntu/cephtest/mnt.0
2023-12-01T07:42:49.741 INFO:tasks.workunit.client.0.smithi069.stderr:+ ceph osd df
...

2023-12-01T07:42:50.095 INFO:tasks.workunit.client.0.smithi069.stdout:ID  CLASS  WEIGHT   REWEIGHT  SIZE   RAW USE  DATA     OMAP     META    AVAIL    %USE  VAR   PGS  STATUS
2023-12-01T07:42:50.095 INFO:tasks.workunit.client.0.smithi069.stdout: 0    ssd  0.00099   1.00000  1 GiB   27 MiB  736 KiB    1 KiB  26 MiB  997 MiB  2.65  1.00   97      up
2023-12-01T07:42:50.095 INFO:tasks.workunit.client.0.smithi069.stdout:                       TOTAL  1 GiB   27 MiB  736 KiB  1.7 KiB  26 MiB  997 MiB  2.65
Actions #18

Updated by Kotresh Hiremath Ravishankar 3 months ago

Copying Adam's reply into this tracker: (https://tracker.ceph.com/issues/62501#note-10)

----------------
The bug is caused by difference is behaviour of BlueFS.
This PR is the final trigger: https://github.com/ceph/ceph/pull/51774 (quincy variant)

ORIGINAL:
Before that BlueFS DID NOT preallocate size for RocksDB Write-Ahead-Log (WAL).
Settings that we had in configuration:
bluestore_rocksdb_options="max_write_buffer_number=4...write_buffer_size=268435456..."
did nothing to preallocate (ceph-bluestore-tool dump-bluefs-log):
0x1000d7: op_file_update file(ino 36 size 0x4b38e1 ... allocated 500000 ... (note size almost as big as allocated)
0x100311: op_dir_link db.wal/000038.log to 36

WITH PR 51774:
Now BlueFS preallocates log as per RocksDB request:
0x1000d7: op_file_update file(ino 36 size 0x325bd9 ... allocated 11a00000 ... (allocated is ~290MB)
0x100312: op_dir_link db.wal/000038.log to 36

REFERENCE IN ORIGINAL/MAIN
We have different settings that we created in process of optimizing RocksDB performance, starting with PR #47221.
bluestore_rocksdb_options="max_write_buffer_number=64...write_buffer_size=16777216..."
With that, we preallocate just a little:
0x10017d: op_file_update file(ino 31 size 0x31ae8c ... allocated 1200000 ... allocated (~18MB)
0x100354: op_dir_link db.wal/000038.log to 31

So the change is between allocating max 4x 256MB WALs
vs allocating max 64x 16MB WALs

==================================================================================
Taking in all into account I recommend to just expand device size from 1GB to 2GB.
-----------------

Actions #19

Updated by Kotresh Hiremath Ravishankar 3 months ago

  • Status changed from Triaged to In Progress
  • Pull request ID set to 55435
Actions #20

Updated by Kotresh Hiremath Ravishankar 3 months ago

  • Related to Bug #64012: qa: Command failed qa/workunits/fs/full/subvolume_clone.sh added
Actions #21

Updated by Venky Shankar 2 months ago

  • Status changed from In Progress to Pending Backport
  • Backport changed from reef,quincy to reef,quincy,squid
Actions #22

Updated by Backport Bot 2 months ago

  • Copied to Backport #64581: quincy: qa: subvolume_snapshot_rm.sh stalls when waiting for OSD_FULL warning added
Actions #23

Updated by Backport Bot 2 months ago

  • Copied to Backport #64582: reef: qa: subvolume_snapshot_rm.sh stalls when waiting for OSD_FULL warning added
Actions #24

Updated by Backport Bot 2 months ago

  • Copied to Backport #64583: squid: qa: subvolume_snapshot_rm.sh stalls when waiting for OSD_FULL warning added
Actions #25

Updated by Backport Bot 2 months ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF