Bug #55636: octopus: osd-bluefs-volume-ops.sh: TEST_bluestore2 fails with "FAILED ceph_assert(r == 0)" - bluestore - Ceph

Actions

Copy link

Bug #55636

closed

octopus: osd-bluefs-volume-ops.sh: TEST_bluestore2 fails with "FAILED ceph_assert(r == 0)"

Added by Laura Flores almost 2 years ago. Updated almost 2 years ago.

Status:

Resolved

Priority:

High

Assignee:

Adam Kupczyk

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

47053

Crash signature (v1):

Crash signature (v2):

Description

/a/yuriw-2022-05-09_21:49:19-rados-wip-yuri6-testing-2022-05-09-0734-octopus-distro-default-smithi/6829109

2022-05-10T00:31:01.602 INFO:tasks.workunit.client.0.smithi032.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:382: TEST_bluestore2:  ceph osd down 0
2022-05-10T00:31:02.106 INFO:tasks.workunit.client.0.smithi032.stderr:osd.0 is already down.
2022-05-10T00:31:02.113 INFO:tasks.workunit.client.0.smithi032.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:384: TEST_bluestore2:  ceph-bluestore-tool --path td/osd-bluefs-volume-ops/0 --devs-source td/osd-bluefs-volume-ops/0/block.db --dev-target td/osd-bluefs-volume-ops/0/block --command bluefs-bdev-migrate
2022-05-10T00:31:02.123 INFO:tasks.workunit.client.0.smithi032.stdout:inferring bluefs devices from bluestore path
2022-05-10T00:31:02.184 INFO:tasks.workunit.client.0.smithi032.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.16-101-g1fb36ae9/rpm/el8/BUILD/ceph-15.2.16-101-g1fb36ae9/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_mount_for_bluefs()' thread 7f4fab54c240 time 2022-05-10T00:31:02.184091+0000
2022-05-10T00:31:02.184 INFO:tasks.workunit.client.0.smithi032.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.16-101-g1fb36ae9/rpm/el8/BUILD/ceph-15.2.16-101-g1fb36ae9/src/os/bluestore/BlueStore.cc: 6894: FAILED ceph_assert(r == 0)
2022-05-10T00:31:02.185 INFO:tasks.workunit.client.0.smithi032.stderr:2022-05-10T00:31:02.183+0000 7f4fab54c240 -1 bluefs _check_new_allocations invalid extent 1: 0xe00000~400000: duplicate reference, ino 22
2022-05-10T00:31:02.185 INFO:tasks.workunit.client.0.smithi032.stderr:2022-05-10T00:31:02.183+0000 7f4fab54c240 -1 bluefs mount failed to replay log: (14) Bad address
2022-05-10T00:31:02.185 INFO:tasks.workunit.client.0.smithi032.stderr:2022-05-10T00:31:02.183+0000 7f4fab54c240 -1 bluestore(td/osd-bluefs-volume-ops/0) _open_bluefs failed bluefs mount: (14) Bad address
2022-05-10T00:31:02.188 INFO:tasks.workunit.client.0.smithi032.stderr: ceph version 15.2.16-101-g1fb36ae9 (1fb36ae9234fd4c69ca633c9660cdd632313b2a8) octopus (stable)
2022-05-10T00:31:02.188 INFO:tasks.workunit.client.0.smithi032.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f4fa16565ea]
2022-05-10T00:31:02.189 INFO:tasks.workunit.client.0.smithi032.stderr: 2: (()+0x279804) [0x7f4fa1656804]
2022-05-10T00:31:02.190 INFO:tasks.workunit.client.0.smithi032.stderr: 3: (BlueStore::_mount_for_bluefs()+0x68) [0x5555c7af1ec8]
2022-05-10T00:31:02.190 INFO:tasks.workunit.client.0.smithi032.stderr: 4: (BlueStore::migrate_to_existing_bluefs_device(std::set<int, std::less<int>, std::allocator<int> > const&, int)+0x1a0) [0x5555c7af57d0]
2022-05-10T00:31:02.191 INFO:tasks.workunit.client.0.smithi032.stderr: 5: (main()+0x4144) [0x5555c79e8204]
2022-05-10T00:31:02.191 INFO:tasks.workunit.client.0.smithi032.stderr: 6: (__libc_start_main()+0xf3) [0x7f4f9ee827b3]
2022-05-10T00:31:02.191 INFO:tasks.workunit.client.0.smithi032.stderr: 7: (_start()+0x2e) [0x5555c7a083ae]

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Laura Flores almost 2 years ago

Priority changed from Normal to High

5 occurrences on http://pulpito.front.sepia.ceph.com/?branch=wip-yuri-testing-2022-06-24-0817-octopus.

Actions

Copy link

Updated by Laura Flores almost 2 years ago

Some recent observations:

On https://trello.com/c/w6qCkODQ/1567-wip-yuri-testing-2022-06-24-0817-octopus, I noticed several repeated occurrences of this Tracker. The first time this bug was observed was during testing for https://github.com/ceph/ceph/pull/45040. We merged this PR though since upon re-running the test, it passed, and the failure did not look obviously related.

However, I think the bug actually is related to https://github.com/ceph/ceph/pull/45040. On the recent test branch from https://trello.com/c/w6qCkODQ/1567-wip-yuri-testing-2022-06-24-0817-octopus, I reverted the commit from https://github.com/ceph/ceph/pull/45040 and reran the exact same test that had previously failed:

Description: rados:standalone/{supported-random-distro$/{centos_8} workloads/osd}
http://pulpito.front.sepia.ceph.com/lflores-2022-06-30_23:22:32-rados:standalone-wip-lflores-testing-octopus-distro-default-smithi/ (10/10 tests passed)

But why, when PR https://github.com/ceph/ceph/pull/45040 was tested before, did the test pass?

Description: rados/standalone/{supported-random-distro$/{ubuntu_latest} workloads/osd}
https://pulpito.ceph.com/yuriw-2022-06-22_17:35:19-rados-wip-yuri5-testing-2022-06-22-0914-octopus-distro-default-smithi/6892316/

Notice that the only difference between passed and failed tests is the OS (centos_8 vs. ubuntu_latest).

I reran the tests on wip-yuri5-testing-2022-06-22-0914-octopus, once with centos and once with ubuntu:

Description: rados:standalone/{supported-random-distro$/{ubuntu_latest} workloads/osd}
http://pulpito.front.sepia.ceph.com/lflores-2022-07-01_15:52:29-rados:standalone-wip-yuri5-testing-2022-06-22-0914-octopus-distro-default-smithi/6909144/ (1/1 tests passed)

Description: rados:standalone/{supported-random-distro$/{centos_8} workloads/osd}
http://pulpito.front.sepia.ceph.com/lflores-2022-07-01_15:53:43-rados:standalone-wip-yuri5-testing-2022-06-22-0914-octopus-distro-default-smithi/6909209/ (1/1 tests failed)

Notice that ubuntu passes, while centos fails.

Actions

Copy link

Updated by Laura Flores almost 2 years ago

Assignee set to Igor Fedotov

Hey @Igor Gajowiak, mind taking a look?

Actions

Copy link

Updated by Igor Fedotov almost 2 years ago

Laura Flores wrote:

Hey @Igor Gajowiak, mind taking a look?

looking

Actions

Copy link

Updated by Laura Flores almost 2 years ago

Hey Igor, any updates?

Actions

Copy link

Updated by Laura Flores almost 2 years ago

Caught another instance here:
/a/yuriw-2022-07-08_20:17:39-rados-wip-yuri7-testing-2022-07-08-1007-octopus-distro-default-smithi/6920431

Description: rados/standalone/{supported-random-distro$/{ubuntu_18.04} workloads/osd}

This is failing on ubuntu_18.04. Seems that the failure is pretty reproducible.

Actions

Copy link

Updated by Laura Flores almost 2 years ago

Status changed from New to Fix Under Review
Pull request ID set to 47053

I've opened a revert PR since we need to finalize the Octopus point release. @Igor Gajowiak let me know if you have a fix you want to try though, and I can change out the PR number.

Actions

Copy link

Updated by Laura Flores almost 2 years ago

This one passed again on ubuntu_latest. Strange that it seems to correlate:

/a/lflores-2022-07-11_16:31:11-rados:standalone-wip-yuri7-testing-2022-07-08-1007-octopus-distro-default-smithi/6924068
Description: rados:standalone/{supported-random-distro$/{ubuntu_latest} workloads/osd}

Actions

Copy link

Updated by Adam Kupczyk almost 2 years ago

Assignee changed from Igor Fedotov to Adam Kupczyk

I deeply believe that this is an internal BlueFS problem. Insufficient locking? Data collision?

Actions

Copy link

#10

Updated by Igor Fedotov almost 2 years ago

The issue is caused by a false positive allocations check. It detects a reference to an "already allocated" chunk. And the latter in turn is caused by an improper operation sequence in bluefs log, e.g.:
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db.wal/000050.log
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 56
...
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_update file(ino 56 size 0xdba8000 mtime 2022-07-12T03:22:34.566
488+0300 allocated dbb0000 extents [1:0x1c900000~200000,1:0x1d000000~2000000,1:0x1f800000~200000,1:0x20a00000~200000,1:0x21400000~200000,1:0x2
1e00000~800000,1:0x22b00000~400000,1:0x23700000~200000,1:0x24100000~200000,1:0x24b00000~100000,1:0x25400000~200000,1:0x25e00000~200000,1:0x268
00000~200000,1:0x27200000~500000,1:0x28000000~700000,1:0x29d00000~200000,1:0x2a700000~300000,1:0x2b200000~800000,1:0x2bf00000~200000,1:0x2c900
000~100000,1:0x2ec00000~300000,1:0x2f700000~300000,1:0x30200000~200000,1:0x30c00000~200000,1:0x31600000~200000,1:0x32000000~300000,1:0x3370000
0~500000,1:0x34500000~300000,1:0x35000000~300000,1:0x35b00000~300000,1:0x36600000~200000,1:0x37000000~600000,1:0x38800000~100000,1:0x39500000~
300000,1:0x3b500000~500000,1:0x3c600000~800000,1:0x3ea00000~100000,1:0x3fa00000~500000,1:0x2a00000~800000,1:0xef00000~200000,1:0x11900000~2000
00,1:0x12300000~200000,1:0x12d00000~400000,1:0x13a00000~300000,1:0x14500000~300000,1:0x15000000~200000,1:0x15a00000~200000,1:0x16400000~100000
,1:0x16d00000~200000,1:0x17700000~200000,1:0x18100000~300000,1:0x18d00000~200000,1:0x19700000~300000,1:0x1a200000~200000,1:0x1ac00000~200000,1
:0x500000~100000,1:0x800000~100000,1:0xa00000~100000,1:0xc00000~100000,1:0x2ca00000~100000,1:0x1c800000~100000,1:0x3ff00000~100000,1:0x1600000
~500000,2:0x3c600000~b0000,2:0x3ceb0000~280000,2:0x3d930000~100000,2:0x3e230000~200000,2:0x3ec30000~180000,2:0x3f5b0000~200000,2:0x3ffb0000~10
0000,2:0x70000000~100000])

Actions

Copy link

#11

Updated by Igor Fedotov almost 2 years ago

So tuning that RocksDB setting results in a bit different WAL file use pattern which in turn apparently reveals a bug in BlueFS.
BTW it looks to me that we've already faced that improper operation sequence in BlueFS log, @Adam DC949 don't you remember the details?

Actions

Copy link

#12

Updated by Igor Fedotov almost 2 years ago

For the sake of completeness resending the log snippet relevant to the issue with the final false positive check:
2022-07-12T03:28:51.276+0300 7f9545400080 10 bluefs _replay 0xa0000: txn(seq 161 len 0x48b crc 0x3c32b079)
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000051.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 57
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db.wal/000050.log
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 56
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000042.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 48
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000041.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 47
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000039.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 45
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000038.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 44
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000030.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 36
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_update file(ino 56 size 0xdba8000 mtime 2022-07-12T03:22:34.566
488+0300 allocated dbb0000 extents [1:0x1c900000~200000,1:0x1d000000~2000000,1:0x1f800000~200000,1:0x20a00000~200000,1:0x21400000~200000,1:0x2
1e00000~800000,1:0x22b00000~400000,1:0x23700000~200000,1:0x24100000~200000,1:0x24b00000~100000,1:0x25400000~200000,1:0x25e00000~200000,1:0x268
00000~200000,1:0x27200000~500000,1:0x28000000~700000,1:0x29d00000~200000,1:0x2a700000~300000,1:0x2b200000~800000,1:0x2bf00000~200000,1:0x2c900
000~100000,1:0x2ec00000~300000,1:0x2f700000~300000,1:0x30200000~200000,1:0x30c00000~200000,1:0x31600000~200000,1:0x32000000~300000,1:0x3370000
0~500000,1:0x34500000~300000,1:0x35000000~300000,1:0x35b00000~300000,1:0x36600000~200000,1:0x37000000~600000,1:0x38800000~100000,1:0x39500000~
300000,1:0x3b500000~500000,1:0x3c600000~800000,1:0x3ea00000~100000,1:0x3fa00000~500000,1:0x2a00000~800000,1:0xef00000~200000,1:0x11900000~2000
00,1:0x12300000~200000,1:0x12d00000~400000,1:0x13a00000~300000,1:0x14500000~300000,1:0x15000000~200000,1:0x15a00000~200000,1:0x16400000~100000
,1:0x16d00000~200000,1:0x17700000~200000,1:0x18100000~300000,1:0x18d00000~200000,1:0x19700000~300000,1:0x1a200000~200000,1:0x1ac00000~200000,1
:0x500000~100000,1:0x800000~100000,1:0xa00000~100000,1:0xc00000~100000,1:0x2ca00000~100000,1:0x1c800000~100000,1:0x3ff00000~100000,1:0x1600000
~500000,2:0x3c600000~b0000,2:0x3ceb0000~280000,2:0x3d930000~100000,2:0x3e230000~200000,2:0x3ec30000~180000,2:0x3f5b0000~200000,2:0x3ffb0000~10
0000,2:0x70000000~100000])
...
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa6000: op_file_update file(ino 68 size 0x2c07272 mtime 2022-07-12T03:22:34.735782+0300 allocated 2d00000 extents [2:0x70100000~80000,2:0x72da0000~400000,2:0x18090000~80000,2:0x20110000~80000,2:0x21c10000~80000,2:0x76fd0000~80000,2:0x731a0000~80000,2:0x16e90000~100000,2:0x753d0000~100000,1:0x27700000~1200000,1:0x28d00000~1200000])
2022-07-12T03:28:51.276+0300 7f9545400080 -1 bluefs _check_new_allocations invalid extent 1: 0x27700000~1200000: duplicate reference, ino 68
2022-07-12T03:28:51.276+0300 7f9545400080 -1 bluefs mount failed to replay log: (14) Bad address
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _stop_alloc

Actions

Copy link

#13

Updated by Igor Fedotov almost 2 years ago

I think I have a fix but unfortunately it's a bug (and hence a fix) in BlueFS.
BlueFS improperly handles the following op sequence:

open_for_write
append
...
unlink file
fsync
close

and leaves file's op_update record in bluefs log after file's unlink one.
Which causes either false positive in allocations check during log replay or file's link counting check's failure on replay completion.

The bug itself is present in all active Ceph releases but apparently Octopus RocksDB triggers the above scenario for a WAL file when recycle_log_file_num is equal to 0 and hence this reveals the issue.

I'm going to submit a PR against master shortly but I'm not sure it makes much sense to postpone Octopus release due to that fix - IMO it worth full QA cycle and some backing in main branch before going to production.
So it's apparently better to revert https://github.com/ceph/ceph/pull/45040 at this point

Actions

Copy link

#14

Updated by Igor Fedotov almost 2 years ago

Related to Bug #56533: Bluefs might put an orpan op_update record in the log added

Actions

Copy link

#15

Updated by Laura Flores almost 2 years ago

Status changed from Fix Under Review to Resolved

Efforts to solve the BlueFS bug are ongoing in bug #56533.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Custom queries

Bug #55636

octopus: osd-bluefs-volume-ops.sh: TEST_bluestore2 fails with "FAILED ceph_assert(r == 0)"

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Igor Fedotov almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Adam Kupczyk almost 2 years ago

Updated by Igor Fedotov almost 2 years ago

Updated by Igor Fedotov almost 2 years ago

Updated by Igor Fedotov almost 2 years ago

Updated by Igor Fedotov almost 2 years ago

Updated by Igor Fedotov almost 2 years ago

Updated by Laura Flores almost 2 years ago