Project

General

Profile

Bug #55636

octopus: osd-bluefs-volume-ops.sh: TEST_bluestore2 fails with "FAILED ceph_assert(r == 0)"

Added by Laura Flores 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2022-05-09_21:49:19-rados-wip-yuri6-testing-2022-05-09-0734-octopus-distro-default-smithi/6829109

2022-05-10T00:31:01.602 INFO:tasks.workunit.client.0.smithi032.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:382: TEST_bluestore2:  ceph osd down 0
2022-05-10T00:31:02.106 INFO:tasks.workunit.client.0.smithi032.stderr:osd.0 is already down.
2022-05-10T00:31:02.113 INFO:tasks.workunit.client.0.smithi032.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:384: TEST_bluestore2:  ceph-bluestore-tool --path td/osd-bluefs-volume-ops/0 --devs-source td/osd-bluefs-volume-ops/0/block.db --dev-target td/osd-bluefs-volume-ops/0/block --command bluefs-bdev-migrate
2022-05-10T00:31:02.123 INFO:tasks.workunit.client.0.smithi032.stdout:inferring bluefs devices from bluestore path
2022-05-10T00:31:02.184 INFO:tasks.workunit.client.0.smithi032.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.16-101-g1fb36ae9/rpm/el8/BUILD/ceph-15.2.16-101-g1fb36ae9/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_mount_for_bluefs()' thread 7f4fab54c240 time 2022-05-10T00:31:02.184091+0000
2022-05-10T00:31:02.184 INFO:tasks.workunit.client.0.smithi032.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.16-101-g1fb36ae9/rpm/el8/BUILD/ceph-15.2.16-101-g1fb36ae9/src/os/bluestore/BlueStore.cc: 6894: FAILED ceph_assert(r == 0)
2022-05-10T00:31:02.185 INFO:tasks.workunit.client.0.smithi032.stderr:2022-05-10T00:31:02.183+0000 7f4fab54c240 -1 bluefs _check_new_allocations invalid extent 1: 0xe00000~400000: duplicate reference, ino 22
2022-05-10T00:31:02.185 INFO:tasks.workunit.client.0.smithi032.stderr:2022-05-10T00:31:02.183+0000 7f4fab54c240 -1 bluefs mount failed to replay log: (14) Bad address
2022-05-10T00:31:02.185 INFO:tasks.workunit.client.0.smithi032.stderr:2022-05-10T00:31:02.183+0000 7f4fab54c240 -1 bluestore(td/osd-bluefs-volume-ops/0) _open_bluefs failed bluefs mount: (14) Bad address
2022-05-10T00:31:02.188 INFO:tasks.workunit.client.0.smithi032.stderr: ceph version 15.2.16-101-g1fb36ae9 (1fb36ae9234fd4c69ca633c9660cdd632313b2a8) octopus (stable)
2022-05-10T00:31:02.188 INFO:tasks.workunit.client.0.smithi032.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f4fa16565ea]
2022-05-10T00:31:02.189 INFO:tasks.workunit.client.0.smithi032.stderr: 2: (()+0x279804) [0x7f4fa1656804]
2022-05-10T00:31:02.190 INFO:tasks.workunit.client.0.smithi032.stderr: 3: (BlueStore::_mount_for_bluefs()+0x68) [0x5555c7af1ec8]
2022-05-10T00:31:02.190 INFO:tasks.workunit.client.0.smithi032.stderr: 4: (BlueStore::migrate_to_existing_bluefs_device(std::set<int, std::less<int>, std::allocator<int> > const&, int)+0x1a0) [0x5555c7af57d0]
2022-05-10T00:31:02.191 INFO:tasks.workunit.client.0.smithi032.stderr: 5: (main()+0x4144) [0x5555c79e8204]
2022-05-10T00:31:02.191 INFO:tasks.workunit.client.0.smithi032.stderr: 6: (__libc_start_main()+0xf3) [0x7f4f9ee827b3]
2022-05-10T00:31:02.191 INFO:tasks.workunit.client.0.smithi032.stderr: 7: (_start()+0x2e) [0x5555c7a083ae]


Related issues

Related to bluestore - Bug #56533: Bluefs might put an orpan op_update record in the log Pending Backport

History

#1 Updated by Laura Flores about 2 months ago

  • Priority changed from Normal to High

#2 Updated by Laura Flores about 2 months ago

Some recent observations:

On https://trello.com/c/w6qCkODQ/1567-wip-yuri-testing-2022-06-24-0817-octopus, I noticed several repeated occurrences of this Tracker. The first time this bug was observed was during testing for https://github.com/ceph/ceph/pull/45040. We merged this PR though since upon re-running the test, it passed, and the failure did not look obviously related.

However, I think the bug actually is related to https://github.com/ceph/ceph/pull/45040. On the recent test branch from https://trello.com/c/w6qCkODQ/1567-wip-yuri-testing-2022-06-24-0817-octopus, I reverted the commit from https://github.com/ceph/ceph/pull/45040 and reran the exact same test that had previously failed: But why, when PR https://github.com/ceph/ceph/pull/45040 was tested before, did the test pass?

Notice that the only difference between passed and failed tests is the OS (centos_8 vs. ubuntu_latest).

I reran the tests on wip-yuri5-testing-2022-06-22-0914-octopus, once with centos and once with ubuntu:

Notice that ubuntu passes, while centos fails.

#3 Updated by Laura Flores about 2 months ago

  • Assignee set to Igor Fedotov

Hey @Igor, mind taking a look?

#4 Updated by Igor Fedotov about 1 month ago

Laura Flores wrote:

Hey @Igor, mind taking a look?

looking

#5 Updated by Laura Flores about 1 month ago

Hey Igor, any updates?

#6 Updated by Laura Flores about 1 month ago

Caught another instance here:
/a/yuriw-2022-07-08_20:17:39-rados-wip-yuri7-testing-2022-07-08-1007-octopus-distro-default-smithi/6920431

Description: rados/standalone/{supported-random-distro$/{ubuntu_18.04} workloads/osd}

This is failing on ubuntu_18.04. Seems that the failure is pretty reproducible.

#7 Updated by Laura Flores about 1 month ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 47053

I've opened a revert PR since we need to finalize the Octopus point release. @Igor let me know if you have a fix you want to try though, and I can change out the PR number.

#8 Updated by Laura Flores about 1 month ago

This one passed again on ubuntu_latest. Strange that it seems to correlate:

/a/lflores-2022-07-11_16:31:11-rados:standalone-wip-yuri7-testing-2022-07-08-1007-octopus-distro-default-smithi/6924068
Description: rados:standalone/{supported-random-distro$/{ubuntu_latest} workloads/osd}

#9 Updated by Adam Kupczyk about 1 month ago

  • Assignee changed from Igor Fedotov to Adam Kupczyk

I deeply believe that this is an internal BlueFS problem. Insufficient locking? Data collision?

#10 Updated by Igor Fedotov about 1 month ago

The issue is caused by a false positive allocations check. It detects a reference to an "already allocated" chunk. And the latter in turn is caused by an improper operation sequence in bluefs log, e.g.:
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db.wal/000050.log
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 56
...
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_update file(ino 56 size 0xdba8000 mtime 2022-07-12T03:22:34.566
488+0300 allocated dbb0000 extents [1:0x1c900000~200000,1:0x1d000000~2000000,1:0x1f800000~200000,1:0x20a00000~200000,1:0x21400000~200000,1:0x2
1e00000~800000,1:0x22b00000~400000,1:0x23700000~200000,1:0x24100000~200000,1:0x24b00000~100000,1:0x25400000~200000,1:0x25e00000~200000,1:0x268
00000~200000,1:0x27200000~500000,1:0x28000000~700000,1:0x29d00000~200000,1:0x2a700000~300000,1:0x2b200000~800000,1:0x2bf00000~200000,1:0x2c900
000~100000,1:0x2ec00000~300000,1:0x2f700000~300000,1:0x30200000~200000,1:0x30c00000~200000,1:0x31600000~200000,1:0x32000000~300000,1:0x3370000
0~500000,1:0x34500000~300000,1:0x35000000~300000,1:0x35b00000~300000,1:0x36600000~200000,1:0x37000000~600000,1:0x38800000~100000,1:0x39500000~
300000,1:0x3b500000~500000,1:0x3c600000~800000,1:0x3ea00000~100000,1:0x3fa00000~500000,1:0x2a00000~800000,1:0xef00000~200000,1:0x11900000~2000
00,1:0x12300000~200000,1:0x12d00000~400000,1:0x13a00000~300000,1:0x14500000~300000,1:0x15000000~200000,1:0x15a00000~200000,1:0x16400000~100000
,1:0x16d00000~200000,1:0x17700000~200000,1:0x18100000~300000,1:0x18d00000~200000,1:0x19700000~300000,1:0x1a200000~200000,1:0x1ac00000~200000,1
:0x500000~100000,1:0x800000~100000,1:0xa00000~100000,1:0xc00000~100000,1:0x2ca00000~100000,1:0x1c800000~100000,1:0x3ff00000~100000,1:0x1600000
~500000,2:0x3c600000~b0000,2:0x3ceb0000~280000,2:0x3d930000~100000,2:0x3e230000~200000,2:0x3ec30000~180000,2:0x3f5b0000~200000,2:0x3ffb0000~10
0000,2:0x70000000~100000])

#11 Updated by Igor Fedotov about 1 month ago

So tuning that RocksDB setting results in a bit different WAL file use pattern which in turn apparently reveals a bug in BlueFS.
BTW it looks to me that we've already faced that improper operation sequence in BlueFS log, @Adam don't you remember the details?

#12 Updated by Igor Fedotov about 1 month ago

For the sake of completeness resending the log snippet relevant to the issue with the final false positive check:
2022-07-12T03:28:51.276+0300 7f9545400080 10 bluefs _replay 0xa0000: txn(seq 161 len 0x48b crc 0x3c32b079)
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000051.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 57
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db.wal/000050.log
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 56
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000042.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 48
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000041.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 47
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000039.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 45
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000038.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 44
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_dir_unlink db/000030.sst
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_remove 36
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa0000: op_file_update file(ino 56 size 0xdba8000 mtime 2022-07-12T03:22:34.566
488+0300 allocated dbb0000 extents [1:0x1c900000~200000,1:0x1d000000~2000000,1:0x1f800000~200000,1:0x20a00000~200000,1:0x21400000~200000,1:0x2
1e00000~800000,1:0x22b00000~400000,1:0x23700000~200000,1:0x24100000~200000,1:0x24b00000~100000,1:0x25400000~200000,1:0x25e00000~200000,1:0x268
00000~200000,1:0x27200000~500000,1:0x28000000~700000,1:0x29d00000~200000,1:0x2a700000~300000,1:0x2b200000~800000,1:0x2bf00000~200000,1:0x2c900
000~100000,1:0x2ec00000~300000,1:0x2f700000~300000,1:0x30200000~200000,1:0x30c00000~200000,1:0x31600000~200000,1:0x32000000~300000,1:0x3370000
0~500000,1:0x34500000~300000,1:0x35000000~300000,1:0x35b00000~300000,1:0x36600000~200000,1:0x37000000~600000,1:0x38800000~100000,1:0x39500000~
300000,1:0x3b500000~500000,1:0x3c600000~800000,1:0x3ea00000~100000,1:0x3fa00000~500000,1:0x2a00000~800000,1:0xef00000~200000,1:0x11900000~2000
00,1:0x12300000~200000,1:0x12d00000~400000,1:0x13a00000~300000,1:0x14500000~300000,1:0x15000000~200000,1:0x15a00000~200000,1:0x16400000~100000
,1:0x16d00000~200000,1:0x17700000~200000,1:0x18100000~300000,1:0x18d00000~200000,1:0x19700000~300000,1:0x1a200000~200000,1:0x1ac00000~200000,1
:0x500000~100000,1:0x800000~100000,1:0xa00000~100000,1:0xc00000~100000,1:0x2ca00000~100000,1:0x1c800000~100000,1:0x3ff00000~100000,1:0x1600000
~500000,2:0x3c600000~b0000,2:0x3ceb0000~280000,2:0x3d930000~100000,2:0x3e230000~200000,2:0x3ec30000~180000,2:0x3f5b0000~200000,2:0x3ffb0000~10
0000,2:0x70000000~100000])
...
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _replay 0xa6000: op_file_update file(ino 68 size 0x2c07272 mtime 2022-07-12T03:22:34.735782+0300 allocated 2d00000 extents [2:0x70100000~80000,2:0x72da0000~400000,2:0x18090000~80000,2:0x20110000~80000,2:0x21c10000~80000,2:0x76fd0000~80000,2:0x731a0000~80000,2:0x16e90000~100000,2:0x753d0000~100000,1:0x27700000~1200000,1:0x28d00000~1200000])
2022-07-12T03:28:51.276+0300 7f9545400080 -1 bluefs _check_new_allocations invalid extent 1: 0x27700000~1200000: duplicate reference, ino 68
2022-07-12T03:28:51.276+0300 7f9545400080 -1 bluefs mount failed to replay log: (14) Bad address
2022-07-12T03:28:51.276+0300 7f9545400080 20 bluefs _stop_alloc

#13 Updated by Igor Fedotov about 1 month ago

I think I have a fix but unfortunately it's a bug (and hence a fix) in BlueFS.
BlueFS improperly handles the following op sequence:

open_for_write
append
...
unlink file
fsync
close

and leaves file's op_update record in bluefs log after file's unlink one.
Which causes either false positive in allocations check during log replay or file's link counting check's failure on replay completion.

The bug itself is present in all active Ceph releases but apparently Octopus RocksDB triggers the above scenario for a WAL file when recycle_log_file_num is equal to 0 and hence this reveals the issue.

I'm going to submit a PR against master shortly but I'm not sure it makes much sense to postpone Octopus release due to that fix - IMO it worth full QA cycle and some backing in main branch before going to production.
So it's apparently better to revert https://github.com/ceph/ceph/pull/45040 at this point

#14 Updated by Igor Fedotov about 1 month ago

  • Related to Bug #56533: Bluefs might put an orpan op_update record in the log added

#15 Updated by Laura Flores about 1 month ago

  • Status changed from Fix Under Review to Resolved

Efforts to solve the BlueFS bug are ongoing in bug #56533.

Also available in: Atom PDF