Project

General

Profile

Actions

Bug #57396

closed

[rbd-mirror] local image replayer stops and fails to update status

Added by Josef Johansson over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
pacific,quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm trying out two way mirroring.

Everything seems to running fine except sometimes when the replay stops at the primary
clusters.

This means that VMs will not start properly until all journal entries are replayed, but
also that the journal grows by time.

I am trying to find out why this occurs, and where to look for more information.

I am currently using rbd --pool <pool> --image <image> journal status to see
if the clients are in sync or not.

Example output when things went sideways

minimum_set: 0
active_set: 2
registered clients:
[id=, commit_position=[positions=[[object_number=0, tag_tid=1, entry_tid=4592],
[object_number=3, tag_tid=1, entry_tid=4591], [object_number=2, tag_tid=1,
entry_tid=4590], [object_number=1, tag_tid=1, entry_tid=4589]]], state=connected]
[id=bdde9b90-df26-4e3d-84b3-66605dc45608, commit_position=[positions=[[object_number=5,
tag_tid=1, entry_tid=19913], [object_number=4, tag_tid=1, entry_tid=19912],
[object_number=7, tag_tid=1, entry_tid=19911], [object_number=6, tag_tid=1,
entry_tid=19910]]], state=disconnected]


Files

issue.log (57.5 KB) issue.log Josef Johansson, 12/21/2022 04:05 PM
issue.sh (6.04 KB) issue.sh Josef Johansson, 12/21/2022 04:05 PM

Related issues 2 (0 open2 closed)

Copied to rbd - Backport #58765: quincy: [rbd-mirror] local image replayer stops and fails to update statusResolvedIlya DryomovActions
Copied to rbd - Backport #58766: pacific: [rbd-mirror] local image replayer stops and fails to update statusResolvedIlya DryomovActions
Actions #1

Updated by Josef Johansson over 1 year ago

As it turns out this problem is not that easy.
1) Running CloudLinux 3.10.0-962.3.2.lve1.5.73.el7.x86_64 with virtio-scsi-single disks and discard does not work. However, switching to virtio-blk and still have discard all the way to the mounts works.
2) Running CloudLinux 4.18.0-372.26.1.lve.1.el8.x86_64 with virtio-scsi-single disks and discard does not work, but neither does virtio-blk. Switching off discard in the guest will solve both cases though.
3) Running Debian 5.10.0-19-amd64 with virtio-scsi-single disks and discard does not work, in some cases, as a reference.

It's clear that something happens with discard and commit of a journal entry.

I found out how to log within the qemu-process to a separate file, but even if I used all debug-flags I could set, there was no error. Just that it stopped logging that the commit_position was updated. The logs where quite huge, any hints on what would be interesting?

Actions #2

Updated by Josef Johansson over 1 year ago

$ rpm -q kernel-3.10.0-962.3.2.lve1.5.73.el7.x86_64 --changelog | grep 'virtio_blk: add discard'

$ rpm --changelog q kernel-4.18.0-372.26.1.lve.1.el8.x86_64 | grep 'virtio_blk: add discard'
[block] virtio_blk: add discard and write zeroes support (Stefano Garzarella) [1692939]

Which makes a ton of sense. When discards are issued librbd in the qemu-kvm process stops committing to disk.

When running 3.10 with virtio-scsi-single there is discard support, which then also makes the process stops committing to disk.

Actions #3

Updated by Josef Johansson over 1 year ago

This will trigger the problem.

`echo 1>a;sync;rm a;sync;fstrim --all`

Actions #4

Updated by Josef Johansson over 1 year ago

I managed to reproduce it with rbd-nbd as well.

A simple echo 1>a;sync;rm a; sync will not break it. But fstrim /mnt will.

Trying to parse the logs:

It prints `committed tid=`, but not `updated commit position: `.
Which means that either `!m_pending_commit_tids.empty()` is false, or !commit_entry.committed is true.

```
void JournalMetadata::committed(uint64_t commit_tid,
const CreateContext &create_context) {
ldout(m_cct, 20) << "committed tid=" << commit_tid << dendl;

ObjectSetPosition commit_position;
Context *stale_ctx = nullptr; {
std::scoped_lock locker{*m_timer_lock, m_lock};
ceph_assert(commit_tid > m_commit_position_tid);
if (!m_commit_position.object_positions.empty()) {
// in-flight commit position update
commit_position = m_commit_position;
} else {
// safe commit position
commit_position = m_client.commit_position;
}
CommitTids::iterator it = m_pending_commit_tids.find(commit_tid);
ceph_assert(it != m_pending_commit_tids.end());
CommitEntry &commit_entry = it->second;
commit_entry.committed = true;
bool update_commit_position = false;
while (!m_pending_commit_tids.empty()) {
CommitTids::iterator it = m_pending_commit_tids.begin();
CommitEntry &commit_entry = it->second;
if (!commit_entry.committed) {
break;
}
commit_position.object_positions.emplace_front(
commit_entry.object_num, commit_entry.tag_tid,
commit_entry.entry_tid);
m_pending_commit_tids.erase(it);
update_commit_position = true;
}
if (!update_commit_position) {
return;
}
// prune the position to have one position per splay offset
std::set&lt;uint8_t&gt; in_use_splay_offsets;
ObjectPositions::iterator ob_it = commit_position.object_positions.begin();
while (ob_it != commit_position.object_positions.end()) {
uint8_t splay_offset = ob_it->object_number % m_splay_width;
if (!in_use_splay_offsets.insert(splay_offset).second) {
ob_it = commit_position.object_positions.erase(ob_it);
} else {
++ob_it;
}
}
stale_ctx = m_commit_position_ctx;
m_commit_position_ctx = create_context();
m_commit_position = commit_position;
m_commit_position_tid = commit_tid;
ldout(m_cct, 20) << "updated commit position: " << commit_position << ", " 
<< "on_safe=" << m_commit_position_ctx << dendl;
schedule_commit_task();
}
if (stale_ctx != nullptr) {
ldout(m_cct, 20) << "canceling stale commit: on_safe=" << stale_ctx
<< dendl;
stale_ctx->complete(-ESTALE);
}
}
```
Actions #5

Updated by Josef Johansson over 1 year ago

AFAIK this case is not covered in the test suite.

https://github.com/ceph/ceph/blob/6fee777d603aebce492c57b41f3b5760d50ddb07/src/test/librbd/journal/test_Replay.cc

Since this happens when JournalPlayer is in STATE_READY (from handler_flushing_ready), it seems to be in another state loop (since JournalPlayer handles AioDiscard perfectly fine). I think the test cases needs to be check that the events run fine in all states that handle replay.

Not sure how to tag people or get any feedback on issues here, but hopefully someone hears the echoes :)

To conclude, whenever any AioDiscard is happening when JournalPlayer is in STATE_READY, nothing will be replayed from the journal.

This bug is two fold, first the bug that stops the process. Second, there is no heartbeat that picks up that the player is not committing to disk.

Updated by Josef Johansson over 1 year ago

I appended a test with a log output of it stopping when running fstrim.

I made it such that it creates a rbd-replay binary that is easier to replay rather than using nbd.

Actions #7

Updated by Josef Johansson over 1 year ago

For some reason the following will trigger that journal is never commited to rbd image.

Given

rbd-nbd --device ${device} map ${image} &

blkdiscard  -o $((1996*4096+1024)) -l $((4096*104)) /dev/nbd0

Now nothing will be committed to the rbd image. (updated commit position in JournalMetadata is not touched).

And

blkdiscard  -o $((1996*4096)) -l $((4096*104)) /dev/nbd0

works properly

Setting

rbd skip partial discard = false 

Makes the problem go away.

How do I make the test in src/test/librbd/journal/test_Replay.cc actually test this?
I would like to test both cases, my problem is that open_image does not seem to enable handle_flushing_replay.

Actions #8

Updated by Josef Johansson over 1 year ago

Pull request to fix the original problem: https://github.com/ceph/ceph/pull/49614

Actions #9

Updated by Ilya Dryomov over 1 year ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 49614
Actions #10

Updated by Ilya Dryomov over 1 year ago

  • Assignee set to Josef Johansson
  • Backport set to pacific,quincy
Actions #11

Updated by Ilya Dryomov about 1 year ago

  • Priority changed from Normal to High
Actions #12

Updated by Ilya Dryomov about 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #13

Updated by Backport Bot about 1 year ago

  • Copied to Backport #58765: quincy: [rbd-mirror] local image replayer stops and fails to update status added
Actions #14

Updated by Backport Bot about 1 year ago

  • Copied to Backport #58766: pacific: [rbd-mirror] local image replayer stops and fails to update status added
Actions #15

Updated by Backport Bot about 1 year ago

  • Tags set to backport_processed
Actions #16

Updated by Backport Bot about 1 year ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF