Project

General

Profile

Bug #53477

[rbd-mirror] mirror demon crash

Added by Deepika Upadhyay about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

looks like sepia lab hiccup more than real issue, but adding just in case someone also hits this
2021-12-02T15:15:22.521 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ '[' '!' -e /var/run/ceph/rbd-mirror.cluster1-client.mirror.0.pid ']'
2021-12-02T15:15:22.521 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ echo 'cluster1 rbd-mirror not running or unknown' '(/var/run/ceph/rbd-mirror.cluster1-client.mirror.0.pid not exist)'
2021-12-02T15:15:22.521 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stdout:cluster1 rbd-mirror not running or unknown (/var/run/ceph/rbd-mirror.cluster1-client.mirror.0.pid not exist)
2021-12-02T15:15:22.524 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ continue
2021-12-02T15:15:22.524 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ for instance in `seq 0 ${LAST_MIRROR_INSTANCE}`
2021-12-02T15:15:22.524 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ daemon_pid_file cluster1:1
2021-12-02T15:15:22.524 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ local cluster=cluster1:1
2021-12-02T15:15:22.525 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ local instance
2021-12-02T15:15:22.525 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ set_cluster_instance cluster1:1 cluster instance
2021-12-02T15:15:22.525 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ local val=cluster1:1
2021-12-02T15:15:22.525 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ local cluster_var_name=cluster
2021-12-02T15:15:22.525 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ local instance_var_name=instance
2021-12-02T15:15:22.526 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ cluster=cluster1
2021-12-02T15:15:22.526 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ instance=1
2021-12-02T15:15:22.526 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ '[' 1 = cluster1:1 ']'
2021-12-02T15:15:22.526 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ eval cluster=cluster1
2021-12-02T15:15:22.527 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+++ cluster=cluster1
2021-12-02T15:15:22.527 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ eval instance=1
2021-12-02T15:15:22.527 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+++ instance=1
2021-12-02T15:15:22.527 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+++ ceph-conf --cluster cluster1 --name client.mirror.1 'pid file'
2021-12-02T15:15:22.527 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:++ echo /var/run/ceph/rbd-mirror.cluster1-client.mirror.1.pid
2021-12-02T15:15:22.528 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ local pid_file=/var/run/ceph/rbd-mirror.cluster1-client.mirror.1.pid
2021-12-02T15:15:22.528 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ '[' '!' -e /var/run/ceph/rbd-mirror.cluster1-client.mirror.1.pid ']'
2021-12-02T15:15:22.528 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ echo 'cluster1 rbd-mirror not running or unknown' '(/var/run/ceph/rbd-mirror.cluster1-client.mirror.1.pid not exist)'
2021-12-02T15:15:22.528 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stdout:cluster1 rbd-mirror not running or unknown (/var/run/ceph/rbd-mirror.cluster1-client.mirror.1.pid not exist)
2021-12-02T15:15:22.531 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ continue

History

#2 Updated by Mykola Golub about 2 months ago

The messages reported in the description is not an issue actually. This is generated on cleanup by "status" command, after the test already failed, when it tires to collect all information possible, and errors like this are expected. There was no rbd-mirror crash actually.

You should look for the real failure before "cleanup 1" line (or some other non-zero code), and in this case it is:

2021-12-02T15:12:56.304 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.296+0000 7fc91d3a5700 -1 rbd::mirror::image_replayer::journal::PrepareReplayRequest: 0x56463a7b63c0 send: split-brain detected: local_image_id=11c951359ef6, registered local_image_id=11fc1fd3afc0
2021-12-02T15:12:56.304 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.296+0000 7fc91d3a5700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x56463db7f860 handle_prepare_replay: failed to prepare local replay: (17) File exists
2021-12-02T15:12:56.314 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.306+0000 7fc9233b1700 -1 rbd::mirror::ImageReplayer: 0x56463761cc80 [3/aa8f8f8e-730b-4750-8e87-b78d71ba23ce] operator(): start failed: (17) File exists
2021-12-02T15:12:56.331 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.323+0000 7fc91d3a5700 -1 rbd::mirror::image_replayer::journal::PrepareReplayRequest: 0x56463a7b63c0 send: split-brain detected: local_image_id=10c8afcf8425, registered local_image_id=10f2fc17b45d
2021-12-02T15:12:56.331 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.323+0000 7fc91d3a5700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x56463a6d2ea0 handle_prepare_replay: failed to prepare local replay: (17) File exists
2021-12-02T15:12:56.364 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.356+0000 7fc923bb2700 -1 rbd::mirror::ImageReplayer: 0x56463cb39400 [3/22d9c1dc-d4e2-435c-936c-0b99d987564d] operator(): start failed: (17) File exists
2021-12-02T15:12:56.398 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.389+0000 7fc8f5b56700 -1 rbd::mirror::image_replayer::CreateImageRequest: 0x56463fa285a0 handle_clone_image: failed to clone image mirror/10cafcd5cf14 to image_cluster1.client.mirror.3-clone8
2021-12-02T15:12:56.421 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.413+0000 7fc91d3a5700 -1 rbd::mirror::image_replayer::journal::PrepareReplayRequest: 0x56463a7b63c0 send: split-brain detected: local_image_id=10c8b0d9d29e, registered local_image_id=10f291e7b59d
2021-12-02T15:12:56.421 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.413+0000 7fc91d3a5700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x564639b589c0 handle_prepare_replay: failed to prepare local replay: (17) File exists
2021-12-02T15:12:56.428 INFO:tasks.rbd_mirror.cluster2.client.mirror.1.smithi099.stderr:2021-12-02T15:12:56.420+0000 7fc9233b1700 -1 rbd::mirror::ImageReplayer: 0x56463761db80 [3/b4388d95-84a1-4178-80e3-1950717ad031] operator(): start failed: (17) File exists
2021-12-02T15:12:57.284 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ rbd --cluster cluster2 info mirror/image_cluster1.client.mirror.0-clone21@78fa04b2-94eb-4362-9340-4bb0af92de96
2021-12-02T15:12:57.349 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:error setting snapshot context: (2) No such file or directory
2021-12-02T15:12:57.354 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ continue
2021-12-02T15:12:57.354 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ return 1
2021-12-02T15:12:57.355 INFO:tasks.workunit.cluster1.client.mirror.smithi099.stderr:+ cleanup 1

So actual failure is `rbd --cluster cluster2 info mirror/image_cluster1.client.mirror.0-clone21@78fa04b2-94eb-4362-9340-4bb0af92de96` because the snapshot was not found. And it looks similar to [1]. Though in this case before this rbd error we see the rbd-mirror split-brain error and I am not sure I saw this in [1] case (and the logs seems already expired).

[1] https://tracker.ceph.com/issues/53110#note-3

Also available in: Atom PDF