Project

General

Profile

Actions

Bug #50439

closed

librbd: removing a snapshot with multiple peer can go into an infinite loop

Added by Arthur Outhenin-Chalandre about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When you have multiple peers, librbd may try to unlink a snapshot from a peer it's not linked against and eventually goes into an infinite recursive loop and segfault.


[11:52][root@arthur1-mon-d57ce350a1cd (arthur_dev:ceph/arthur1/mon*2:leader) ~]# ceph -v
ceph version 15.2.10-165-g4d989fbd (4d989fbd14590faabab25cc6d9edc39d1d455125) octopus (stable)
[11:52][root@arthur1-mon-d57ce350a1cd (arthur_dev:ceph/arthur1/mon*2:leader) ~]# rbd mirror image status bench-snap/bench-1
bench-1:
  global_id:   79cd4f82-bc25-4e52-8049-94b2a78b5a59
  state:       up+stopped
  description: local image is primary
  service:     p05151113561997 on p05151113561997.cern.ch
  last_update: 2021-04-20 11:52:36
  snapshots:
    4 .mirror.primary.79cd4f82-bc25-4e52-8049-94b2a78b5a59.a7ac13b1-0a60-47c2-ab62-b587ad080383 (peer_uuids:[6705fd20-f99f-42be-9773-fc17881023f4,e1ac9d6a-756f-44c3-96ad-592388bc1cc6])
    5 .mirror.primary.79cd4f82-bc25-4e52-8049-94b2a78b5a59.f0db2cc6-d1f8-4c78-ac96-29e1240a7bc0 (peer_uuids:[e1ac9d6a-756f-44c3-96ad-592388bc1cc6])
    106 .mirror.primary.79cd4f82-bc25-4e52-8049-94b2a78b5a59.70c516b9-7969-4fd1-b9bc-f370f8709b56 (peer_uuids:[6705fd20-f99f-42be-9773-fc17881023f4,e1ac9d6a-756f-44c3-96ad-592388bc1cc6])
[11:52][root@arthur1-mon-d57ce350a1cd (arthur_dev:ceph/arthur1/mon*2:leader) ~]# rbd mirror image snapshot bench-snap/bench-1 --debug-rbd=15 2> ceph_debug.log
Segmentation fault (core dumped)

Files

mirrorsnapshot-crash.log (39.2 KB) mirrorsnapshot-crash.log rbd debug log Arthur Outhenin-Chalandre, 04/20/2021 10:07 AM

Related issues 2 (0 open2 closed)

Copied to rbd - Backport #50712: octopus: librbd: removing a snapshot with multiple peer can go into an infinite loopResolvedActions
Copied to rbd - Backport #50713: pacific: librbd: removing a snapshot with multiple peer can go into an infinite loopResolvedActions
Actions #1

Updated by Arthur Outhenin-Chalandre about 3 years ago

I can't edit the issue, but here is my PR: https://github.com/ceph/ceph/pull/40937.

Actions #2

Updated by Dan van der Ster about 3 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Arthur Outhenin-Chalandre
  • Pull request ID set to 40937
Actions #3

Updated by Mykola Golub almost 3 years ago

  • Project changed from Ceph to rbd
  • Category deleted (librbd)
Actions #4

Updated by Ilya Dryomov almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50712: octopus: librbd: removing a snapshot with multiple peer can go into an infinite loop added
Actions #6

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50713: pacific: librbd: removing a snapshot with multiple peer can go into an infinite loop added
Actions #7

Updated by Loïc Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF