Project

General

Profile

Bug #54052

mgr/snap-schedule: scheduled snapshots are not created after ceph-mgr restart

Added by Venky Shankar 10 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
task(medium)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From ceph-user - https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SLR3JIGNUV3TNMRJKPSEZUXJK7XBA3HC/

$ ceph fs snap-schedule add /shares/users 1h 2021-10-31T18:00
Schedule set for path /shares/users

$ ceph fs snap-schedule retention add /shares/users 14h10d12m
Retention added to path /shares/users

Wait until the next complete hour.

$ ceph fs snap-schedule status /shares/users
{"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": "/shares/users", "schedule": "1h", "retention": {"h": 14, "d": 10, "m": 12}, "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": "2022-01-27T00:00:00", "last": "2022-01-27T00:00:00", "last_pruned": "2022-01-27T00:00:00", "\
created_count": 1, "pruned_count": 1, "active": true}

Now everything looks and works as expected. However, if I restart the active MGR, no new snapshots will be created and the status command does unexpectedly report NULL for some of the properties.

$ systemctl restart ceph-mgr@apollon.service

$ ceph fs snap-schedule status /shares/users
{"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": "/shares/users", "schedule": "1h", "retention": {}, "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": null, "last": null, "last_pruned": null, "created_count": 0, "pruned_count": 0, "active": true}

Interesting snippets from the log:

* 2022-01-27T00:51:51 : remove last non working snapshot schedule
        > 2022-01-27T00:51:51.972+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] SnapDB on cephfs changed for /shares/users, updating next Timer
* 2022-01-27T00:52:03 : add new snapshot schedule
        > 2022-01-27T00:52:03.704+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] repeat is 3600
        > 2022-01-27T00:52:03.704+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] attempting to add schedule /shares/users 1h
        > 2022-01-27T00:52:03.704+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule] schedule with retention {}
        > 2022-01-27T00:52:03.708+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] SnapDB on cephfs changed for /shares/users, updating next Timer
        > 2022-01-27T00:52:03.708+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Creating new snapshot timer for /shares/users
        > 2022-01-27T00:52:03.708+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Will snapshot /shares/users in fs cephfs in 477s
* 2022-01-27T00:52:08 : add retention
        > 2022-01-27T00:52:08.172+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule] parse_retention(14h10d12m)
        > 2022-01-27T00:52:08.172+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule] parse_retention(14h10d12m) -> {'h': 14, 'd': 10, 'm': 12}
        > 2022-01-27T00:52:08.172+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule] db result is ('{}',)
        > 2022-01-27T00:52:08.172+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] SnapDB on cephfs changed for /shares/users, updating next Timer
        > 2022-01-27T00:52:08.172+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Creating new snapshot timer for /shares/users
        > 2022-01-27T00:52:08.172+0100 7f03b62c0700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Will snapshot /shares/users in fs cephfs in 472s
* 2022-01-27T01:00:00 : scheduled snapshot created
        > 2022-01-27T01:00:00.175+0100 7f039393b700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Scheduled snapshot of /shares/users triggered
        > 2022-01-27T01:00:00.187+0100 7f039393b700  0 [snap_schedule INFO snap_schedule.fs.schedule_client] created scheduled snapshot of /shares/users
        > 2022-01-27T01:00:00.187+0100 7f039393b700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] created scheduled snapshot /shares/users/.snap/scheduled-2022-01-27-00_00_00
        > 2022-01-27T01:00:00.187+0100 7f039393b700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] SnapDB on cephfs changed for /shares/users, updating next Timer
        > 2022-01-27T01:00:00.187+0100 7f039393b700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Creating new snapshot timer for /shares/users
        > 2022-01-27T01:00:00.187+0100 7f039393b700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Will snapshot /shares/users in fs cephfs in 3600s
        > 2022-01-27T01:00:00.187+0100 7f039393b700  0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Pruning snapshots
  • 2022-01-27T01:04:43 restart of the MGR and no references to snapshots being taken in the log.

Similar issue: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7K4T2HI72NJPB6UWEMZAYEUN4MORBL6O/


Related issues

Copied to CephFS - Backport #55055: quincy: mgr/snap-schedule: scheduled snapshots are not created after ceph-mgr restart Resolved
Copied to CephFS - Backport #55056: pacific: mgr/snap-schedule: scheduled snapshots are not created after ceph-mgr restart Resolved

History

#1 Updated by Venky Shankar 10 months ago

  • Assignee set to Milind Changire

Milind, please take a look.

#2 Updated by Milind Changire 10 months ago

  • Pull request ID set to 45115

#3 Updated by Milind Changire 10 months ago

  • Status changed from New to In Progress

#4 Updated by Venky Shankar 9 months ago

  • Status changed from In Progress to Pending Backport

#5 Updated by Backport Bot 9 months ago

  • Copied to Backport #55055: quincy: mgr/snap-schedule: scheduled snapshots are not created after ceph-mgr restart added

#6 Updated by Backport Bot 9 months ago

  • Copied to Backport #55056: pacific: mgr/snap-schedule: scheduled snapshots are not created after ceph-mgr restart added

#7 Updated by Backport Bot 4 months ago

  • Tags set to backport_processed

#8 Updated by Konstantin Shalygin 2 months ago

  • Status changed from Pending Backport to Resolved
  • Tags deleted (backport_processed)

Also available in: Atom PDF