Bug #54052
mgr/snap-schedule: scheduled snapshots are not created after ceph-mgr restart
Status:
Resolved
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
task(medium)
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From ceph-user - https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SLR3JIGNUV3TNMRJKPSEZUXJK7XBA3HC/
$ ceph fs snap-schedule add /shares/users 1h 2021-10-31T18:00
Schedule set for path /shares/users
$ ceph fs snap-schedule retention add /shares/users 14h10d12m
Retention added to path /shares/users
Wait until the next complete hour.
$ ceph fs snap-schedule status /shares/users
{"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": "/shares/users", "schedule": "1h", "retention": {"h": 14, "d": 10, "m": 12}, "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": "2022-01-27T00:00:00", "last": "2022-01-27T00:00:00", "last_pruned": "2022-01-27T00:00:00", "\
created_count": 1, "pruned_count": 1, "active": true}
Now everything looks and works as expected. However, if I restart the active MGR, no new snapshots will be created and the status command does unexpectedly report NULL for some of the properties.
$ systemctl restart ceph-mgr@apollon.service
$ ceph fs snap-schedule status /shares/users
{"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": "/shares/users", "schedule": "1h", "retention": {}, "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": null, "last": null, "last_pruned": null, "created_count": 0, "pruned_count": 0, "active": true}
Interesting snippets from the log:
* 2022-01-27T00:51:51 : remove last non working snapshot schedule
> 2022-01-27T00:51:51.972+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] SnapDB on cephfs changed for /shares/users, updating next Timer
* 2022-01-27T00:52:03 : add new snapshot schedule
> 2022-01-27T00:52:03.704+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] repeat is 3600
> 2022-01-27T00:52:03.704+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] attempting to add schedule /shares/users 1h
> 2022-01-27T00:52:03.704+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule] schedule with retention {}
> 2022-01-27T00:52:03.708+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] SnapDB on cephfs changed for /shares/users, updating next Timer
> 2022-01-27T00:52:03.708+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Creating new snapshot timer for /shares/users
> 2022-01-27T00:52:03.708+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Will snapshot /shares/users in fs cephfs in 477s
* 2022-01-27T00:52:08 : add retention
> 2022-01-27T00:52:08.172+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule] parse_retention(14h10d12m)
> 2022-01-27T00:52:08.172+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule] parse_retention(14h10d12m) -> {'h': 14, 'd': 10, 'm': 12}
> 2022-01-27T00:52:08.172+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule] db result is ('{}',)
> 2022-01-27T00:52:08.172+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] SnapDB on cephfs changed for /shares/users, updating next Timer
> 2022-01-27T00:52:08.172+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Creating new snapshot timer for /shares/users
> 2022-01-27T00:52:08.172+0100 7f03b62c0700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Will snapshot /shares/users in fs cephfs in 472s
* 2022-01-27T01:00:00 : scheduled snapshot created
> 2022-01-27T01:00:00.175+0100 7f039393b700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Scheduled snapshot of /shares/users triggered
> 2022-01-27T01:00:00.187+0100 7f039393b700 0 [snap_schedule INFO snap_schedule.fs.schedule_client] created scheduled snapshot of /shares/users
> 2022-01-27T01:00:00.187+0100 7f039393b700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] created scheduled snapshot /shares/users/.snap/scheduled-2022-01-27-00_00_00
> 2022-01-27T01:00:00.187+0100 7f039393b700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] SnapDB on cephfs changed for /shares/users, updating next Timer
> 2022-01-27T01:00:00.187+0100 7f039393b700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Creating new snapshot timer for /shares/users
> 2022-01-27T01:00:00.187+0100 7f039393b700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Will snapshot /shares/users in fs cephfs in 3600s
> 2022-01-27T01:00:00.187+0100 7f039393b700 0 [snap_schedule DEBUG snap_schedule.fs.schedule_client] Pruning snapshots
- 2022-01-27T01:04:43 restart of the MGR and no references to snapshots being taken in the log.
Similar issue: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7K4T2HI72NJPB6UWEMZAYEUN4MORBL6O/
Related issues
History
#1 Updated by Venky Shankar over 1 year ago
- Assignee set to Milind Changire
Milind, please take a look.
#2 Updated by Milind Changire over 1 year ago
- Pull request ID set to 45115
#3 Updated by Milind Changire over 1 year ago
- Status changed from New to In Progress
#4 Updated by Venky Shankar over 1 year ago
- Status changed from In Progress to Pending Backport
#5 Updated by Backport Bot over 1 year ago
- Copied to Backport #55055: quincy: mgr/snap-schedule: scheduled snapshots are not created after ceph-mgr restart added
#6 Updated by Backport Bot over 1 year ago
- Copied to Backport #55056: pacific: mgr/snap-schedule: scheduled snapshots are not created after ceph-mgr restart added
#7 Updated by Backport Bot about 1 year ago
- Tags set to backport_processed
#8 Updated by Konstantin Shalygin 11 months ago
- Status changed from Pending Backport to Resolved
- Tags deleted (
backport_processed)