Project

General

Profile

Bug #52643

snap scheduler: cephfs snapshot created with schedules stopped on nfs volume after creating successfully for 24 hours

Added by Milind Changire 4 months ago. Updated 4 months ago.

Status:
Triaged
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/snap_schedule
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

** Created snapshot schedule every hour on nfs volume 
# ceph fs snap-schedule list / --recursive=true --fs=cephfs
/nvol5 1h 

# ceph fs snap-schedule status /nvol5 --fs=cephfs
{"fs": "cephfs", "subvol": null, "path": "/nvol5", "rel_path": "/nvol5", "schedule": "1h", "retention": {}, "start": "2021-07-28T00:00:00", "created": "2021-07-28T15:56:12", "first": "2021-07-28T16:00:00", "last": "2021-07-29T16:00:00", "last_pruned": null, "created_count": 23, "pruned_count": 0, "active": true}

# cd .snap/
# ls
scheduled-2021-07-28-16_00_00  scheduled-2021-07-28-20_00_00  scheduled-2021-07-29-00_00_00  scheduled-2021-07-29-04_00_00  scheduled-2021-07-29-08_00_00  scheduled-2021-07-29-12_00_00  scheduled-2021-07-29-16_00_00
scheduled-2021-07-28-17_00_00  scheduled-2021-07-28-21_00_00  scheduled-2021-07-29-01_00_00  scheduled-2021-07-29-05_00_00  scheduled-2021-07-29-09_00_00  scheduled-2021-07-29-13_00_00
scheduled-2021-07-28-18_00_00  scheduled-2021-07-28-22_00_00  scheduled-2021-07-29-02_00_00  scheduled-2021-07-29-06_00_00  scheduled-2021-07-29-10_00_00  scheduled-2021-07-29-14_00_00
scheduled-2021-07-28-19_00_00  scheduled-2021-07-28-23_00_00  scheduled-2021-07-29-03_00_00  scheduled-2021-07-29-07_00_00  scheduled-2021-07-29-11_00_00  scheduled-2021-07-29-15_00_00
# ls | wc -l
25

As per the status the snapshot schedule was started at"2021-07-28T15:56:12" and 1st snapshot was created on 2021-07-28T16:00:00 and last snapshot was created on 2021-07-29T16:00:00, hereafter the snapshot creation failed to create on nfs volume. 

# date
Fri Jul 30 09:15:22 UTC 2021

There are Traceback error seen at 2021-07-29T16:00:00.805 that's when the last snapshot was created on nfs volume. And one more at 2021-07-30T02:00:00.

2021-07-29T16:00:00.805+0000 7f89c8584700  0 [snap_schedule ERROR snap_schedule.fs.schedule_client] refresh_snap_timers raised an exception:
2021-07-29T16:00:00.805+0000 7f89c8584700  0 [snap_schedule ERROR snap_schedule.fs.schedule_client] Traceback (most recent call last):
  File "/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py", line 167, in refresh_snap_timers
    rows = [r for r in all_rows if self._is_allowed_repeat(r, path)][0:1]
sqlite3.OperationalError: cannot commit - no transaction is active

2021-07-30T02:00:00.305+0000 7f89c6d81700  0 [snap_schedule ERROR snap_schedule.fs.schedule_client] create_scheduled_snapshot raised an exception:
2021-07-30T02:00:00.306+0000 7f89c6d81700  0 [snap_schedule ERROR snap_schedule.fs.schedule_client] Traceback (most recent call last):
  File "/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py", line 204, in create_scheduled_snapshot
    sched.update_last(time, db)
  File "/usr/share/ceph/mgr/snap_schedule/fs/schedule.py", line 398, in update_last
    self.repeat))
sqlite3.OperationalError: cannot commit - no transaction is active

History

#1 Updated by Milind Changire 4 months ago

  • Priority changed from Normal to High

Moving to High Priority since python traceback causes loss of functionality.

#2 Updated by Patrick Donnelly 4 months ago

  • Status changed from New to Triaged
  • Assignee set to Patrick Donnelly

Also available in: Atom PDF