Project

General

Profile

Bug #56270

crash: File "mgr/snap_schedule/module.py", in __init__: self.client = SnapSchedClient(self)

Added by Telemetry Bot 6 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Telemetry
Tags:
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(FS):
mgr/snap_schedule
Labels (FS):
Pull request ID:
Crash signature (v1):

5a837b17fdc77936b4d5208c9f69af044a1a69d3a95437cba6b8ecfd77265f87


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=c00cdd2659181963c4fcc1eac24b4768df5d9cb2dbd05e210b4dce38ae046e86

Sanitized backtrace:

    File "mgr/snap_schedule/module.py", in __init__: self.client = SnapSchedClient(self)
    File "mgr/snap_schedule/fs/schedule_client.py", in __init__: with self.get_schedule_db(fs_name) as conn_mgr:
    File "mgr/snap_schedule/fs/schedule_client.py", in get_schedule_db: db.executescript(dump)

Crash dump sample:
{
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/snap_schedule/module.py\", line 38, in __init__\n    self.client = SnapSchedClient(self)",
        "  File \"/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py\", line 158, in __init__\n    with self.get_schedule_db(fs_name) as conn_mgr:",
        "  File \"/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py\", line 192, in get_schedule_db\n    db.executescript(dump)",
        "<redacted>" 
    ],
    "ceph_version": "17.2.0",
    "crash_id": "2022-06-20T15:13:02.711351Z_f91dc73d-19af-4069-89f9-51528be9fab7",
    "entity_name": "mgr.165587efc0dd735a4de0efee0479fff494913b05",
    "mgr_module": "snap_schedule",
    "mgr_module_caller": "ActivePyModule::load",
    "mgr_python_exception": "OperationalError",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "c0b46fcd547d24a8595150ee9a6c3b3e5ae43023f0ce3caa788450149499be30",
    "timestamp": "2022-06-20T15:13:02.711351Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-120-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#136-Ubuntu SMP Fri Jun 10 13:40:48 UTC 2022" 
}


Related issues

Related to CephFS - Bug #56269: crash: File "mgr/snap_schedule/module.py", in __init__: self.client = SnapSchedClient(self) Pending Backport

History

#1 Updated by Telemetry Bot 6 months ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v17.2.0 added

#2 Updated by Andreas Teuchert 5 months ago

The full backtrace is:

    "backtrace": [
        "  File \"/usr/share/ceph/mgr/snap_schedule/module.py\", line 38, in __init__\n    self.client = SnapSchedClient(self)",
        "  File \"/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py\", line 158, in __init__\n    with self.get_schedule_db(fs_name) as conn_mgr:",
        "  File \"/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py\", line 192, in get_schedule_db\n    db.executescript(dump)",
        "sqlite3.OperationalError: table schedules already exists" 
    ],

The error is probably caused by the ioctx.remove(SNAP_DB_OBJECT_NAME) call failing, see https://tracker.ceph.com/issues/56269.

When the code is run for the first time the table doesn't yet exist so db.executescript(dump) succeeds. Then ioctx.remove(SNAP_DB_OBJECT_NAME) is supposed to delete the legacy dump which fails so the dump is loaded on every mgr restart and the repeated calls of db.executescript(dump) fail.

#3 Updated by Venky Shankar 5 months ago

  • Project changed from mgr to CephFS
  • Category set to Correctness/Safety
  • Assignee set to Milind Changire
  • Target version set to v18.0.0
  • Backport set to quincy
  • Component(FS) mgr/snap_schedule added

Milind, please take a look.

#4 Updated by Patrick Donnelly 5 months ago

  • Related to Bug #56269: crash: File "mgr/snap_schedule/module.py", in __init__: self.client = SnapSchedClient(self) added

#5 Updated by Telemetry Bot 4 months ago

  • Affected Versions v17.2.1, v17.2.2 added

#6 Updated by Alexander Mamonov about 1 month ago

How to fix it?

#7 Updated by Alexander Mamonov about 1 month ago

{"log":"debug 2022-11-03T08:38:12.502+0000 7f46270f5700 -1 mgr load Failed to construct class in 'snap_schedule'\n","stream":"stderr","time":"2022-11-03T08:38:12.504394471Z"} {"log":" File \"/usr/share/ceph/mgr/snap_schedule/module.py\", line 38, in init\n","stream":"stderr","time":"2022-11-03T08:38:12.50754467Z"} {"log":" File \"/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py\", line 169, in init\n","stream":"stderr","time":"2022-11-03T08:38:12.507552024Z"} {"log":" File \"/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py\", line 203, in get_schedule_db\n","stream":"stderr","time":"2022-11-03T08:38:12.507584909Z"} {"log":"debug 2022-11-03T08:38:12.502+0000 7f46270f5700 -1 mgr operator() Failed to run module in active mode ('snap_schedule')\n","stream":"stderr","time":"2022-11-03T08:38:12.507786143Z"}

#8 Updated by Andreas Teuchert about 1 month ago

If you're running into this bug after upgrading from Pacific to Quincy, you can manually delete the legacy schedule DB as described here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/6NRHZWC4KHUNHVI7K6OT6MJOMX7CVPHJ/.

Note: This bug is already fixed in recent versions of Quincy, so when upgrading directly to 17.2.5 or later, the bug shouldn't occur.

Also available in: Atom PDF