Project

General

Profile

Actions

Bug #55583

closed

Intermittent ParsingError failure in mgr/volumes module during "clone cancel"

Added by John Mulligan almost 2 years ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Category:
Administration/Usability
Target version:
% Done:

100%

Source:
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This issue is a bit difficult to explain so please bear with me.

On quincy, but not on octopus or pacific, a test runs that attempt to cancel a clone. Sometimes when this test runs it fails with an python traceback in the mgr response. This traceback indicates a configparser.ParsingError was raised.

rados: ret=-22, Invalid argument: "Traceback (most recent call last):\n
  File \"/usr/share/ceph/mgr/mgr_module.py\", line 1701, in _handle_command\n
    return self.handle_command(inbuf, cmd)\n
  File \"/usr/share/ceph/mgr/volumes/module.py\", line 409, in handle_command\n
    return handler(inbuf, cmd)\n
  File \"/usr/share/ceph/mgr/volumes/module.py\", line 38, in wrap\n
    return f(self, inbuf, cmd)\n
  File \"/usr/share/ceph/mgr/volumes/module.py\", line 636, in _cmd_fs_clone_cancel\n
    vol_name=cmd['vol_name'], clone_name=cmd['clone_name'], group_name=cmd.get('group_name', None))\n
  File \"/usr/share/ceph/mgr/volumes/fs/volume.py\", line 582, in clone_cancel\n
    self.cloner.cancel_job(volname, (clonename, groupname))\n
  File \"/usr/share/ceph/mgr/volumes/fs/async_cloner.py\", line 389, in cancel_job\n
    with open_subvol(self.fs_client.mgr, fs_handle, self.vc.volspec, group, clonename, SubvolumeOpType.CLONE_CANCEL) as clone_subvolume:\n
  File \"/usr/lib64/python3.6/contextlib.py\", line 81, in __enter__\n
    return next(self.gen)\n
  File \"/usr/share/ceph/mgr/volumes/fs/operations/subvolume.py\", line 72, in open_subvol\n
    subvolume = loaded_subvolumes.get_subvolume_object(mgr, fs, vol_spec, group, subvolname)\n
  File \"/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py\", line 95, in get_subvolume_object\n
    subvolume.discover()\n
  File \"/usr/share/ceph/mgr/volumes/fs/operations/versions/subvolume_base.py\", line 319, in discover\n
    self.metadata_mgr.refresh()\n
  File \"/usr/share/ceph/mgr/volumes/fs/operations/versions/metadata_manager.py\", line 52, in refresh\n
    self.config.readfp(conf_data)\n
  File \"/usr/lib64/python3.6/configparser.py\", line 763, in readfp\n
    self.read_file(fp, source=filename)\n
  File \"/usr/lib64/python3.6/configparser.py\", line 718, in read_file\n
    self._read(f, source)\n
  File \"/usr/lib64/python3.6/configparser.py\", line 1111, in _read\n
    raise e\nconfigparser.ParsingError: Source contains parsing errors: '<???>'\n\t[line 13]: 'a0\\n'\n" 

(I've added newlines back to the traceback to make it more readable. It's possible I missed some)

The automated test is part of the go-ceph project. We're tracking the issue on our side here:
https://github.com/ceph/go-ceph/issues/679

Since this one has been hard for me to track down I'm filing the issue with the express request that I can get some additional hints on how to debug this further.

One additional challenge is that I can not reproduce this on my desktop system, but it does occur regularly in our CI. We're blocked officially supporting quincy until this test, as well as some others, are better understood. Thanks!


Files

mgr.log.gz (17.4 KB) mgr.log.gz John Mulligan, 06/28/2022 01:25 PM

Related issues 2 (1 open1 closed)

Copied to CephFS - Backport #57112: quincy: Intermittent ParsingError failure in mgr/volumes module during "clone cancel"In ProgressKotresh Hiremath RavishankarActions
Copied to CephFS - Backport #57113: pacific: Intermittent ParsingError failure in mgr/volumes module during "clone cancel"ResolvedKotresh Hiremath RavishankarActions
Actions

Also available in: Atom PDF