Project

General

Profile

Actions

Bug #52749

open

gather-facts: Improperly parsing disk wwid

Added by brent s. over 2 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Sorry, wasn't sure which subcomponent to file this under.

I am running ceph 16.2.6-1 on some monitors running in Hyper-V using the Ceph SIG repository for AlmaLinux 8.4. Full versions of installed software:

[root@cephmon002-nj1 ~]# rpm -qa | grep ceph | sort
centos-release-ceph-pacific-1.0-1.el8.noarch
cephadm-16.2.1-1.el8.noarch
ceph-base-16.2.6-1.el8.x86_64
ceph-common-16.2.6-1.el8.x86_64
ceph-grafana-dashboards-16.2.6-1.el8.noarch
ceph-mgr-16.2.6-1.el8.x86_64
ceph-mgr-cephadm-16.2.6-1.el8.noarch
ceph-mgr-dashboard-16.2.6-1.el8.noarch
ceph-mgr-diskprediction-local-16.2.6-1.el8.noarch
ceph-mgr-k8sevents-16.2.6-1.el8.noarch
ceph-mgr-modules-core-16.2.6-1.el8.noarch
ceph-mgr-rook-16.2.6-1.el8.noarch
ceph-mon-16.2.6-1.el8.x86_64
ceph-prometheus-alerts-16.2.6-1.el8.noarch
ceph-selinux-16.2.6-1.el8.x86_64
libcephfs2-16.2.6-1.el8.x86_64
libcephsqlite-16.2.6-1.el8.x86_64
python3-ceph-argparse-16.2.6-1.el8.x86_64
python3-ceph-common-16.2.6-1.el8.x86_64
python3-cephfs-16.2.6-1.el8.x86_64

When bootstrapping/configuring/managing the cluster with cephadm, I get the following errors in the log for e.g. /var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931 :

Sep 28 10:59:31 mon1.domain.tld conmon[2485]: cephadm 2021-09-28T14:59:30.104412+0000 mgr.mon2.kgjipz (mgr.144113) 28 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     main()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     r = ctx.func(ctx)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6762, in command_gather_facts
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     print(host.dump())
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6750, in dump
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     k: getattr(self, k) for k in dir(self)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6752, in <dictcomp>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     and isinstance(getattr(self, k), (float, int, str, list, dict, tuple))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6455, in hdd_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     return self._dev_list(devs)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6435, in _dev_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)]).strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6284, in read_file
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     content = f.read().strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/usr/lib64/python3.6/codecs.py", line 321, in decode
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     (result, consumed) = self._buffer_decode(data, self.errors, final)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 12: invalid continuation byte
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1347, in _remote_connection
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     yield (conn, connr)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1244, in _run_cephadm
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     code, '\n'.join(err)))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     main()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     r = ctx.func(ctx)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6762, in command_gather_facts
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     print(host.dump())
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6750, in dump
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     k: getattr(self, k) for k in dir(self)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6752, in <dictcomp>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     and isinstance(getattr(self, k), (float, int, str, list, dict, tuple))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6455, in hdd_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     return self._dev_list(devs)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6435, in _dev_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)]).strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6284, in read_file
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     content = f.read().strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/usr/lib64/python3.6/codecs.py", line 321, in decode
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     (result, consumed) = self._buffer_decode(data, self.errors, final)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 12: invalid continuation byte
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: cephadm 2021-09-28T14:59:30.113897+0000 mgr.mon2.kgjipz (mgr.144113) 29 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     main()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     r = ctx.func(ctx)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6762, in command_gather_facts
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     print(host.dump())
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6750, in dump
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     k: getattr(self, k) for k in dir(self)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6752, in <dictcomp>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     and isinstance(getattr(self, k), (float, int, str, list, dict, tuple))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6455, in hdd_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     return self._dev_list(devs)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6435, in _dev_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)]).strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6284, in read_file
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     content = f.read().strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/usr/lib64/python3.6/codecs.py", line 321, in decode
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     (result, consumed) = self._buffer_decode(data, self.errors, final)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 12: invalid continuation byte
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1347, in _remote_connection
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     yield (conn, connr)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1244, in _run_cephadm
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     code, '\n'.join(err)))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     main()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     r = ctx.func(ctx)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6762, in command_gather_facts
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     print(host.dump())
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6750, in dump
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     k: getattr(self, k) for k in dir(self)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6752, in <dictcomp>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     and isinstance(getattr(self, k), (float, int, str, list, dict, tuple))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6455, in hdd_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     return self._dev_list(devs)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6435, in _dev_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)]).strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6284, in read_file
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     content = f.read().strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:   File "/usr/lib64/python3.6/codecs.py", line 321, in decode
Sep 28 10:59:31 mon1.domain.tld conmon[2485]:     (result, consumed) = self._buffer_decode(data, self.errors, final)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 12: invalid continuation byte

Of special note is it trying to parse the /sys/block/<device>/device/wwid file and failing, because on Hyper-V, these disks contain non-UTF-8 data:

# cat /sys/block/sda/device/wwid
t10.MSFT    hάJ\206!O2n3

# cat /sys/block/sda/device/wwid | xxd
00000000: 7431 302e 4d53 4654 2020 2020 cfdb 68ce  t10.MSFT    ..h.
00000010: ac4a de45 5c32 3036 214f 326e 33d1 cd0a  .J.E\206!O2n3...

Since the file (if/when it exists, which it not always does) is opened in plaintext (line 6282):

   6268 def read_file(path_list, file_name=''):
   6269     # type: (List[str], str) -> str
   6270     """Returns the content of the first file found within the `path_list`
   6271 
   6272     :param path_list: list of file paths to search
   6273     :param file_name: optional file_name to be applied to a file path
   6274     :returns: content of the file or 'Unknown'
   6275     """ 
   6276     for path in path_list:
   6277         if file_name:
   6278             file_path = os.path.join(path, file_name)
   6279         else:
   6280             file_path = path
   6281         if os.path.exists(file_path):
   6282             with open(file_path, 'r') as f:
   6283                 try:
   6284                     content = f.read().strip()
   6285                 except OSError:
   6286                     # sysfs may populate the file, but for devices like
   6287                     # virtio reads can fail
   6288                     return 'Unknown'
   6289                 else:
   6290                     return content
   6291     return 'Unknown'

It quite obviously fails.

This, for some reason, renders my cluster unusable.

Please resolve this. What more information do you need from me?

Actions #1

Updated by Greg Farnum over 2 years ago

  • Project changed from Ceph to Orchestrator
Actions #2

Updated by Sebastian Wagner over 2 years ago

  • Description updated (diff)
Actions #3

Updated by Paul Cuzner over 2 years ago

  • Assignee set to Paul Cuzner
Actions #4

Updated by Sebastian Wagner over 2 years ago

  • Subject changed from Improperly parsing disk wwid to gather-facts: Improperly parsing disk wwid
Actions #5

Updated by Frank Filippone over 1 year ago

I have hit this exact same issue of non-UTF-8 data in the /sys/block/sda/device/wwid but with Rocky Linux 8.6 and Ceph 17.2.3 (under a Windows 2019 Hyper-V host).

Fresh install this week both Rocky Linux and Ceph using latest versions of everything.

I note this file is OK using Rocky Linux 9 and Ubuntu 20.04LTS under the same Windows 2019 host so I wonder if this is limited to RHEL/RL/AL 8.x versions only?

Either way it would be useful to resolve this as it causes the cluster to sit in a warning state forever (I've not gone past this state to see if the cluster actually functions).

Do you need any more information?

Actions

Also available in: Atom PDF