Bug #52749
opengather-facts: Improperly parsing disk wwid
0%
Description
Sorry, wasn't sure which subcomponent to file this under.
I am running ceph 16.2.6-1 on some monitors running in Hyper-V using the Ceph SIG repository for AlmaLinux 8.4. Full versions of installed software:
[root@cephmon002-nj1 ~]# rpm -qa | grep ceph | sort centos-release-ceph-pacific-1.0-1.el8.noarch cephadm-16.2.1-1.el8.noarch ceph-base-16.2.6-1.el8.x86_64 ceph-common-16.2.6-1.el8.x86_64 ceph-grafana-dashboards-16.2.6-1.el8.noarch ceph-mgr-16.2.6-1.el8.x86_64 ceph-mgr-cephadm-16.2.6-1.el8.noarch ceph-mgr-dashboard-16.2.6-1.el8.noarch ceph-mgr-diskprediction-local-16.2.6-1.el8.noarch ceph-mgr-k8sevents-16.2.6-1.el8.noarch ceph-mgr-modules-core-16.2.6-1.el8.noarch ceph-mgr-rook-16.2.6-1.el8.noarch ceph-mon-16.2.6-1.el8.x86_64 ceph-prometheus-alerts-16.2.6-1.el8.noarch ceph-selinux-16.2.6-1.el8.x86_64 libcephfs2-16.2.6-1.el8.x86_64 libcephsqlite-16.2.6-1.el8.x86_64 python3-ceph-argparse-16.2.6-1.el8.x86_64 python3-ceph-common-16.2.6-1.el8.x86_64 python3-cephfs-16.2.6-1.el8.x86_64
When bootstrapping/configuring/managing the cluster with cephadm, I get the following errors in the log for e.g. /var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931 :
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: cephadm 2021-09-28T14:59:30.104412+0000 mgr.mon2.kgjipz (mgr.144113) 28 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: main()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: r = ctx.func(ctx)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6762, in command_gather_facts
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: print(host.dump())
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6750, in dump
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: k: getattr(self, k) for k in dir(self)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6752, in <dictcomp>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: and isinstance(getattr(self, k), (float, int, str, list, dict, tuple))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6455, in hdd_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: return self._dev_list(devs)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6435, in _dev_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)]).strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6284, in read_file
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: content = f.read().strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/usr/lib64/python3.6/codecs.py", line 321, in decode
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: (result, consumed) = self._buffer_decode(data, self.errors, final)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 12: invalid continuation byte
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1347, in _remote_connection
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: yield (conn, connr)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1244, in _run_cephadm
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: code, '\n'.join(err)))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: main()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: r = ctx.func(ctx)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6762, in command_gather_facts
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: print(host.dump())
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6750, in dump
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: k: getattr(self, k) for k in dir(self)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6752, in <dictcomp>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: and isinstance(getattr(self, k), (float, int, str, list, dict, tuple))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6455, in hdd_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: return self._dev_list(devs)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6435, in _dev_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)]).strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6284, in read_file
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: content = f.read().strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/usr/lib64/python3.6/codecs.py", line 321, in decode
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: (result, consumed) = self._buffer_decode(data, self.errors, final)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 12: invalid continuation byte
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: cephadm 2021-09-28T14:59:30.113897+0000 mgr.mon2.kgjipz (mgr.144113) 29 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: main()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: r = ctx.func(ctx)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6762, in command_gather_facts
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: print(host.dump())
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6750, in dump
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: k: getattr(self, k) for k in dir(self)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6752, in <dictcomp>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: and isinstance(getattr(self, k), (float, int, str, list, dict, tuple))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6455, in hdd_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: return self._dev_list(devs)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6435, in _dev_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)]).strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6284, in read_file
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: content = f.read().strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/usr/lib64/python3.6/codecs.py", line 321, in decode
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: (result, consumed) = self._buffer_decode(data, self.errors, final)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 12: invalid continuation byte
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1347, in _remote_connection
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: yield (conn, connr)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1244, in _run_cephadm
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: code, '\n'.join(err)))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Traceback (most recent call last):
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: main()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: r = ctx.func(ctx)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6762, in command_gather_facts
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: print(host.dump())
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6750, in dump
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: k: getattr(self, k) for k in dir(self)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6752, in <dictcomp>
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: and isinstance(getattr(self, k), (float, int, str, list, dict, tuple))
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6455, in hdd_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: return self._dev_list(devs)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6435, in _dev_list
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)]).strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/var/lib/ceph/9a17df3a-203f-11ec-9edc-00155d0c761e/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 6284, in read_file
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: content = f.read().strip()
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: File "/usr/lib64/python3.6/codecs.py", line 321, in decode
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: (result, consumed) = self._buffer_decode(data, self.errors, final)
Sep 28 10:59:31 mon1.domain.tld conmon[2485]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 12: invalid continuation byte
Of special note is it trying to parse the /sys/block/<device>/device/wwid file and failing, because on Hyper-V, these disks contain non-UTF-8 data:
# cat /sys/block/sda/device/wwid
t10.MSFT hάJ\206!O2n3
# cat /sys/block/sda/device/wwid | xxd
00000000: 7431 302e 4d53 4654 2020 2020 cfdb 68ce t10.MSFT ..h.
00000010: ac4a de45 5c32 3036 214f 326e 33d1 cd0a .J.E\206!O2n3...
Since the file (if/when it exists, which it not always does) is opened in plaintext (line 6282):
6268 def read_file(path_list, file_name=''):
6269 # type: (List[str], str) -> str
6270 """Returns the content of the first file found within the `path_list`
6271
6272 :param path_list: list of file paths to search
6273 :param file_name: optional file_name to be applied to a file path
6274 :returns: content of the file or 'Unknown'
6275 """
6276 for path in path_list:
6277 if file_name:
6278 file_path = os.path.join(path, file_name)
6279 else:
6280 file_path = path
6281 if os.path.exists(file_path):
6282 with open(file_path, 'r') as f:
6283 try:
6284 content = f.read().strip()
6285 except OSError:
6286 # sysfs may populate the file, but for devices like
6287 # virtio reads can fail
6288 return 'Unknown'
6289 else:
6290 return content
6291 return 'Unknown'
It quite obviously fails.
This, for some reason, renders my cluster unusable.
Please resolve this. What more information do you need from me?
Updated by Greg Farnum over 2 years ago
- Project changed from Ceph to Orchestrator
Updated by Sebastian Wagner over 2 years ago
- Subject changed from Improperly parsing disk wwid to gather-facts: Improperly parsing disk wwid
Updated by Frank Filippone over 1 year ago
I have hit this exact same issue of non-UTF-8 data in the /sys/block/sda/device/wwid but with Rocky Linux 8.6 and Ceph 17.2.3 (under a Windows 2019 Hyper-V host).
Fresh install this week both Rocky Linux and Ceph using latest versions of everything.
I note this file is OK using Rocky Linux 9 and Ubuntu 20.04LTS under the same Windows 2019 host so I wonder if this is limited to RHEL/RL/AL 8.x versions only?
Either way it would be useful to resolve this as it causes the cluster to sit in a warning state forever (I've not gone past this state to see if the cluster actually functions).
Do you need any more information?