Project

General

Profile

Actions

Bug #63502

open

Regression: Permanent KeyError: 'TYPE' : return self.blkid_api['TYPE'] == 'part'

Added by Harry Coin 6 months ago. Updated 13 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
backport_processed
Backport:
reef, quincy
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A bug reported long ago, apparently fixed, has re-appeared when migrating within Quincy to 17.2.7

On Tue, 7 Nov 2023, Harry G Coin wrote:

These repeat for every host, only after upgrading from prev release Quincy to 17.2.7. As a result, the cluster is always warned, never indicates healthy.

I'm hitting this error, too.

"/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 482, in is_partition
/usr/bin/docker: stderr return self.blkid_api['TYPE'] == 'part'
/usr/bin/docker: stderr KeyError: 'TYPE'

Variable names indicate usage of BLKID. It seems that `blkid` usually returns TYPE="something", but I have devices without TYPE:

/dev/mapper/data-4d323729--8fec--42c6--a1da--bacdea89fb37.disk0_data: PTUUID="c2901603-fae8-45cb-86fe-13d02e6b6dc6" PTTYPE="gpt"
/dev/mapper/data-8d485122--d8ca--4e11--85bb--3f795a4e31e9.disk0_data: PTUUID="2bc7a15e" PTTYPE="dos"
/dev/drbd3: PTUUID="2bc7a15e" PTTYPE="dos"

Maybe this indicates why the key is missing?

Please tell me if there is anything I can do to find the root cause.

Thanks, Sascha.

Detail when it appeared previously, but only in tests and not in production: https://tracker.ceph.com/issues/56573


Related issues 2 (1 open1 closed)

Copied to Orchestrator - Backport #64843: quincy: Regression: Permanent KeyError: 'TYPE' : return self.blkid_api['TYPE'] == 'part'In ProgressAdam KingActions
Copied to Orchestrator - Backport #64844: reef: Regression: Permanent KeyError: 'TYPE' : return self.blkid_api['TYPE'] == 'part'ResolvedAdam KingActions
Actions #1

Updated by Vadym Kukharenko 4 months ago

I got the same problem.
Fistly tried to upgrade from 17.2.6 to 17.2.7.
Secondly reverted from 17.2.7 to 17.2.6 and upgraded to 18.2.1.
18.2.1 has the same error.
So I temperary muted this error:

ceph health mute CEPHADM_REFRESH_FAILED

I hope fix will come fast.

Harry Coin wrote:

A bug reported long ago, apparently fixed, has re-appeared when migrating within Quincy to 17.2.7

On Tue, 7 Nov 2023, Harry G Coin wrote:

These repeat for every host, only after upgrading from prev release Quincy to 17.2.7. As a result, the cluster is always warned, never indicates healthy.

I'm hitting this error, too.

"/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 482, in is_partition
/usr/bin/docker: stderr return self.blkid_api['TYPE'] == 'part'
/usr/bin/docker: stderr KeyError: 'TYPE'

Variable names indicate usage of BLKID. It seems that `blkid` usually returns TYPE="something", but I have devices without TYPE:

/dev/mapper/data-4d323729--8fec--42c6--a1da--bacdea89fb37.disk0_data: PTUUID="c2901603-fae8-45cb-86fe-13d02e6b6dc6" PTTYPE="gpt"
/dev/mapper/data-8d485122--d8ca--4e11--85bb--3f795a4e31e9.disk0_data: PTUUID="2bc7a15e" PTTYPE="dos"
/dev/drbd3: PTUUID="2bc7a15e" PTTYPE="dos"

Maybe this indicates why the key is missing?

Please tell me if there is anything I can do to find the root cause.

Thanks, Sascha.

Detail when it appeared previously, but only in tests and not in production: https://tracker.ceph.com/issues/56573

Actions #2

Updated by Zac Dover 4 months ago

  • Pull request ID set to 54608
Actions #3

Updated by Harry Coin about 2 months ago

Remains in 18.2.1

Actions #4

Updated by Harry Coin about 2 months ago

Remains in 18.2.2

Actions #5

Updated by Adam King about 2 months ago

  • Project changed from Ceph to Orchestrator
  • Status changed from New to Pending Backport
  • Backport set to reef, quincy

looks like this was never backported.

Actions #6

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64843: quincy: Regression: Permanent KeyError: 'TYPE' : return self.blkid_api['TYPE'] == 'part' added
Actions #7

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64844: reef: Regression: Permanent KeyError: 'TYPE' : return self.blkid_api['TYPE'] == 'part' added
Actions #8

Updated by Backport Bot about 2 months ago

  • Tags set to backport_processed
Actions #9

Updated by Vadym Kukharenko 13 days ago

Vadym Kukharenko wrote in #note-1:

I got the same problem.
Fistly tried to upgrade from 17.2.6 to 17.2.7.
Secondly reverted from 17.2.7 to 17.2.6 and upgraded to 18.2.1.
18.2.1 has the same error.
So I temperary muted this error:
[...]

I hope fix will come fast.

Harry Coin wrote:

A bug reported long ago, apparently fixed, has re-appeared when migrating within Quincy to 17.2.7

On Tue, 7 Nov 2023, Harry G Coin wrote:

These repeat for every host, only after upgrading from prev release Quincy to 17.2.7. As a result, the cluster is always warned, never indicates healthy.

I'm hitting this error, too.

"/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 482, in is_partition
/usr/bin/docker: stderr return self.blkid_api['TYPE'] == 'part'
/usr/bin/docker: stderr KeyError: 'TYPE'

Variable names indicate usage of BLKID. It seems that `blkid` usually returns TYPE="something", but I have devices without TYPE:

/dev/mapper/data-4d323729--8fec--42c6--a1da--bacdea89fb37.disk0_data: PTUUID="c2901603-fae8-45cb-86fe-13d02e6b6dc6" PTTYPE="gpt"
/dev/mapper/data-8d485122--d8ca--4e11--85bb--3f795a4e31e9.disk0_data: PTUUID="2bc7a15e" PTTYPE="dos"
/dev/drbd3: PTUUID="2bc7a15e" PTTYPE="dos"

Maybe this indicates why the key is missing?

Please tell me if there is anything I can do to find the root cause.

Thanks, Sascha.

Detail when it appeared previously, but only in tests and not in production: https://tracker.ceph.com/issues/56573

Fix didn't come yet, so I found workaround.
I use nvme disk for fast pools and this errror related to nvme disks.
So I removed osd and made zap and create osd again.
Before zapping disk, you must make lvremove for this disk on needed host.

Actions

Also available in: Atom PDF