Actions
Bug #49212
closedmon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'
Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
description: rados/monthrash/{ceph clusters/3-mons msgr-failures/mon-delay msgr/async objectstore/bluestore-bitmap rados supported-random-distro$/{rhel_latest} thrashers/many workloads/rados_mon_workunits}
2021-02-07T17:17:06.081 INFO:tasks.mon_thrash.ceph_manager:quorum is size 2 2021-02-07T17:17:06.081 DEBUG:teuthology.orchestra.run.smithi060:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early tell mon.a mon_status 2021-02-07T17:17:06.207 INFO:tasks.workunit.client.0.smithi204.stderr:Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first
looking into mon logs:
2021-02-07T16:57:57.891+0000 7fb612561700 1 -- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/1302515522 conn(0x5614574f0800 msgr2=0x5614574e7200 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 2021-02-07T16:57:57.891+0000 7fb612561700 1 --2- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/1302515522 conn(0x5614574f0800 0x5614574e7200 secure :-1 s=READY pgs=1 cs=0 l=1 rx=0x5614575cc120 tx=0x5614574ee910).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
received signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
/ceph/teuthology-archive/yuriw-2021-02-07_16:27:00-rados-wip-yuri8-testing-2021-01-27-1208-octopus-distro-basic-smithi/5865221/teuthology.log
Updated by Deepika Upadhyay about 3 years ago
- Related to Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yaml added
Updated by Deepika Upadhyay about 3 years ago
- Related to Bug #37786: test fails in mon/crush_ops.sh added
Updated by Deepika Upadhyay about 3 years ago
- Related to deleted (Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yaml)
Updated by Deepika Upadhyay about 3 years ago
- Related to deleted (Bug #37786: test fails in mon/crush_ops.sh)
Updated by Deepika Upadhyay about 3 years ago
- Is duplicate of Bug #37786: test fails in mon/crush_ops.sh added
Updated by Neha Ojha about 3 years ago
- Subject changed from mon/crush_ops.sh fails to octopus: mon/crush_ops.sh fails
Updated by Sage Weil about 3 years ago
- Subject changed from octopus: mon/crush_ops.sh fails to mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'
- Status changed from Duplicate to In Progress
- Priority changed from Normal to High
2021-02-24T03:30:07.464 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:30:07.463+0000 7fc9f6ffd700 10 monclient: handle_mon_command_ack 2 [{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["osd.1"]}] 2021-02-24T03:30:07.464 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:30:07.463+0000 7fc9f6ffd700 10 monclient: _finish_command 2 = mon:16 osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first 2021-02-24T03:30:07.465 INFO:tasks.workunit.client.0.smithi190.stderr:Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first 2021-02-24T03:30:07.466 DEBUG:teuthology.orchestra.run:got remote process result: 16 2021-02-24T03:30:07.468 INFO:tasks.workunit:Stopping ['mon/pool_ops.sh', 'mon/crush_ops.sh', 'mon/osd.sh', 'mon/caps.sh'] on client.0...
/a/sage-2021-02-23_19:11:17-rados:monthrash-master-distro-basic-smithi/5907851
Updated by Sage Weil about 3 years ago
- Status changed from In Progress to Fix Under Review
- Backport set to pacific,octopus
- Pull request ID set to 39674
earlier,
2021-02-24T03:29:52.687 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:29:52.684+0000 7f04c5ffb700 10 monclient: handle_mon_command_ack 3 [{"prefix": "osd crush rm-device-class", "ids": ["all"]}] 2021-02-24T03:29:52.687 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:29:52.684+0000 7f04c5ffb700 10 monclient: _finish_command 3 = system:0 osd.0 belongs to no class, osd.1 belongs to no class, osd.2 belongs to no class, osd.3 belongs to no class, osd.4 belongs to no class, osd.5 belongs to no class,
the problem is this command was sent multiple times (thrashign mons, msgr failure injection, etc.), and the most recent send returned early based on uncommitted state (which was then lost due to a mon restart, probably), which meant the later command(s) in crush_ops.sh saw unexpected cluster state.
Updated by Sage Weil about 3 years ago
- Status changed from Fix Under Review to Pending Backport
backport for pacific: https://github.com/ceph/ceph/pull/39735
Updated by Backport Bot about 3 years ago
- Copied to Backport #49526: pacific: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd' added
Updated by Backport Bot about 3 years ago
- Copied to Backport #49527: octopus: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd' added
Updated by Loïc Dachary about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Actions