Bug #49212: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd' - RADOS - Ceph

Actions

Copy link

Bug #49212

closed

mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'

Added by Deepika Upadhyay about 3 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

pacific,octopus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

39674

Crash signature (v1):

Crash signature (v2):

Description

  description: rados/monthrash/{ceph clusters/3-mons msgr-failures/mon-delay msgr/async
    objectstore/bluestore-bitmap rados supported-random-distro$/{rhel_latest} thrashers/many
    workloads/rados_mon_workunits}

2021-02-07T17:17:06.081 INFO:tasks.mon_thrash.ceph_manager:quorum is size 2
2021-02-07T17:17:06.081 DEBUG:teuthology.orchestra.run.smithi060:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early tell mon.a mon_status
2021-02-07T17:17:06.207 INFO:tasks.workunit.client.0.smithi204.stderr:Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first

looking into mon logs:

2021-02-07T16:57:57.891+0000 7fb612561700  1 -- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/1302515522 conn(0x5614574f0800 msgr2=0x5614574e7200 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2021-02-07T16:57:57.891+0000 7fb612561700  1 --2- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/1302515522 conn(0x5614574f0800 0x5614574e7200 secure :-1 s=READY pgs=1 cs=0 l=1 rx=0x5614575cc120 tx=0x5614574ee910).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)

received  signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0

/ceph/teuthology-archive/yuriw-2021-02-07_16:27:00-rados-wip-yuri8-testing-2021-01-27-1208-octopus-distro-basic-smithi/5865221/teuthology.log

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Deepika Upadhyay about 3 years ago

Related to Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yaml added

Actions

Copy link

Updated by Deepika Upadhyay about 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Deepika Upadhyay about 3 years ago

Related to Bug #37786: test fails in mon/crush_ops.sh added

Actions

Copy link

Updated by Deepika Upadhyay about 3 years ago

Related to deleted (Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yaml)

Actions

Copy link

Updated by Deepika Upadhyay about 3 years ago

Related to deleted (Bug #37786: test fails in mon/crush_ops.sh)

Actions

Copy link

Updated by Deepika Upadhyay about 3 years ago

Is duplicate of Bug #37786: test fails in mon/crush_ops.sh added

Actions

Copy link

Updated by Neha Ojha about 3 years ago

Subject changed from mon/crush_ops.sh fails to octopus: mon/crush_ops.sh fails

Actions

Copy link

Updated by Neha Ojha about 3 years ago

Status changed from New to Duplicate

Actions

Copy link

Updated by Sage Weil about 3 years ago

Subject changed from octopus: mon/crush_ops.sh fails to mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'
Status changed from Duplicate to In Progress
Priority changed from Normal to High

2021-02-24T03:30:07.464 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:30:07.463+0000 7fc9f6ffd700 10 monclient: handle_mon_command_ack 2 [{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["osd.1"]}]
2021-02-24T03:30:07.464 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:30:07.463+0000 7fc9f6ffd700 10 monclient: _finish_command 2 = mon:16 osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first
2021-02-24T03:30:07.465 INFO:tasks.workunit.client.0.smithi190.stderr:Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first
2021-02-24T03:30:07.466 DEBUG:teuthology.orchestra.run:got remote process result: 16
2021-02-24T03:30:07.468 INFO:tasks.workunit:Stopping ['mon/pool_ops.sh', 'mon/crush_ops.sh', 'mon/osd.sh', 'mon/caps.sh'] on client.0...

/a/sage-2021-02-23_19:11:17-rados:monthrash-master-distro-basic-smithi/5907851

Actions

Copy link

#10

Updated by Sage Weil about 3 years ago

Status changed from In Progress to Fix Under Review
Backport set to pacific,octopus
Pull request ID set to 39674

earlier,

2021-02-24T03:29:52.687 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:29:52.684+0000 7f04c5ffb700 10 monclient: handle_mon_command_ack 3 [{"prefix": "osd crush rm-device-class", "ids": ["all"]}]
2021-02-24T03:29:52.687 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:29:52.684+0000 7f04c5ffb700 10 monclient: _finish_command 3 = system:0 osd.0 belongs to no class, osd.1 belongs to no class, osd.2 belongs to no class, osd.3 belongs to no class, osd.4 belongs to no class, osd.5 belongs to no class,

the problem is this command was sent multiple times (thrashign mons, msgr failure injection, etc.), and the most recent send returned early based on uncommitted state (which was then lost due to a mon restart, probably), which meant the later command(s) in crush_ops.sh saw unexpected cluster state.

Actions

Copy link

#11

Updated by Sage Weil about 3 years ago

Status changed from Fix Under Review to Pending Backport

backport for pacific: https://github.com/ceph/ceph/pull/39735

Actions

Copy link

#12

Updated by Backport Bot about 3 years ago

Copied to Backport #49526: pacific: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd' added

Actions

Copy link

#13

Updated by Backport Bot about 3 years ago

Copied to Backport #49527: octopus: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd' added

Actions

Copy link

#14

Updated by Loïc Dachary about 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #49212

mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'

Updated by Deepika Upadhyay about 3 years ago

Updated by Deepika Upadhyay about 3 years ago

Updated by Deepika Upadhyay about 3 years ago

Updated by Deepika Upadhyay about 3 years ago

Updated by Deepika Upadhyay about 3 years ago

Updated by Deepika Upadhyay about 3 years ago

Updated by Neha Ojha about 3 years ago

Updated by Neha Ojha about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Loïc Dachary about 3 years ago