Project

General

Profile

Actions

Bug #49212

closed

mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'

Added by Deepika Upadhyay about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  description: rados/monthrash/{ceph clusters/3-mons msgr-failures/mon-delay msgr/async
    objectstore/bluestore-bitmap rados supported-random-distro$/{rhel_latest} thrashers/many
    workloads/rados_mon_workunits}
2021-02-07T17:17:06.081 INFO:tasks.mon_thrash.ceph_manager:quorum is size 2
2021-02-07T17:17:06.081 DEBUG:teuthology.orchestra.run.smithi060:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early tell mon.a mon_status
2021-02-07T17:17:06.207 INFO:tasks.workunit.client.0.smithi204.stderr:Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first

looking into mon logs:

2021-02-07T16:57:57.891+0000 7fb612561700  1 -- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/1302515522 conn(0x5614574f0800 msgr2=0x5614574e7200 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2021-02-07T16:57:57.891+0000 7fb612561700  1 --2- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/1302515522 conn(0x5614574f0800 0x5614574e7200 secure :-1 s=READY pgs=1 cs=0 l=1 rx=0x5614575cc120 tx=0x5614574ee910).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)

received  signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0

/ceph/teuthology-archive/yuriw-2021-02-07_16:27:00-rados-wip-yuri8-testing-2021-01-27-1208-octopus-distro-basic-smithi/5865221/teuthology.log


Related issues 3 (0 open3 closed)

Is duplicate of RADOS - Bug #37786: test fails in mon/crush_ops.shCan't reproduce01/04/2019

Actions
Copied to RADOS - Backport #49526: pacific: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'ResolvedActions
Copied to RADOS - Backport #49527: octopus: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'Resolvedsinguliere _Actions
Actions #1

Updated by Deepika Upadhyay about 3 years ago

  • Related to Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yaml added
Actions #2

Updated by Deepika Upadhyay about 3 years ago

  • Description updated (diff)
Actions #3

Updated by Deepika Upadhyay about 3 years ago

  • Related to Bug #37786: test fails in mon/crush_ops.sh added
Actions #4

Updated by Deepika Upadhyay about 3 years ago

  • Related to deleted (Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yaml)
Actions #5

Updated by Deepika Upadhyay about 3 years ago

  • Related to deleted (Bug #37786: test fails in mon/crush_ops.sh)
Actions #6

Updated by Deepika Upadhyay about 3 years ago

  • Is duplicate of Bug #37786: test fails in mon/crush_ops.sh added
Actions #7

Updated by Neha Ojha about 3 years ago

  • Subject changed from mon/crush_ops.sh fails to octopus: mon/crush_ops.sh fails
Actions #8

Updated by Neha Ojha about 3 years ago

  • Status changed from New to Duplicate
Actions #9

Updated by Sage Weil about 3 years ago

  • Subject changed from octopus: mon/crush_ops.sh fails to mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'
  • Status changed from Duplicate to In Progress
  • Priority changed from Normal to High
2021-02-24T03:30:07.464 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:30:07.463+0000 7fc9f6ffd700 10 monclient: handle_mon_command_ack 2 [{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["osd.1"]}]
2021-02-24T03:30:07.464 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:30:07.463+0000 7fc9f6ffd700 10 monclient: _finish_command 2 = mon:16 osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first
2021-02-24T03:30:07.465 INFO:tasks.workunit.client.0.smithi190.stderr:Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first
2021-02-24T03:30:07.466 DEBUG:teuthology.orchestra.run:got remote process result: 16
2021-02-24T03:30:07.468 INFO:tasks.workunit:Stopping ['mon/pool_ops.sh', 'mon/crush_ops.sh', 'mon/osd.sh', 'mon/caps.sh'] on client.0...

/a/sage-2021-02-23_19:11:17-rados:monthrash-master-distro-basic-smithi/5907851
Actions #10

Updated by Sage Weil about 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to pacific,octopus
  • Pull request ID set to 39674

earlier,

2021-02-24T03:29:52.687 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:29:52.684+0000 7f04c5ffb700 10 monclient: handle_mon_command_ack 3 [{"prefix": "osd crush rm-device-class", "ids": ["all"]}]
2021-02-24T03:29:52.687 INFO:tasks.workunit.client.0.smithi190.stderr:2021-02-24T03:29:52.684+0000 7f04c5ffb700 10 monclient: _finish_command 3 = system:0 osd.0 belongs to no class, osd.1 belongs to no class, osd.2 belongs to no class, osd.3 belongs to no class, osd.4 belongs to no class, osd.5 belongs to no class,

the problem is this command was sent multiple times (thrashign mons, msgr failure injection, etc.), and the most recent send returned early based on uncommitted state (which was then lost due to a mon restart, probably), which meant the later command(s) in crush_ops.sh saw unexpected cluster state.

Actions #11

Updated by Sage Weil about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #12

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49526: pacific: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd' added
Actions #13

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49527: octopus: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd' added
Actions #14

Updated by Loïc Dachary about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF