Bug #45235
cephadm: mons are not properly undeployed
Status:
Can't reproduce
Priority:
Low
Assignee:
-
Category:
cephadm
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
$ ceph orch daemon rm mon.ubuntu --force Removed mon.ubuntu from host 'ubuntu' $ ceph orch ps No daemons reported
suddenly, everything hangs:
$ ceph orch apply mon --placement 'ubuntu:127.0.0.1' ^CCluster connection aborted $ ceph -s ^CCluster connection aborted
gdb revealed: cephadm hangs in set_store:
Thread 40 (Thread 0x7f8917ec4700 (LWP 23648)): #0 0x00007f894179b9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f8917ec0400) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f8917ec03b0, cond=0x7f8917ec03d8) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=0x7f8917ec03d8, mutex=0x7f8917ec03b0) at pthread_cond_wait.c:655 #3 0x00007f8941274f0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x0000561b001cd1ee in std::condition_variable::wait<C_SaferCond::wait()::{lambda()#1}>(std::unique_lock<std::mutex>&, C_SaferCond::wait()::{lambda()#1}) (this=0x7f8917ec03d8, __lock=..., __p=...) at /usr/include/c++/9/condition_variable:101 #5 0x0000561b001c91ae in C_SaferCond::wait (this=0x7f8917ec03a8) at /home/sebastian/Repos/ceph/src/common/Cond.h:100 #6 0x0000561b001c9a82 in Command::wait (this=0x7f8917ec03a0) at /home/sebastian/Repos/ceph/src/mgr/MgrContext.h:39 #7 0x0000561b001c188a in ActivePyModules::set_store (this=0x561b03e5e000, module_name="cephadm", key="host.ubuntu", val=...) at /home/sebastian/Repos/ceph/src/mgr/ActivePyModules.cc:679 #8 0x0000561b001e738a in ceph_store_set (self=0x7f89217685e8, args=('host.ubuntu', '{"daemons": {}, "devices": [{"rejected_reasons": ["locked"], "available": false, "path": "/dev/nvme0n1", "sys_api": {"removable": "0", "ro": "0", "vendor": "", "model": "KXG50ZNV512G NVMe TOSHIBA 512GB", "rev": "", "sas_address": "", "sas_device_handle": "", "support_discard": "512", "rotational": "0", "nr_requests": "1023", "scheduler_mode": "none", "partitions": {"nvme0n1p3": {"start": "1288192", "sectors": "998926336", "sectorsize": 512, "size": 511450284032.0, "human_readable_size": "476.33 GB", "holders": []}, "nvme0n1p1": {"start": "2048", "sectors": "1024000", "sectorsize": 512, "size": 524288000.0, "human_readable_size": "500.00 MB", "holders": []}, "nvme0n1p2": {"start": "1026048", "sectors": "262144", "sectorsize": 512, "size": 134217728.0, "human_readable_size": "128.00 MB", "holders": []}}, "sectors": 0, "sectorsize": "512", "size": 512110190592.0, "human_readable_size": "476.94 GB", "path": "/dev/nvme0n1", "locked": 1}, "lvs": [], "human_readable_type": "ssd", "device_id":')) at /home/sebastian/Repos/ceph/src/mgr/BaseMgrModule.cc:511 #9 0x00007f8941e3077b in PyCFunction_Call (func=<unknown at remote 0x7f8917ec0400>, args=<unknown at remote 0x80>, kwds=0x0) at ../Objects/methodobject.c:103 #10 0x0000000000000002 in ?? () #11 0x0000000000000000 in ?? ()
monmap still lists the old mon:
$ ceph daemon mon.a mon_status "monmap": { "epoch": 2, "fsid": "a9df56ad-29b0-4bf4-8e47-a35c7657b332", "modified": "2020-04-23T13:26:56.292827Z", "created": "2020-04-23T11:14:26.191548Z", "min_mon_release": 15, "min_mon_release_name": "octopus", "features": { "persistent": [ "kraken", "luminous", "mimic", "osdmap-prune", "nautilus", "octopus" ], "optional": [] }, "mons": [ { "rank": 0, "name": "a", "public_addrs": { "addrvec": [ { "type": "v2", "addr": "172.17.0.1:40843", "nonce": 0 }, { "type": "v1", "addr": "172.17.0.1:40844", "nonce": 0 } ] }, "addr": "172.17.0.1:40844/0", "public_addr": "172.17.0.1:40844/0", "priority": 0, "weight": 0 }, { "rank": 1, "name": "ubuntu", "public_addrs": { "addrvec": [ { "type": "v2", "addr": "127.0.0.1:3300", "nonce": 0 }, { "type": "v1", "addr": "127.0.0.1:6789", "nonce": 0 } ] }, "addr": "127.0.0.1:6789/0", "public_addr": "127.0.0.1:6789/0", "priority": 0, "weight": 0 } ] },
Related issues
History
#1 Updated by Sebastian Wagner almost 4 years ago
- Description updated (diff)
#2 Updated by Sebastian Wagner almost 4 years ago
- Related to Bug #45167: cephadm: mons are not properly deployed added
#3 Updated by Sebastian Wagner over 3 years ago
- Priority changed from Normal to Low
#4 Updated by Sebastian Wagner over 3 years ago
- Status changed from New to Can't reproduce