Project

General

Profile

Bug #45235

Updated by Sebastian Wagner almost 4 years ago

<pre> 
 $ ceph orch daemon rm mon.ubuntu --force                                                                                                                  
 Removed mon.ubuntu from host 'ubuntu' 
 $ ceph orch ps 
 No daemons reported 
 </pre> 

 suddenly, now everything hangs: 

 <pre> 
 $ ceph orch apply mon --placement 'ubuntu:127.0.0.1' 
 ^CCluster connection aborted 
 $ ceph -s                     
 ^CCluster connection aborted 
 </pre> 

 gdb revealed: cephadm hangs in set_store: 

 <pre> 
 Thread 40 (Thread 0x7f8917ec4700 (LWP 23648)): 
 #0    0x00007f894179b9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f8917ec0400) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 
 #1    __pthread_cond_wait_common (abstime=0x0, mutex=0x7f8917ec03b0, cond=0x7f8917ec03d8) at pthread_cond_wait.c:502 
 #2    __pthread_cond_wait (cond=0x7f8917ec03d8, mutex=0x7f8917ec03b0) at pthread_cond_wait.c:655 
 #3    0x00007f8941274f0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
 #4    0x0000561b001cd1ee in std::condition_variable::wait<C_SaferCond::wait()::{lambda()#1}>(std::unique_lock<std::mutex>&, C_SaferCond::wait()::{lambda()#1}) (this=0x7f8917ec03d8, __lock=..., __p=...) at /usr/include/c++/9/condition_variable:101 
 #5    0x0000561b001c91ae in C_SaferCond::wait (this=0x7f8917ec03a8) at /home/sebastian/Repos/ceph/src/common/Cond.h:100 
 #6    0x0000561b001c9a82 in Command::wait (this=0x7f8917ec03a0) at /home/sebastian/Repos/ceph/src/mgr/MgrContext.h:39 
 #7    0x0000561b001c188a in ActivePyModules::set_store (this=0x561b03e5e000, module_name="cephadm", key="host.ubuntu", val=...) at /home/sebastian/Repos/ceph/src/mgr/ActivePyModules.cc:679 
 #8    0x0000561b001e738a in ceph_store_set (self=0x7f89217685e8,  
     args=('host.ubuntu', '{"daemons": {}, "devices": [{"rejected_reasons": ["locked"], "available": false, "path": "/dev/nvme0n1", "sys_api": {"removable": "0", "ro": "0", "vendor": "", "model": "KXG50ZNV512G NVMe TOSHIBA 512GB", "rev": "", "sas_address": "", "sas_device_handle": "", "support_discard": "512", "rotational": "0", "nr_requests": "1023", "scheduler_mode": "none", "partitions": {"nvme0n1p3": {"start": "1288192", "sectors": "998926336", "sectorsize": 512, "size": 511450284032.0, "human_readable_size": "476.33 GB", "holders": []}, "nvme0n1p1": {"start": "2048", "sectors": "1024000", "sectorsize": 512, "size": 524288000.0, "human_readable_size": "500.00 MB", "holders": []}, "nvme0n1p2": {"start": "1026048", "sectors": "262144", "sectorsize": 512, "size": 134217728.0, "human_readable_size": "128.00 MB", "holders": []}}, "sectors": 0, "sectorsize": "512", "size": 512110190592.0, "human_readable_size": "476.94 GB", "path": "/dev/nvme0n1", "locked": 1}, "lvs": [], "human_readable_type": "ssd", "device_id":')) at /home/sebastian/Repos/ceph/src/mgr/BaseMgrModule.cc:511 
 #9    0x00007f8941e3077b in PyCFunction_Call (func=<unknown at remote 0x7f8917ec0400>, args=<unknown at remote 0x80>, kwds=0x0) at ../Objects/methodobject.c:103 
 #10 0x0000000000000002 in ?? () 
 #11 0x0000000000000000 in ?? () 
 </pre> 

 monmap still lists the old mon: 

 <pre> 
 $ ceph daemon mon.a mon_status 
     "monmap": { 
         "epoch": 2, 
         "fsid": "a9df56ad-29b0-4bf4-8e47-a35c7657b332", 
         "modified": "2020-04-23T13:26:56.292827Z", 
         "created": "2020-04-23T11:14:26.191548Z", 
         "min_mon_release": 15, 
         "min_mon_release_name": "octopus", 
         "features": { 
             "persistent": [ 
                 "kraken", 
                 "luminous", 
                 "mimic", 
                 "osdmap-prune", 
                 "nautilus", 
                 "octopus" 
             ], 
             "optional": [] 
         }, 
         "mons": [ 
             { 
                 "rank": 0, 
                 "name": "a", 
                 "public_addrs": { 
                     "addrvec": [ 
                         { 
                             "type": "v2", 
                             "addr": "172.17.0.1:40843", 
                             "nonce": 0 
                         }, 
                         { 
                             "type": "v1", 
                             "addr": "172.17.0.1:40844", 
                             "nonce": 0 
                         } 
                     ] 
                 }, 
                 "addr": "172.17.0.1:40844/0", 
                 "public_addr": "172.17.0.1:40844/0", 
                 "priority": 0, 
                 "weight": 0 
             }, 
             { 
                 "rank": 1, 
                 "name": "ubuntu", 
                 "public_addrs": { 
                     "addrvec": [ 
                         { 
                             "type": "v2", 
                             "addr": "127.0.0.1:3300", 
                             "nonce": 0 
                         }, 
                         { 
                             "type": "v1", 
                             "addr": "127.0.0.1:6789", 
                             "nonce": 0 
                         } 
                     ] 
                 }, 
                 "addr": "127.0.0.1:6789/0", 
                 "public_addr": "127.0.0.1:6789/0", 
                 "priority": 0, 
                 "weight": 0 
             } 
         ] 
     }, 

 </pre>

Back