Project

General

Profile

Bug #51077

Updated by Patrick Donnelly almost 3 years ago

I'm using ceph v16.2.4 deployed with cephadm/docker. 
 When I try mounting the cephfs from a client, all 3 monitor containers crash. 

 The cephfs (and the client for it) were created using the following commands: 
 # ceph fs new bareos_backups bareos_backups_metadata bareos_backups_data --force    (force is because the pool bareos_backups_data is an EC pool) 
 # ceph fs authorize bareos_backups client.bareos_backups /    rw 


 The configuration of Ubuntu 20.04.2 client: 
 Ceph versions: 
 <pre> 
 # dpkg -l | grep ceph | awk '{ print $2," ",$3, " ",$4}' 
 ceph-common     15.2.11-0ubuntu0.20.04.2     amd64 
 libcephfs2     15.2.11-0ubuntu0.20.04.2     amd64 
 python3-ceph-argparse     15.2.11-0ubuntu0.20.04.2     amd64 
 python3-ceph-common     15.2.11-0ubuntu0.20.04.2     all 
 python3-cephfs     15.2.11-0ubuntu0.20.04.2     amd64 
 </pre> 

 Fstab (I only used one of the monitor IP for the test, but all of the monitors crash nevertheless): 

 <pre> 
 100.90.1.13:/ /mnt/ceph ceph name=bareos_backups,secretfile=/etc/ceph.secret,noatime,_netdev 0 0 
 </pre> 

 When I attempt to run "mount /mnt/ceph" I see the following messages in the client dmesg: 

 <pre> 
 [13436.808890] libceph: mon0 (1)100.90.1.13:6789 session established 
 [13436.810034] libceph: mon0 (1)100.90.1.13:6789 socket closed (con state OPEN) 
 [13436.810055] libceph: mon0 (1)100.90.1.13:6789 session lost, hunting for new mon 
 [13436.816322] libceph: mon2 (1)100.90.1.14:6789 session established 
 [13437.367487] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state OPEN) 
 [13437.367520] libceph: mon2 (1)100.90.1.14:6789 session lost, hunting for new mon 
 [13437.389389] libceph: mon0 (1)100.90.1.12:6789 session established 
 [13438.129616] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state OPEN) 
 [13438.129667] libceph: mon0 (1)100.90.1.12:6789 session lost, hunting for new mon 
 [13444.450124] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state CONNECTING) 
 [13445.410163] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state CONNECTING) 
 [13446.402105] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state CONNECTING) 
 [13448.418115] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state CONNECTING) 
 [13452.647592] libceph: mon2 (1)100.90.1.14:6789 session established 
 [13452.841658] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state OPEN) 
 [13452.841694] libceph: mon2 (1)100.90.1.14:6789 session lost, hunting for new mon 
 [13452.848163] libceph: mon0 (1)100.90.1.12:6789 session established 
 [13453.139576] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state OPEN) 
 [13453.139614] libceph: mon0 (1)100.90.1.12:6789 session lost, hunting for new mon 
 [13453.145211] libceph: mon1 (1)100.90.1.13:6789 session established 
 [13453.585151] libceph: mon1 (1)100.90.1.13:6789 socket closed (con state OPEN) 
 [13453.585185] libceph: mon1 (1)100.90.1.13:6789 session lost, hunting for new mon 
 [13453.586192] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING) 
 [13454.402183] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING) 
 [13455.426124] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING) 
 [13457.410047] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING) 
 [13461.601997] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING) 
 [13465.447114] libceph: mon1 (1)100.90.1.13:6789 session established 
 [13465.624148] libceph: mon1 (1)100.90.1.13:6789 socket closed (con state OPEN) 
 [13465.624172] libceph: mon1 (1)100.90.1.13:6789 session lost, hunting for new mon 
 [13479.809892] libceph: mon2 (1)100.90.1.14:6789 session established 
 [13480.009943] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state OPEN) 
 [13480.009989] libceph: mon2 (1)100.90.1.14:6789 session lost, hunting for new mon 
 [13486.020207] libceph: mon1 (1)100.90.1.13:6789 socket closed (con state OPEN) 
 [13496.928447] ceph: No mds server is up or the cluster is laggy 
 </pre> 

 


 At the same time all monitor containers crash with the message: 

 <pre> 
 debug        0> 2021-06-03T12:14:28.190+0000 7fb14dc17700 -1 *** Caught signal (Aborted) ** 
  in thread 7fb14dc17700 thread_name:ms_dispatch 

  ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) 
  1: /lib64/libpthread.so.0(+0x12b20) [0x7fb15928ab20] 
  2: gsignal() 
  3: abort() 
  4: /lib64/libstdc++.so.6(+0x9009b) [0x7fb1588a809b] 
  5: /lib64/libstdc++.so.6(+0x9653c) [0x7fb1588ae53c] 
  6: /lib64/libstdc++.so.6(+0x96597) [0x7fb1588ae597] 
  7: /lib64/libstdc++.so.6(+0x967f8) [0x7fb1588ae7f8] 
  8: /lib64/libstdc++.so.6(+0x92045) [0x7fb1588aa045] 
  9: /usr/bin/ceph-mon(+0x4d8da6) [0x559d953eada6] 
  10: (MDSMonitor::check_sub(Subscription*)+0x819) [0x559d953e1329] 
  11: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xcd8) [0x559d951d3258] 
  12: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d) [0x559d951f92ed] 
  13: (Monitor::_ms_dispatch(Message*)+0x670) [0x559d951fa910] 
  14: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x559d95228fdc] 
  15: (DispatchQueue::entry()+0x126a) [0x7fb15b9cab1a] 
  16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fb15ba7ab71] 
  17: /lib64/libpthread.so.0(+0x814a) [0x7fb15928014a] 
  18: clone() 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 

 --- logging levels --- 
    0/ 5 none 
    0/ 1 lockdep 
    0/ 1 context 
    1/ 1 crush 
    1/ 5 mds 
    1/ 5 mds_balancer 
    1/ 5 mds_locker 
    1/ 5 mds_log 
    1/ 5 mds_log_expire 
    1/ 5 mds_migrator 
    0/ 1 buffer 
    0/ 1 timer 
    0/ 1 filer 
    0/ 1 striper 
    0/ 1 objecter 
    0/ 5 rados 
    0/ 5 rbd 
    0/ 5 rbd_mirror 
    0/ 5 rbd_replay 
    0/ 5 rbd_pwl 
    0/ 5 journaler 
    0/ 5 objectcacher 
    0/ 5 immutable_obj_cache 
    0/ 5 client 
    1/ 5 osd 
    0/ 5 optracker 
    0/ 5 objclass 
    1/ 3 filestore 
    1/ 3 journal 
    0/ 0 ms 
    1/ 5 mon 
    0/10 monc 
    1/ 5 paxos 
    0/ 5 tp 
    1/ 5 auth 
    1/ 5 crypto 
    1/ 1 finisher 
    1/ 1 reserver 
    1/ 5 heartbeatmap 
    1/ 5 perfcounter 
    1/ 5 rgw 
    1/ 5 rgw_sync 
    1/10 civetweb 
    1/ 5 javaclient 
    1/ 5 asok 
    1/ 1 throttle 
    0/ 0 refs 
    1/ 5 compressor 
    1/ 5 bluestore 
    1/ 5 bluefs 
    1/ 3 bdev 
    1/ 5 kstore 
    4/ 5 rocksdb 
    4/ 5 leveldb 
    4/ 5 memdb 
    1/ 5 fuse 
    1/ 5 mgr 
    1/ 5 mgrc 
    1/ 5 dpdk 
    1/ 5 eventtrace 
    1/ 5 prioritycache 
    0/ 5 test 
    0/ 5 cephfs_mirror 
    0/ 5 cephsqlite 
   -2/-2 (syslog threshold) 
   99/99 (stderr threshold) 
 --- pthread ID / name mapping for recent threads --- 
   140399414929152 / rstore_compact 
   140399431714560 / ms_dispatch 
   140399448499968 / rocksdb:dump_st 
   140399473678080 / msgr-worker-0 
   140399490463488 / ms_dispatch 
   140399532427008 / safe_timer 
   140399582783232 / rocksdb:high0 
   140399591175936 / rocksdb:low0 
   max_recent       10000 
   max_new          10000 
   log_file /var/lib/ceph/crash/2021-06-03T12:14:28.191940Z_c1fbea06-3d75-4053-9c28-24de6ab45fd5/log 
 --- end dump of recent events --- 
 </pre> 

 



 I'd be glad to provide more info/perform other tests if its needed.

Back