Project

General

Profile

Actions

Bug #51077

closed

MDSMonitor: crash when attempting to mount cephfs

Added by Stanislav Datskevych almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm using ceph v16.2.4 deployed with cephadm/docker.
When I try mounting the cephfs from a client, all 3 monitor containers crash.

The cephfs (and the client for it) were created using the following commands:
  1. ceph fs new bareos_backups bareos_backups_metadata bareos_backups_data --force (force is because the pool bareos_backups_data is an EC pool)
  2. ceph fs authorize bareos_backups client.bareos_backups / rw

The configuration of Ubuntu 20.04.2 client:
Ceph versions:

# dpkg -l | grep ceph | awk '{ print $2," ",$3, " ",$4}'
ceph-common   15.2.11-0ubuntu0.20.04.2   amd64
libcephfs2   15.2.11-0ubuntu0.20.04.2   amd64
python3-ceph-argparse   15.2.11-0ubuntu0.20.04.2   amd64
python3-ceph-common   15.2.11-0ubuntu0.20.04.2   all
python3-cephfs   15.2.11-0ubuntu0.20.04.2   amd64

Fstab (I only used one of the monitor IP for the test, but all of the monitors crash nevertheless):

100.90.1.13:/ /mnt/ceph ceph name=bareos_backups,secretfile=/etc/ceph.secret,noatime,_netdev 0 0

When I attempt to run "mount /mnt/ceph" I see the following messages in the client dmesg:

[13436.808890] libceph: mon0 (1)100.90.1.13:6789 session established
[13436.810034] libceph: mon0 (1)100.90.1.13:6789 socket closed (con state OPEN)
[13436.810055] libceph: mon0 (1)100.90.1.13:6789 session lost, hunting for new mon
[13436.816322] libceph: mon2 (1)100.90.1.14:6789 session established
[13437.367487] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state OPEN)
[13437.367520] libceph: mon2 (1)100.90.1.14:6789 session lost, hunting for new mon
[13437.389389] libceph: mon0 (1)100.90.1.12:6789 session established
[13438.129616] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state OPEN)
[13438.129667] libceph: mon0 (1)100.90.1.12:6789 session lost, hunting for new mon
[13444.450124] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state CONNECTING)
[13445.410163] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state CONNECTING)
[13446.402105] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state CONNECTING)
[13448.418115] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state CONNECTING)
[13452.647592] libceph: mon2 (1)100.90.1.14:6789 session established
[13452.841658] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state OPEN)
[13452.841694] libceph: mon2 (1)100.90.1.14:6789 session lost, hunting for new mon
[13452.848163] libceph: mon0 (1)100.90.1.12:6789 session established
[13453.139576] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state OPEN)
[13453.139614] libceph: mon0 (1)100.90.1.12:6789 session lost, hunting for new mon
[13453.145211] libceph: mon1 (1)100.90.1.13:6789 session established
[13453.585151] libceph: mon1 (1)100.90.1.13:6789 socket closed (con state OPEN)
[13453.585185] libceph: mon1 (1)100.90.1.13:6789 session lost, hunting for new mon
[13453.586192] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING)
[13454.402183] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING)
[13455.426124] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING)
[13457.410047] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING)
[13461.601997] libceph: mon0 (1)100.90.1.12:6789 socket closed (con state CONNECTING)
[13465.447114] libceph: mon1 (1)100.90.1.13:6789 session established
[13465.624148] libceph: mon1 (1)100.90.1.13:6789 socket closed (con state OPEN)
[13465.624172] libceph: mon1 (1)100.90.1.13:6789 session lost, hunting for new mon
[13479.809892] libceph: mon2 (1)100.90.1.14:6789 session established
[13480.009943] libceph: mon2 (1)100.90.1.14:6789 socket closed (con state OPEN)
[13480.009989] libceph: mon2 (1)100.90.1.14:6789 session lost, hunting for new mon
[13486.020207] libceph: mon1 (1)100.90.1.13:6789 socket closed (con state OPEN)
[13496.928447] ceph: No mds server is up or the cluster is laggy

At the same time all monitor containers crash with the message:

debug      0> 2021-06-03T12:14:28.190+0000 7fb14dc17700 -1 *** Caught signal (Aborted) **
 in thread 7fb14dc17700 thread_name:ms_dispatch

 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7fb15928ab20]
 2: gsignal()
 3: abort()
 4: /lib64/libstdc++.so.6(+0x9009b) [0x7fb1588a809b]
 5: /lib64/libstdc++.so.6(+0x9653c) [0x7fb1588ae53c]
 6: /lib64/libstdc++.so.6(+0x96597) [0x7fb1588ae597]
 7: /lib64/libstdc++.so.6(+0x967f8) [0x7fb1588ae7f8]
 8: /lib64/libstdc++.so.6(+0x92045) [0x7fb1588aa045]
 9: /usr/bin/ceph-mon(+0x4d8da6) [0x559d953eada6]
 10: (MDSMonitor::check_sub(Subscription*)+0x819) [0x559d953e1329]
 11: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xcd8) [0x559d951d3258]
 12: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d) [0x559d951f92ed]
 13: (Monitor::_ms_dispatch(Message*)+0x670) [0x559d951fa910]
 14: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x559d95228fdc]
 15: (DispatchQueue::entry()+0x126a) [0x7fb15b9cab1a]
 16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fb15ba7ab71]
 17: /lib64/libpthread.so.0(+0x814a) [0x7fb15928014a]
 18: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 rbd_pwl
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 immutable_obj_cache
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
   0/ 5 cephfs_mirror
   0/ 5 cephsqlite
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
  140399414929152 / rstore_compact
  140399431714560 / ms_dispatch
  140399448499968 / rocksdb:dump_st
  140399473678080 / msgr-worker-0
  140399490463488 / ms_dispatch
  140399532427008 / safe_timer
  140399582783232 / rocksdb:high0
  140399591175936 / rocksdb:low0
  max_recent     10000
  max_new        10000
  log_file /var/lib/ceph/crash/2021-06-03T12:14:28.191940Z_c1fbea06-3d75-4053-9c28-24de6ab45fd5/log
--- end dump of recent events ---

I'd be glad to provide more info/perform other tests if its needed.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #51286: pacific: MDSMonitor: crash when attempting to mount cephfsResolvedPatrick DonnellyActions
Actions

Also available in: Atom PDF