Project

General

Profile

Actions

Bug #58217

open

mon crash when attempting to mount cephfs

Added by hailu huang over 1 year ago. Updated 11 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-ansible
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

OS: ubunut22.04 tls
kern: 5.15.0-53-generic
ceph: 17.2.0 (apt install ceph-17.2.0-0ubuntu0.22.04.2)

the problem is similar to https://tracker.ceph.com/issues/51077 , but i can't solve.

my ceph cluster status:

root@ceph01:~# ceph -s 
  cluster:
    id:     31aafcbd-9e0f-4b83-9dad-147092b5a0e2
    health: HEALTH_WARN
            85 daemons have recently crashed

  services:
    mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 4h)
    mgr: ceph01(active, since 7d), standbys: ceph02, ceph03
    mds: 1/1 daemons up, 2 standby
    osd: 36 osds: 36 up (since 25h), 36 in (since 3w)

  data:
    volumes: 1/1 healthy
    pools:   7 pools, 169 pgs
    objects: 650.94k objects, 3.7 TiB
    usage:   11 TiB used, 123 TiB / 134 TiB avail
    pgs:     169 active+clean

  io:
    client:   2.4 MiB/s rd, 11 MiB/s wr, 187 op/s rd, 425 op/s wr

root@ceph01:~# ceph fs ls
name: instance, metadata pool: instance_metadata, data pools: [instance_data ]

keying:

root@ceph01:~# ceph auth get client.instance
[client.instance]
    key = AQDTQYhjHxMwFxAAt1mke9tz2cpcAL8t+s0TZg==
    caps mds = "allow rw fsname=instance" 
    caps mon = "allow r fsname=instance" 
    caps osd = "allow rw tag cephfs data=instance" 
exported keyring for client.instance

my operation:

root@compute01:~# mount -t ceph instance@.instance=/ /mnt -o mon_addr=172.28.20.101:6789,secret=AQDTQYhjHxMwFxAAt1mke9tz2cpcAL8t+s0TZg==
mount error: no mds server is up or the cluster is laggy

client_node kern.log:

Dec  7 14:34:25 compute01 kernel: [1028635.737354] libceph: mon1 (1)172.28.20.102:6789 session established
Dec  7 14:34:25 compute01 kernel: [1028635.740938] libceph: client2981042 fsid 31aafcbd-9e0f-4b83-9dad-147092b5a0e2
Dec  7 14:34:25 compute01 kernel: [1028636.056235] libceph: mon1 (1)172.28.20.102:6789 socket closed (con state OPEN)
Dec  7 14:34:25 compute01 kernel: [1028636.056258] libceph: mon1 (1)172.28.20.102:6789 session lost, hunting for new mon
Dec  7 14:34:40 compute01 kernel: [1028650.780437] libceph: mon0 (1)172.28.20.101:6789 session established
Dec  7 14:34:40 compute01 kernel: [1028650.945041] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state OPEN)
Dec  7 14:34:40 compute01 kernel: [1028650.945060] libceph: mon0 (1)172.28.20.101:6789 session lost, hunting for new mon
Dec  7 14:34:45 compute01 kernel: [1028656.557101] libceph: mon1 (1)172.28.20.102:6789 socket closed (con state OPEN)
Dec  7 14:35:04 compute01 kernel: [1028675.487212] libceph: mon0 (1)172.28.20.101:6789 session established
Dec  7 14:35:05 compute01 kernel: [1028675.649419] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state OPEN)
Dec  7 14:35:05 compute01 kernel: [1028675.649466] libceph: mon0 (1)172.28.20.101:6789 session lost, hunting for new mon
Dec  7 14:35:05 compute01 kernel: [1028675.651955] libceph: mon2 (1)172.28.20.103:6789 session established
Dec  7 14:35:05 compute01 kernel: [1028675.819309] libceph: mon2 (1)172.28.20.103:6789 socket closed (con state OPEN)
Dec  7 14:35:05 compute01 kernel: [1028675.819347] libceph: mon2 (1)172.28.20.103:6789 session lost, hunting for new mon
Dec  7 14:35:05 compute01 kernel: [1028675.819499] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state V1_BANNER)
Dec  7 14:35:05 compute01 kernel: [1028676.072497] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state V1_BANNER)
Dec  7 14:35:05 compute01 kernel: [1028676.576644] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state V1_BANNER)
Dec  7 14:35:06 compute01 kernel: [1028677.596565] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state V1_BANNER)
Dec  7 14:35:09 compute01 kernel: [1028679.836608] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state V1_BANNER)
Dec  7 14:35:13 compute01 kernel: [1028684.192490] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state V1_BANNER)
Dec  7 14:35:21 compute01 kernel: [1028692.127143] libceph: mon0 (1)172.28.20.101:6789 session established
Dec  7 14:35:21 compute01 kernel: [1028692.280205] libceph: mon0 (1)172.28.20.101:6789 socket closed (con state OPEN)
Dec  7 14:35:21 compute01 kernel: [1028692.280268] libceph: mon0 (1)172.28.20.101:6789 session lost, hunting for new mon
Dec  7 14:35:25 compute01 kernel: [1028695.964239] ceph: No mds server is up or the cluster is laggy

ceph-mon.log:

Standby daemons:

[mds.ceph03{-1:1039214} state up:standby seq 1 addr [v2:172.28.20.103:6848/819823202,v1:172.28.20.103:6849/819823202] compat {c=[1],r=[1],i=[7ff]}]
[mds.ceph02{-1:1059471} state up:standby seq 1 addr [v2:172.28.20.102:6848/3629915987,v1:172.28.20.102:6849/3629915987] compat {c=[1],r=[1],i=[7ff]}]

   -76> 2022-12-01T17:09:02.288+0800 7f2221064980  0 mon.ceph01@-1(???).osd e2756 crush map has features 3314933000854323200, adjusting msgr requires
   -75> 2022-12-01T17:09:02.288+0800 7f2221064980  0 mon.ceph01@-1(???).osd e2756 crush map has features 432629239337189376, adjusting msgr requires
   -74> 2022-12-01T17:09:02.288+0800 7f2221064980  0 mon.ceph01@-1(???).osd e2756 crush map has features 432629239337189376, adjusting msgr requires
   -73> 2022-12-01T17:09:02.288+0800 7f2221064980  0 mon.ceph01@-1(???).osd e2756 crush map has features 432629239337189376, adjusting msgr requires
   -72> 2022-12-01T17:09:02.288+0800 7f2221064980  1 mon.ceph01@-1(???).paxosservice(auth 503..550) refresh upgraded, format 0 -> 3
   -71> 2022-12-01T17:09:02.288+0800 7f2221064980  4 mon.ceph01@-1(???).mgr e0 loading version 134
   -70> 2022-12-01T17:09:02.292+0800 7f2221064980  4 mon.ceph01@-1(???).mgr e134 active server: [v2:172.28.20.101:6804/40020,v1:172.28.20.101:6808/40020](1054101)
   -69> 2022-12-01T17:09:02.292+0800 7f2221064980  4 mon.ceph01@-1(???).mgr e134 mkfs or daemon transitioned to available, loading commands
   -68> 2022-12-01T17:09:02.292+0800 7f2221064980  4 set_mon_vals no callback set
   -67> 2022-12-01T17:09:02.292+0800 7f2221064980 10 set_mon_vals auth_allow_insecure_global_id_reclaim = false
   -66> 2022-12-01T17:09:02.292+0800 7f2221064980  4 set_mon_vals no callback set
   -65> 2022-12-01T17:09:02.292+0800 7f2221064980  2 auth: KeyRing::load: loaded key file /var/lib/ceph/mon/ceph-ceph01/keyring
   -64> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command compact hook 0x55e49ce57070
   -63> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command smart hook 0x55e49ce57070
   -62> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command mon_status hook 0x55e49ce57070
   -61> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command heap hook 0x55e49ce57070
   -60> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command connection scores dump hook 0x55e49ce57070
   -59> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command connection scores reset hook 0x55e49ce57070
   -58> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command sync_force hook 0x55e49ce57070
   -57> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command add_bootstrap_peer_hint hook 0x55e49ce57070
   -56> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command add_bootstrap_peer_hintv hook 0x55e49ce57070
   -55> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command quorum enter hook 0x55e49ce57070
   -54> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command quorum exit hook 0x55e49ce57070
   -53> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command ops hook 0x55e49ce57070
   -52> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command sessions hook 0x55e49ce57070
   -51> 2022-12-01T17:09:02.292+0800 7f2221064980  5 asok(0x55e49cba00f0) register_command dump_historic_ops hook 0x55e49ce57070
   -50> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding auth protocol: cephx
   -49> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding auth protocol: cephx
   -48> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding auth protocol: cephx
   -47> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: secure
   -46> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: crc
   -45> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: secure
   -44> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: crc
   -43> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: secure
   -42> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: crc
   -41> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: crc
   -40> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: secure
   -39> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: crc
   -38> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: secure
   -37> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: crc
   -36> 2022-12-01T17:09:02.292+0800 7f2221064980  5 AuthRegistry(0x55e49cdef490) adding con mode: secure
   -35> 2022-12-01T17:09:02.292+0800 7f2221064980  2 auth: KeyRing::load: loaded key file /var/lib/ceph/mon/ceph-ceph01/keyring
   -34> 2022-12-01T17:09:02.292+0800 7f2221064980  2 mon.ceph01@-1(???) e4 init
   -33> 2022-12-01T17:09:02.296+0800 7f2221064980  4 mgrc handle_mgr_map Got map version 134
   -32> 2022-12-01T17:09:02.296+0800 7f2221064980  4 mgrc handle_mgr_map Active mgr is now [v2:172.28.20.101:6804/40020,v1:172.28.20.101:6808/40020]
   -31> 2022-12-01T17:09:02.296+0800 7f2221064980  4 mgrc reconnect Starting new session with [v2:172.28.20.101:6804/40020,v1:172.28.20.101:6808/40020]
   -30> 2022-12-01T17:09:02.296+0800 7f2221064980  0 mon.ceph01@-1(probing) e4  my rank is now 0 (was -1)
   -29> 2022-12-01T17:09:02.312+0800 7f2208ff9640  5 mon.ceph01@0(probing) e4 _ms_dispatch setting monitor caps on this connection
   -28> 2022-12-01T17:09:02.312+0800 7f2200ff9640  4 mgrc handle_mgr_configure stats_period=5
   -27> 2022-12-01T17:09:02.312+0800 7f2200ff9640  4 mgrc handle_mgr_configure updated stats threshold: 5
   -26> 2022-12-01T17:09:02.312+0800 7f2208ff9640  0 log_channel(cluster) log [INF] : mon.ceph01 calling monitor election
   -25> 2022-12-01T17:09:02.312+0800 7f2208ff9640 10 log_client _send_to_mon log to self
   -24> 2022-12-01T17:09:02.312+0800 7f2208ff9640 10 log_client  log_queue is 1 last_log 1 sent 0 num 1 unsent 1 sending 1
   -23> 2022-12-01T17:09:02.312+0800 7f2208ff9640 10 log_client  will send 2022-12-01T17:09:02.316047+0800 mon.ceph01 (mon.0) 1 : cluster [INF] mon.ceph01 calling monitor election
   -22> 2022-12-01T17:09:02.312+0800 7f2208ff9640  5 paxos.0).electionLogic(0) start -- can i be leader?
   -21> 2022-12-01T17:09:02.312+0800 7f2208ff9640  1 paxos.0).electionLogic(164) init, last seen epoch 164
   -20> 2022-12-01T17:09:02.332+0800 7f2208ff9640  5 mon.ceph01@0(electing) e4 _ms_dispatch setting monitor caps on this connection
   -19> 2022-12-01T17:09:02.332+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.335937+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
   -18> 2022-12-01T17:09:02.332+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.335963+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
   -17> 2022-12-01T17:09:02.332+0800 7f2208ff9640  5 mon.ceph01@0(electing).elector(165) handle_propose from mon.1
   -16> 2022-12-01T17:09:02.336+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.339202+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
   -15> 2022-12-01T17:09:02.336+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.339228+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
   -14> 2022-12-01T17:09:02.336+0800 7f2208ff9640  5 mon.ceph01@0(electing).elector(165) handle_ack from mon.1
   -13> 2022-12-01T17:09:02.336+0800 7f2208ff9640  5 mon.ceph01@0(electing).elector(165)  so far i have { mon.0: features 4540138303579357183 mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus,octopus,pacific,elector-pinging,quincy]) }
   -12> 2022-12-01T17:09:02.496+0800 7f2208ff9640  5 mon.ceph01@0(electing) e4 _ms_dispatch setting monitor caps on this connection
   -11> 2022-12-01T17:09:02.500+0800 7f2208ff9640  5 mon.ceph01@0(electing).elector(165) handle_propose from mon.2
   -10> 2022-12-01T17:09:02.504+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.507038+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -9> 2022-12-01T17:09:02.504+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.507075+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -8> 2022-12-01T17:09:02.512+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.517128+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -7> 2022-12-01T17:09:02.512+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.517161+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -6> 2022-12-01T17:09:02.516+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.520651+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -5> 2022-12-01T17:09:02.516+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.520685+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -4> 2022-12-01T17:09:02.540+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.542810+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -3> 2022-12-01T17:09:02.540+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.542846+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -2> 2022-12-01T17:09:02.540+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.542870+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
    -1> 2022-12-01T17:09:02.540+0800 7f2208ff9640  5 mon.ceph01@0(electing).paxos(paxos recovering c 1667896..1668556) is_readable = 0 - now=2022-12-01T17:09:02.542880+0800 lease_expire=1970-01-01T08:00:00.000000+0800 has v0 lc 1668556
     0> 2022-12-01T17:09:02.592+0800 7f2208ff9640 -1 *** Caught signal (Segmentation fault) **
 in thread 7f2208ff9640 thread_name:ms_dispatch

 ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f2221797520]
 2: (std::_Rb_tree_increment(std::_Rb_tree_node_base*)+0x24) [0x7f2221b4a544]
 3: (FSMap::filter(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+0x17c) [0x55e49b651c3c]
 4: (MDSMonitor::check_sub(Subscription*)+0x82) [0x55e49b82c682]
 5: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0x831) [0x55e49b61ba81]
 6: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x66e) [0x55e49b63a3fe]
 7: (Monitor::_ms_dispatch(Message*)+0x3fa) [0x55e49b63b3ba]
 8: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x4b) [0x55e49b66a2bb]
 9: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x450) [0x7f22221af270]
 10: (DispatchQueue::entry()+0x5ff) [0x7f22221ac65f]
 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f222226ee71]
 12: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f22217e9b43]
 13: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f222187ba00]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.  ()

Actions #1

Updated by Ilya Dryomov 11 months ago

  • Target version deleted (v17.2.6)
Actions

Also available in: Atom PDF