Project

General

Profile

Bug #20535

mds segmentation fault ceph_lock_state_t::get_overlapping_locks

Added by Webert Lima almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature:

Description

The Active MDS crashes, all clients freeze and the standby (or standby-replay) daemon takes hours to recover.

ceph version 10.2.2-1-gad1a6d7 (ad1a6d77eda05dd1a0190814022e73a56f630117)
1: (()+0x4ec572) [0x563d0e1de572]
2: (()+0x10340) [0x7f7023876340]
3: (ceph_lock_state_t::get_overlapping_locks(ceph_filelock const&, std::list<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> >, std::allocator<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> > > >&, std::list<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> >, std::allocator<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> > > >)+0x21) [0x563d0e301c51]
4: (ceph_lock_state_t::is_deadlock(ceph_filelock const&, std::list<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> >, std::allocator<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> > > >&, ceph_filelock const
, unsigned int)+0x655) [0x563d0e3034b5]
5: (ceph_lock_state_t::add_lock(ceph_filelock&, bool, bool, bool*)+0x3da) [0x563d0e306b8a]
6: (Server::handle_client_file_setlock(std::shared_ptr<MDRequestImpl>&)+0x409) [0x563d0df58d19]
7: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0x9a1) [0x563d0df84e91]
8: (Server::handle_client_request(MClientRequest*)+0x47f) [0x563d0df854ef]
9: (Server::dispatch(Message*)+0x3bb) [0x563d0df8973b]
10: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x563d0df0fc0c]
11: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d0df18d01]
12: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d0df19e55]
13: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d0df01c23]
14: (DispatchQueue::entry()+0x78b) [0x563d0e3c705b]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d0e2b5e0d]
16: (()+0x8182) [0x7f702386e182]
17: (clone()+0x6d) [0x7f7021dc547d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Usage scenario:

  • Single Active MDS
  • Single FS
  • Multiple Mountpoints
  • 2 Subtrees
  • Used for Dovecot LMTP/IMAP/POP3

~# ceph mds dump
dumped fsmap epoch 18434
fs_name cephfs
epoch 18434
flags 0
created 2016-08-01 11:07:47.592124
modified 2017-07-03 10:32:44.426431
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 12530
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
max_mds 1
in 0
up {0=1042087}
failed
damaged
stopped
data_pools 8,9
metadata_pool 7
inline_data disabled
1042087: 10.0.2.3:6800/28835 'c' mds.0.18417 up:active seq 289417 (standby for rank 0)
1382729: 10.0.2.4:6800/22945 'd' mds.0.0 up:standby-replay seq 379 (standby for rank 0)

~# ceph -v
ceph version 10.2.2-1-gad1a6d7 (ad1a6d77eda05dd1a0190814022e73a56f630117)

~# ceph osd pool ls detail
pool 5 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 3051 lfor 1736 flags hashpspool tiers 6 read_tier 6 write_tier 6 stripe_width 0
removed_snaps [1~3]
pool 6 'rbd_cache' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 12440 flags hashpspool,incomplete_clones tier_of 5 cache_mode writeback hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 1200s x0 decay_rate 0 search_last_n 0 stripe_width 0
removed_snaps [1~3]
pool 7 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9597 flags hashpspool stripe_width 0
pool 8 'cephfs_data_ssd' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9599 flags hashpspool stripe_width 0
pool 9 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9617 lfor 9610 flags hashpspool crash_replay_interval 45 tiers 10 read_tier 10 write_tier 10 stripe_width 0
pool 10 'cephfs_data_cache' replicated size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 12510 flags hashpspool,incomplete_clones tier_of 9 cache_mode writeback target_bytes 268435456000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 0s x0 decay_rate 0 search_last_n 0 stripe_width 0

~# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data_ssd cephfs_data ]

:~# getfattr -d -m ceph.dir.* /srv/dovecot2/mail
getfattr: Removing leading '/' from absolute path names
  1. file: srv/dovecot2/mail
    ceph.dir.entries="1613"
    ceph.dir.files="0"
    ceph.dir.rbytes="18757341556329"
    ceph.dir.rctime="1499371837.09279502014"
    ceph.dir.rentries="5048251"
    ceph.dir.rfiles="4449946"
    ceph.dir.rsubdirs="598305"
    ceph.dir.subdirs="1613"
~# getfattr -d -m ceph.dir.* /srv/dovecot2/index
getfattr: Removing leading '/' from absolute path names
  1. file: srv/dovecot2/index
    ceph.dir.entries="1584"
    ceph.dir.files="0"
    ceph.dir.rbytes="52658042908"
    ceph.dir.rctime="1499371833.09531511882"
    ceph.dir.rentries="927423"
    ceph.dir.rfiles="582449"
    ceph.dir.rsubdirs="344974"
    ceph.dir.subdirs="1584"

ceph-mds.tgz (490 KB) Webert Lima, 07/06/2017 08:19 PM


Related issues

Copied to fs - Backport #20564: jewel: mds segmentation fault ceph_lock_state_t::get_overlapping_locks Resolved

History

#1 Updated by Zheng Yan almost 3 years ago

I recently found a bug. It can explain the crash.

https://github.com/ceph/ceph/pull/15440

commit 0d71c6120e61f31b803c3fb6488fc7e97134e348
Author: Yan, Zheng <zyan@redhat.com>
Date:   Sat Jun 3 12:06:10 2017 +0800

    mds/flock: properly remove item from global_waiting_locks

    ceph_lock_state_t::remove_waiting() uses wrong key to search
    global_waiting_locks. It should use item in waiting_locks as
    key.

    Signed-off-by: "Yan, Zheng" <zyan@redhat.com>

For the slow recovery. are there "MDS health message (mds.0): Behind on trimming (xxx/100)" warnings in cluster log. how many caps do cephfs clients use (ceph daemon mds.x session ls) ?

#2 Updated by Webert Lima almost 3 years ago

Running it today on the same cluster that crashed (client usage is about the same by the time):

~# ceph daemon mds.c session ls
[ {
"id": 889364,
"num_leases": 302,
"num_caps": 170838,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.889364 10.0.2.192:0\/77167272",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-fe01.m9.network",
"kernel_version": "4.4.0-22-generic"
}
}, {
"id": 864144,
"num_leases": 0,
"num_caps": 18778,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.864144 10.0.2.1:0\/1943224316",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-ds01.m9.network",
"kernel_version": "4.4.0-22-generic"
}
}, {
"id": 1030234,
"num_leases": 325,
"num_caps": 183941,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.1030234 10.0.2.200:0\/3543349573",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail00-fe02.m9.network",
"kernel_version": "4.4.0-24-generic"
}
}, {
"id": 945213,
"num_leases": 234,
"num_caps": 138801,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.945213 10.0.2.193:0\/3058601307",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-fe02.m9.network",
"kernel_version": "4.4.0-22-generic"
}
}, {
"id": 1029565,
"num_leases": 0,
"num_caps": 2588906,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.1029565 10.0.2.19:0\/1447852959",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-ds05.m9.network",
"kernel_version": "4.4.0-62-generic"
}
}
]

#3 Updated by Webert Lima almost 3 years ago

There is something interesting here. The host "bhs1-mail02-ds05.m9.network" has the highest number of caps (over 2.5M) but nothing on this host is actually using cephfs. It's just mounted there.

#4 Updated by Zheng Yan almost 3 years ago

{
"id": 1029565,
"num_leases": 0,
"num_caps": 2588906,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.1029565 10.0.2.19:0\/1447852959",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-ds05.m9.network",
"kernel_version": "4.4.0-62-generic"
}

2588906 is quite large. If I am right, the recovery mds stuck in rejoin state for a long time.

please check how many inodes and open files on each machine.
sudo cat /proc/sys/fs/file-nr
sudo cat /proc/sys/fs/inode-nr

If the actual number of open files are not that large, we can do some quick optimization. Otherwise, we need add new mechanism to track open files for quick recovery

#5 Updated by Webert Lima almost 3 years ago

root@bhs1-mail02-ds05:~# cat /proc/sys/fs/file-nr
3360 0 3273932
root@bhs1-mail02-ds05:~# cat /proc/sys/fs/inode-nr
4961157 916551

bhs1-mail02-ds05.m9.network use to run an instance of dovecot too, so cephfs is mounted there, but it's not running anymore since last week.

#6 Updated by Zheng Yan almost 3 years ago

Webert Lima wrote:

root@bhs1-mail02-ds05:~# cat /proc/sys/fs/file-nr
3360 0 3273932
root@bhs1-mail02-ds05:~# cat /proc/sys/fs/inode-nr
4961157 916551

how about other machines

bhs1-mail02-ds05.m9.network use to run an instance of dovecot too, so cephfs is mounted there, but it's not running anymore since last week.

It's better to umount it, or run 'echo 3 > /proc/sys/vm/drop_caches"

#7 Updated by Webert Lima almost 3 years ago

Zheng Yan wrote:

how about other machines

Sorry I didn't check at the time. I'll post everything as it looks today.

It's better to umount it, or run 'echo 3 > /proc/sys/vm/drop_caches"

I went to umount it and I saw that the CAPs dropped to 3 on that host. I've umounted it anyway.

These are all hosts on this cluster today:

root@bhs1-mail00-fe02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
355409 2659
88544 0 3274282
root@bhs1-mail02-fe01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
423591 9083
101152 0 3274269
root@bhs1-mail02-fe02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
401796 3688
102208 0 3274268
root@bhs1-mail02-ds01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
1767614 1389515
4224 0 3274268
root@bhs1-mail02-ds02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
2545540 1547328
4032 0 3274268
root@bhs1-mail02-ds03:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
379577 162589
4576 0 3274268
root@bhs1-mail02-ds04:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
497047 203917
6912 0 3274268
root@bhs1-mail02-ds05:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
456680 423484
3520 0 3273932

Hosts with cephfs mounted:

bhs1-mail00-fe02 (used by dovecot)
bhs1-mail02-fe01 (used by dovecot)
bhs1-mail02-fe02 (used by dovecot)
bhs1-mail02-ds01 (used by custom script)

and these are from another cluster that is nearly identical:

root@bhs1-mail00-fe01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
96268 221
7648 0 3274282

root@bhs1-mail01-fe01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
597439 200002
53824 0 3274268
root@bhs1-mail01-fe02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
346847 15250
81600 0 3274268
root@bhs1-mail01-ds01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
344420 252542
4064 0 3274267
root@bhs1-mail01-ds02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
100984 47608
3936 0 3274268
root@bhs1-mail01-ds03:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
281508 80696
7008 0 3274268
root@bhs1-mail01-ds04:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
274520 77608
4032 0 3274268
root@bhs1-mail01-ds05:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
184926 25445
108544 0 3273933

Hosts with cephfs mounted:

bhs1-mail00-fe01 (mounted but not used)
bhs1-mail01-fe01 (mounted but not used)
bhs1-mail01-fe02 (used by dovecot)
bhs1-mail01-ds01 (used by custom script)
bhs1-mail01-ds05 (used by dovecot)

#8 Updated by Webert Lima almost 3 years ago

and this is current session ls from each Active MDS on each of both clusters:

root@bhs1-mail01-ds02:~# ceph daemon mds.b session ls | grep "num_caps\|hostname" 
"num_caps": 1429,
"hostname": "bhs1-mail01-fe01.m9.network",
"num_caps": 100868,
"hostname": "bhs1-mail01-ds05",
"num_caps": 1776,
"hostname": "bhs1-mail01-ds01",
"num_caps": 3,
"hostname": "bhs1-mail00-fe01",
"num_caps": 154609,
"hostname": "bhs1-mail01-fe02",
root@bhs1-mail02-ds03:~# ceph daemon mds.c session ls | grep "num_caps\|hostname" 
"num_caps": 204144,
"hostname": "bhs1-mail02-fe01",
"num_caps": 2788,
"hostname": "bhs1-mail02-ds01",
"num_caps": 198677,
"hostname": "bhs1-mail00-fe02",
"num_caps": 173802,
"hostname": "bhs1-mail02-fe02",

#9 Updated by Patrick Donnelly almost 3 years ago

  • Status changed from New to Closed

Okay, it appears the deadlock is fixed. Please open a new ticket if you're still seeing issues with rejoin taking unreasonable amounts of time.

#10 Updated by Webert Lima almost 3 years ago

Patrick Donnelly wrote:

Okay, it appears the deadlock is fixed.

I'm sorry. Do you refer to that commit? If so, is there a schedule for it to be on a LTS version?

Please open a new ticket if you're still seeing issues with rejoin taking unreasonable amounts of time.

I'm pretty sure the crash will happen again, this was like the 7th time in the last 2 months.
Any suggestion from preventing the mds crash from happening?

#11 Updated by Patrick Donnelly almost 3 years ago

  • Status changed from Closed to In Progress
  • Assignee set to Zheng Yan

Zheng, PR#15440 indicates it's a multimds fix but Webert's setup is single MDS. Any issues you see backporting the fix?

I'm sorry. Do you refer to that commit? If so, is there a schedule for it to be on a LTS version?

It will be in the imminent Luminous release but now I think it should be backported to jewel/kraken if feasible, since this bug apparently affects single active MDS setups.

Any suggestion from preventing the mds crash from happening?

You can either backport the fix yourself, wait for the backport (which could happen after Luminous) or Luminous, or update to the latest Luminous RC if you feel comfortable doing that.

#12 Updated by Nathan Cutler almost 3 years ago

it should be backported to jewel/kraken if feasible

Consider backport to jewel only, since kraken goes EOL the moment Luminous is declared stable.

In other words, there will be a 11.2.1 but I doubt there will be in a 11.2.2.

#13 Updated by Webert Lima almost 3 years ago

Patrick Donnelly wrote:

Zheng, PR#15440 indicates it's a multimds fix but Webert's setup is single MDS. Any issues you see backporting the fix?

You're right. It's a Single Active-MDS setup. Used to run Active/Standby and now I run Active/Standby-Replay pairs.

You can either backport the fix yourself, wait for the backport (which could happen after Luminous) or Luminous, or update to the latest Luminous RC if you feel comfortable doing that.

I'm not really comfortable in doing a upgrade like that, as the service and data availability is very critical here. I feel more comfortable in upgrading to the latest Jewel version and backporting the fix if that doesn't break anything else. Could you point me directions on doing the backport myself?

Obs: our current version is a backport done by you guys last year because of this: http://tracker.ceph.com/issues/15920

#14 Updated by Patrick Donnelly almost 3 years ago

I'm not really comfortable in doing a upgrade like that, as the service and data availability is very critical here. I feel more comfortable in upgrading to the latest Jewel version and backporting the fix if that doesn't break anything else. Could you point me directions on doing the backport myself?

I went ahead and did it since our CI will build the repos for you: https://shaman.ceph.com/builds/ceph/i20535-backport/2223e478c4b770e75cb7db196f5cd9d985929ac9/default/55553/

PR: https://github.com/ceph/ceph/pull/16248

#15 Updated by Patrick Donnelly almost 3 years ago

  • Status changed from In Progress to Pending Backport

#16 Updated by Webert Lima almost 3 years ago

Patrick Donnelly wrote:

I went ahead and did it since our CI will build the repos for you: https://shaman.ceph.com/builds/ceph/i20535-backport/2223e478c4b770e75cb7db196f5cd9d985929ac9/default/55553/

Oh I really appreciate that! I see DIST=xenial, will that run on trusty too?

#17 Updated by Patrick Donnelly almost 3 years ago

I meant to link: https://shaman.ceph.com/builds/ceph/i20535-backport/2223e478c4b770e75cb7db196f5cd9d985929ac9/

Looks like something went wrong in the build. I'll take a look.

#18 Updated by Webert Lima almost 3 years ago

Awesome! Thanks!

#20 Updated by Nathan Cutler almost 3 years ago

  • Backport set to jewel

#21 Updated by Nathan Cutler almost 3 years ago

  • Copied to Backport #20564: jewel: mds segmentation fault ceph_lock_state_t::get_overlapping_locks added

#22 Updated by Webert Lima almost 3 years ago

Patrick Donnelly wrote:

New run: https://shaman.ceph.com/builds/ceph/i20535-backport/387e184970bc2949e16139db0cbda6acfa3f7b3a/

Thanks! It looks like it's build.
Couple questions:
- any recommendations choosing over notcmalloc or default flavors?
- what I have to do is download the packages at ubuntu/trusty/flavors/default/pool/main/c/ceph/ and install them? (respecting some update guide)

#23 Updated by Patrick Donnelly almost 3 years ago

Webert Lima wrote:

Couple questions:
- any recommendations choosing over notcmalloc or default flavors?

Generally you want default (i.e. with tcmalloc).

- what I have to do is download the packages at ubuntu/trusty/flavors/default/pool/main/c/ceph/ and install them? (respecting some update guide)

I think you should be able to add the keys according to this guide:

http://docs.ceph.com/docs/master/install/get-packages/

Then manually install the downloaded package (which I don't know how to do).

#24 Updated by Webert Lima almost 3 years ago

Ok. I'll do it in two small clusters tomorrow and I'll update this troublesome clusters next week.
Thanks a lot for that backport.

#25 Updated by Webert Lima almost 3 years ago

Does this built package include the fix for the MDS regression that was found in 10.2.8? I read about it in the mailing list.

#26 Updated by Patrick Donnelly almost 3 years ago

Webert Lima wrote:

Does this built package include the fix for the MDS regression that was found in 10.2.8? I read about it in the mailing list.

It does not include the fix. Do not use that branch. I'll make a note to update it...

#27 Updated by Webert Lima almost 3 years ago

Patrick Donnelly wrote:

It does not include the fix. Do not use that branch. I'll make a note to update it...

Thanks. I'll be waiting.

#29 Updated by Webert Lima almost 3 years ago

Patrick Donnelly wrote:

https://shaman.ceph.com/builds/ceph/i20535-backport-v10.2.9/

Thanks, I'll be upgrading all clusters by the ned of next week.
Will that repo be permanently available or does it expire?

#30 Updated by Patrick Donnelly almost 3 years ago

It will expire in a week or two.

#31 Updated by Webert Lima almost 3 years ago

ok i'll download them just in case.

#32 Updated by Patrick Donnelly almost 3 years ago

Any update?

#33 Updated by Webert Lima almost 3 years ago

Patrick Donnelly wrote:

Any update?

Hey Patrick, I have upgrade one test cluster first, but it keeps as HEALTH_WARN:

health HEALTH_WARN
all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set

#34 Updated by Patrick Donnelly almost 3 years ago

Webert Lima wrote:

Patrick Donnelly wrote:

Any update?

Hey Patrick, I have upgrade one test cluster first, but it keeps as HEALTH_WARN:

health HEALTH_WARN
all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set

See this announcement: http://ceph.com/geen-categorie/v10-2-4-jewel-released/

#35 Updated by Webert Lima almost 3 years ago

Patrick Donnelly wrote:

See this announcement: http://ceph.com/geen-categorie/v10-2-4-jewel-released/

Thank you Patrick. So that was because I was running 10.2.2 and that flag didn't exist on that version.
The update on the test cluster went well. I've being cautious about it because I never did a ceph update before.

I have the 3 production clusters' update scheduled for Saturday 29th.
I was supposed to do it by tomorrow but it won't be possible due to other issues.

I'll keep you informed.

#36 Updated by Webert Lima almost 3 years ago

One of our production clusters upgraded.
Next one scheduled for next Wednesday, August 2nd.

#37 Updated by Webert Lima almost 3 years ago

Just upgraded the other 2 production clusters where the problem tends to happen frequently.
Will watch from now on.

#38 Updated by Patrick Donnelly almost 3 years ago

Webert Lima wrote:

Just upgraded the other 2 production clusters where the problem tends to happen frequently.
Will watch from now on.

Thanks for keeping us in the loop!

#39 Updated by Webert Lima almost 3 years ago

Reporting in, I've had the first incident after the version upgrade.

My active MDS had committed suicide due to "dne in mds map" (this is happening a lot but I don't think it's related to the upgrade)
When the recovery was already done, a new crash happened (never seen this one before):

2017-08-19 06:37:54.542005 7f9721654700  1 mds.0.1370 rejoin_joint_start
2017-08-19 06:37:54.542206 7f971ec4d700 1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15
2017-08-19 07:27:07.427682 7f971cc49700 1 mds.0.1370 rejoin_done
2017-08-19 07:27:08.658514 7f9721654700 1 mds.0.1370 handle_mds_map i am now mds.0.1370
2017-08-19 07:27:08.658522 7f9721654700 1 mds.0.1370 handle_mds_map state change up:rejoin --> up:active
2017-08-19 07:27:08.658539 7f9721654700 1 mds.0.1370 recovery_done -- successful recovery!
2017-08-19 07:27:08.658709 7f9721654700 1 mds.0.1370 active_start
2017-08-19 07:27:11.154702 7f9721654700 -1 mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7f9721654700 time 2017-08-19 07:27:11.152839
mds/Locker.cc: 4924: FAILED assert(lock->get_state() == LOCK_PRE_SCAN)
ceph version 10.2.9-4-gbeaec39 (beaec397f00491079cd74f7b9e3e10660859e26b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x565179789adb]
2: (Locker::file_recover(ScatterLock*)+0x1c9) [0x565179523a19]
3: (MDCache::start_files_to_recover()+0xd3) [0x565179455813]
4: (MDSRank::active_start()+0x8e) [0x5651793bf56e]
5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x10eb) [0x5651793d0f6b]
6: (MDSDaemon::handle_mds_map(MMDSMap*)+0xe07) [0x5651793aa557]
7: (MDSDaemon::handle_core_message(Message*)+0x78b) [0x5651793ab9fb]
8: (MDSDaemon::ms_dispatch(Message*)+0xab) [0x5651793abc5b]
9: (DispatchQueue::entry()+0x78b) [0x56517988668b]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x56517976f38d]
11: (()+0x8184) [0x7f972701c184]
12: (clone()+0x6d) [0x7f9725570bed]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

#40 Updated by Patrick Donnelly almost 3 years ago

Webert Lima wrote:

My active MDS had committed suicide due to "dne in mds map" (this is happening a lot but I don't think it's related to the upgrade)

Probably related to: http://tracker.ceph.com/issues/19706

When the recovery was already done, a new crash happened (never seen this one before):

2017-08-19 06:37:54.542005 7f9721654700 1 mds.0.1370 rejoin_joint_start
2017-08-19 06:37:54.542206 7f971ec4d700 1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15
2017-08-19 07:27:07.427682 7f971cc49700 1 mds.0.1370 rejoin_done
2017-08-19 07:27:08.658514 7f9721654700 1 mds.0.1370 handle_mds_map i am now mds.0.1370
2017-08-19 07:27:08.658522 7f9721654700 1 mds.0.1370 handle_mds_map state change up:rejoin --> up:active
2017-08-19 07:27:08.658539 7f9721654700 1 mds.0.1370 recovery_done -- successful recovery!
2017-08-19 07:27:08.658709 7f9721654700 1 mds.0.1370 active_start
2017-08-19 07:27:11.154702 7f9721654700 -1 mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7f9721654700 time 2017-08-19 07:27:11.152839
mds/Locker.cc: 4924: FAILED assert(lock->get_state() == LOCK_PRE_SCAN)
ceph version 10.2.9-4-gbeaec39 (beaec397f00491079cd74f7b9e3e10660859e26b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x565179789adb]
2: (Locker::file_recover(ScatterLock*)+0x1c9) [0x565179523a19]
3: (MDCache::start_files_to_recover()+0xd3) [0x565179455813]
4: (MDSRank::active_start()+0x8e) [0x5651793bf56e]
5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x10eb) [0x5651793d0f6b]
6: (MDSDaemon::handle_mds_map(MMDSMap*)+0xe07) [0x5651793aa557]
7: (MDSDaemon::handle_core_message(Message*)+0x78b) [0x5651793ab9fb]
8: (MDSDaemon::ms_dispatch(Message*)+0xab) [0x5651793abc5b]
9: (DispatchQueue::entry()+0x78b) [0x56517988668b]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x56517976f38d]
11: (()+0x8184) [0x7f972701c184]
12: (clone()+0x6d) [0x7f9725570bed]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Please fork another issue for this one.

#41 Updated by Patrick Donnelly almost 3 years ago

  • Status changed from Pending Backport to Resolved

Webert, the backport is merged so I'm marking this as resolved. If you experience this particular issue again, please reopen. Otherwise, any new problems please make a new tracker issue.

#42 Updated by Webert Lima over 2 years ago

Probably related to: http://tracker.ceph.com/issues/19706

I'll keep an eye on it. I'm suspecting out of sync clocks may be causing it.

Please fork another issue for this one.

I'll do it if it happens again. Still hasn't.

Thanks a lot Patrick and @everyone involved.

Also available in: Atom PDF