Bug #20535: mds segmentation fault ceph_lock_state_t::get_overlapping_locks - CephFS - Ceph

Actions

Copy link

Bug #20535

closed

mds segmentation fault ceph_lock_state_t::get_overlapping_locks

Added by Webert Lima almost 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Zheng Yan

Category:

Target version:

% Done:

Source:

Tags:

Backport:

jewel

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

Ceph - v10.2.2

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

The Active MDS crashes, all clients freeze and the standby (or standby-replay) daemon takes hours to recover.

ceph version 10.2.2-1-gad1a6d7 (ad1a6d77eda05dd1a0190814022e73a56f630117)
 1: (()+0x4ec572) [0x563d0e1de572]
 2: (()+0x10340) [0x7f7023876340]
 3: (ceph_lock_state_t::get_overlapping_locks(ceph_filelock const&, std::list&lt;std::_Rb_tree_iterator&lt;std::pair&lt;unsigned long const, ceph_filelock&gt; >, std::allocator&lt;std::_Rb_tree_iterator&lt;std::pair&lt;unsigned long const, ceph_filelock&gt; > > >&, std::list&lt;std::_Rb_tree_iterator&lt;std::pair&lt;unsigned long const, ceph_filelock&gt; >, std::allocator&lt;std::_Rb_tree_iterator&lt;std::pair&lt;unsigned long const, ceph_filelock&gt; > > >)+0x21) [0x563d0e301c51]
 4: (ceph_lock_state_t::is_deadlock(ceph_filelock const&, std::list&lt;std::_Rb_tree_iterator&lt;std::pair&lt;unsigned long const, ceph_filelock&gt; >, std::allocator&lt;std::_Rb_tree_iterator&lt;std::pair&lt;unsigned long const, ceph_filelock&gt; > > >&, ceph_filelock const, unsigned int)+0x655) [0x563d0e3034b5]
 5: (ceph_lock_state_t::add_lock(ceph_filelock&, bool, bool, bool*)+0x3da) [0x563d0e306b8a]
 6: (Server::handle_client_file_setlock(std::shared_ptr&lt;MDRequestImpl&gt;&)+0x409) [0x563d0df58d19]
 7: (Server::dispatch_client_request(std::shared_ptr&lt;MDRequestImpl&gt;&)+0x9a1) [0x563d0df84e91]
 8: (Server::handle_client_request(MClientRequest*)+0x47f) [0x563d0df854ef]
 9: (Server::dispatch(Message*)+0x3bb) [0x563d0df8973b]
 10: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x563d0df0fc0c]
 11: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d0df18d01]
 12: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d0df19e55]
 13: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d0df01c23]
 14: (DispatchQueue::entry()+0x78b) [0x563d0e3c705b]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d0e2b5e0d]
 16: (()+0x8182) [0x7f702386e182]
 17: (clone()+0x6d) [0x7f7021dc547d]
 NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

Usage scenario:

Single Active MDS
Single FS
Multiple Mountpoints
2 Subtrees
Used for Dovecot LMTP/IMAP/POP3

~# ceph mds dump
dumped fsmap epoch 18434
fs_name cephfs
epoch 18434
flags 0
created 2016-08-01 11:07:47.592124
modified 2017-07-03 10:32:44.426431
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 12530
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
max_mds 1
in 0
up {0=1042087}
failed
damaged
stopped
data_pools 8,9
metadata_pool 7
inline_data disabled
1042087: 10.0.2.3:6800/28835 'c' mds.0.18417 up:active seq 289417 (standby for rank 0)
1382729: 10.0.2.4:6800/22945 'd' mds.0.0 up:standby-replay seq 379 (standby for rank 0)

~# ceph -v
ceph version 10.2.2-1-gad1a6d7 (ad1a6d77eda05dd1a0190814022e73a56f630117)

~# ceph osd pool ls detail
pool 5 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 3051 lfor 1736 flags hashpspool tiers 6 read_tier 6 write_tier 6 stripe_width 0
removed_snaps [1~3]
pool 6 'rbd_cache' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 12440 flags hashpspool,incomplete_clones tier_of 5 cache_mode writeback hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 1200s x0 decay_rate 0 search_last_n 0 stripe_width 0
removed_snaps [1~3]
pool 7 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9597 flags hashpspool stripe_width 0
pool 8 'cephfs_data_ssd' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9599 flags hashpspool stripe_width 0
pool 9 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9617 lfor 9610 flags hashpspool crash_replay_interval 45 tiers 10 read_tier 10 write_tier 10 stripe_width 0
pool 10 'cephfs_data_cache' replicated size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 12510 flags hashpspool,incomplete_clones tier_of 9 cache_mode writeback target_bytes 268435456000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 0s x0 decay_rate 0 search_last_n 0 stripe_width 0

~# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data_ssd cephfs_data ]

:~# getfattr -d -m ceph.dir.* /srv/dovecot2/mail
getfattr: Removing leading '/' from absolute path names

file: srv/dovecot2/mail
ceph.dir.entries="1613"
ceph.dir.files="0"
ceph.dir.rbytes="18757341556329"
ceph.dir.rctime="1499371837.09279502014"
ceph.dir.rentries="5048251"
ceph.dir.rfiles="4449946"
ceph.dir.rsubdirs="598305"
ceph.dir.subdirs="1613"

~# getfattr -d -m ceph.dir.* /srv/dovecot2/index
getfattr: Removing leading '/' from absolute path names

file: srv/dovecot2/index
ceph.dir.entries="1584"
ceph.dir.files="0"
ceph.dir.rbytes="52658042908"
ceph.dir.rctime="1499371833.09531511882"
ceph.dir.rentries="927423"
ceph.dir.rfiles="582449"
ceph.dir.rsubdirs="344974"
ceph.dir.subdirs="1584"

Files

ceph-mds.tgz (490 KB) ceph-mds.tgz

Webert Lima, 07/06/2017 08:19 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Zheng Yan almost 7 years ago

I recently found a bug. It can explain the crash.

https://github.com/ceph/ceph/pull/15440

commit 0d71c6120e61f31b803c3fb6488fc7e97134e348
Author: Yan, Zheng <zyan@redhat.com>
Date:   Sat Jun 3 12:06:10 2017 +0800

    mds/flock: properly remove item from global_waiting_locks

    ceph_lock_state_t::remove_waiting() uses wrong key to search
    global_waiting_locks. It should use item in waiting_locks as
    key.

    Signed-off-by: "Yan, Zheng" <zyan@redhat.com>

For the slow recovery. are there "MDS health message (mds.0): Behind on trimming (xxx/100)" warnings in cluster log. how many caps do cephfs clients use (ceph daemon mds.x session ls) ?

Actions

Copy link

Updated by Webert Lima almost 7 years ago

Running it today on the same cluster that crashed (client usage is about the same by the time):

~# ceph daemon mds.c session ls
[ {
"id": 889364,
"num_leases": 302,
"num_caps": 170838,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.889364 10.0.2.192:0\/77167272",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-fe01.m9.network",
"kernel_version": "4.4.0-22-generic"
}
}, {
"id": 864144,
"num_leases": 0,
"num_caps": 18778,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.864144 10.0.2.1:0\/1943224316",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-ds01.m9.network",
"kernel_version": "4.4.0-22-generic"
}
}, {
"id": 1030234,
"num_leases": 325,
"num_caps": 183941,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.1030234 10.0.2.200:0\/3543349573",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail00-fe02.m9.network",
"kernel_version": "4.4.0-24-generic"
}
}, {
"id": 945213,
"num_leases": 234,
"num_caps": 138801,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.945213 10.0.2.193:0\/3058601307",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-fe02.m9.network",
"kernel_version": "4.4.0-22-generic"
}
}, {
"id": 1029565,
"num_leases": 0,
"num_caps": 2588906,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.1029565 10.0.2.19:0\/1447852959",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-ds05.m9.network",
"kernel_version": "4.4.0-62-generic"
}
}
]

Actions

Copy link

Updated by Webert Lima almost 7 years ago

There is something interesting here. The host "bhs1-mail02-ds05.m9.network" has the highest number of caps (over 2.5M) but nothing on this host is actually using cephfs. It's just mounted there.

Actions

Copy link

Updated by Zheng Yan almost 7 years ago

{
"id": 1029565,
"num_leases": 0,
"num_caps": 2588906,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.1029565 10.0.2.19:0\/1447852959",
"client_metadata": {
"entity_id": "",
"hostname": "bhs1-mail02-ds05.m9.network",
"kernel_version": "4.4.0-62-generic"
}

2588906 is quite large. If I am right, the recovery mds stuck in rejoin state for a long time.

please check how many inodes and open files on each machine.
sudo cat /proc/sys/fs/file-nr
sudo cat /proc/sys/fs/inode-nr

If the actual number of open files are not that large, we can do some quick optimization. Otherwise, we need add new mechanism to track open files for quick recovery

Actions

Copy link

Updated by Webert Lima almost 7 years ago

root@bhs1-mail02-ds05:~# cat /proc/sys/fs/file-nr
3360 0 3273932
root@bhs1-mail02-ds05:~# cat /proc/sys/fs/inode-nr
4961157 916551

bhs1-mail02-ds05.m9.network use to run an instance of dovecot too, so cephfs is mounted there, but it's not running anymore since last week.

Actions

Copy link

Updated by Zheng Yan almost 7 years ago

Webert Lima wrote:

root@bhs1-mail02-ds05:~# cat /proc/sys/fs/file-nr
3360 0 3273932
root@bhs1-mail02-ds05:~# cat /proc/sys/fs/inode-nr
4961157 916551

how about other machines

bhs1-mail02-ds05.m9.network use to run an instance of dovecot too, so cephfs is mounted there, but it's not running anymore since last week.

It's better to umount it, or run 'echo 3 > /proc/sys/vm/drop_caches"

Actions

Copy link

Updated by Webert Lima almost 7 years ago

Zheng Yan wrote:

how about other machines

Sorry I didn't check at the time. I'll post everything as it looks today.

It's better to umount it, or run 'echo 3 > /proc/sys/vm/drop_caches"

I went to umount it and I saw that the CAPs dropped to 3 on that host. I've umounted it anyway.

These are all hosts on this cluster today:

root@bhs1-mail00-fe02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 355409  2659
 88544   0       3274282

root@bhs1-mail02-fe01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 423591  9083
 101152  0       3274269

root@bhs1-mail02-fe02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 401796  3688
 102208  0       3274268

root@bhs1-mail02-ds01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 1767614 1389515
 4224    0       3274268

root@bhs1-mail02-ds02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 2545540 1547328
 4032    0       3274268

root@bhs1-mail02-ds03:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 379577  162589
 4576    0       3274268

root@bhs1-mail02-ds04:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 497047  203917
 6912    0       3274268

root@bhs1-mail02-ds05:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 456680  423484
 3520    0       3273932

Hosts with cephfs mounted:

bhs1-mail00-fe02 (used by dovecot)
 bhs1-mail02-fe01 (used by dovecot)
 bhs1-mail02-fe02 (used by dovecot)
 bhs1-mail02-ds01 (used by custom script)

and these are from another cluster that is nearly identical:

root@bhs1-mail00-fe01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
96268 221
7648 0 3274282

root@bhs1-mail01-fe01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 597439  200002
 53824   0       3274268

root@bhs1-mail01-fe02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 346847  15250
 81600   0       3274268

root@bhs1-mail01-ds01:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 344420  252542
 4064    0       3274267

root@bhs1-mail01-ds02:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 100984  47608
 3936    0       3274268

root@bhs1-mail01-ds03:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 281508  80696
 7008    0       3274268

root@bhs1-mail01-ds04:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 274520  77608
 4032    0       3274268

root@bhs1-mail01-ds05:~# cat /proc/sys/fs/inode-nr /proc/sys/fs/file-nr
 184926  25445
 108544  0       3273933

Hosts with cephfs mounted:

bhs1-mail00-fe01 (mounted but not used)
 bhs1-mail01-fe01 (mounted but not used)
 bhs1-mail01-fe02 (used by dovecot)
 bhs1-mail01-ds01 (used by custom script) 
 bhs1-mail01-ds05 (used by dovecot)

Actions

Copy link

Updated by Webert Lima almost 7 years ago

and this is current session ls from each Active MDS on each of both clusters:

root@bhs1-mail01-ds02:~# ceph daemon mds.b session ls | grep "num_caps\|hostname" 
        "num_caps": 1429,
            "hostname": "bhs1-mail01-fe01.m9.network",
        "num_caps": 100868,
            "hostname": "bhs1-mail01-ds05",
        "num_caps": 1776,
            "hostname": "bhs1-mail01-ds01",
        "num_caps": 3,
            "hostname": "bhs1-mail00-fe01",
        "num_caps": 154609,
            "hostname": "bhs1-mail01-fe02",

root@bhs1-mail02-ds03:~# ceph daemon mds.c session ls | grep "num_caps\|hostname" 
        "num_caps": 204144,
            "hostname": "bhs1-mail02-fe01",
        "num_caps": 2788,
            "hostname": "bhs1-mail02-ds01",
        "num_caps": 198677,
            "hostname": "bhs1-mail00-fe02",
        "num_caps": 173802,
            "hostname": "bhs1-mail02-fe02",

Actions

Copy link

Updated by Patrick Donnelly almost 7 years ago

Status changed from New to Closed

Okay, it appears the deadlock is fixed. Please open a new ticket if you're still seeing issues with rejoin taking unreasonable amounts of time.

Actions

Copy link

#10

Updated by Webert Lima almost 7 years ago

Patrick Donnelly wrote:

Okay, it appears the deadlock is fixed.

I'm sorry. Do you refer to that commit? If so, is there a schedule for it to be on a LTS version?

Please open a new ticket if you're still seeing issues with rejoin taking unreasonable amounts of time.

I'm pretty sure the crash will happen again, this was like the 7th time in the last 2 months.
Any suggestion from preventing the mds crash from happening?

Actions

Copy link

#11

Updated by Patrick Donnelly almost 7 years ago

Status changed from Closed to In Progress
Assignee set to Zheng Yan

Zheng, PR#15440 indicates it's a multimds fix but Webert's setup is single MDS. Any issues you see backporting the fix?

I'm sorry. Do you refer to that commit? If so, is there a schedule for it to be on a LTS version?

It will be in the imminent Luminous release but now I think it should be backported to jewel/kraken if feasible, since this bug apparently affects single active MDS setups.

Any suggestion from preventing the mds crash from happening?

You can either backport the fix yourself, wait for the backport (which could happen after Luminous) or Luminous, or update to the latest Luminous RC if you feel comfortable doing that.

Actions

Copy link

#12

Updated by Nathan Cutler almost 7 years ago

it should be backported to jewel/kraken if feasible

Consider backport to jewel only, since kraken goes EOL the moment Luminous is declared stable.

In other words, there will be a 11.2.1 but I doubt there will be in a 11.2.2.

Actions

Copy link

#13

Updated by Webert Lima almost 7 years ago

Patrick Donnelly wrote:

Zheng, PR#15440 indicates it's a multimds fix but Webert's setup is single MDS. Any issues you see backporting the fix?

You're right. It's a Single Active-MDS setup. Used to run Active/Standby and now I run Active/Standby-Replay pairs.

You can either backport the fix yourself, wait for the backport (which could happen after Luminous) or Luminous, or update to the latest Luminous RC if you feel comfortable doing that.

I'm not really comfortable in doing a upgrade like that, as the service and data availability is very critical here. I feel more comfortable in upgrading to the latest Jewel version and backporting the fix if that doesn't break anything else. Could you point me directions on doing the backport myself?

Obs: our current version is a backport done by you guys last year because of this: http://tracker.ceph.com/issues/15920

Actions

Copy link

#14

Updated by Patrick Donnelly almost 7 years ago

I'm not really comfortable in doing a upgrade like that, as the service and data availability is very critical here. I feel more comfortable in upgrading to the latest Jewel version and backporting the fix if that doesn't break anything else. Could you point me directions on doing the backport myself?

I went ahead and did it since our CI will build the repos for you: https://shaman.ceph.com/builds/ceph/i20535-backport/2223e478c4b770e75cb7db196f5cd9d985929ac9/default/55553/

PR: https://github.com/ceph/ceph/pull/16248

Actions

Copy link

#15

Updated by Patrick Donnelly almost 7 years ago

Status changed from In Progress to Pending Backport

Backport PR: https://github.com/ceph/ceph/pull/16248

Actions

Copy link

#16

Updated by Webert Lima almost 7 years ago

Patrick Donnelly wrote:

I went ahead and did it since our CI will build the repos for you: https://shaman.ceph.com/builds/ceph/i20535-backport/2223e478c4b770e75cb7db196f5cd9d985929ac9/default/55553/

Oh I really appreciate that! I see DIST=xenial, will that run on trusty too?

Actions

Copy link

#17

Updated by Patrick Donnelly almost 7 years ago

I meant to link: https://shaman.ceph.com/builds/ceph/i20535-backport/2223e478c4b770e75cb7db196f5cd9d985929ac9/

Looks like something went wrong in the build. I'll take a look.

Actions

Copy link

#18

Updated by Webert Lima almost 7 years ago

Awesome! Thanks!

Actions

Copy link

#19

Updated by Patrick Donnelly almost 7 years ago

New run: https://shaman.ceph.com/builds/ceph/i20535-backport/387e184970bc2949e16139db0cbda6acfa3f7b3a/

Actions

Copy link

#20

Updated by Nathan Cutler almost 7 years ago

Backport set to jewel

Actions

Copy link

#21

Updated by Nathan Cutler almost 7 years ago

Copied to Backport #20564: jewel: mds segmentation fault ceph_lock_state_t::get_overlapping_locks added

Actions

Copy link

#22

Updated by Webert Lima almost 7 years ago

Patrick Donnelly wrote:

New run: https://shaman.ceph.com/builds/ceph/i20535-backport/387e184970bc2949e16139db0cbda6acfa3f7b3a/

Thanks! It looks like it's build.
Couple questions:
- any recommendations choosing over notcmalloc or default flavors?
- what I have to do is download the packages at ubuntu/trusty/flavors/default/pool/main/c/ceph/ and install them? (respecting some update guide)

Actions

Copy link

#23

Updated by Patrick Donnelly almost 7 years ago

Webert Lima wrote:

Couple questions:
- any recommendations choosing over notcmalloc or default flavors?

Generally you want default (i.e. with tcmalloc).

- what I have to do is download the packages at ubuntu/trusty/flavors/default/pool/main/c/ceph/ and install them? (respecting some update guide)

I think you should be able to add the keys according to this guide:

http://docs.ceph.com/docs/master/install/get-packages/

Then manually install the downloaded package (which I don't know how to do).

Actions

Copy link

#24

Updated by Webert Lima almost 7 years ago

Ok. I'll do it in two small clusters tomorrow and I'll update this troublesome clusters next week.
Thanks a lot for that backport.

Actions

Copy link

#25

Updated by Webert Lima almost 7 years ago

Does this built package include the fix for the MDS regression that was found in 10.2.8? I read about it in the mailing list.

Actions

Copy link

#26

Updated by Patrick Donnelly almost 7 years ago

Webert Lima wrote:

Does this built package include the fix for the MDS regression that was found in 10.2.8? I read about it in the mailing list.

It does not include the fix. Do not use that branch. I'll make a note to update it...

Actions

Copy link

#27

Updated by Webert Lima almost 7 years ago

Patrick Donnelly wrote:

It does not include the fix. Do not use that branch. I'll make a note to update it...

Thanks. I'll be waiting.

Actions

Copy link

#28

Updated by Patrick Donnelly almost 7 years ago

https://shaman.ceph.com/builds/ceph/i20535-backport-v10.2.9/

Actions

Copy link

#29

Updated by Webert Lima almost 7 years ago

Patrick Donnelly wrote:

https://shaman.ceph.com/builds/ceph/i20535-backport-v10.2.9/

Thanks, I'll be upgrading all clusters by the ned of next week.
Will that repo be permanently available or does it expire?

Actions

Copy link

#30

Updated by Patrick Donnelly almost 7 years ago

It will expire in a week or two.

Actions

Copy link

#31

Updated by Webert Lima almost 7 years ago

ok i'll download them just in case.

Actions

Copy link

#32

Updated by Patrick Donnelly over 6 years ago

Any update?

Actions

Copy link

#33

Updated by Webert Lima over 6 years ago

Patrick Donnelly wrote:

Any update?

Hey Patrick, I have upgrade one test cluster first, but it keeps as HEALTH_WARN:

health HEALTH_WARN
 all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set

Actions

Copy link

#34

Updated by Patrick Donnelly over 6 years ago

Webert Lima wrote:

Patrick Donnelly wrote:

Any update?

Hey Patrick, I have upgrade one test cluster first, but it keeps as HEALTH_WARN:

health HEALTH_WARN
all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set

See this announcement: http://ceph.com/geen-categorie/v10-2-4-jewel-released/

Actions

Copy link

#35

Updated by Webert Lima over 6 years ago

Patrick Donnelly wrote:

See this announcement: http://ceph.com/geen-categorie/v10-2-4-jewel-released/

Thank you Patrick. So that was because I was running 10.2.2 and that flag didn't exist on that version.
The update on the test cluster went well. I've being cautious about it because I never did a ceph update before.

I have the 3 production clusters' update scheduled for Saturday 29th.
I was supposed to do it by tomorrow but it won't be possible due to other issues.

I'll keep you informed.

Actions

Copy link

#36

Updated by Webert Lima over 6 years ago

One of our production clusters upgraded.
Next one scheduled for next Wednesday, August 2nd.

Actions

Copy link

#37

Updated by Webert Lima over 6 years ago

Just upgraded the other 2 production clusters where the problem tends to happen frequently.
Will watch from now on.

Actions

Copy link

#38

Updated by Patrick Donnelly over 6 years ago

Webert Lima wrote:

Just upgraded the other 2 production clusters where the problem tends to happen frequently.
Will watch from now on.

Thanks for keeping us in the loop!

Actions

Copy link

#39

Updated by Webert Lima over 6 years ago

Reporting in, I've had the first incident after the version upgrade.

My active MDS had committed suicide due to "dne in mds map" (this is happening a lot but I don't think it's related to the upgrade)
When the recovery was already done, a new crash happened (never seen this one before):

2017-08-19 06:37:54.542005 7f9721654700  1 mds.0.1370 rejoin_joint_start
 2017-08-19 06:37:54.542206 7f971ec4d700  1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15
 2017-08-19 07:27:07.427682 7f971cc49700  1 mds.0.1370 rejoin_done
 2017-08-19 07:27:08.658514 7f9721654700  1 mds.0.1370 handle_mds_map i am now mds.0.1370
 2017-08-19 07:27:08.658522 7f9721654700  1 mds.0.1370 handle_mds_map state change up:rejoin --> up:active
 2017-08-19 07:27:08.658539 7f9721654700  1 mds.0.1370 recovery_done -- successful recovery!
 2017-08-19 07:27:08.658709 7f9721654700  1 mds.0.1370 active_start
 2017-08-19 07:27:11.154702 7f9721654700 1 mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7f9721654700 time 2017-08-19 07:27:11.152839
 mds/Locker.cc: 4924: FAILED assert(lock>get_state() == LOCK_PRE_SCAN)
  ceph version 10.2.9-4-gbeaec39 (beaec397f00491079cd74f7b9e3e10660859e26b)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x565179789adb]
  2: (Locker::file_recover(ScatterLock*)+0x1c9) [0x565179523a19]
  3: (MDCache::start_files_to_recover()+0xd3) [0x565179455813]
  4: (MDSRank::active_start()+0x8e) [0x5651793bf56e]
  5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x10eb) [0x5651793d0f6b]
  6: (MDSDaemon::handle_mds_map(MMDSMap*)+0xe07) [0x5651793aa557]
  7: (MDSDaemon::handle_core_message(Message*)+0x78b) [0x5651793ab9fb]
  8: (MDSDaemon::ms_dispatch(Message*)+0xab) [0x5651793abc5b]
  9: (DispatchQueue::entry()+0x78b) [0x56517988668b]
  10: (DispatchQueue::DispatchThread::entry()+0xd) [0x56517976f38d]
  11: (()+0x8184) [0x7f972701c184]
  12: (clone()+0x6d) [0x7f9725570bed]
  NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

Actions

Copy link

#40

Updated by Patrick Donnelly over 6 years ago

Webert Lima wrote:

My active MDS had committed suicide due to "dne in mds map" (this is happening a lot but I don't think it's related to the upgrade)

Probably related to: http://tracker.ceph.com/issues/19706

When the recovery was already done, a new crash happened (never seen this one before):

2017-08-19 06:37:54.542005 7f9721654700 1 mds.0.1370 rejoin_joint_start
2017-08-19 06:37:54.542206 7f971ec4d700 1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15
2017-08-19 07:27:07.427682 7f971cc49700 1 mds.0.1370 rejoin_done
2017-08-19 07:27:08.658514 7f9721654700 1 mds.0.1370 handle_mds_map i am now mds.0.1370
2017-08-19 07:27:08.658522 7f9721654700 1 mds.0.1370 handle_mds_map state change up:rejoin --> up:active
2017-08-19 07:27:08.658539 7f9721654700 1 mds.0.1370 recovery_done -- successful recovery!
2017-08-19 07:27:08.658709 7f9721654700 1 mds.0.1370 active_start
2017-08-19 07:27:11.154702 7f9721654700 1 mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7f9721654700 time 2017-08-19 07:27:11.152839
mds/Locker.cc: 4924: FAILED assert(lock>get_state() == LOCK_PRE_SCAN)
ceph version 10.2.9-4-gbeaec39 (beaec397f00491079cd74f7b9e3e10660859e26b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x565179789adb]
2: (Locker::file_recover(ScatterLock*)+0x1c9) [0x565179523a19]
3: (MDCache::start_files_to_recover()+0xd3) [0x565179455813]
4: (MDSRank::active_start()+0x8e) [0x5651793bf56e]
5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x10eb) [0x5651793d0f6b]
6: (MDSDaemon::handle_mds_map(MMDSMap*)+0xe07) [0x5651793aa557]
7: (MDSDaemon::handle_core_message(Message*)+0x78b) [0x5651793ab9fb]
8: (MDSDaemon::ms_dispatch(Message*)+0xab) [0x5651793abc5b]
9: (DispatchQueue::entry()+0x78b) [0x56517988668b]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x56517976f38d]
11: (()+0x8184) [0x7f972701c184]
12: (clone()+0x6d) [0x7f9725570bed]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Please fork another issue for this one.

Actions

Copy link

#41

Updated by Patrick Donnelly over 6 years ago

Status changed from Pending Backport to Resolved

Webert, the backport is merged so I'm marking this as resolved. If you experience this particular issue again, please reopen. Otherwise, any new problems please make a new tracker issue.

Actions

Copy link

#42

Updated by Webert Lima over 6 years ago

Probably related to: http://tracker.ceph.com/issues/19706

I'll keep an eye on it. I'm suspecting out of sync clocks may be causing it.

Please fork another issue for this one.

I'll do it if it happens again. Still hasn't.

Thanks a lot Patrick and @everyone involved.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #20535

mds segmentation fault ceph_lock_state_t::get_overlapping_locks

Updated by Zheng Yan almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Zheng Yan almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Zheng Yan almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly almost 7 years ago

Updated by Webert Lima almost 7 years ago

Updated by Patrick Donnelly over 6 years ago

Updated by Webert Lima over 6 years ago

Updated by Patrick Donnelly over 6 years ago

Updated by Webert Lima over 6 years ago

Updated by Webert Lima over 6 years ago

Updated by Webert Lima over 6 years ago

Updated by Patrick Donnelly over 6 years ago

Updated by Webert Lima over 6 years ago

Updated by Patrick Donnelly over 6 years ago

Updated by Patrick Donnelly over 6 years ago

Updated by Webert Lima over 6 years ago