Project

General

Profile

Bug #48172

Nautilus 14.2.13 osdmap not trimming on clean cluster

Added by Marcin Śliwiński over 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have cluster running on 14.2.13 (some osd's are on 14.2.9, they are on Debian). Cluster is in active+clean state, are OSDs are up.
We can't get it to trim osdmaps:

root@monb01:~# ceph report | jq .osdmap_first_committed
report 1115531697
67114
root@monb01:~# ceph report | jq .osdmap_last_committed
report 2573211981
72592
root@monb01:~#

We did setup 'require_min_compat_client luminous' and 'require_osd_release nautilus'
I tried to find similar bugs but it was either dead OSD, missing 'require_osd_release' or fixed by restarting MON's. We tried all that without success.
There's also our post on ceph-users: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ROINIUNPI36Z24YWGAYF6ZZB7LMQ6EZE/

We consider it as bug because our cluster died due to the issue few days ago, and we had to add "osd_map_cache_size = 5000" to ceph.conf on all OSD nodes as a temporary solution.

report.txt.gz - ceph report (86.1 KB) Marcin Śliwiński, 11/10/2020 08:08 PM

History

#1 Updated by Marcin Śliwiński over 3 years ago

So, we managed to find the reason, and it's weird.

Cluster is not trimming osdmaps because it thinks that all PGs from one of the pools are still creating, but, at the same time they are reported as created and are used by that cluster.

For example, from log:
2020-11-16 12:57:00.514 7f131496f700 10 mon.monb01@0(probing).osd e72792 update_creating_pgs will instruct osd.265 to create 28.3ff@67698
2020-11-16 12:57:25.982 7f1315971700 10 mon.monb01@0(leader).osd e72792 update_creating_pgs will instruct osd.265 to create 28.3ff@72792

But:
root@monb01:/var/log/ceph# ceph pg dump |grep 28.3ff
dumped all
28.3ff 3841 0 0 0 0 15970230272 0 0 3028 3028 active+clean 2020-11-16 05:38:27.338826 72792'87928 72792:335764 [265,277,282] 265 [265,277,282] 265 72792'85741 2020-11-16 05:38:27.338783 72588'79082 2020-11-10 18:42:43.182436 0
root@monb01:/var/log/ceph#

root@monb01:/var/log/ceph# ceph health detail
HEALTH_OK
root@monb01:/var/log/ceph#

#2 Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)

Also available in: Atom PDF