Bug #39103
closedbitmap allocator osd make %use of osd over 100%
0%
Description
after pull request https://github.com/ceph/ceph/pull/26983 is merged.
I install package
ceph-common-13.2.5-101.ga1aa89a.el7.x86_64 ceph-mgr-13.2.5-101.ga1aa89a.el7.x86_64 ceph-mds-13.2.5-101.ga1aa89a.el7.x86_64 ceph-13.2.5-101.ga1aa89a.el7.x86_64 libcephfs2-13.2.5-101.ga1aa89a.el7.x86_64 ceph-selinux-13.2.5-101.ga1aa89a.el7.x86_64 ceph-osd-13.2.5-101.ga1aa89a.el7.x86_64 python-cephfs-13.2.5-101.ga1aa89a.el7.x86_64 ceph-base-13.2.5-101.ga1aa89a.el7.x86_64 ceph-mon-13.2.5-101.ga1aa89a.el7.x86_64
and configured
[osd] bluestore_allocator = bitmap
after restart 1 osd. use of this osd over 100
ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME -4 7.19995 - 7.3 TiB 16 EiB 7.3 TiB 230759568.00 1.13 - root ssd -3 3.59998 - 3.6 TiB 5.3 GiB 3.6 TiB 0.14 0 - host ssd-ceph-2 0 ssd 0.89999 1.00000 930 GiB 1.3 GiB 929 GiB 0.14 0 128 osd.0 1 ssd 0.89999 1.00000 930 GiB 1.3 GiB 929 GiB 0.14 0 139 osd.1 2 ssd 0.89999 1.00000 930 GiB 1.3 GiB 929 GiB 0.14 0 114 osd.2 3 ssd 0.89999 1.00000 930 GiB 1.3 GiB 929 GiB 0.14 0 131 osd.3 -5 3.59998 - 3.6 TiB 16 EiB 3.7 TiB 461405824.00 2.25 - host ssd-ceph-3 4 ssd 0.89999 1.00000 930 GiB 16 EiB 965 GiB 1846529920.00 9.00 36 osd.4 5 ssd 0.89999 1.00000 931 GiB 1.4 GiB 930 GiB 0.15 0 152 osd.5 6 ssd 0.89999 1.00000 931 GiB 1.3 GiB 930 GiB 0.14 0 133 osd.6 7 ssd 0.89999 1.00000 931 GiB 1.3 GiB 930 GiB 0.14 0 125 osd.7
It make my cluster not working, iops = 0 and ceph health
1 full osd(s) 1 pool(s) full Degraded data redundancy: 52/924 objects degraded (5.628%), 38 pgs degraded, 38 pgs undersized Degraded data redundancy (low space): 2 pgs backfill_toofull, 36 pgs recovery_toofull
log start has
2019-04-04 15:44:37.580 7fa86eb29700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 1846530048% full
detail log in attach file.
Files
Updated by Igor Fedotov about 5 years ago
- Pull request ID set to 27366
Probably that's the missed backport which triggers the issue.
See https://github.com/ceph/ceph/pull/27366
Updated by Igor Fedotov about 5 years ago
- Status changed from New to Need More Info
@hoan nv, could you please apply the patch when it's available and report back if it helps
Updated by hoan nv about 5 years ago
Igor Fedotov wrote:
@hoan nv, could you please apply the patch when it's available and report back if it helps
i will try and feedback.
Updated by hoan nv about 5 years ago
Igor Fedotov wrote:
Probably that's the missed backport which triggers the issue.
See https://github.com/ceph/ceph/pull/27366
This pull request not fix my issue.
Updated by Igor Fedotov about 5 years ago
Could you please collect OSD startup log with 'debug bluestore' set to 20?
Updated by hoan nv about 5 years ago
- File ceph-osd.4.log.tar.gz ceph-osd.4.log.tar.gz added
Igor Fedotov wrote:
Could you please collect OSD startup log with 'debug bluestore' set to 20?
Yes. my log is on attach file.
Updated by Igor Fedotov about 5 years ago
Just managed to reproduce your issue in mimic HEAD.
After applying the patch from https://github.com/ceph/ceph/pull/27366 reporting has been fixed:
before:
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME
-1 0.90860 - 930 GiB 16 EiB 965 GiB 1846529920.00 1.00 - root default
-3 0.90860 - 930 GiB 16 EiB 965 GiB 1846529920.00 1.00 - host crius
0 ssd 0.90860 1.00000 930 GiB 16 EiB 965 GiB 1846529920.00 1.00 16 osd.0
TOTAL 930 GiB 16 EiB 965 GiB 1846529920.00
after:
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME
-1 0.90860 - 930 GiB 1.0 GiB 929 GiB 0.11 1.00 - root default
-3 0.90860 - 930 GiB 1.0 GiB 929 GiB 0.11 1.00 - host crius
0 ssd 0.90860 1.00000 930 GiB 1.0 GiB 929 GiB 0.11 1.00 16 osd.0
TOTAL 930 GiB 1.0 GiB 929 GiB 0.11
Could you please double check if you applied the patch properly?
Updated by hoan nv about 5 years ago
Could you please double check if you applied the patch properly?
Yes. i will rebuild and recheck.
Updated by hoan nv about 5 years ago
This is my step to build rpm and install.
Checkout branch remotes/origin/mimic from github.com/ceph/ceph
Apply this patch
./make-srpm.sh
rpmbuild --rebuild ceph-13.2.5-140.g5ae3e4b.el7.src.rpm
install package on server.
do i have a wrong step?
Updated by Igor Fedotov about 5 years ago
Looks good to me.
Suggest to insert some new logging, e.g.
int BlueStore::_mount(bool kv_only, bool open_db)
{
dout(1) << func << " path " << path << " MY CODE" << dendl;
...
to make sure you run new code.
And check if it appears in the log after restart.
Updated by hoan nv about 5 years ago
Igor Fedotov wrote:
Looks good to me.
Suggest to insert some new logging, e.g.
int BlueStore::_mount(bool kv_only, bool open_db) {
dout(1) << func << " path " << path << " MY CODE" << dendl;
...to make sure you run new code.
And check if it appears in the log after restart.
I added your log code and rebuild but new seem not patch.
I don't know why.
Do you have srpm file ưhich has patch? share to me if you can.
Thanks.
Updated by Igor Fedotov about 5 years ago
Unfortunately I don't use rpm in my lab. And have pretty limited expertise in this area so can hardly advise something.
And I'm not sure whether rpm (if any) built in my lab running SUSE Linux are applicable for you environment.
Updated by Igor Fedotov about 5 years ago
- Status changed from Need More Info to Fix Under Review
Updated by Igor Fedotov almost 5 years ago
- Status changed from Fix Under Review to Resolved