Bug #58052
openEmpty Pool (zero objects) shows usage.
0%
Description
I have a pool that was/is being used in a CephFS. I have migrated all of the files off of the pool and was preparing to remove it. But even after a day of the pool being empty (zero objects), it still shows as consuming space. It also shows statistics for objects under compression:
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR ... CephFS-Erasure-ByOSD-D4F2-Data 6.3 GiB 0 0 0 0 0 0 5971837 5.9 TiB 2495150 39 GiB 365 MiB 730 MiB ...
I recently added about a dozen OSDs and it is going to take another day or so for that re-balancing to finish. Is it possible that this is because things are still re-balancing?
I tried a few things to see if I could figure out what was going on:
rbd -p CephFS-Erasure-ByOSD-D4F2-Data du #No results.
rados -p CephFS-Erasure-ByOSD-D4F2-Data ls #No results.
I also made sure that there was nothing leftover from any benchmarks:
rados -p CephFS-Erasure-ByOSD-D4F2-Data cleanup #No change.
This pool was only used for CephFS and was not used for any block devices, nor have I used any snapshots.
The cluster started out as 17.2.3 but is currently running 17.2.4.
I would like to remove this pool, but I am not sure if it is safe to do so. I am not sure if I have run into some sort of an orphaning issue, or if this is just the statistics not calculating correctly.
Thanks!
Files
Updated by Radoslaw Zarzynski over 1 year ago
- Status changed from New to Need More Info
Could you please provide a log from an active mgr with debug_ms=1
and debug_mgr=20
? We would like to see which OSDs participate in these numbers.
Also, could you please retake the stats for the CephFS-Erasure-ByOSD-D4F2-Data
pool?
Updated by Brian Woods over 1 year ago
Radoslaw Zarzynski wrote:
Could you please provide a log from an active mgr with
debug_ms=1
anddebug_mgr=20
?
I am not 100% sure I know how to do that, but this is what I have so far. I found this:
https://access.redhat.com/solutions/2085183
ceph --admin-daemon /var/run/ceph/ceph-client.rgw.<name>.asok config set debug_ms 1
So I:
# docker exec -it ceph-### bash ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.### set debug_ms 1# ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.### config set debug_ms 1 { "success": "" } # ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.### config set debug_mgr 20 { "success": "" }
That seemed to work, but I am not sure what logs you want me to grab. Just cephadm.log? I will have to scrub it a bit, but can get that soon.
We would like to see which OSDs participate in these numbers.
Is there a specific command I can issue to figure out what OSDs are participating in the space usage?
Also, could you please retake the stats for the
CephFS-Erasure-ByOSD-D4F2-Data
pool?
This? It moved a bit this morning when I deleted some files from another pool, but not since then (about 24 hours).
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR CephFS-Erasure-ByOSD-D4F2-Data 6.2 GiB 0 0 0 0 0 0 5971837 5.9 TiB 2495150 39 GiB 360 MiB 720 MiB
Updated by Radoslaw Zarzynski over 1 year ago
Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbosity on all mgrs below.
ceph config set mgr debug_ms 1 ceph config set mgr debug_mgr 20
Then, please collect the logs and revert the log settings to defaults.
Updated by Brian Woods over 1 year ago
Radoslaw Zarzynski wrote:
Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbosity on all mgrs below.
[...]
Then, please collect the logs and revert the log settings to defaults.
Sorry for the delay. Logs attached.
What are the defaults?
Also, an update, I emptied another pool and have a similar case with it. Interesting side note for it, it took sending the "rados -p CephFS-Erasure-ByOSD-D5F1-Data cleanup --prefix benchmark_data" three times for it actually delete the blocks. On the third time it gave a "warning doing linear search" or something to the lines of that.
So we now look like this:
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR .mgr 70 MiB 10 0 20 0 0 0 16218 18 MiB 28266 592 MiB 0 B 0 B .rgw.root 48 KiB 6 0 12 0 0 0 144 144 KiB 6 6 KiB 0 B 0 B CephFS-Erasure-ByOSD-D4F2-Data 5.9 GiB 0 0 0 0 0 0 5971837 5.9 TiB 2495150 39 GiB 353 MiB 706 MiB CephFS-Erasure-ByOSD-D5F1-Data 74 GiB 0 0 0 0 0 0 33833233 47 TiB 13492697 2.0 TiB 213 MiB 425 MiB CephFS-Erasure-ByOSD-D5F3-Data 67 TiB 12757283 0 102058264 0 0 0 3470975 2.5 TiB 19723113 43 TiB 145 GiB 291 GiB CephFS-Meta 5.3 GiB 762332 0 1524664 0 0 0 35039654 71 GiB 41024292 195 GiB 0 B 0 B CephFS-Root 72 KiB 3218516 0 9655548 0 0 0 258607 253 MiB 11827192 289 MiB 0 B 0 B SSDStorage 135 GiB 595698 0 1191396 0 0 0 6160114 3.7 TiB 31158766 734 GiB 37 GiB 92 GiB default.rgw.control 0 B 8 0 16 0 0 0 0 0 B 0 0 B 0 B 0 B default.rgw.log 272 KiB 209 0 418 0 0 0 5750159 5.5 GiB 3821043 34 KiB 0 B 0 B default.rgw.meta 19 KiB 3 0 6 0 0 0 90 69 KiB 6 2 KiB 0 B 0 B
Updated by Brian Woods over 1 year ago
- File server1.zip server1.zip added
I am realizing those logs are from a single host (server4).
server3 got removed today.
Attaching server1 to this message.
Updated by Brian Woods over 1 year ago
- File server2.zip server2.zip added
Attaching server2 to this message.
Updated by Radoslaw Zarzynski over 1 year ago
Hello. Thanks for response and the files.
Archive: server2.zip inflating: ceph-client.admin.log inflating: ceph-volume.log inflating: cephadm.log inflating: cephadm.log.1 inflating: cephadm.log.2 inflating: cephadm.log.3 inflating: cephadm.log.4 inflating: cephadm.log.5 inflating: cephadm.log.6 inflating: cephadm.log.7
Unfortunately, they lack the ceph-mgr's logs. Could you please provide the `ceph-mgr.*` log files?
Updated by Brian Woods over 1 year ago
That is every log file from every node. There are no ceph-mgr* logs. :/
Even from inside the docker on the adm node:
# docker exec -u root -it ceph-cdffec50-412e-11ed-a95e-8be7dd44d8f3-mgr-server1-domain-local-miydsy bash [root@55772 /]# cd /var/log/ceph [root@55772 log]# cd ceph/ [root@server1 ceph]# ls -l total 55772 -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.0.log -rw-r--r-- 1 ceph ceph 24839 Oct 1 02:30 ceph-osd.0.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.1.log -rw-r--r-- 1 ceph ceph 50855 Oct 2 19:44 ceph-osd.1.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 23 00:00 ceph-osd.11.log -rw-r--r-- 1 ceph ceph 26394 Oct 21 19:57 ceph-osd.11.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.2.log -rw-r--r-- 1 ceph ceph 50852 Oct 2 19:45 ceph-osd.2.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.3.log -rw-r--r-- 1 ceph ceph 50824 Oct 2 19:45 ceph-osd.3.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.4.log -rw-r--r-- 1 ceph ceph 51142 Oct 2 19:46 ceph-osd.4.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.5.log -rw-r--r-- 1 ceph ceph 51450 Oct 2 19:47 ceph-osd.5.log.1.gz -rw-r--r-- 1 root root 0 Dec 5 00:00 ceph-volume.log -rw-r--r-- 1 root root 4175959 Dec 4 15:33 ceph-volume.log.1.gz -rw-r--r-- 1 root root 6519855 Dec 3 23:40 ceph-volume.log.2.gz -rw-r--r-- 1 root root 8749846 Dec 2 23:49 ceph-volume.log.3.gz -rw-r--r-- 1 root root 10504801 Dec 1 23:59 ceph-volume.log.4.gz -rw-r--r-- 1 root root 9538837 Nov 30 23:51 ceph-volume.log.5.gz -rw-r--r-- 1 root root 8649327 Nov 29 23:42 ceph-volume.log.6.gz -rw-r--r-- 1 root root 8631026 Nov 28 23:41 ceph-volume.log.7.gz -rw-r--r-- 1 root root 0 Oct 27 00:00 cephadm.log -rw-r--r-- 1 root root 188 Oct 25 14:02 cephadm.log.1.gz [root@server1 ceph]#
Ideas? I don't recall seeing mgr specific logs on any of my deployments (I have a few).
Updated by Brian Woods over 1 year ago
Alright, I found the logs can be accessed from docker itself. In the process of pulling them, but I am already at 5GBs of logs and growing.... I will compress and post a Google Drive link as soon as they are complete..
Updated by Brian Woods over 1 year ago
Alright, please let me know when you have the file so I can remove it from my drive:
https://drive.google.com/file/d/1i--yt4tcOn5augmpmRbQmS8pg4yyAPFU/view?usp=sharing
Updated by Radoslaw Zarzynski over 1 year ago
Glad you've found it! Would mind uploading via the ceph-post-file
(https://docs.ceph.com/en/quincy/man/8/ceph-post-file/)?
Updated by Brian Woods over 1 year ago
Radoslaw Zarzynski wrote:
Glad you've found it! Would mind uploading via the
ceph-post-file
(https://docs.ceph.com/en/quincy/man/8/ceph-post-file/)?
First attempt for an unknown RSA key error. Odd as I have never used the command, but meh...
$ ceph-post-file --description "All MGR Logs" -u "BRIANWOODS" mgr.zip args: --description All MGR Logs -u BRIANWOODS -- mgr.zip args: -u BRIANWOODS -- mgr.zip args: -- mgr.zip /usr/bin/ceph-post-file: upload tag f252e192-a62d-4ccd-846a-61906260aae3 /usr/bin/ceph-post-file: user: BRIANWOODS /usr/bin/ceph-post-file: description: All MGR Logs /usr/bin/ceph-post-file: will upload file mgr.zip @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the RSA key sent by the remote host is SHA256:cgG0rSk0W8I4sC+LZ8un9R0D5FKOwxwL5XVBt7PeW8Y. Please contact your system administrator. Add correct host key in /home/bpwoods/.ssh/known_hosts to get rid of this message. Offending RSA key in /usr/share/ceph/known_hosts_drop.ceph.com:1 remove with: ssh-keygen -f "/usr/share/ceph/known_hosts_drop.ceph.com" -R "drop.ceph.com" RSA host key for drop.ceph.com has changed and you have requested strict checking. Host key verification failed. Connection closed. Connection closed
I cleared the key and attempted the upload again and got a permission denied error:
$ ceph-post-file --description "All MGR Logs" -u "BRIANWOODS" mgr.zip args: --description All MGR Logs -u BRIANWOODS -- mgr.zip args: -u BRIANWOODS -- mgr.zip args: -- mgr.zip /usr/bin/ceph-post-file: upload tag cf4d360c-5007-4119-a244-1855a1519ebb /usr/bin/ceph-post-file: user: BRIANWOODS /usr/bin/ceph-post-file: description: All MGR Logs /usr/bin/ceph-post-file: will upload file mgr.zip The authenticity of host 'drop.ceph.com (8.43.84.129)' can't be established. ECDSA key fingerprint is SHA256:NQf3i0LwF3NK4drP9lDudf1u0UaA9G0uqdWxDefhhrU. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'drop.ceph.com' (ECDSA) to the list of known hosts. postfile@drop.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). Connection closed. Connection closed
Updated by Radoslaw Zarzynski over 1 year ago
Downloading manually. Neha is testing ceph-post-file.
Updated by Brian Woods about 1 year ago
Radoslaw Zarzynski wrote:
Downloading manually. Neha is testing ceph-post-file.
I kinda want to kill these pools to help with placement. Do you want to continue debugging this?
I was kinda hoping that they would clear themselves out after the cluster finished re-balancing (maybe a back logging issue), but no progress after several days of everything being clear.
Updated by Brian Woods about 1 year ago
I am concerned that this could be a bigger issue, kinda like a memory leak, but for storage. And that this could continue to slowly eat up space over time.
Updated by Brian Woods about 1 year ago
Not sure if this helps at all, but I was writing a script that generates a histogram of scrub times, and I noticed that the PGs are still getting scrubbed:
### Pool: CephFS-Meta (ID: 3 ) #### # There are 8 PGs, and 8 have been scrubbed in the last 24 hours. At this rate, all PGs should be scrubbed by 2023-02-06 22:26:44 It should be possible to complete a full scrub in 24.0 hours. The oldest PG scrub is: 2023-02-05 22:48:01 #Here is a histogram of scrub times: February 05, 2023: *** (3) February 06, 2023: ***** (5) *** This is one of them: ### Pool: CephFS-Erasure-ByOSD-D4F2-Data (ID: 7 ) #### # There are 64 PGs, and 15 have been scrubbed in the last 24 hours. At this rate, all PGs should be scrubbed by 2023-02-10 04:50:44 It should be possible to complete a full scrub in 102.4 hours. The oldest PG scrub is: 2022-11-13 15:40:15 #Here is a histogram of scrub times: November 13, 2022: * (1) November 14, 2022: * (1) November 29, 2022: * (1) November 30, 2022: * (1) December 27, 2022: * (1) December 28, 2022: ** (2) January 09, 2023: * (1) February 01, 2023: *** (3) February 02, 2023: ******** (8) February 03, 2023: ********* (9) February 04, 2023: ******* (7) February 05, 2023: **************** (16) February 06, 2023: ************* (13) *** This is one of them: ### Pool: CephFS-Erasure-ByOSD-D5F1-Data (ID: 8 ) #### # There are 64 PGs, and 18 have been scrubbed in the last 24 hours. At this rate, all PGs should be scrubbed by 2023-02-09 11:46:44 It should be possible to complete a full scrub in 85.3 hours. The oldest PG scrub is: 2022-11-14 06:11:59 #Here is a histogram of scrub times: November 14, 2022: * (1) November 29, 2022: * (1) December 27, 2022: ** (2) December 28, 2022: ** (2) January 09, 2023: * (1) February 01, 2023: ********* (9) February 02, 2023: ******* (7) February 03, 2023: ***** (5) February 04, 2023: ******* (7) February 05, 2023: ************* (13) February 06, 2023: **************** (16) ### Pool: SSDStorage (ID: 9 ) #### # There are 8 PGs, and 5 have been scrubbed in the last 24 hours. At this rate, all PGs should be scrubbed by 2023-02-07 12:50:44 It should be possible to complete a full scrub in 38.4 hours. The oldest PG scrub is: 2023-02-05 12:31:04 #Here is a histogram of scrub times: February 05, 2023: **** (4) February 06, 2023: **** (4)
I am just now working though my first scrub in a while, should be done soon.
Updated by Radoslaw Zarzynski about 1 year ago
From the attached mgr-server1.log
(the big log):
debug 2022-12-07T20:43:57.165+0000 7efe0d885700 0 [pg_autoscaler DEBUG root] pool: { ... "CephFS-Erasure-ByOSD-D4F2-Data": { ... "pool": 7, "pool_name": "CephFS-Erasure-ByOSD-D4F2-Data",
"PG_NOT_SCRUBBED": { "severity": "HEALTH_WARN", "summary": { "message": "116 pgs not scrubbed in time", "count": 116 }, "detail": [ { "message": "pg 7.3e not scrubbed since 2022-11-13T16:53:51.788941+0000" }, ... { "message": "pg 7.3c not scrubbed since 2022-11-13T13:15:34.048678+0000" },
Could you please provide results of ceph pg 7.3e query
?
Updated by Brian Woods about 1 year ago
Not 100% sure what command you are wanting, "ceph pg 7.3e" isn't complete.
Do you mean this?- ceph pg ls 7 | grep 3e
7.3e 0 0 0 0 0 0 0 2957 active+clean 12h 3339'290310 7103:1137146 [2,17,1,7,20,13]p2 [2,17,1,7,20,13]p2 2023-02-13T07:52:53.369394+0000 2023-02-10T21:23:44.278508+0000 1 periodic scrub scheduled @ 2023-02-14T18:16:32.369265+0000
Updated by Brian Woods about 1 year ago
Scrubbing is almost caught up (though scrubbing doesn't seem to target the oldest first, not sure if that could be improved or now):
### Pool: CephFS-Erasure-ByOSD-D4F2-Data (ID: 7 ) #### # There are 64 PGs, and 23 have been scrubbed in the last 24 hours. At this rate, all PGs should be scrubbed by 2023-02-15 15:04:50 It should be possible to complete a full scrub in 66.8 hours. The oldest PG scrub is: 2023-02-02 04:59:04 #Here is a histogram of scrub times: February 02, 2023: * (1) February 03, 2023: * (1) February 05, 2023: ** (2) February 06, 2023: * (1) February 07, 2023: * (1) February 08, 2023: ** (2) February 09, 2023: ****** (6) February 10, 2023: ***** (5) February 11, 2023: *********** (11) February 12, 2023: **************** (16) February 13, 2023: ****************** (18)
Updated by Brian Woods about 1 year ago
Radoslaw Zarzynski wrote:
Could you please provide results of
ceph pg 7.3e query
?
Did that cover what you needed?
Sorry to poke, but I am hoping to kill these as soon. They make balancing a bit tricky.
Updated by Brian Woods about 1 year ago
Well, I need to move on, so I am deleting the pools. I may try to re-create this in a lab later. If I do, I will try and do it in a way that allows me to ship the whole cluster.
Updated by Radoslaw Zarzynski about 1 year ago
Some notes from from the last Ceph RADOS Team meeting:
1. It's very interesting trait that, although the USED
varies over the invocations of `rados df`, RD_OPS
and WR_OPS
stay the same (they are frozen).
2. There is an already known bug on replicas sending pg stats (only primary should) which could lead not only to assertions but sometimes to accounting issues as well (see https://github.com/ceph/ceph/pull/50283). This a long-shot hypothesis and needs verification whether the pattern seen in the rados df
samples could be explained that way.
Regardless of the hypothesis from #2 it would be superhelpful if the problem can be replicated on a lab cluster!