Bug #58052
Empty Pool (zero objects) shows usage.
0%
Description
I have a pool that was/is being used in a CephFS. I have migrated all of the files off of the pool and was preparing to remove it. But even after a day of the pool being empty (zero objects), it still shows as consuming space. It also shows statistics for objects under compression:
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR ... CephFS-Erasure-ByOSD-D4F2-Data 6.3 GiB 0 0 0 0 0 0 5971837 5.9 TiB 2495150 39 GiB 365 MiB 730 MiB ...
I recently added about a dozen OSDs and it is going to take another day or so for that re-balancing to finish. Is it possible that this is because things are still re-balancing?
I tried a few things to see if I could figure out what was going on:
rbd -p CephFS-Erasure-ByOSD-D4F2-Data du #No results.
rados -p CephFS-Erasure-ByOSD-D4F2-Data ls #No results.
I also made sure that there was nothing leftover from any benchmarks:
rados -p CephFS-Erasure-ByOSD-D4F2-Data cleanup #No change.
This pool was only used for CephFS and was not used for any block devices, nor have I used any snapshots.
The cluster started out as 17.2.3 but is currently running 17.2.4.
I would like to remove this pool, but I am not sure if it is safe to do so. I am not sure if I have run into some sort of an orphaning issue, or if this is just the statistics not calculating correctly.
Thanks!
History
#1 Updated by Radoslaw Zarzynski 2 months ago
- Status changed from New to Need More Info
Could you please provide a log from an active mgr with debug_ms=1
and debug_mgr=20
? We would like to see which OSDs participate in these numbers.
Also, could you please retake the stats for the CephFS-Erasure-ByOSD-D4F2-Data
pool?
#2 Updated by Brian Woods 2 months ago
Radoslaw Zarzynski wrote:
Could you please provide a log from an active mgr with
debug_ms=1
anddebug_mgr=20
?
I am not 100% sure I know how to do that, but this is what I have so far. I found this:
https://access.redhat.com/solutions/2085183
ceph --admin-daemon /var/run/ceph/ceph-client.rgw.<name>.asok config set debug_ms 1
So I:
# docker exec -it ceph-### bash ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.### set debug_ms 1# ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.### config set debug_ms 1 { "success": "" } # ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.### config set debug_mgr 20 { "success": "" }
That seemed to work, but I am not sure what logs you want me to grab. Just cephadm.log? I will have to scrub it a bit, but can get that soon.
We would like to see which OSDs participate in these numbers.
Is there a specific command I can issue to figure out what OSDs are participating in the space usage?
Also, could you please retake the stats for the
CephFS-Erasure-ByOSD-D4F2-Data
pool?
This? It moved a bit this morning when I deleted some files from another pool, but not since then (about 24 hours).
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR CephFS-Erasure-ByOSD-D4F2-Data 6.2 GiB 0 0 0 0 0 0 5971837 5.9 TiB 2495150 39 GiB 360 MiB 720 MiB
#3 Updated by Radoslaw Zarzynski 2 months ago
Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbosity on all mgrs below.
ceph config set mgr debug_ms 1 ceph config set mgr debug_mgr 20
Then, please collect the logs and revert the log settings to defaults.
#4 Updated by Brian Woods 2 months ago
- File Logs.zip added
Radoslaw Zarzynski wrote:
Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbosity on all mgrs below.
[...]
Then, please collect the logs and revert the log settings to defaults.
Sorry for the delay. Logs attached.
What are the defaults?
Also, an update, I emptied another pool and have a similar case with it. Interesting side note for it, it took sending the "rados -p CephFS-Erasure-ByOSD-D5F1-Data cleanup --prefix benchmark_data" three times for it actually delete the blocks. On the third time it gave a "warning doing linear search" or something to the lines of that.
So we now look like this:
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR .mgr 70 MiB 10 0 20 0 0 0 16218 18 MiB 28266 592 MiB 0 B 0 B .rgw.root 48 KiB 6 0 12 0 0 0 144 144 KiB 6 6 KiB 0 B 0 B CephFS-Erasure-ByOSD-D4F2-Data 5.9 GiB 0 0 0 0 0 0 5971837 5.9 TiB 2495150 39 GiB 353 MiB 706 MiB CephFS-Erasure-ByOSD-D5F1-Data 74 GiB 0 0 0 0 0 0 33833233 47 TiB 13492697 2.0 TiB 213 MiB 425 MiB CephFS-Erasure-ByOSD-D5F3-Data 67 TiB 12757283 0 102058264 0 0 0 3470975 2.5 TiB 19723113 43 TiB 145 GiB 291 GiB CephFS-Meta 5.3 GiB 762332 0 1524664 0 0 0 35039654 71 GiB 41024292 195 GiB 0 B 0 B CephFS-Root 72 KiB 3218516 0 9655548 0 0 0 258607 253 MiB 11827192 289 MiB 0 B 0 B SSDStorage 135 GiB 595698 0 1191396 0 0 0 6160114 3.7 TiB 31158766 734 GiB 37 GiB 92 GiB default.rgw.control 0 B 8 0 16 0 0 0 0 0 B 0 0 B 0 B 0 B default.rgw.log 272 KiB 209 0 418 0 0 0 5750159 5.5 GiB 3821043 34 KiB 0 B 0 B default.rgw.meta 19 KiB 3 0 6 0 0 0 90 69 KiB 6 2 KiB 0 B 0 B
#5 Updated by Brian Woods 2 months ago
- File server1.zip added
I am realizing those logs are from a single host (server4).
server3 got removed today.
Attaching server1 to this message.
#7 Updated by Radoslaw Zarzynski about 2 months ago
Hello. Thanks for response and the files.
Archive: server2.zip inflating: ceph-client.admin.log inflating: ceph-volume.log inflating: cephadm.log inflating: cephadm.log.1 inflating: cephadm.log.2 inflating: cephadm.log.3 inflating: cephadm.log.4 inflating: cephadm.log.5 inflating: cephadm.log.6 inflating: cephadm.log.7
Unfortunately, they lack the ceph-mgr's logs. Could you please provide the `ceph-mgr.*` log files?
#8 Updated by Brian Woods about 2 months ago
That is every log file from every node. There are no ceph-mgr* logs. :/
Even from inside the docker on the adm node:
# docker exec -u root -it ceph-cdffec50-412e-11ed-a95e-8be7dd44d8f3-mgr-server1-domain-local-miydsy bash [root@55772 /]# cd /var/log/ceph [root@55772 log]# cd ceph/ [root@server1 ceph]# ls -l total 55772 -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.0.log -rw-r--r-- 1 ceph ceph 24839 Oct 1 02:30 ceph-osd.0.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.1.log -rw-r--r-- 1 ceph ceph 50855 Oct 2 19:44 ceph-osd.1.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 23 00:00 ceph-osd.11.log -rw-r--r-- 1 ceph ceph 26394 Oct 21 19:57 ceph-osd.11.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.2.log -rw-r--r-- 1 ceph ceph 50852 Oct 2 19:45 ceph-osd.2.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.3.log -rw-r--r-- 1 ceph ceph 50824 Oct 2 19:45 ceph-osd.3.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.4.log -rw-r--r-- 1 ceph ceph 51142 Oct 2 19:46 ceph-osd.4.log.1.gz -rw-r--r-- 1 ceph ceph 0 Oct 3 00:00 ceph-osd.5.log -rw-r--r-- 1 ceph ceph 51450 Oct 2 19:47 ceph-osd.5.log.1.gz -rw-r--r-- 1 root root 0 Dec 5 00:00 ceph-volume.log -rw-r--r-- 1 root root 4175959 Dec 4 15:33 ceph-volume.log.1.gz -rw-r--r-- 1 root root 6519855 Dec 3 23:40 ceph-volume.log.2.gz -rw-r--r-- 1 root root 8749846 Dec 2 23:49 ceph-volume.log.3.gz -rw-r--r-- 1 root root 10504801 Dec 1 23:59 ceph-volume.log.4.gz -rw-r--r-- 1 root root 9538837 Nov 30 23:51 ceph-volume.log.5.gz -rw-r--r-- 1 root root 8649327 Nov 29 23:42 ceph-volume.log.6.gz -rw-r--r-- 1 root root 8631026 Nov 28 23:41 ceph-volume.log.7.gz -rw-r--r-- 1 root root 0 Oct 27 00:00 cephadm.log -rw-r--r-- 1 root root 188 Oct 25 14:02 cephadm.log.1.gz [root@server1 ceph]#
Ideas? I don't recall seeing mgr specific logs on any of my deployments (I have a few).
#9 Updated by Brian Woods about 2 months ago
Alright, I found the logs can be accessed from docker itself. In the process of pulling them, but I am already at 5GBs of logs and growing.... I will compress and post a Google Drive link as soon as they are complete..
#10 Updated by Brian Woods about 2 months ago
Alright, please let me know when you have the file so I can remove it from my drive:
https://drive.google.com/file/d/1i--yt4tcOn5augmpmRbQmS8pg4yyAPFU/view?usp=sharing
#11 Updated by Radoslaw Zarzynski about 1 month ago
Glad you've found it! Would mind uploading via the ceph-post-file
(https://docs.ceph.com/en/quincy/man/8/ceph-post-file/)?
#12 Updated by Brian Woods about 1 month ago
Radoslaw Zarzynski wrote:
Glad you've found it! Would mind uploading via the
ceph-post-file
(https://docs.ceph.com/en/quincy/man/8/ceph-post-file/)?
First attempt for an unknown RSA key error. Odd as I have never used the command, but meh...
$ ceph-post-file --description "All MGR Logs" -u "BRIANWOODS" mgr.zip args: --description All MGR Logs -u BRIANWOODS -- mgr.zip args: -u BRIANWOODS -- mgr.zip args: -- mgr.zip /usr/bin/ceph-post-file: upload tag f252e192-a62d-4ccd-846a-61906260aae3 /usr/bin/ceph-post-file: user: BRIANWOODS /usr/bin/ceph-post-file: description: All MGR Logs /usr/bin/ceph-post-file: will upload file mgr.zip @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the RSA key sent by the remote host is SHA256:cgG0rSk0W8I4sC+LZ8un9R0D5FKOwxwL5XVBt7PeW8Y. Please contact your system administrator. Add correct host key in /home/bpwoods/.ssh/known_hosts to get rid of this message. Offending RSA key in /usr/share/ceph/known_hosts_drop.ceph.com:1 remove with: ssh-keygen -f "/usr/share/ceph/known_hosts_drop.ceph.com" -R "drop.ceph.com" RSA host key for drop.ceph.com has changed and you have requested strict checking. Host key verification failed. Connection closed. Connection closed
I cleared the key and attempted the upload again and got a permission denied error:
$ ceph-post-file --description "All MGR Logs" -u "BRIANWOODS" mgr.zip args: --description All MGR Logs -u BRIANWOODS -- mgr.zip args: -u BRIANWOODS -- mgr.zip args: -- mgr.zip /usr/bin/ceph-post-file: upload tag cf4d360c-5007-4119-a244-1855a1519ebb /usr/bin/ceph-post-file: user: BRIANWOODS /usr/bin/ceph-post-file: description: All MGR Logs /usr/bin/ceph-post-file: will upload file mgr.zip The authenticity of host 'drop.ceph.com (8.43.84.129)' can't be established. ECDSA key fingerprint is SHA256:NQf3i0LwF3NK4drP9lDudf1u0UaA9G0uqdWxDefhhrU. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'drop.ceph.com' (ECDSA) to the list of known hosts. postfile@drop.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). Connection closed. Connection closed
#13 Updated by Brian Woods about 1 month ago
Any thoughts on this?
#14 Updated by Radoslaw Zarzynski 27 days ago
Downloading manually. Neha is testing ceph-post-file.