Project

General

Profile

Bug #58052

Empty Pool (zero objects) shows usage.

Added by Brian Woods over 1 year ago. Updated 12 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a pool that was/is being used in a CephFS. I have migrated all of the files off of the pool and was preparing to remove it. But even after a day of the pool being empty (zero objects), it still shows as consuming space. It also shows statistics for objects under compression:

POOL_NAME                          USED  OBJECTS  CLONES    COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED    RD_OPS       RD    WR_OPS       WR  USED COMPR  UNDER COMPR
...
CephFS-Erasure-ByOSD-D4F2-Data  6.3 GiB        0       0         0                   0        0         0   5971837  5.9 TiB   2495150   39 GiB     365 MiB      730 MiB
...

I recently added about a dozen OSDs and it is going to take another day or so for that re-balancing to finish. Is it possible that this is because things are still re-balancing?

I tried a few things to see if I could figure out what was going on:

rbd -p CephFS-Erasure-ByOSD-D4F2-Data du
#No results.
rados -p CephFS-Erasure-ByOSD-D4F2-Data ls
#No results.

I also made sure that there was nothing leftover from any benchmarks:

rados -p CephFS-Erasure-ByOSD-D4F2-Data cleanup
#No change.

This pool was only used for CephFS and was not used for any block devices, nor have I used any snapshots.

The cluster started out as 17.2.3 but is currently running 17.2.4.

I would like to remove this pool, but I am not sure if it is safe to do so. I am not sure if I have run into some sort of an orphaning issue, or if this is just the statistics not calculating correctly.

Thanks!

Logs.zip (783 KB) Brian Woods, 12/02/2022 05:39 PM

server1.zip (349 KB) Brian Woods, 12/02/2022 06:09 PM

server2.zip (363 KB) Brian Woods, 12/02/2022 06:09 PM

History

#1 Updated by Radoslaw Zarzynski over 1 year ago

  • Status changed from New to Need More Info

Could you please provide a log from an active mgr with debug_ms=1 and debug_mgr=20? We would like to see which OSDs participate in these numbers.

Also, could you please retake the stats for the CephFS-Erasure-ByOSD-D4F2-Data pool?

#2 Updated by Brian Woods over 1 year ago

Radoslaw Zarzynski wrote:

Could you please provide a log from an active mgr with debug_ms=1 and debug_mgr=20?

I am not 100% sure I know how to do that, but this is what I have so far. I found this:
https://access.redhat.com/solutions/2085183

ceph --admin-daemon /var/run/ceph/ceph-client.rgw.<name>.asok config set debug_ms 1

So I:

# docker exec -it ceph-### bash
ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.### set debug_ms 1# ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.###  config set debug_ms 1
{
    "success": "" 
}
# ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgw-default.###  config set debug_mgr 20
{
    "success": "" 
}

That seemed to work, but I am not sure what logs you want me to grab. Just cephadm.log? I will have to scrub it a bit, but can get that soon.

We would like to see which OSDs participate in these numbers.

Is there a specific command I can issue to figure out what OSDs are participating in the space usage?

Also, could you please retake the stats for the CephFS-Erasure-ByOSD-D4F2-Data pool?

This? It moved a bit this morning when I deleted some files from another pool, but not since then (about 24 hours).

POOL_NAME                          USED  OBJECTS  CLONES    COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED    RD_OPS       RD    WR_OPS       WR  USED COMPR  UNDER COMPR

CephFS-Erasure-ByOSD-D4F2-Data  6.2 GiB        0       0         0                   0        0         0   5971837  5.9 TiB   2495150   39 GiB     360 MiB      720 MiB

#3 Updated by Radoslaw Zarzynski about 1 year ago

Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbosity on all mgrs below.

ceph config set mgr debug_ms 1
ceph config set mgr debug_mgr 20

Then, please collect the logs and revert the log settings to defaults.

#4 Updated by Brian Woods about 1 year ago

Radoslaw Zarzynski wrote:

Well, I think the command you mentioned did effect for RGW, not MGR. I'm providing the commands increasing log verbosity on all mgrs below.

[...]

Then, please collect the logs and revert the log settings to defaults.

Sorry for the delay. Logs attached.

What are the defaults?

Also, an update, I emptied another pool and have a similar case with it. Interesting side note for it, it took sending the "rados -p CephFS-Erasure-ByOSD-D5F1-Data cleanup --prefix benchmark_data" three times for it actually delete the blocks. On the third time it gave a "warning doing linear search" or something to the lines of that.

So we now look like this:

POOL_NAME                          USED   OBJECTS  CLONES     COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED    RD_OPS       RD    WR_OPS       WR  USED COMPR  UNDER COMPR
.mgr                             70 MiB        10       0         20                   0        0         0     16218   18 MiB     28266  592 MiB         0 B          0 B
.rgw.root                        48 KiB         6       0         12                   0        0         0       144  144 KiB         6    6 KiB         0 B          0 B
CephFS-Erasure-ByOSD-D4F2-Data  5.9 GiB         0       0          0                   0        0         0   5971837  5.9 TiB   2495150   39 GiB     353 MiB      706 MiB
CephFS-Erasure-ByOSD-D5F1-Data   74 GiB         0       0          0                   0        0         0  33833233   47 TiB  13492697  2.0 TiB     213 MiB      425 MiB
CephFS-Erasure-ByOSD-D5F3-Data   67 TiB  12757283       0  102058264                   0        0         0   3470975  2.5 TiB  19723113   43 TiB     145 GiB      291 GiB
CephFS-Meta                     5.3 GiB    762332       0    1524664                   0        0         0  35039654   71 GiB  41024292  195 GiB         0 B          0 B
CephFS-Root                      72 KiB   3218516       0    9655548                   0        0         0    258607  253 MiB  11827192  289 MiB         0 B          0 B
SSDStorage                      135 GiB    595698       0    1191396                   0        0         0   6160114  3.7 TiB  31158766  734 GiB      37 GiB       92 GiB
default.rgw.control                 0 B         8       0         16                   0        0         0         0      0 B         0      0 B         0 B          0 B
default.rgw.log                 272 KiB       209       0        418                   0        0         0   5750159  5.5 GiB   3821043   34 KiB         0 B          0 B
default.rgw.meta                 19 KiB         3       0          6                   0        0         0        90   69 KiB         6    2 KiB         0 B          0 B

#5 Updated by Brian Woods about 1 year ago

I am realizing those logs are from a single host (server4).
server3 got removed today.
Attaching server1 to this message.

#6 Updated by Brian Woods about 1 year ago

Attaching server2 to this message.

#7 Updated by Radoslaw Zarzynski about 1 year ago

Hello. Thanks for response and the files.

Archive:  server2.zip
  inflating: ceph-client.admin.log   
  inflating: ceph-volume.log         
  inflating: cephadm.log             
  inflating: cephadm.log.1           
  inflating: cephadm.log.2           
  inflating: cephadm.log.3           
  inflating: cephadm.log.4           
  inflating: cephadm.log.5           
  inflating: cephadm.log.6           
  inflating: cephadm.log.7 

Unfortunately, they lack the ceph-mgr's logs. Could you please provide the `ceph-mgr.*` log files?

#8 Updated by Brian Woods about 1 year ago

That is every log file from every node. There are no ceph-mgr* logs. :/

Even from inside the docker on the adm node:

# docker exec -u root -it ceph-cdffec50-412e-11ed-a95e-8be7dd44d8f3-mgr-server1-domain-local-miydsy bash
[root@55772 /]# cd /var/log/ceph
[root@55772 log]# cd ceph/
[root@server1 ceph]# ls -l
total 55772
-rw-r--r-- 1 ceph ceph        0 Oct  3 00:00 ceph-osd.0.log
-rw-r--r-- 1 ceph ceph    24839 Oct  1 02:30 ceph-osd.0.log.1.gz
-rw-r--r-- 1 ceph ceph        0 Oct  3 00:00 ceph-osd.1.log
-rw-r--r-- 1 ceph ceph    50855 Oct  2 19:44 ceph-osd.1.log.1.gz
-rw-r--r-- 1 ceph ceph        0 Oct 23 00:00 ceph-osd.11.log
-rw-r--r-- 1 ceph ceph    26394 Oct 21 19:57 ceph-osd.11.log.1.gz
-rw-r--r-- 1 ceph ceph        0 Oct  3 00:00 ceph-osd.2.log
-rw-r--r-- 1 ceph ceph    50852 Oct  2 19:45 ceph-osd.2.log.1.gz
-rw-r--r-- 1 ceph ceph        0 Oct  3 00:00 ceph-osd.3.log
-rw-r--r-- 1 ceph ceph    50824 Oct  2 19:45 ceph-osd.3.log.1.gz
-rw-r--r-- 1 ceph ceph        0 Oct  3 00:00 ceph-osd.4.log
-rw-r--r-- 1 ceph ceph    51142 Oct  2 19:46 ceph-osd.4.log.1.gz
-rw-r--r-- 1 ceph ceph        0 Oct  3 00:00 ceph-osd.5.log
-rw-r--r-- 1 ceph ceph    51450 Oct  2 19:47 ceph-osd.5.log.1.gz
-rw-r--r-- 1 root root        0 Dec  5 00:00 ceph-volume.log
-rw-r--r-- 1 root root  4175959 Dec  4 15:33 ceph-volume.log.1.gz
-rw-r--r-- 1 root root  6519855 Dec  3 23:40 ceph-volume.log.2.gz
-rw-r--r-- 1 root root  8749846 Dec  2 23:49 ceph-volume.log.3.gz
-rw-r--r-- 1 root root 10504801 Dec  1 23:59 ceph-volume.log.4.gz
-rw-r--r-- 1 root root  9538837 Nov 30 23:51 ceph-volume.log.5.gz
-rw-r--r-- 1 root root  8649327 Nov 29 23:42 ceph-volume.log.6.gz
-rw-r--r-- 1 root root  8631026 Nov 28 23:41 ceph-volume.log.7.gz
-rw-r--r-- 1 root root        0 Oct 27 00:00 cephadm.log
-rw-r--r-- 1 root root      188 Oct 25 14:02 cephadm.log.1.gz
[root@server1 ceph]# 

Ideas? I don't recall seeing mgr specific logs on any of my deployments (I have a few).

#9 Updated by Brian Woods about 1 year ago

Alright, I found the logs can be accessed from docker itself. In the process of pulling them, but I am already at 5GBs of logs and growing.... I will compress and post a Google Drive link as soon as they are complete..

#10 Updated by Brian Woods about 1 year ago

Alright, please let me know when you have the file so I can remove it from my drive:
https://drive.google.com/file/d/1i--yt4tcOn5augmpmRbQmS8pg4yyAPFU/view?usp=sharing

#11 Updated by Radoslaw Zarzynski about 1 year ago

Glad you've found it! Would mind uploading via the ceph-post-file (https://docs.ceph.com/en/quincy/man/8/ceph-post-file/)?

#12 Updated by Brian Woods about 1 year ago

Radoslaw Zarzynski wrote:

Glad you've found it! Would mind uploading via the ceph-post-file (https://docs.ceph.com/en/quincy/man/8/ceph-post-file/)?

First attempt for an unknown RSA key error. Odd as I have never used the command, but meh...

$ ceph-post-file  --description "All MGR Logs" -u "BRIANWOODS" mgr.zip 
args: --description All MGR Logs -u BRIANWOODS -- mgr.zip
args: -u BRIANWOODS -- mgr.zip
args: -- mgr.zip
/usr/bin/ceph-post-file: upload tag f252e192-a62d-4ccd-846a-61906260aae3
/usr/bin/ceph-post-file: user: BRIANWOODS
/usr/bin/ceph-post-file: description: All MGR Logs
/usr/bin/ceph-post-file: will upload file mgr.zip
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:cgG0rSk0W8I4sC+LZ8un9R0D5FKOwxwL5XVBt7PeW8Y.
Please contact your system administrator.
Add correct host key in /home/bpwoods/.ssh/known_hosts to get rid of this message.
Offending RSA key in /usr/share/ceph/known_hosts_drop.ceph.com:1
  remove with:
  ssh-keygen -f "/usr/share/ceph/known_hosts_drop.ceph.com" -R "drop.ceph.com" 
RSA host key for drop.ceph.com has changed and you have requested strict checking.
Host key verification failed.
Connection closed.  
Connection closed

I cleared the key and attempted the upload again and got a permission denied error:

$ ceph-post-file  --description "All MGR Logs" -u "BRIANWOODS" mgr.zip 
args: --description All MGR Logs -u BRIANWOODS -- mgr.zip
args: -u BRIANWOODS -- mgr.zip
args: -- mgr.zip
/usr/bin/ceph-post-file: upload tag cf4d360c-5007-4119-a244-1855a1519ebb
/usr/bin/ceph-post-file: user: BRIANWOODS
/usr/bin/ceph-post-file: description: All MGR Logs
/usr/bin/ceph-post-file: will upload file mgr.zip
The authenticity of host 'drop.ceph.com (8.43.84.129)' can't be established.
ECDSA key fingerprint is SHA256:NQf3i0LwF3NK4drP9lDudf1u0UaA9G0uqdWxDefhhrU.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'drop.ceph.com' (ECDSA) to the list of known hosts.
postfile@drop.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Connection closed.  
Connection closed

#13 Updated by Brian Woods about 1 year ago

Any thoughts on this?

#14 Updated by Radoslaw Zarzynski about 1 year ago

Downloading manually. Neha is testing ceph-post-file.

#15 Updated by Brian Woods about 1 year ago

Radoslaw Zarzynski wrote:

Downloading manually. Neha is testing ceph-post-file.

I kinda want to kill these pools to help with placement. Do you want to continue debugging this?

I was kinda hoping that they would clear themselves out after the cluster finished re-balancing (maybe a back logging issue), but no progress after several days of everything being clear.

#16 Updated by Brian Woods about 1 year ago

I am concerned that this could be a bigger issue, kinda like a memory leak, but for storage. And that this could continue to slowly eat up space over time.

#17 Updated by Brian Woods about 1 year ago

Not sure if this helps at all, but I was writing a script that generates a histogram of scrub times, and I noticed that the PGs are still getting scrubbed:

### Pool: CephFS-Meta  (ID: 3 ) ####
# There are 8 PGs, and 8 have been scrubbed in the last 24 hours.  At this rate, all PGs should be scrubbed by 2023-02-06 22:26:44  It should be possible to complete a full scrub in 24.0 hours.  The oldest PG scrub is: 2023-02-05 22:48:01

#Here is a histogram of scrub times:
February 05, 2023: *** (3)
February 06, 2023: ***** (5)

*** This is one of them:
### Pool: CephFS-Erasure-ByOSD-D4F2-Data  (ID: 7 ) ####
# There are 64 PGs, and 15 have been scrubbed in the last 24 hours.  At this rate, all PGs should be scrubbed by 2023-02-10 04:50:44  It should be possible to complete a full scrub in 102.4 hours.  The oldest PG scrub is: 2022-11-13 15:40:15

#Here is a histogram of scrub times:
November 13, 2022: * (1)
November 14, 2022: * (1)
November 29, 2022: * (1)
November 30, 2022: * (1)
December 27, 2022: * (1)
December 28, 2022: ** (2)
January 09, 2023: * (1)
February 01, 2023: *** (3)
February 02, 2023: ******** (8)
February 03, 2023: ********* (9)
February 04, 2023: ******* (7)
February 05, 2023: **************** (16)
February 06, 2023: ************* (13)

*** This is one of them:
### Pool: CephFS-Erasure-ByOSD-D5F1-Data  (ID: 8 ) ####
# There are 64 PGs, and 18 have been scrubbed in the last 24 hours.  At this rate, all PGs should be scrubbed by 2023-02-09 11:46:44  It should be possible to complete a full scrub in 85.3 hours.  The oldest PG scrub is: 2022-11-14 06:11:59

#Here is a histogram of scrub times:
November 14, 2022: * (1)
November 29, 2022: * (1)
December 27, 2022: ** (2)
December 28, 2022: ** (2)
January 09, 2023: * (1)
February 01, 2023: ********* (9)
February 02, 2023: ******* (7)
February 03, 2023: ***** (5)
February 04, 2023: ******* (7)
February 05, 2023: ************* (13)
February 06, 2023: **************** (16)

### Pool: SSDStorage  (ID: 9 ) ####
# There are 8 PGs, and 5 have been scrubbed in the last 24 hours.  At this rate, all PGs should be scrubbed by 2023-02-07 12:50:44  It should be possible to complete a full scrub in 38.4 hours.  The oldest PG scrub is: 2023-02-05 12:31:04

#Here is a histogram of scrub times:
February 05, 2023: **** (4)
February 06, 2023: **** (4)

I am just now working though my first scrub in a while, should be done soon.

#18 Updated by Radoslaw Zarzynski about 1 year ago

From the attached mgr-server1.log (the big log):

debug 2022-12-07T20:43:57.165+0000 7efe0d885700  0 [pg_autoscaler DEBUG root] pool: {
...
    "CephFS-Erasure-ByOSD-D4F2-Data": {
...
        "pool": 7,
        "pool_name": "CephFS-Erasure-ByOSD-D4F2-Data",
    "PG_NOT_SCRUBBED": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "116 pgs not scrubbed in time",
            "count": 116
        },
        "detail": [
            {
                "message": "pg 7.3e not scrubbed since 2022-11-13T16:53:51.788941+0000" 
            },
...
            {
                "message": "pg 7.3c not scrubbed since 2022-11-13T13:15:34.048678+0000" 
            },

Could you please provide results of ceph pg 7.3e query?

#19 Updated by Brian Woods about 1 year ago

Not 100% sure what command you are wanting, "ceph pg 7.3e" isn't complete.

Do you mean this?
  1. ceph pg ls 7 | grep 3e
    7.3e 0 0 0 0 0 0 0 2957 active+clean 12h 3339'290310 7103:1137146 [2,17,1,7,20,13]p2 [2,17,1,7,20,13]p2 2023-02-13T07:52:53.369394+0000 2023-02-10T21:23:44.278508+0000 1 periodic scrub scheduled @ 2023-02-14T18:16:32.369265+0000

#20 Updated by Brian Woods about 1 year ago

Scrubbing is almost caught up (though scrubbing doesn't seem to target the oldest first, not sure if that could be improved or now):

### Pool: CephFS-Erasure-ByOSD-D4F2-Data  (ID: 7 ) ####
# There are 64 PGs, and 23 have been scrubbed in the last 24 hours.  At this rate, all PGs should be scrubbed by 2023-02-15 15:04:50  It should be possible to complete a full scrub in 66.8 hours.  The oldest PG scrub is: 2023-02-02 04:59:04

#Here is a histogram of scrub times:
February 02, 2023: * (1)
February 03, 2023: * (1)
February 05, 2023: ** (2)
February 06, 2023: * (1)
February 07, 2023: * (1)
February 08, 2023: ** (2)
February 09, 2023: ****** (6)
February 10, 2023: ***** (5)
February 11, 2023: *********** (11)
February 12, 2023: **************** (16)
February 13, 2023: ****************** (18)

#21 Updated by Brian Woods about 1 year ago

Radoslaw Zarzynski wrote:

Could you please provide results of ceph pg 7.3e query?

Did that cover what you needed?

Sorry to poke, but I am hoping to kill these as soon. They make balancing a bit tricky.

#22 Updated by Brian Woods 12 months ago

Well, I need to move on, so I am deleting the pools. I may try to re-create this in a lab later. If I do, I will try and do it in a way that allows me to ship the whole cluster.

#23 Updated by Radoslaw Zarzynski 12 months ago

Some notes from from the last Ceph RADOS Team meeting:

1. It's very interesting trait that, although the USED varies over the invocations of `rados df`, RD_OPS and WR_OPS stay the same (they are frozen).
2. There is an already known bug on replicas sending pg stats (only primary should) which could lead not only to assertions but sometimes to accounting issues as well (see https://github.com/ceph/ceph/pull/50283). This a long-shot hypothesis and needs verification whether the pattern seen in the rados df samples could be explained that way.

Regardless of the hypothesis from #2 it would be superhelpful if the problem can be replicated on a lab cluster!

Also available in: Atom PDF