Bug #59192
closed
cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
Added by Laura Flores about 1 year ago.
Updated 5 months ago.
Backport:
pacific,quincy,reef
Description
/a/lflores-2023-03-27_02:17:31-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7221015
2023-03-27T07:50:01.978 DEBUG:teuthology.orchestra.run.smithi103:workunit test cls/test_cls_sdk.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5e717292106ca2d310770101bfebb345837be8e1 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_sdk.sh
...
2023-03-27T07:50:48.129 DEBUG:teuthology.orchestra.run.smithi103:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
2023-03-27T07:50:48.162 INFO:teuthology.orchestra.run.smithi103.stdout:1679903171.7781157 mon.a (mon.0) 589 : cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
2023-03-27T07:50:48.163 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2023-03-27T07:50:48.163 DEBUG:teuthology.orchestra.run.smithi103:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
2023-03-27T07:50:48.218 DEBUG:teuthology.orchestra.run.smithi103:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
2023-03-27T07:50:48.272 DEBUG:teuthology.orchestra.run.smithi103:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
/a/lflores-2023-03-27_13:46:00-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7221524
/a/yuriw-2023-03-27_23:05:54-rados-wip-yuri4-testing-2023-03-25-0714-distro-default-smithi/7221965
- Priority changed from Normal to High
I've seen this several times now in two different branches of unmerged PRs. Possible regression?
Looking at a previous run very similar to /a/lflores-2023-03-27_02:17:31-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7221015 (rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/classic msgr-failures/many msgr/async objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{ubuntu_latest} tasks/rados_cls_all}) that had passed, it appears that the warning existed there too but the badness check just didn't catch it.
/a/yuriw-2023-03-17_23:38:21-rados-reef-distro-default-smithi/7212192 (rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/classic msgr-failures/many msgr/async objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{ubuntu_latest} tasks/rados_cls_all})
2023-03-19T04:46:59.319 INFO:tasks.ceph:Checking cluster log for badness...
2023-03-19T04:46:59.319 DEBUG:teuthology.orchestra.run.smithi121:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
2023-03-19T04:46:59.319 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mon.b is failed for ~0s
2023-03-19T04:46:59.339 INFO:tasks.ceph:Unmounting /var/lib/ceph/osd/ceph-0 on ubuntu@smithi121.front.sepia
nojha@teuthology:/ceph/teuthology-archive/yuriw-2023-03-17_23:38:21-rados-reef-distro-default-smithi/7212192/remote/smithi121/log$ zgrep "POOL_APP_NOT_ENABLED" ceph.log.gz
1679200138.2431207 mon.a (mon.0) 640 : cluster 3 Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
1679200138.2431207 mon.a (mon.0) 640 : cluster 3 Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
1679200143.161677 mon.a (mon.0) 649 : cluster 1 Health check cleared: POOL_APP_NOT_ENABLED (was: 1 pool(s) do not have an application enabled)
1679200143.161677 mon.a (mon.0) 649 : cluster 1 Health check cleared: POOL_APP_NOT_ENABLED (was: 1 pool(s) do not have an application enabled)
Neha Ojha wrote:
Looking at a previous run very similar to ... that had passed, it appears that the warning existed there too but the badness check just didn't catch it.
Unless I'm missing something, the pools created in this test shouldn't be associated to any application. If that's the case, then we can simply add "POOL_APP_NOT_ENABLED" to the ignore list (as done in other rados suite tests).
@Matan Breizman that's probably right, although I wonder what changed to make this pop up so frequently in the rados/rgw suites.
/a/yuriw-2023-03-27_23:05:54-rados-wip-yuri4-testing-2023-03-25-0714-distro-default-smithi/7221965
/a/yuriw-2023-03-16_21:59:27-rados-wip-yuri6-testing-2023-03-12-0918-pacific-distro-default-smithi/7211186
/a/yuriw-2023-03-16_21:59:27-rados-wip-yuri6-testing-2023-03-12-0918-pacific-distro-default-smithi/7211167
/a/yuriw-2023-03-30_21:53:20-rados-wip-yuri7-testing-2023-03-29-1100-distro-default-smithi/7227986
- Status changed from New to In Progress
- Assignee set to Radoslaw Zarzynski
/a/yuriw-2023-04-25_14:15:40-rados-pacific-release-distro-default-smithi/7251186
/a/yuriw-2023-04-24_23:35:26-smoke-pacific-release-distro-default-smithi/7250661
/a/yuriw-2023-04-25_21:30:50-rados-wip-yuri3-testing-2023-04-25-1147-distro-default-smithi/7253406
/a/yuriw-2023-04-25_18:56:08-rados-wip-yuri5-testing-2023-04-25-0837-pacific-distro-default-smithi/7252745
/a/yuriw-2023-05-06_14:41:44-rados-pacific-release-distro-default-smithi/7264188
- Assignee changed from Radoslaw Zarzynski to Laura Flores
- Priority changed from High to Normal
Laura, would you mind taking a look? Definitely not urgent thing.
/a/yuriw-2023-04-26_01:16:19-rados-wip-yuri11-testing-2023-04-25-1605-pacific-distro-default-smithi/7253751
Sure Radek, I will see if something needs to be whitelisted.
- Related to Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added
- Status changed from In Progress to Duplicate
- Is duplicate of Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added
- Related to deleted (Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED))
- Status changed from Duplicate to New
Hmm, found another instance that looks like this tracker in main:
/a/yuriw-2023-06-01_19:33:38-rados-wip-yuri-testing-2023-06-01-0746-distro-default-smithi/7294007
The test branch has the commit mentioned above, so perhaps there needs to be an additional fix.
- Is duplicate of deleted (Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED))
- Related to Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added
/a/yuriw-2023-05-30_21:40:46-rados-wip-yuri10-testing-2023-05-30-1244-distro-default-smithi/7290995
- Backport changed from pacific to pacific,quincy,reef
Hi Laura! Do you have the bandwidth to take a deeper look?
Hey Radek, yes. Looking into it, it should be a quick whitelist fix. Trying out a fix now.
- Status changed from New to Fix Under Review
- Pull request ID set to 51925
- Status changed from Fix Under Review to Pending Backport
- Copied to Backport #61601: quincy: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added
- Copied to Backport #61602: pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added
- Copied to Backport #61603: reef: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added
- Tags set to backport_processed
- Related to Bug #62595: Health check failed: (POOL_APP_NOT_ENABLED)" in cluster log added
- Category set to Tests
- Status changed from Pending Backport to Resolved
- Target version set to v19.0.0
- Source set to Development
Also available in: Atom
PDF