Ceph &raquo; RADOS

Category:

Tests

Target version:

Ceph - v19.0.0

% Done:

Source:

Development

Tags:

backport_processed

Backport:

pacific,quincy,reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

51925

Crash signature (v1):

Crash signature (v2):

Description

/a/lflores-2023-03-27_02:17:31-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7221015

2023-03-27T07:50:01.978 DEBUG:teuthology.orchestra.run.smithi103:workunit test cls/test_cls_sdk.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5e717292106ca2d310770101bfebb345837be8e1 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_sdk.sh

...

2023-03-27T07:50:48.129 DEBUG:teuthology.orchestra.run.smithi103:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
2023-03-27T07:50:48.162 INFO:teuthology.orchestra.run.smithi103.stdout:1679903171.7781157 mon.a (mon.0) 589 : cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
2023-03-27T07:50:48.163 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2023-03-27T07:50:48.163 DEBUG:teuthology.orchestra.run.smithi103:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
2023-03-27T07:50:48.218 DEBUG:teuthology.orchestra.run.smithi103:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
2023-03-27T07:50:48.272 DEBUG:teuthology.orchestra.run.smithi103:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1

Related issues 5 (1 open — 4 closed)

Related to rgw - Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)

Pending Backport

Casey Bodley

Related to crimson - Bug #62595: Health check failed: (POOL_APP_NOT_ENABLED)" in cluster log

Resolved

Copied to RADOS - Backport #61601: quincy: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)

Resolved

Copied to RADOS - Backport #61602: pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)

Rejected

Copied to RADOS - Backport #61603: reef: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)

Resolved

Updated by Laura Flores about 1 year ago

/a/lflores-2023-03-27_13:46:00-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7221524
/a/yuriw-2023-03-27_23:05:54-rados-wip-yuri4-testing-2023-03-25-0714-distro-default-smithi/7221965

Actions

Updated by Laura Flores about 1 year ago

Priority changed from Normal to High

I've seen this several times now in two different branches of unmerged PRs. Possible regression?

Actions

Updated by Neha Ojha about 1 year ago

seen in the rgw test suite too /a/cbodley-2023-03-22_18:01:21-rgw-main-distro-default-smithi/7216444 - see the discussion in https://github.com/ceph/ceph/pull/47560#issuecomment-1487406107

Actions

Updated by Neha Ojha about 1 year ago

Looking at a previous run very similar to /a/lflores-2023-03-27_02:17:31-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7221015 (rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/classic msgr-failures/many msgr/async objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{ubuntu_latest} tasks/rados_cls_all}) that had passed, it appears that the warning existed there too but the badness check just didn't catch it.

/a/yuriw-2023-03-17_23:38:21-rados-reef-distro-default-smithi/7212192 (rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/classic msgr-failures/many msgr/async objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{ubuntu_latest} tasks/rados_cls_all})

2023-03-19T04:46:59.319 INFO:tasks.ceph:Checking cluster log for badness...
2023-03-19T04:46:59.319 DEBUG:teuthology.orchestra.run.smithi121:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v '\(PG_AVAILABILITY\)' | head -n 1
2023-03-19T04:46:59.319 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mon.b is failed for ~0s
2023-03-19T04:46:59.339 INFO:tasks.ceph:Unmounting /var/lib/ceph/osd/ceph-0 on ubuntu@smithi121.front.sepia

nojha@teuthology:/ceph/teuthology-archive/yuriw-2023-03-17_23:38:21-rados-reef-distro-default-smithi/7212192/remote/smithi121/log$ zgrep "POOL_APP_NOT_ENABLED" ceph.log.gz 
1679200138.2431207 mon.a (mon.0) 640 : cluster 3 Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
1679200138.2431207 mon.a (mon.0) 640 : cluster 3 Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
1679200143.161677 mon.a (mon.0) 649 : cluster 1 Health check cleared: POOL_APP_NOT_ENABLED (was: 1 pool(s) do not have an application enabled)
1679200143.161677 mon.a (mon.0) 649 : cluster 1 Health check cleared: POOL_APP_NOT_ENABLED (was: 1 pool(s) do not have an application enabled)

Actions

Updated by Matan Breizman about 1 year ago

Neha Ojha wrote:

Looking at a previous run very similar to ... that had passed, it appears that the warning existed there too but the badness check just didn't catch it.

Unless I'm missing something, the pools created in this test shouldn't be associated to any application. If that's the case, then we can simply add "POOL_APP_NOT_ENABLED" to the ignore list (as done in other rados suite tests).

Actions

Updated by Laura Flores about 1 year ago

@Matan Breizman that's probably right, although I wonder what changed to make this pop up so frequently in the rados/rgw suites.

Actions

Updated by Casey Bodley about 1 year ago

this seems to happen exclusively against ubuntu 22.04:
https://pulpito.ceph.com/cbodley-2023-03-30_21:31:09-rgw:verify-wip-cbodley-testing-distro-default-smithi/

centos 8 runs were green:
https://pulpito.ceph.com/cbodley-2023-03-30_20:32:49-rgw:verify-wip-cbodley-testing-distro-default-smithi/

ubuntu 20.04 had other failures, but no cluster warnings:
https://pulpito.ceph.com/cbodley-2023-03-30_13:17:22-rgw:verify-wip-cbodley-testing-distro-default-smithi/

Actions

Updated by Laura Flores about 1 year ago

/a/yuriw-2023-03-27_23:05:54-rados-wip-yuri4-testing-2023-03-25-0714-distro-default-smithi/7221965

Actions

Updated by Laura Flores about 1 year ago

Backport set to pacific

/a/yuriw-2023-03-16_21:59:27-rados-wip-yuri6-testing-2023-03-12-0918-pacific-distro-default-smithi/7211186
/a/yuriw-2023-03-16_21:59:27-rados-wip-yuri6-testing-2023-03-12-0918-pacific-distro-default-smithi/7211167

Actions

#10

Updated by Laura Flores about 1 year ago

/a/yuriw-2023-03-30_21:53:20-rados-wip-yuri7-testing-2023-03-29-1100-distro-default-smithi/7227986

Actions

#11

Updated by Radoslaw Zarzynski about 1 year ago

Status changed from New to In Progress
Assignee set to Radoslaw Zarzynski

Actions

#12

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-25_14:15:40-rados-pacific-release-distro-default-smithi/7251186

Actions

#13

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-24_23:35:26-smoke-pacific-release-distro-default-smithi/7250661

Actions

#14

Updated by Casey Bodley 12 months ago

still failing consistently in the rgw suite
on main: https://pulpito.ceph.com/cbodley-2023-04-26_00:39:50-rgw-wip-cbodley2-testing-distro-default-smithi/
and on reef: https://pulpito.ceph.com/yuriw-2023-04-28_19:03:15-rgw-reef-distro-default-smithi/

Actions

#15

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-25_21:30:50-rados-wip-yuri3-testing-2023-04-25-1147-distro-default-smithi/7253406

Actions

#16

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-25_18:56:08-rados-wip-yuri5-testing-2023-04-25-0837-pacific-distro-default-smithi/7252745

Actions

#17

Updated by Laura Flores 12 months ago

/a/yuriw-2023-05-06_14:41:44-rados-pacific-release-distro-default-smithi/7264188

Actions

#18

Updated by Radoslaw Zarzynski 12 months ago

Assignee changed from Radoslaw Zarzynski to Laura Flores
Priority changed from High to Normal

Laura, would you mind taking a look? Definitely not urgent thing.

Actions

#19

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-26_01:16:19-rados-wip-yuri11-testing-2023-04-25-1605-pacific-distro-default-smithi/7253751

Sure Radek, I will see if something needs to be whitelisted.

Actions

#20

Updated by Casey Bodley 12 months ago

Related to Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added

Actions

#21

Updated by Laura Flores 11 months ago

Status changed from In Progress to Duplicate

Solved by https://github.com/ceph/ceph/pull/51494 in https://tracker.ceph.com/issues/61168.

Actions

#22

Updated by Laura Flores 11 months ago

Is duplicate of Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added

Actions

#23

Updated by Laura Flores 11 months ago

Related to deleted (Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED))

Actions

#24

Updated by Laura Flores 11 months ago

Status changed from Duplicate to New

Hmm, found another instance that looks like this tracker in main:
/a/yuriw-2023-06-01_19:33:38-rados-wip-yuri-testing-2023-06-01-0746-distro-default-smithi/7294007

The test branch has the commit mentioned above, so perhaps there needs to be an additional fix.

Actions

#25

Updated by Laura Flores 11 months ago

Is duplicate of deleted (Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED))

Actions

#26

Updated by Laura Flores 11 months ago

Related to Bug #61168: cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added

Actions

#27

Updated by Laura Flores 11 months ago

/a/yuriw-2023-05-30_21:40:46-rados-wip-yuri10-testing-2023-05-30-1244-distro-default-smithi/7290995

Actions

#28

Updated by Radoslaw Zarzynski 11 months ago

Backport changed from pacific to pacific,quincy,reef

Actions

#29

Updated by Radoslaw Zarzynski 11 months ago

Hi Laura! Do you have the bandwidth to take a deeper look?

Actions

#30

Updated by Laura Flores 11 months ago

Hey Radek, yes. Looking into it, it should be a quick whitelist fix. Trying out a fix now.

Actions

#31

Updated by Laura Flores 11 months ago

Status changed from New to Fix Under Review
Pull request ID set to 51925

Actions

#32

Updated by Laura Flores 11 months ago

Status changed from Fix Under Review to Pending Backport

Actions

#33

Updated by Backport Bot 11 months ago

Copied to Backport #61601: quincy: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added

Actions

#34

Updated by Backport Bot 11 months ago

Copied to Backport #61602: pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added

Actions

#35

Updated by Backport Bot 11 months ago

Copied to Backport #61603: reef: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added

Actions

#36

Updated by Backport Bot 11 months ago

Tags set to backport_processed

Actions

#37

Updated by Matan Breizman 8 months ago

Related to Bug #62595: Health check failed: (POOL_APP_NOT_ENABLED)" in cluster log added

Actions