Project

General

Profile

Bug #43637

nautilus: qa: Health check failed: Reduced data availability: 16 pgs inactive (PG_AVAILABILITY)

Added by Ramana Raja about 1 month ago. Updated 15 days ago.

Status:
Triaged
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:
Crash signature:

Description

2020-01-10T10:41:33.742 INFO:teuthology.orchestra.run.smithi166:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v 'overall HEALTH_' | egrep -v '\(FS_DEGRADED\)' | egrep -v '\(MDS_FAILED\)' | egrep -v '\(MDS_DEGRADED\)' | egrep -v '\(FS_WITH_FAILED_MDS\)' | egrep -v '\(MDS_DAMAGE\)' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v 'overall HEALTH_' | egrep -v '\(OSD_DOWN\)' | egrep -v '\(OSD_' | egrep -v 'but it is still running' | egrep -v 'is not responding' | egrep -v 'not responding, replacing' | egrep -v '\(MDS_INSUFFICIENT_STANDBY\)' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2020-01-10T10:41:33.765 INFO:teuthology.orchestra.run.smithi166.stdout:2020-01-10 10:22:15.090135 mon.b (mon.0) 1750 : cluster [WRN] Health check failed: Reduced data availability: 16 pgs inactive (PG_AVAILABILITY)

From /a/yuriw-2020-01-09_22:23:54-fs-wip-yuri6-testing-2020-01-09-1744-nautilus-distro-basic-smithi/4650211/

Seen this failure in 12 jobs in fs suite and kcephfs suite,

In fs suite, /a/yuriw-2020-01-09_22:23:54-fs-wip-yuri6-testing-2020-01-09-1744-nautilus-distro-basic-smithi/

Failure: "2020-01-10 10:22:15.090135 mon.b (mon.0) 1750 : cluster [WRN] Health check failed: Reduced data availability: 16 pgs inactive (PG_AVAILABILITY)" in cluster log
5 jobs: ['4650211', '4650242', '4650177', '4650144', '4650276']
suites intersection: ['clusters/1a3s-mds-2c-client.yaml', 'conf/{client.yaml', 'fs/multifs/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/failover.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['clusters/1a3s-mds-2c-client.yaml', 'conf/{client.yaml', 'fs/multifs/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-bitmap.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'supported-random-distros$/{centos_latest.yaml}', 'supported-random-distros$/{ubuntu_16.04.yaml}', 'supported-random-distros$/{ubuntu_latest.yaml}', 'tasks/failover.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

In kcephfs suite, /a/yuriw-2020-01-09_22:20:53-kcephfs-wip-yuri6-testing-2020-01-09-1744-nautilus-distro-basic-smithi,

Failure: "2020-01-10 01:06:47.884132 mon.a (mon.0) 2248 : cluster [WRN] Health check failed: Reduced data availability: 16 pgs inactive (PG_AVAILABILITY)" in cluster log
7 jobs: ['4649992', '4649994', '4650026', '4650028', '4650030', '4650048', '4649990']
suites intersection: ['clusters/1-mds-4-client.yaml', 'conf/{client.yaml', 'kcephfs/recovery/{begin.yaml', 'kclient/{mount.yaml', 'log-config.yaml', 'mds.yaml', 'mon.yaml', 'ms-die-on-skipped.yaml}}', 'osd-asserts.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['clusters/1-mds-4-client.yaml', 'conf/{client.yaml', 'kcephfs/recovery/{begin.yaml', 'kclient/{mount.yaml', 'log-config.yaml', 'mds.yaml', 'mon.yaml', 'ms-die-on-skipped.yaml}}', 'objectstore-ec/bluestore-bitmap.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd-asserts.yaml', 'osd.yaml}', 'overrides/{distro/random/{k-testing.yaml', 'overrides/{distro/rhel/{k-distro.yaml', 'overrides/{frag_enable.yaml', 'rhel_latest.yaml}', 'supported$/{rhel_latest.yaml}}', 'supported$/{ubuntu_latest.yaml}}', 'tasks/damage.yaml}', 'tasks/data-scan.yaml}', 'tasks/failover.yaml}', 'tasks/volume-client.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

History

#1 Updated by Ramana Raja about 1 month ago

  • Subject changed from qa: task_failover.yaml failure to nautilus qa: task_failover.yaml failure
  • Target version changed from v14.2.5 to v14.2.7
  • Component(FS) deleted (qa-suite)

#2 Updated by Patrick Donnelly about 1 month ago

  • Subject changed from nautilus qa: task_failover.yaml failure to nautilus: qa: task_failover.yaml failure
  • Status changed from New to Triaged
  • Assignee set to Ramana Raja

#3 Updated by Patrick Donnelly 15 days ago

  • Subject changed from nautilus: qa: task_failover.yaml failure to nautilus: qa: Health check failed: Reduced data availability: 16 pgs inactive (PG_AVAILABILITY)
  • ceph-qa-suite deleted (kcephfs)
  • Component(FS) qa-suite added

Also available in: Atom PDF