QA Run #65270: wip-yuri6-testing-2024-04-02-1310 - Ceph QA - Ceph

Actions

Copy link

QA Run #65270

closed

wip-yuri6-testing-2024-04-02-1310

Added by Yuri Weinstein about 2 months ago. Updated 3 days ago.

Status:

QA Closed

Priority:

Normal

Assignee:

Yuri Weinstein

Tags:

core

Shaman Build:

wip-yuri6-testing-2024-04-02-1310

QA Runs:

wip-yuri6-testing-2024-04-02-1310

QA Release:

main

Git Branch:

yuriw/ceph/commits/wip-yuri6-testing-2024-04-02-1310

Description

--- done. these PRs were included:
https://github.com/ceph/ceph/pull/55985 - make-dist: remove old cruft recursively
https://github.com/ceph/ceph/pull/56574 - test/lazy-omap-stats: Convert to boost::regex
https://github.com/ceph/ceph/pull/56640 - common/pick_address: check if address in subnet all public address

rados + upgrades

Actions

Copy link

Updated by Yuri Weinstein about 2 months ago

Status changed from QA Testing to QA Needs Approval
QA Runs set to wip-yuri6-testing-2024-04-02-1310

Actions

Copy link

Updated by Yuri Weinstein about 2 months ago

could not schedule, retriggered ceontos8

Actions

Copy link

Updated by Yuri Weinstein about 2 months ago

QA Runs deleted (~~wip-yuri6-testing-2024-04-02-1310~~)

Actions

Copy link

Updated by Yuri Weinstein about 2 months ago

could not schedule, rebased

Actions

Copy link

Updated by Yuri Weinstein about 1 month ago

https://shaman.ceph.com/builds/ceph/wip-yuri6-testing-2024-04-02-1310/4e7fe392e7f55e50416aa95f11128ea7f475a31b/default/

Actions

Copy link

Updated by Yuri Weinstein about 1 month ago

QA Runs set to wip-yuri6-testing-2024-04-02-1310

Actions

Copy link

Updated by Yuri Weinstein about 1 month ago

Assignee changed from Yuri Weinstein to Laura Flores

@laura pls review

Actions

Copy link

Updated by Laura Flores about 1 month ago

I'll review this one; it can stay assigned to me.

Actions

Copy link

Updated by Laura Flores about 1 month ago

Hey @Yuri Weinstein can you also schedule an upgrade suite?

Actions

Copy link

#10

Updated by Laura Flores about 1 month ago

@Yuri Weinstein actually don't yet- I think I see some issues.

Actions

Copy link

#11

Updated by Yuri Weinstein about 1 month ago

Laura Flores wrote in #note-9:

Hey @Yuri Weinstein can you also schedule an upgrade suite?

@Laura Flores done, sorry missed that

Actions

Copy link

#12

Updated by Laura Flores about 1 month ago

No worries- there do appear to be some issues in the rados suite though, so we may have to rebase. Stay tuned for updates..

Actions

Copy link

#13

Updated by Laura Flores about 1 month ago

Found this in the cluster log:
/a/yuriw-2024-04-04_19:42:51-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7640475/remote/smithi012/log/ceph.log.gz

2024-04-04T22:44:15.705587+0000 mon.a (mon.0) 1321 : cluster [WRN] Health check update: Degraded data redundancy: 20/276 objects degraded (7.246%), 4 pgs degraded (PG_DEGRADED)
2024-04-04T22:44:15.713899+0000 mon.a (mon.0) 1322 : cluster [DBG] osdmap e208: 16 total, 15 up, 16 in
2024-04-04T22:44:16.374789+0000 mon.a (mon.0) 1323 : cluster [DBG] osd.15 reported immediately failed by osd.12
2024-04-04T22:44:16.374830+0000 mon.a (mon.0) 1324 : cluster [INF] osd.15 failed (root=default,host=smithi110) (connection refused reported by osd.12)
2024-04-04T22:44:16.375040+0000 mon.a (mon.0) 1325 : cluster [DBG] osd.15 reported immediately failed by osd.0
2024-04-04T22:44:16.375126+0000 mon.a (mon.0) 1326 : cluster [DBG] osd.15 reported immediately failed by osd.8
2024-04-04T22:44:16.375318+0000 mon.a (mon.0) 1327 : cluster [DBG] osd.15 reported immediately failed by osd.12
2024-04-04T22:44:16.375396+0000 mon.a (mon.0) 1328 : cluster [DBG] osd.15 reported immediately failed by osd.0
2024-04-04T22:44:16.375566+0000 mon.a (mon.0) 1329 : cluster [DBG] osd.15 reported immediately failed by osd.8
2024-04-04T22:44:16.377488+0000 mon.a (mon.0) 1330 : cluster [DBG] osd.15 reported immediately failed by osd.10
2024-04-04T22:44:16.377566+0000 mon.a (mon.0) 1331 : cluster [DBG] osd.15 reported immediately failed by osd.14
2024-04-04T22:44:16.377741+0000 mon.a (mon.0) 1332 : cluster [DBG] osd.15 reported immediately failed by osd.10
2024-04-04T22:44:16.377880+0000 mon.a (mon.0) 1333 : cluster [DBG] osd.15 reported immediately failed by osd.3
2024-04-04T22:44:16.378054+0000 mon.a (mon.0) 1334 : cluster [DBG] osd.15 reported immediately failed by osd.14
2024-04-04T22:44:16.378241+0000 mon.a (mon.0) 1335 : cluster [DBG] osd.15 reported immediately failed by osd.11
2024-04-04T22:44:16.378429+0000 mon.a (mon.0) 1336 : cluster [DBG] osd.15 reported immediately failed by osd.7
2024-04-04T22:44:16.378547+0000 mon.a (mon.0) 1337 : cluster [DBG] osd.15 reported immediately failed by osd.3
2024-04-04T22:44:16.378773+0000 mon.a (mon.0) 1338 : cluster [DBG] osd.15 reported immediately failed by osd.7
2024-04-04T22:44:16.378943+0000 mon.a (mon.0) 1339 : cluster [DBG] osd.15 reported immediately failed by osd.6
2024-04-04T22:44:16.379063+0000 mon.a (mon.0) 1340 : cluster [DBG] osd.15 reported immediately failed by osd.2
2024-04-04T22:44:16.379282+0000 mon.a (mon.0) 1341 : cluster [DBG] osd.15 reported immediately failed by osd.11
2024-04-04T22:44:16.379407+0000 mon.a (mon.0) 1342 : cluster [DBG] osd.15 reported immediately failed by osd.2
2024-04-04T22:44:16.379623+0000 mon.a (mon.0) 1343 : cluster [DBG] osd.15 reported immediately failed by osd.13
2024-04-04T22:44:16.379795+0000 mon.a (mon.0) 1344 : cluster [DBG] osd.15 reported immediately failed by osd.6
2024-04-04T22:44:16.379983+0000 mon.a (mon.0) 1345 : cluster [DBG] osd.15 reported immediately failed by osd.9
2024-04-04T22:44:16.380104+0000 mon.a (mon.0) 1346 : cluster [DBG] osd.15 reported immediately failed by osd.13
2024-04-04T22:44:16.380334+0000 mon.a (mon.0) 1347 : cluster [DBG] osd.15 reported immediately failed by osd.9
2024-04-04T22:44:16.709029+0000 mon.a (mon.0) 1349 : cluster [WRN] Health check failed: noscrub flag(s) set (OSDMAP_FLAGS)
2024-04-04T22:44:16.709110+0000 mon.a (mon.0) 1350 : cluster [WRN] Health check update: 2 osds down (OSD_DOWN)

Actions

Copy link

#14

Updated by Laura Flores about 1 month ago

@Yuri Weinstein can you drop https://github.com/ceph/ceph/pull/56640 and rebase?

You can kill the upgrade runs, since this batch has issues.

Actions

Copy link

#15

Updated by Yuri Weinstein about 1 month ago

Laura Flores wrote in #note-14:

@Yuri Weinstein can you drop https://github.com/ceph/ceph/pull/56640 and rebase?

You can kill the upgrade runs, since this batch has issues.

killed

Actions

Copy link

#16

Updated by Laura Flores about 1 month ago

Hey @Yuri Weinstein just checking that this run is getting / has gotten rebased and rerun?

Actions

Copy link

#17

Updated by Yuri Weinstein about 1 month ago · Edited

Laura Flores wrote in #note-16:

Hey @Yuri Weinstein just checking that this run is getting / has gotten rebased and rerun?

I saw no requests :(

I think the status and assignee has to change when the ticker needs to be redone
And we need to limit comments only to the tracker.

Agree?

@Laura Flores what do you need for this ticket?

PS: I am rebasing in anticipation you'd say yes :)

Actions

Copy link

#18

Updated by Yuri Weinstein about 1 month ago

Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
Assignee changed from Laura Flores to Yuri Weinstein

Actions

Copy link

#19

Updated by Yuri Weinstein about 1 month ago

QA Runs deleted (~~wip-yuri6-testing-2024-04-02-1310~~)

Actions

Copy link

#20

Updated by Yuri Weinstein about 1 month ago

https://shaman.ceph.com/builds/ceph/wip-yuri6-testing-2024-04-02-1310/a5074d4516d566e9d8b6aec912f26afd099de101/

Actions

Copy link

#21

Updated by Yuri Weinstein about 1 month ago

Tags set to core

Actions

Copy link

#22

Updated by Yuri Weinstein about 1 month ago

Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
Assignee changed from Yuri Weinstein to Laura Flores
QA Runs set to wip-yuri6-testing-2024-04-02-1310

Actions

Copy link

#23

Updated by Laura Flores about 1 month ago

Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
Assignee changed from Laura Flores to Yuri Weinstein

@Yuri Weinstein this will need to be rerun. I see a lot of failures from "Failed to establish a new connection" that I suspect may be related to known connection issues in the lab. See https://ceph-storage.slack.com/archives/C1HFJ4VTN/p1712774983720229

Once this is resolved, I need the results rerun.

Actions

Copy link

#24