Project

General

Profile

Actions

QA Run #65270

open

wip-yuri6-testing-2024-04-02-1310

Added by Yuri Weinstein 29 days ago. Updated 1 day ago.

Status:
QA Needs Approval
Priority:
Normal
Assignee:

Description

--- done. these PRs were included:
https://github.com/ceph/ceph/pull/55985 - make-dist: remove old cruft recursively
https://github.com/ceph/ceph/pull/56574 - test/lazy-omap-stats: Convert to boost::regex
https://github.com/ceph/ceph/pull/56640 - common/pick_address: check if address in subnet all public address

rados + upgrades

Actions #1

Updated by Yuri Weinstein 28 days ago

  • Status changed from QA Testing to QA Needs Approval
  • QA Runs set to wip-yuri6-testing-2024-04-02-1310
Actions #2

Updated by Yuri Weinstein 28 days ago

could not schedule, retriggered ceontos8

Actions #3

Updated by Yuri Weinstein 28 days ago

  • QA Runs deleted (wip-yuri6-testing-2024-04-02-1310)
Actions #4

Updated by Yuri Weinstein 28 days ago

could not schedule, rebased

Actions #6

Updated by Yuri Weinstein 27 days ago

  • QA Runs set to wip-yuri6-testing-2024-04-02-1310
Actions #7

Updated by Yuri Weinstein 26 days ago

  • Assignee changed from Yuri Weinstein to Laura Flores

@laura pls review

Actions #8

Updated by Laura Flores 23 days ago

I'll review this one; it can stay assigned to me.

Actions #9

Updated by Laura Flores 23 days ago

Hey @Yuri Weinstein can you also schedule an upgrade suite?

Actions #10

Updated by Laura Flores 23 days ago

@Yuri Weinstein actually don't yet- I think I see some issues.

Actions #11

Updated by Yuri Weinstein 23 days ago

Laura Flores wrote in #note-9:

Hey @Yuri Weinstein can you also schedule an upgrade suite?

@Laura Flores done, sorry missed that

Actions #12

Updated by Laura Flores 23 days ago

No worries- there do appear to be some issues in the rados suite though, so we may have to rebase. Stay tuned for updates..

Actions #13

Updated by Laura Flores 23 days ago

Found this in the cluster log:
/a/yuriw-2024-04-04_19:42:51-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7640475/remote/smithi012/log/ceph.log.gz

2024-04-04T22:44:15.705587+0000 mon.a (mon.0) 1321 : cluster [WRN] Health check update: Degraded data redundancy: 20/276 objects degraded (7.246%), 4 pgs degraded (PG_DEGRADED)
2024-04-04T22:44:15.713899+0000 mon.a (mon.0) 1322 : cluster [DBG] osdmap e208: 16 total, 15 up, 16 in
2024-04-04T22:44:16.374789+0000 mon.a (mon.0) 1323 : cluster [DBG] osd.15 reported immediately failed by osd.12
2024-04-04T22:44:16.374830+0000 mon.a (mon.0) 1324 : cluster [INF] osd.15 failed (root=default,host=smithi110) (connection refused reported by osd.12)
2024-04-04T22:44:16.375040+0000 mon.a (mon.0) 1325 : cluster [DBG] osd.15 reported immediately failed by osd.0
2024-04-04T22:44:16.375126+0000 mon.a (mon.0) 1326 : cluster [DBG] osd.15 reported immediately failed by osd.8
2024-04-04T22:44:16.375318+0000 mon.a (mon.0) 1327 : cluster [DBG] osd.15 reported immediately failed by osd.12
2024-04-04T22:44:16.375396+0000 mon.a (mon.0) 1328 : cluster [DBG] osd.15 reported immediately failed by osd.0
2024-04-04T22:44:16.375566+0000 mon.a (mon.0) 1329 : cluster [DBG] osd.15 reported immediately failed by osd.8
2024-04-04T22:44:16.377488+0000 mon.a (mon.0) 1330 : cluster [DBG] osd.15 reported immediately failed by osd.10
2024-04-04T22:44:16.377566+0000 mon.a (mon.0) 1331 : cluster [DBG] osd.15 reported immediately failed by osd.14
2024-04-04T22:44:16.377741+0000 mon.a (mon.0) 1332 : cluster [DBG] osd.15 reported immediately failed by osd.10
2024-04-04T22:44:16.377880+0000 mon.a (mon.0) 1333 : cluster [DBG] osd.15 reported immediately failed by osd.3
2024-04-04T22:44:16.378054+0000 mon.a (mon.0) 1334 : cluster [DBG] osd.15 reported immediately failed by osd.14
2024-04-04T22:44:16.378241+0000 mon.a (mon.0) 1335 : cluster [DBG] osd.15 reported immediately failed by osd.11
2024-04-04T22:44:16.378429+0000 mon.a (mon.0) 1336 : cluster [DBG] osd.15 reported immediately failed by osd.7
2024-04-04T22:44:16.378547+0000 mon.a (mon.0) 1337 : cluster [DBG] osd.15 reported immediately failed by osd.3
2024-04-04T22:44:16.378773+0000 mon.a (mon.0) 1338 : cluster [DBG] osd.15 reported immediately failed by osd.7
2024-04-04T22:44:16.378943+0000 mon.a (mon.0) 1339 : cluster [DBG] osd.15 reported immediately failed by osd.6
2024-04-04T22:44:16.379063+0000 mon.a (mon.0) 1340 : cluster [DBG] osd.15 reported immediately failed by osd.2
2024-04-04T22:44:16.379282+0000 mon.a (mon.0) 1341 : cluster [DBG] osd.15 reported immediately failed by osd.11
2024-04-04T22:44:16.379407+0000 mon.a (mon.0) 1342 : cluster [DBG] osd.15 reported immediately failed by osd.2
2024-04-04T22:44:16.379623+0000 mon.a (mon.0) 1343 : cluster [DBG] osd.15 reported immediately failed by osd.13
2024-04-04T22:44:16.379795+0000 mon.a (mon.0) 1344 : cluster [DBG] osd.15 reported immediately failed by osd.6
2024-04-04T22:44:16.379983+0000 mon.a (mon.0) 1345 : cluster [DBG] osd.15 reported immediately failed by osd.9
2024-04-04T22:44:16.380104+0000 mon.a (mon.0) 1346 : cluster [DBG] osd.15 reported immediately failed by osd.13
2024-04-04T22:44:16.380334+0000 mon.a (mon.0) 1347 : cluster [DBG] osd.15 reported immediately failed by osd.9
2024-04-04T22:44:16.709029+0000 mon.a (mon.0) 1349 : cluster [WRN] Health check failed: noscrub flag(s) set (OSDMAP_FLAGS)
2024-04-04T22:44:16.709110+0000 mon.a (mon.0) 1350 : cluster [WRN] Health check update: 2 osds down (OSD_DOWN)

Actions #14

Updated by Laura Flores 23 days ago

@Yuri Weinstein can you drop https://github.com/ceph/ceph/pull/56640 and rebase?

You can kill the upgrade runs, since this batch has issues.

Actions #15

Updated by Yuri Weinstein 23 days ago

Laura Flores wrote in #note-14:

@Yuri Weinstein can you drop https://github.com/ceph/ceph/pull/56640 and rebase?

You can kill the upgrade runs, since this batch has issues.

killed

Actions #16

Updated by Laura Flores 22 days ago

Hey @Yuri Weinstein just checking that this run is getting / has gotten rebased and rerun?

Actions #17

Updated by Yuri Weinstein 22 days ago · Edited

Laura Flores wrote in #note-16:

Hey @Yuri Weinstein just checking that this run is getting / has gotten rebased and rerun?

No

I saw no requests :(

I think the status and assignee has to change when the ticker needs to be redone
And we need to limit comments only to the tracker.

Agree?

@Laura Flores what do you need for this ticket?

PS: I am rebasing in anticipation you'd say yes :)

Actions #18

Updated by Yuri Weinstein 22 days ago

  • Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
  • Assignee changed from Laura Flores to Yuri Weinstein
Actions #19

Updated by Yuri Weinstein 22 days ago

  • QA Runs deleted (wip-yuri6-testing-2024-04-02-1310)
Actions #21

Updated by Yuri Weinstein 21 days ago

  • Tags set to core
Actions #22

Updated by Yuri Weinstein 21 days ago

  • Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
  • Assignee changed from Yuri Weinstein to Laura Flores
  • QA Runs set to wip-yuri6-testing-2024-04-02-1310
Actions #23

Updated by Laura Flores 20 days ago

  • Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
  • Assignee changed from Laura Flores to Yuri Weinstein

@Yuri Weinstein this will need to be rerun. I see a lot of failures from "Failed to establish a new connection" that I suspect may be related to known connection issues in the lab. See https://ceph-storage.slack.com/archives/C1HFJ4VTN/p1712774983720229

Once this is resolved, I need the results rerun.

Actions #24

Updated by Yuri Weinstein 20 days ago

Laura Flores wrote in #note-23:

Once this is resolved, I need the results rerun.

Attempting a rerun again

Actions #25

Updated by Yuri Weinstein 20 days ago

  • Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
  • Assignee changed from Yuri Weinstein to Laura Flores
Actions #26

Updated by Laura Flores 14 days ago · Edited

  • Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
  • Assignee changed from Laura Flores to Yuri Weinstein

Hey @Yuri Weinstein, https://github.com/ceph/ceph/pull/53545 caused some regressions. Can you remove it from the batch and rebuild?

And instead of rerunning the whole rados suite, let's just rerun failed+dead from https://pulpito.ceph.com/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/.

And no need to rerun upgrade.

Actions #27

Updated by Yuri Weinstein 13 days ago

  • Description updated (diff)
Actions #28

Updated by Yuri Weinstein 13 days ago

  • QA Runs deleted (wip-yuri6-testing-2024-04-02-1310)
Actions #29

Updated by Yuri Weinstein 13 days ago

@Laura Flores rebuilding (note it's very slow :()

Actions #30

Updated by Yuri Weinstein 12 days ago

  • Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
  • Assignee changed from Yuri Weinstein to Laura Flores
  • QA Runs set to wip-yuri6-testing-2024-04-02-1310
Actions

Also available in: Atom PDF