QA Run #65270
openwip-yuri6-testing-2024-04-02-1310
Description
--- done. these PRs were included:
https://github.com/ceph/ceph/pull/55985 - make-dist: remove old cruft recursively
https://github.com/ceph/ceph/pull/56574 - test/lazy-omap-stats: Convert to boost::regex
https://github.com/ceph/ceph/pull/56640 - common/pick_address: check if address in subnet all public address
rados + upgrades
Updated by Yuri Weinstein 28 days ago
- Status changed from QA Testing to QA Needs Approval
- QA Runs set to wip-yuri6-testing-2024-04-02-1310
Updated by Yuri Weinstein 28 days ago
- QA Runs deleted (
wip-yuri6-testing-2024-04-02-1310)
Updated by Yuri Weinstein 27 days ago
- QA Runs set to wip-yuri6-testing-2024-04-02-1310
Updated by Yuri Weinstein 26 days ago
- Assignee changed from Yuri Weinstein to Laura Flores
@laura pls review
Updated by Laura Flores 23 days ago
I'll review this one; it can stay assigned to me.
Updated by Laura Flores 23 days ago
Hey @Yuri Weinstein can you also schedule an upgrade suite?
Updated by Laura Flores 23 days ago
@Yuri Weinstein actually don't yet- I think I see some issues.
Updated by Yuri Weinstein 23 days ago
Laura Flores wrote in #note-9:
Hey @Yuri Weinstein can you also schedule an upgrade suite?
@Laura Flores done, sorry missed that
Updated by Laura Flores 23 days ago
No worries- there do appear to be some issues in the rados suite though, so we may have to rebase. Stay tuned for updates..
Updated by Laura Flores 23 days ago
Found this in the cluster log:
/a/yuriw-2024-04-04_19:42:51-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7640475/remote/smithi012/log/ceph.log.gz
2024-04-04T22:44:15.705587+0000 mon.a (mon.0) 1321 : cluster [WRN] Health check update: Degraded data redundancy: 20/276 objects degraded (7.246%), 4 pgs degraded (PG_DEGRADED)
2024-04-04T22:44:15.713899+0000 mon.a (mon.0) 1322 : cluster [DBG] osdmap e208: 16 total, 15 up, 16 in
2024-04-04T22:44:16.374789+0000 mon.a (mon.0) 1323 : cluster [DBG] osd.15 reported immediately failed by osd.12
2024-04-04T22:44:16.374830+0000 mon.a (mon.0) 1324 : cluster [INF] osd.15 failed (root=default,host=smithi110) (connection refused reported by osd.12)
2024-04-04T22:44:16.375040+0000 mon.a (mon.0) 1325 : cluster [DBG] osd.15 reported immediately failed by osd.0
2024-04-04T22:44:16.375126+0000 mon.a (mon.0) 1326 : cluster [DBG] osd.15 reported immediately failed by osd.8
2024-04-04T22:44:16.375318+0000 mon.a (mon.0) 1327 : cluster [DBG] osd.15 reported immediately failed by osd.12
2024-04-04T22:44:16.375396+0000 mon.a (mon.0) 1328 : cluster [DBG] osd.15 reported immediately failed by osd.0
2024-04-04T22:44:16.375566+0000 mon.a (mon.0) 1329 : cluster [DBG] osd.15 reported immediately failed by osd.8
2024-04-04T22:44:16.377488+0000 mon.a (mon.0) 1330 : cluster [DBG] osd.15 reported immediately failed by osd.10
2024-04-04T22:44:16.377566+0000 mon.a (mon.0) 1331 : cluster [DBG] osd.15 reported immediately failed by osd.14
2024-04-04T22:44:16.377741+0000 mon.a (mon.0) 1332 : cluster [DBG] osd.15 reported immediately failed by osd.10
2024-04-04T22:44:16.377880+0000 mon.a (mon.0) 1333 : cluster [DBG] osd.15 reported immediately failed by osd.3
2024-04-04T22:44:16.378054+0000 mon.a (mon.0) 1334 : cluster [DBG] osd.15 reported immediately failed by osd.14
2024-04-04T22:44:16.378241+0000 mon.a (mon.0) 1335 : cluster [DBG] osd.15 reported immediately failed by osd.11
2024-04-04T22:44:16.378429+0000 mon.a (mon.0) 1336 : cluster [DBG] osd.15 reported immediately failed by osd.7
2024-04-04T22:44:16.378547+0000 mon.a (mon.0) 1337 : cluster [DBG] osd.15 reported immediately failed by osd.3
2024-04-04T22:44:16.378773+0000 mon.a (mon.0) 1338 : cluster [DBG] osd.15 reported immediately failed by osd.7
2024-04-04T22:44:16.378943+0000 mon.a (mon.0) 1339 : cluster [DBG] osd.15 reported immediately failed by osd.6
2024-04-04T22:44:16.379063+0000 mon.a (mon.0) 1340 : cluster [DBG] osd.15 reported immediately failed by osd.2
2024-04-04T22:44:16.379282+0000 mon.a (mon.0) 1341 : cluster [DBG] osd.15 reported immediately failed by osd.11
2024-04-04T22:44:16.379407+0000 mon.a (mon.0) 1342 : cluster [DBG] osd.15 reported immediately failed by osd.2
2024-04-04T22:44:16.379623+0000 mon.a (mon.0) 1343 : cluster [DBG] osd.15 reported immediately failed by osd.13
2024-04-04T22:44:16.379795+0000 mon.a (mon.0) 1344 : cluster [DBG] osd.15 reported immediately failed by osd.6
2024-04-04T22:44:16.379983+0000 mon.a (mon.0) 1345 : cluster [DBG] osd.15 reported immediately failed by osd.9
2024-04-04T22:44:16.380104+0000 mon.a (mon.0) 1346 : cluster [DBG] osd.15 reported immediately failed by osd.13
2024-04-04T22:44:16.380334+0000 mon.a (mon.0) 1347 : cluster [DBG] osd.15 reported immediately failed by osd.9
2024-04-04T22:44:16.709029+0000 mon.a (mon.0) 1349 : cluster [WRN] Health check failed: noscrub flag(s) set (OSDMAP_FLAGS)
2024-04-04T22:44:16.709110+0000 mon.a (mon.0) 1350 : cluster [WRN] Health check update: 2 osds down (OSD_DOWN)
Updated by Laura Flores 23 days ago
@Yuri Weinstein can you drop https://github.com/ceph/ceph/pull/56640 and rebase?
You can kill the upgrade runs, since this batch has issues.
Updated by Yuri Weinstein 23 days ago
Laura Flores wrote in #note-14:
@Yuri Weinstein can you drop https://github.com/ceph/ceph/pull/56640 and rebase?
You can kill the upgrade runs, since this batch has issues.
killed
Updated by Laura Flores 22 days ago
Hey @Yuri Weinstein just checking that this run is getting / has gotten rebased and rerun?
Updated by Yuri Weinstein 22 days ago · Edited
Laura Flores wrote in #note-16:
Hey @Yuri Weinstein just checking that this run is getting / has gotten rebased and rerun?
No
I saw no requests :(
I think the status and assignee has to change when the ticker needs to be redone
And we need to limit comments only to the tracker.
Agree?
@Laura Flores what do you need for this ticket?
PS: I am rebasing in anticipation you'd say yes :)
Updated by Yuri Weinstein 22 days ago
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- Assignee changed from Laura Flores to Yuri Weinstein
Updated by Yuri Weinstein 22 days ago
- QA Runs deleted (
wip-yuri6-testing-2024-04-02-1310)
Updated by Yuri Weinstein 21 days ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
- QA Runs set to wip-yuri6-testing-2024-04-02-1310
Updated by Laura Flores 20 days ago
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- Assignee changed from Laura Flores to Yuri Weinstein
@Yuri Weinstein this will need to be rerun. I see a lot of failures from "Failed to establish a new connection" that I suspect may be related to known connection issues in the lab. See https://ceph-storage.slack.com/archives/C1HFJ4VTN/p1712774983720229
Once this is resolved, I need the results rerun.
Updated by Yuri Weinstein 20 days ago
Laura Flores wrote in #note-23:
Once this is resolved, I need the results rerun.
Attempting a rerun again
Updated by Yuri Weinstein 20 days ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
Updated by Laura Flores 14 days ago · Edited
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- Assignee changed from Laura Flores to Yuri Weinstein
Hey @Yuri Weinstein, https://github.com/ceph/ceph/pull/53545 caused some regressions. Can you remove it from the batch and rebuild?
And instead of rerunning the whole rados suite, let's just rerun failed+dead from https://pulpito.ceph.com/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/.
And no need to rerun upgrade.
Updated by Yuri Weinstein 13 days ago
- QA Runs deleted (
wip-yuri6-testing-2024-04-02-1310)
Updated by Yuri Weinstein 13 days ago
@Laura Flores rebuilding (note it's very slow :()
Updated by Yuri Weinstein 12 days ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
- QA Runs set to wip-yuri6-testing-2024-04-02-1310
Updated by Laura Flores 1 day ago
I scheduled another rerun since there were too many infra failures:
https://pulpito.ceph.com/lflores-2024-04-29_19:49:34-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/