QA Run #65270
closedwip-yuri6-testing-2024-04-02-1310
Description
--- done. these PRs were included:
https://github.com/ceph/ceph/pull/55985 - make-dist: remove old cruft recursively
https://github.com/ceph/ceph/pull/56574 - test/lazy-omap-stats: Convert to boost::regex
https://github.com/ceph/ceph/pull/56640 - common/pick_address: check if address in subnet all public address
rados + upgrades
Updated by Yuri Weinstein about 2 months ago
- Status changed from QA Testing to QA Needs Approval
- QA Runs set to wip-yuri6-testing-2024-04-02-1310
Updated by Yuri Weinstein about 2 months ago
could not schedule, retriggered ceontos8
Updated by Yuri Weinstein about 2 months ago
- QA Runs deleted (
wip-yuri6-testing-2024-04-02-1310)
Updated by Yuri Weinstein about 1 month ago
- QA Runs set to wip-yuri6-testing-2024-04-02-1310
Updated by Yuri Weinstein about 1 month ago
- Assignee changed from Yuri Weinstein to Laura Flores
@laura pls review
Updated by Laura Flores about 1 month ago
I'll review this one; it can stay assigned to me.
Updated by Laura Flores about 1 month ago
Hey @Yuri Weinstein can you also schedule an upgrade suite?
Updated by Laura Flores about 1 month ago
@Yuri Weinstein actually don't yet- I think I see some issues.
Updated by Yuri Weinstein about 1 month ago
Laura Flores wrote in #note-9:
Hey @Yuri Weinstein can you also schedule an upgrade suite?
@Laura Flores done, sorry missed that
Updated by Laura Flores about 1 month ago
No worries- there do appear to be some issues in the rados suite though, so we may have to rebase. Stay tuned for updates..
Updated by Laura Flores about 1 month ago
Found this in the cluster log:
/a/yuriw-2024-04-04_19:42:51-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7640475/remote/smithi012/log/ceph.log.gz
2024-04-04T22:44:15.705587+0000 mon.a (mon.0) 1321 : cluster [WRN] Health check update: Degraded data redundancy: 20/276 objects degraded (7.246%), 4 pgs degraded (PG_DEGRADED)
2024-04-04T22:44:15.713899+0000 mon.a (mon.0) 1322 : cluster [DBG] osdmap e208: 16 total, 15 up, 16 in
2024-04-04T22:44:16.374789+0000 mon.a (mon.0) 1323 : cluster [DBG] osd.15 reported immediately failed by osd.12
2024-04-04T22:44:16.374830+0000 mon.a (mon.0) 1324 : cluster [INF] osd.15 failed (root=default,host=smithi110) (connection refused reported by osd.12)
2024-04-04T22:44:16.375040+0000 mon.a (mon.0) 1325 : cluster [DBG] osd.15 reported immediately failed by osd.0
2024-04-04T22:44:16.375126+0000 mon.a (mon.0) 1326 : cluster [DBG] osd.15 reported immediately failed by osd.8
2024-04-04T22:44:16.375318+0000 mon.a (mon.0) 1327 : cluster [DBG] osd.15 reported immediately failed by osd.12
2024-04-04T22:44:16.375396+0000 mon.a (mon.0) 1328 : cluster [DBG] osd.15 reported immediately failed by osd.0
2024-04-04T22:44:16.375566+0000 mon.a (mon.0) 1329 : cluster [DBG] osd.15 reported immediately failed by osd.8
2024-04-04T22:44:16.377488+0000 mon.a (mon.0) 1330 : cluster [DBG] osd.15 reported immediately failed by osd.10
2024-04-04T22:44:16.377566+0000 mon.a (mon.0) 1331 : cluster [DBG] osd.15 reported immediately failed by osd.14
2024-04-04T22:44:16.377741+0000 mon.a (mon.0) 1332 : cluster [DBG] osd.15 reported immediately failed by osd.10
2024-04-04T22:44:16.377880+0000 mon.a (mon.0) 1333 : cluster [DBG] osd.15 reported immediately failed by osd.3
2024-04-04T22:44:16.378054+0000 mon.a (mon.0) 1334 : cluster [DBG] osd.15 reported immediately failed by osd.14
2024-04-04T22:44:16.378241+0000 mon.a (mon.0) 1335 : cluster [DBG] osd.15 reported immediately failed by osd.11
2024-04-04T22:44:16.378429+0000 mon.a (mon.0) 1336 : cluster [DBG] osd.15 reported immediately failed by osd.7
2024-04-04T22:44:16.378547+0000 mon.a (mon.0) 1337 : cluster [DBG] osd.15 reported immediately failed by osd.3
2024-04-04T22:44:16.378773+0000 mon.a (mon.0) 1338 : cluster [DBG] osd.15 reported immediately failed by osd.7
2024-04-04T22:44:16.378943+0000 mon.a (mon.0) 1339 : cluster [DBG] osd.15 reported immediately failed by osd.6
2024-04-04T22:44:16.379063+0000 mon.a (mon.0) 1340 : cluster [DBG] osd.15 reported immediately failed by osd.2
2024-04-04T22:44:16.379282+0000 mon.a (mon.0) 1341 : cluster [DBG] osd.15 reported immediately failed by osd.11
2024-04-04T22:44:16.379407+0000 mon.a (mon.0) 1342 : cluster [DBG] osd.15 reported immediately failed by osd.2
2024-04-04T22:44:16.379623+0000 mon.a (mon.0) 1343 : cluster [DBG] osd.15 reported immediately failed by osd.13
2024-04-04T22:44:16.379795+0000 mon.a (mon.0) 1344 : cluster [DBG] osd.15 reported immediately failed by osd.6
2024-04-04T22:44:16.379983+0000 mon.a (mon.0) 1345 : cluster [DBG] osd.15 reported immediately failed by osd.9
2024-04-04T22:44:16.380104+0000 mon.a (mon.0) 1346 : cluster [DBG] osd.15 reported immediately failed by osd.13
2024-04-04T22:44:16.380334+0000 mon.a (mon.0) 1347 : cluster [DBG] osd.15 reported immediately failed by osd.9
2024-04-04T22:44:16.709029+0000 mon.a (mon.0) 1349 : cluster [WRN] Health check failed: noscrub flag(s) set (OSDMAP_FLAGS)
2024-04-04T22:44:16.709110+0000 mon.a (mon.0) 1350 : cluster [WRN] Health check update: 2 osds down (OSD_DOWN)
Updated by Laura Flores about 1 month ago
@Yuri Weinstein can you drop https://github.com/ceph/ceph/pull/56640 and rebase?
You can kill the upgrade runs, since this batch has issues.
Updated by Yuri Weinstein about 1 month ago
Laura Flores wrote in #note-14:
@Yuri Weinstein can you drop https://github.com/ceph/ceph/pull/56640 and rebase?
You can kill the upgrade runs, since this batch has issues.
killed
Updated by Laura Flores about 1 month ago
Hey @Yuri Weinstein just checking that this run is getting / has gotten rebased and rerun?
Updated by Yuri Weinstein about 1 month ago · Edited
Laura Flores wrote in #note-16:
Hey @Yuri Weinstein just checking that this run is getting / has gotten rebased and rerun?
No
I saw no requests :(
I think the status and assignee has to change when the ticker needs to be redone
And we need to limit comments only to the tracker.
Agree?
@Laura Flores what do you need for this ticket?
PS: I am rebasing in anticipation you'd say yes :)
Updated by Yuri Weinstein about 1 month ago
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- Assignee changed from Laura Flores to Yuri Weinstein
Updated by Yuri Weinstein about 1 month ago
- QA Runs deleted (
wip-yuri6-testing-2024-04-02-1310)
Updated by Yuri Weinstein about 1 month ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
- QA Runs set to wip-yuri6-testing-2024-04-02-1310
Updated by Laura Flores about 1 month ago
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- Assignee changed from Laura Flores to Yuri Weinstein
@Yuri Weinstein this will need to be rerun. I see a lot of failures from "Failed to establish a new connection" that I suspect may be related to known connection issues in the lab. See https://ceph-storage.slack.com/archives/C1HFJ4VTN/p1712774983720229
Once this is resolved, I need the results rerun.
Updated by Yuri Weinstein about 1 month ago
Laura Flores wrote in #note-23:
Once this is resolved, I need the results rerun.
Attempting a rerun again
Updated by Yuri Weinstein about 1 month ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
Updated by Laura Flores about 1 month ago · Edited
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- Assignee changed from Laura Flores to Yuri Weinstein
Hey @Yuri Weinstein, https://github.com/ceph/ceph/pull/53545 caused some regressions. Can you remove it from the batch and rebuild?
And instead of rerunning the whole rados suite, let's just rerun failed+dead from https://pulpito.ceph.com/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/.
And no need to rerun upgrade.
Updated by Yuri Weinstein about 1 month ago
- QA Runs deleted (
wip-yuri6-testing-2024-04-02-1310)
Updated by Yuri Weinstein about 1 month ago
@Laura Flores rebuilding (note it's very slow :()
Updated by Yuri Weinstein 30 days ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
- QA Runs set to wip-yuri6-testing-2024-04-02-1310
Updated by Laura Flores 20 days ago
I scheduled another rerun since there were too many infra failures:
https://pulpito.ceph.com/lflores-2024-04-29_19:49:34-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/
Updated by Laura Flores 4 days ago
- Status changed from QA Needs Approval to QA Approved
- Assignee changed from Laura Flores to Yuri Weinstein
Updated by Yuri Weinstein 3 days ago
- Status changed from QA Approved to QA Closed