Project

General

Profile

Actions

QA ANALYSIS

Analysis Archives

MAIN

SQUID

REEF

QUINCY

PACIFIC

Scheduling rados runs

Rados suites are typically scheduled with these parameters:

./teuthology/virtualenv/bin/teuthology-suite -v -m smithi -c <branch name> -S <SHA1> -s rados --subset 111/120000 -p <priority>

This will generate around 300 jobs from the rados suite. Choose a priority accordingly (100 is usually good for 300+ job runs).

PR Testing Workflow

See this section if you have been assigned a card to review from https://tracker.ceph.com/projects/ceph-qa/issues.

Prerequisites:

  1. Sepia VPN access: https://wiki.sepia.ceph.com/doku.php?id=vpnaccess
  2. Access to teuthology.front.sepia.ceph.com (verify that you can ssh here via your terminal)

Workflow:

  1. To review the rados runs, click on the "https://pulpito.ceph.com" link posted on the card. For example, https://pulpito.ceph.com/yuriw-2023-10-06_20:29:18-rados-wip-yuri6-testing-2023-10-06-0904-quincy-distro-default-smithi/ in https://trello.com/c/25ShNauq/1855-wip-yuri6-testing-2023-10-06-0904-quincy.
  2. Copy the name of the run. In the run above, it would be "yuriw-2023-10-06_20:29:18-rados-wip-yuri6-testing-2023-10-06-0904-quincy-distro-default-smithi".
  3. In your terminal, "ssh teuthology.front.sepia.ceph.com".
  4. Navigate to the run with "cd /a/<run name" (in the above example, "cd /a/yuriw-2023-10-06_20:29:18-rados-wip-yuri6-testing-2023-10-06-0904-quincy-distro-default-smithi").
  5. Check the "scrape.log" file, which summarizes failures, with "vim scrape.log".
  6. Copy a job id from the "scrape.log" file to investigate (i.e. "7415596" from the example above).
  7. Navigate to the job with "cd 7415596".
  8. Check the teuthology log with "less|vim teuthology.log" (less works better with very large teuthology logs).
  9. Analyze the failure (See https://pad.ceph.com/p/analyzing_teuthology_failures for further tips).
  10. Start recording your failure summary in a text pad, or whatever you like (I like to use https://pad.ceph.com/) (see the archives above for examples).
  11. Repeat until all failures are analyzed and recorded in your summary. There may be multiple rados links to analyze (click "See other runs on branch <branch name>" on the run link you analyzed to see if there are more rados links; for example, https://pulpito.ceph.com/?branch=wip-yuri6-testing-2023-10-06-0904-quincy shows multiple links).
  12. If all failures are unrelated, paste your summary into the appropriate archive above (if you analyzed a quincy test batch, paste it into "QUINCY" etc..). On each of the tested PRs, you can leave a note with a link to your summary that the PR has been rados-approved (i.e. https://github.com/ceph/ceph/pull/53860#issuecomment-1753963909).
  13. If new failures are noticed, notify the author on the problematic PR (you can link the failed job, a snippet of the failure, and/or the Trello card, and ask the author to investigate; see an example here: https://github.com/ceph/ceph/pull/54877#issuecomment-1877875099). At this point, you can tag Yuri via the Trello card to remove the problematic PR from the test batch and rebuild/rerun the tests.
  14. If you are unsure about any of the failures, don't hesitate to tag other rados members on the card to help investigate, or ask Yuri to rerun failed jobs!

Helpful Links

See https://pad.ceph.com/p/analyzing_teuthology_failures for tips on analyzing failures.
See https://tracker.ceph.com/projects/ceph-qa/issues for QA runs in progress.

Updated by Laura Flores 13 days ago · 27 revisions