Project

General

Profile

Actions

QA Run #65099

open

wip-yuri10-testing-2024-03-24-1159

Added by Yuri Weinstein about 1 month ago. Updated 20 days ago.

Status:
QA Needs Approval
Priority:
Normal

Description

--- done. these PRs were included:
https://github.com/ceph/ceph/pull/55196 - osd: EC Partial Stripe Reads (Retry of #23138 and #52746)

Actions #1

Updated by Yuri Weinstein about 1 month ago

  • QA Runs set to wip-yuri10-testing-2024-03-24-1159
Actions #2

Updated by Laura Flores about 1 month ago

  • Assignee set to Laura Flores
Actions #3

Updated by Laura Flores about 1 month ago

@Yuri I can review the core run.

Actions #4

Updated by Laura Flores about 1 month ago

  • Status changed from QA Testing to QA Needs Approval
Actions #5

Updated by Laura Flores about 1 month ago

  • Status changed from QA Needs Approval to QA Testing
Actions #6

Updated by Laura Flores 29 days ago

Found some related failures.

/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620593

2024-03-25T01:02:10.414 INFO:tasks.workunit.client.0.smithi080.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:107: rados_get:  local expect=fail
2024-03-25T01:02:10.414 INFO:tasks.workunit.client.0.smithi080.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:112: rados_get:  '[' fail = fail ']'
2024-03-25T01:02:10.414 INFO:tasks.workunit.client.0.smithi080.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:114: rados_get:  rados --pool pool-jerasure get obj-size-81310-1-10 td/test-erasure-eio/COPY
2024-03-25T01:02:10.615 INFO:tasks.workunit.client.0.smithi080.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:115: rados_get:  return
2024-03-25T01:02:10.615 INFO:tasks.workunit.client.0.smithi080.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:243: rados_get_data_bad_size:  return 1
2024-03-25T01:02:10.615 INFO:tasks.workunit.client.0.smithi080.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:323: TEST_rados_get_bad_size_shard_1:  return 1

/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620482
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620682
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620582

2024-03-24T22:41:18.025 INFO:tasks.ceph.osd.1.smithi012.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-2398-g9c429fb6/rpm/el9/BUILD/ceph-19.0.0-2398-g9c429fb6/src/osd/ECUtil.cc: In function 'int ECUtil::decode(const ECUtil::stripe_info_t&, ceph::ErasureCodeInterfaceRef&, std::set<int>, std::map<int, ceph::buffer::v15_2_0::list>&, ceph::bufferlist*)' thread 7fe31fcb7640 time 2024-03-24T22:41:18.025515+0000
2024-03-24T22:41:18.025 INFO:tasks.ceph.osd.1.smithi012.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-2398-g9c429fb6/rpm/el9/BUILD/ceph-19.0.0-2398-g9c429fb6/src/osd/ECUtil.cc: 31: FAILED ceph_assert(total_data_size % sinfo.get_chunk_size() == 0)
2024-03-24T22:41:18.028 INFO:tasks.rados.rados.0.smithi012.stdout:887:  finishing rollback tid 0 to smithi01241422-11
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: ceph version 19.0.0-2398-g9c429fb6 (9c429fb6abfb5deed62697d56d8997d0f0d6d83f) squid (dev)
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x11e) [0x5614923b30f6]
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: 2: ceph-osd(+0x3f62b2) [0x5614923b32b2]
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: 3: ceph-osd(+0x3b7595) [0x561492374595]
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: 4: ceph-osd(+0x6e490c) [0x5614926a190c]
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: 5: (ECCommon::ReadPipeline::complete_read_op(ECCommon::ReadOp&)+0x252) [0x5614926a39a2]
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: 6: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, ZTracer::Trace const&)+0xd89) [0x5614928b6459]
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: 7: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x9d) [0x5614928b6bcd]
2024-03-24T22:41:18.029 INFO:tasks.ceph.osd.1.smithi012.stderr: 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x56) [0x5614926bfe76]
2024-03-24T22:41:18.030 INFO:tasks.ceph.osd.1.smithi012.stderr: 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x80d) [0x56149260941d]
2024-03-24T22:41:18.030 INFO:tasks.ceph.osd.1.smithi012.stderr: 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x197) [0x561492542417]
2024-03-24T22:41:18.030 INFO:tasks.ceph.osd.1.smithi012.stderr: 11: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x69) [0x5614927902d9]
2024-03-24T22:41:18.030 INFO:tasks.ceph.osd.1.smithi012.stderr: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xd07) [0x56149255d7a7]
2024-03-24T22:41:18.030 INFO:tasks.ceph.osd.1.smithi012.stderr: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x2aa) [0x561492a63f2a]
2024-03-24T22:41:18.030 INFO:tasks.ceph.osd.1.smithi012.stderr: 14: ceph-osd(+0xaa74d4) [0x561492a644d4]
2024-03-24T22:41:18.030 INFO:tasks.ceph.osd.1.smithi012.stderr: 15: /lib64/libc.so.6(+0x9f802) [0x7fe34349f802]
2024-03-24T22:41:18.030 INFO:tasks.ceph.osd.1.smithi012.stderr: 16: /lib64/libc.so.6(+0x3f450) [0x7fe34343f450]

/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620629
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620562
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620493
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620766
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620696

2024-03-25T00:03:47.244 INFO:tasks.ceph:Generating config...
2024-03-25T00:03:47.244 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9c429fb6abfb5deed62697d56d8997d0f0d6d83f/qa/tasks/ceph.py", line 693, in cluster
    mons = get_mons(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9c429fb6abfb5deed62697d56d8997d0f0d6d83f/qa/tasks/ceph.py", line 510, in get_mons
    assert mons
AssertionError

/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620713

2024-03-25T01:13:39.669 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:46: mon_mkfs:  ceph-mon --id a --fsid 1b7721ad-53dd-46de-9e5c-35e413869819 --mkfs --mon-data=mkfs/a --mon-initial-members=a --mon-host=127.0.0.1:7110 '--key=corrupted key'
2024-03-25T01:13:40.053 INFO:tasks.workunit.client.0.smithi057.stderr:2024-03-25T01:13:40.073+0000 7fc429255b00 -1 mon.a@-1(???) e0 error decoding keyring [mon.]
2024-03-25T01:13:40.053 INFO:tasks.workunit.client.0.smithi057.stderr:  key = corrupted key
2024-03-25T01:13:40.053 INFO:tasks.workunit.client.0.smithi057.stderr:  caps mon = "allow *" 
2024-03-25T01:13:40.053 INFO:tasks.workunit.client.0.smithi057.stderr:: error setting modifier for [mon.] type=key val=corrupted key: Malformed input [buffer:3]
2024-03-25T01:13:40.053 INFO:tasks.workunit.client.0.smithi057.stderr:2024-03-25T01:13:40.073+0000 7fc429255b00 -1 ceph-mon: error creating monfs: (22) Invalid argument
2024-03-25T01:13:40.054 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:132: auth_cephx_key:  rm -fr mkfs/a/store.db
2024-03-25T01:13:40.056 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:133: auth_cephx_key:  rm -fr mkfs/a/kv_backend
2024-03-25T01:13:40.056 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:136: auth_cephx_key:  mon_mkfs --key=AQDDzwBme7QCKRAAnQOaDLkESezQnTnQYSXS8g==
2024-03-25T01:13:40.057 INFO:tasks.workunit.client.0.smithi057.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:43: mon_mkfs:  uuidgen
2024-03-25T01:13:40.058 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:43: mon_mkfs:  local fsid=f1cd752b-38fe-411f-8959-298738145745
2024-03-25T01:13:40.058 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:46: mon_mkfs:  ceph-mon --id a --fsid f1cd752b-38fe-411f-8959-298738145745 --mkfs --mon-data=mkfs/a --mon-initial-members=a --mon-host=127.0.0.1:7110 --key=AQDDzwBme7QCKRAAnQOaDLkESezQnTnQYSXS8g==
2024-03-25T01:13:40.080 INFO:tasks.workunit.client.0.smithi057.stderr:2024-03-25T01:13:40.100+0000 7ff2ea62ab00 -1 'mkfs/a' already exists and is not empty: monitor may already exist
2024-03-25T01:13:40.082 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-03-25T01:13:40.082 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:138: auth_cephx_key:  '[' -f mkfs/a/keyring ']'
2024-03-25T01:13:40.082 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:138: auth_cephx_key:  return 1
2024-03-25T01:13:40.082 INFO:tasks.workunit.client.0.smithi057.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mkfs.sh:184: run:  return 1
2024-03-25T01:13:40.083 INFO:tasks.workunit:Stopping ['mon'] on client.0...
2024-03-25T01:13:40.083 DEBUG:teuthology.orchestra.run.smithi057:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2024-03-25T01:13:40.415 ERROR:teuthology.run_tasks:Saw exception from tasks.

/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620568

2024-03-25T00:39:26.381 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mon.a is failed for ~10s
2024-03-25T00:39:28.447 ERROR:teuthology.run_tasks:Manager failed: ceph
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/run_tasks.py", line 154, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9c429fb6abfb5deed62697d56d8997d0f0d6d83f/qa/tasks/ceph.py", line 1935, in task
    mon0_remote.run(
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/contextutil.py", line 54, in nested
    raise exc[1]
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9c429fb6abfb5deed62697d56d8997d0f0d6d83f/qa/tasks/ceph.py", line 252, in ceph_log
    yield
  File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/contextutil.py", line 46, in nested
    if exit(*exc):
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9c429fb6abfb5deed62697d56d8997d0f0d6d83f/qa/tasks/ceph.py", line 1449, in run_daemon
    teuthology.stop_daemons_of_type(ctx, type_, cluster_name)
  File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/misc.py", line 1173, in stop_daemons_of_type
    daemon.stop()
  File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/daemon/state.py", line 139, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/run.py", line 473, in wait
    check_time()
  File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (51) after waiting for 300 seconds

Actions #7

Updated by Yuri Weinstein 28 days ago

  • Status changed from QA Testing to QA Needs Approval
Actions #8

Updated by Yuri Weinstein 27 days ago

  • Git Branch set to wip-yuri10-testing-2024-03-24-1159
Actions #9

Updated by Yuri Weinstein 27 days ago

  • Git Branch changed from wip-yuri10-testing-2024-03-24-1159 to yuriw/ceph/commits/wip-yuri10-testing-2024-03-24-1159
Actions #10

Updated by Laura Flores 27 days ago

  • Assignee changed from Laura Flores to Radoslaw Zarzynski
Actions #11

Updated by Radoslaw Zarzynski 26 days ago

Yes, this is ready for a retest.

BTW: https://tracker.ceph.com/issues/65237 will be a reference point for reviewing.

Actions #12

Updated by Yuri Weinstein 26 days ago

Radoslaw Zarzynski wrote:

Yes, this is ready for a retest.

BTW: https://tracker.ceph.com/issues/65237 will be a reference point for reviewing.

I think the latest run includes the latest commits

Did you review it yet?

Actions #13

Updated by Radoslaw Zarzynski 26 days ago

I reviewed the reference point (https://tracker.ceph.com/issues/65237#note-20). As it's broken, there is no business in reviewing this.

Actions #14

Updated by Yuri Weinstein 26 days ago

Do we want this to stay open?
@Radoslaw Smigielski

Actions #15

Updated by Laura Flores 22 days ago

Yuri Weinstein wrote in #note-14:

Do we want this to stay open?
@Radoslaw Smigielski

Meant for @Radoslaw Zarzynski

Actions #16

Updated by Yuri Weinstein 21 days ago

Seems like it needs rebase as I see a recent commit by @Radoslaw Zarzynski

rebasing

Actions #17

Updated by Yuri Weinstein 21 days ago

  • Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
  • Assignee changed from Radoslaw Zarzynski to Yuri Weinstein
Actions #19

Updated by Yuri Weinstein 20 days ago

  • Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
  • Assignee changed from Yuri Weinstein to Radoslaw Zarzynski
  • Tags set to core
Actions #20

Updated by Radoslaw Zarzynski 20 days ago

In the new run (https://pulpito.ceph.com/yuriw-2024-04-10_14:20:47-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/) everything has failed because of a lab's issue:

HTTPSConnectionPool(host='shaman.ceph.com', port=443): Max retries exceeded with url: /api/search?status=ready&project=ceph&flavor=default&distros=centos%2F9%2Fx86_64&ref=wip-yuri10-testing-2024-03-24-1159 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd335862970>: Failed to establish a new connection: [Errno 110] Connection timed out')) 

I'm going to rerun the failures.

Actions

Also available in: Atom PDF