QA Run #65237
openwip-ceph_test_rados-partial-reads
Updated by Yuri Weinstein about 1 month ago
- QA Runs set to wip-ceph_test_rados-partial-reads
Updated by Yuri Weinstein about 1 month ago
laura
radek pls review, lots of errors
Updated by Yuri Weinstein about 1 month ago
- Status changed from QA Testing to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
Updated by Yuri Weinstein about 1 month ago
@laura @Radoslaw Smigielski pls see the run, too red
Updated by Radoslaw Zarzynski 30 days ago
Oops, a lot of red color, a lot of failures like:
CEPH_CLIENT_ID=0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_rados --set_chunk --enable_dedup --low_tier_pool low_tier --dedup_chunk_size 131072 --dedup_chunk_algo fastcdc --max-ops 1500 --objects 50 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op read 100 --op write 25 --op delete 10 --op snap_create 10 --op snap_remove 10 --op rollback 10 --op set_chunk 30 --op tier_promote 10 --op tier_evict 10 --op tier_flush 5 --op write_excl 25 --pool unique_pool_0 ... 2024-03-30T10:50:45.507 INFO:tasks.rados.rados.0.smithi114.stdout: redirect_not_in_use: 100 low_tier 2024-03-30T10:50:45.507 INFO:tasks.rados.rados.0.smithi114.stdout:601: read oid 15 snap -1 2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:601: expect (ObjNum 14 snap 0 seq_num 14) 2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:602: tier_flush oid 12 2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:602: tier_flush oid 12 snap -1 2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:603: write (excl) oid 6 current snap is 0 2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:603: seq_num 50 ranges {440350=460568,1309629=542666,2323973=516377} 2024-03-30T10:50:45.509 INFO:tasks.rados.rados.0.smithi114.stdout:603: writing smithi11431840-6 from 440350 to 900918 tid 1 2024-03-30T10:50:45.509 INFO:tasks.rados.rados.0.smithi114.stdout:603: writing smithi11431840-6 from 1309629 to 1852295 tid 2 2024-03-30T10:50:45.509 INFO:tasks.rados.rados.0.smithi114.stdout:603: writing smithi11431840-6 from 2323973 to 2840350 tid 3 2024-03-30T10:50:45.509 INFO:tasks.rados.rados.0.smithi114.stdout:604: set_chunk oid 32 target oid 2024-03-30T10:50:45.510 INFO:teuthology.orchestra.run.smithi114.stderr:nodeep-scrub is unset 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:604: set_chunk oid 32 offset: 290816 length: 1934848 target oid 0378f8e2b9b9b7874b4ebbd2a2ee0b60ba9d9d03714df09f4f4ed90c21bca5b1 offset: 0 snap -1 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:605: rollback oid 36 current snap is 0 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:605: done (4 left) 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:606: rollback oid 29 current snap is 0 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:606: done (4 left) 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:607: read oid 44 snap -1 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:607: expect (ObjNum 43 snap 0 seq_num 43) 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:608: delete oid 19 current snap is 0 2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:601: expect (ObjNum 14 snap 0 seq_num 14) 2024-03-30T10:50:45.512 INFO:tasks.rados.rados.0.smithi114.stdout:sparse read omitted non-zero data at 635298 2024-03-30T10:50:45.513 INFO:tasks.rados.rados.0.smithi114.stderr:601: oid 15 contents (ObjNum 14 snap 0 seq_num 14) corrupt
This one seems to come from a cache tiering test. At the moment I don't know whether this a problem with the modified testing tool itself or – which would be worse – from unveiling a bug in OSD.
Updated by Yuri Weinstein 29 days ago
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- QA Runs deleted (
wip-ceph_test_rados-partial-reads)
Updated by Yuri Weinstein 29 days ago
- Git Branch changed from https://github.com/rzarzynski/ceph/tree/wip-ceph_test_rados-partial-reads to yuriw/ceph/commits/wip-ceph_test_rados-partial-reads
Updated by Yuri Weinstein 29 days ago
- Assignee changed from Laura Flores to Yuri Weinstein
Updated by Yuri Weinstein 28 days ago
@Radoslaw Smigielski I need a PR for this
Updated by Yuri Weinstein 28 days ago
- Assignee changed from Yuri Weinstein to Radoslaw Zarzynski
Updated by Laura Flores 27 days ago
From Radek in Slack:
rzarzynski
10:52 AM
@Yuri Weinstein, @Laura Flores: I think I have a fix for `ceph_test_rados`. Would you mind a rerun? I pushed the to already existing branch, the same as in https://tracker.ceph.com/issues/65237
So Yuri you can re-push the same branch to CI and run it through the rados suite on the new SHA.
Updated by Yuri Weinstein 27 days ago
- QA Runs set to wip-ceph_test_rados-partial-reads
Updated by Radoslaw Zarzynski 27 days ago
The sparse_read
problem seems to be fixed. Still there are some mysterious deaths of ceph_test_rados
like:
2024-04-04T20:16:05.852 INFO:tasks.rados.rados.0.smithi052.stdout:update_object_version oid 440 v 117 (ObjNum 503 snap 0 seq_num 503) dirty exists 2024-04-04T20:16:05.852 INFO:tasks.rados.rados.0.smithi052.stdout:504: left oid 440 (ObjNum 503 snap 0 seq_num 503) 2024-04-04T20:16:05.852 INFO:tasks.rados.rados.0.smithi052.stdout:505: finishing write tid 1 to smithi05237511-243 2024-04-04T20:16:05.852 INFO:tasks.rados.rados.0.smithi052.stdout:514: seq_num 505 ranges {487535=480043,1584936=624833} 2024-04-04T20:16:05.855 INFO:tasks.rados.rados.0.smithi052.stdout:514: writing smithi05237511-163 from 487535 to 967578 tid 1 2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:514: writing smithi05237511-163 from 1584936 to 2209769 tid 2 2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:502: done (11 left) 2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:504: done (10 left) 2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:515: write oid 66 current snap is 0 2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:515: seq_num 506 ranges {504342=555292,1643418=484844} 2024-04-04T20:16:05.868 INFO:tasks.rados.rados.0.smithi052.stdout:515: writing smithi05237511-66 from 504342 to 1059634 tid 1 2024-04-04T20:16:05.874 INFO:tasks.rados.rados.0.smithi052.stdout:515: writing smithi05237511-66 from 1643418 to 2128262 tid 2 2024-04-04T20:16:05.874 INFO:tasks.rados.rados.0.smithi052.stdout:516: delete oid 182 current snap is 0 2024-04-04T20:16:05.875 INFO:tasks.rados.rados.0.smithi052.stdout:508: expect (ObjNum 407 snap 0 seq_num 407) 2024-04-04T20:16:06.302 DEBUG:teuthology.orchestra.run:got remote process result: None 2024-04-04T20:16:06.304 ERROR:teuthology:Uncaught exception (Hub) Traceback (most recent call last): File "src/gevent/greenlet.py", line 908, in gevent._gevent_cgreenlet.Greenlet.run File "/home/teuthworker/src/github.com_ceph_ceph-c_66deef1598934c3b1ef2f044e380a7d00244cead/qa/tasks/rados.py", line 280, in thread run.wait(tests.values()) File "/home/teuthworker/src/git.ceph.com_teuthology_d7d6e60867cadf19fbfd7655a551ae7d58060757/teuthology/orchestra/run.py", line 479, in wait proc.wait() File "/home/teuthworker/src/git.ceph.com_teuthology_d7d6e60867cadf19fbfd7655a551ae7d58060757/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_teuthology_d7d6e60867cadf19fbfd7655a551ae7d58060757/teuthology/orchestra/run.py", line 179, in _raise_for_status
Will investigate this further.
Updated by Radoslaw Zarzynski 23 days ago
Replicated locally:
[rzarzynski@o06 build]$ bin/ceph --cluster ceph osd pool create unique_pool_0 16 && bin/ceph --cluster ceph osd pool application enable unique_pool_0 rados --yes-i-really-mean-it && bin/ceph_test_rados --max-ops 4000 --objects 500 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op read 45 --op write 22 --op delete 10 --op write_excl 22 --pool unique_pool_0 ... 515: writing o061608448-289 from 1644029 to 2416189 tid 2 waiting on 16 502: expect (ObjNum 176 snap 0 seq_num 176) Segmentation fault (core dumped)
WIP!
Updated by Radoslaw Zarzynski 23 days ago
Switching to Thread 0x7fffd3fff700 (LWP 1608643)] 0x00000000004504b6 in ContentsGenerator::iterator::seek (pos=0, this=0x7fffb4002210) at /home/rzarzynski/ceph1/src/test/osd/Object.h:109 109 void seek(uint64_t pos) { impl->seek(pos); } Missing separate debuginfos, use: dnf debuginfo-install brotli-1.0.6-2.el8.x86_64 cyrus-sasl-lib-2.1.27-5.el8.x86_64 glibc-2.28-203.el8.x86_64 keyutils-libs-1.5.10-9.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libb lkid-2.32.1-43.el8.x86_64 libcap-2.48-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libcurl-7.61.1-34.el8.x86_64 libgcc-8.5.0-13.el8.x86_64 libibverbs-48.0-1.el8.x86_64 libidn2-2.2.0-1.el8.x86_64 libmount-2.32.1-4 3.el8.x86_64 libnghttp2-1.33.0-3.el8_2.1.x86_64 libnl3-3.7.0-1.el8.x86_64 libpsl-0.20.2-6.el8.x86_64 librdmacm-48.0-1.el8.x86_64 libselinux-2.9-3.el8.x86_64 libssh-0.9.6-3.el8.x86_64 libstdc++-8.5.0-13.el8.x86_6 4 libunistring-0.9.9-3.el8.x86_64 libuuid-2.32.1-43.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 lttng-ust-2.8.1-11.el8.x86_64 openldap-2.4.46-18.el8.x86_64 openssl-libs-1.1.1k-12.el8.x86_64 pcre2-10.32-2.el8.x86_64 systemd-libs-239-79.el8.x86_64 thrift-0.13.0-2.el8.x86_64 userspace-rcu-0.10.1-2.el8.x86_64 zlib-1.2.11-16.el8_2.x86_64 (gdb) bt #0 0x00000000004504b6 in ContentsGenerator::iterator::seek (pos=0, this=0x7fffb4002210) at /home/rzarzynski/ceph1/src/test/osd/Object.h:109 #1 ObjectDesc::iterator::seek (_pos=0, this=0x7fffd3ffc490) at /home/rzarzynski/ceph1/src/test/osd/Object.h:434 #2 ObjectDesc::check_sparse (this=this@entry=0x6fddc0, extents=std::map with 2 elements = {...}, to_check=..., offlen=...) at /home/rzarzynski/ceph1/src/test/osd/Object.cc:156 #3 0x000000000043e4a8 in ReadOp::_finish (this=0x6fdd50, info=<optimized out>) at /home/rzarzynski/ceph1/src/test/osd/RadosModel.h:1604 #4 0x00000000004540ae in TestOp::finish (this=<optimized out>, info=info@entry=0x0) at /home/rzarzynski/ceph1/src/test/osd/RadosModel.cc:20 #5 0x00000000004540c4 in read_callback (comp=<optimized out>, arg=<optimized out>) at /home/rzarzynski/ceph1/src/test/osd/RadosModel.cc:25 #6 0x00007ffff7eba176 in librados::CB_AioComplete::operator() (this=0x7fffd3ffc768) at /home/rzarzynski/ceph1/src/librados/AioCompletionImpl.h:140
(gdb) print impl $1 = (ContentsGenerator::iterator_impl *) 0x0
So it's the problem with seeking.
Updated by Radoslaw Zarzynski 22 days ago
Fixed the one above and scheduled a rerun: https://pulpito.ceph.com/rzarzynski-2024-04-09_12:48:11-rados-wip-ceph_test_rados-partial-reads-distro-default-smithi/.