Project

General

Profile

Actions

QA Run #65237

open
Actions #1

Updated by Yuri Weinstein 2 months ago

  • Assignee set to Yuri Weinstein
Actions #2

Updated by Yuri Weinstein 2 months ago

  • QA Runs set to wip-ceph_test_rados-partial-reads
Actions #3

Updated by Yuri Weinstein about 2 months ago

laura radek pls review, lots of errors

Actions #4

Updated by Yuri Weinstein about 2 months ago

  • Status changed from QA Testing to QA Needs Approval
  • Assignee changed from Yuri Weinstein to Laura Flores
Actions #5

Updated by Yuri Weinstein about 2 months ago

@laura @Radoslaw Smigielski pls see the run, too red

Actions #6

Updated by Radoslaw Zarzynski about 2 months ago

Oops, a lot of red color, a lot of failures like:

CEPH_CLIENT_ID=0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_rados --set_chunk --enable_dedup --low_tier_pool low_tier --dedup_chunk_size 131072 --dedup_chunk_algo fastcdc --max-ops 1500 --objects 50 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op read 100 --op write 25 --op delete 10 --op snap_create 10 --op snap_remove 10 --op rollback 10 --op set_chunk 30 --op tier_promote 10 --op tier_evict 10 --op tier_flush 5 --op write_excl 25 --pool unique_pool_0
...
2024-03-30T10:50:45.507 INFO:tasks.rados.rados.0.smithi114.stdout: redirect_not_in_use: 100 low_tier
2024-03-30T10:50:45.507 INFO:tasks.rados.rados.0.smithi114.stdout:601: read oid 15 snap -1
2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:601:  expect (ObjNum 14 snap 0 seq_num 14)
2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:602: tier_flush oid 12
2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:602: tier_flush oid 12 snap -1
2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:603: write (excl) oid 6 current snap is 0
2024-03-30T10:50:45.508 INFO:tasks.rados.rados.0.smithi114.stdout:603:  seq_num 50 ranges {440350=460568,1309629=542666,2323973=516377}
2024-03-30T10:50:45.509 INFO:tasks.rados.rados.0.smithi114.stdout:603:  writing smithi11431840-6 from 440350 to 900918 tid 1
2024-03-30T10:50:45.509 INFO:tasks.rados.rados.0.smithi114.stdout:603:  writing smithi11431840-6 from 1309629 to 1852295 tid 2
2024-03-30T10:50:45.509 INFO:tasks.rados.rados.0.smithi114.stdout:603:  writing smithi11431840-6 from 2323973 to 2840350 tid 3
2024-03-30T10:50:45.509 INFO:tasks.rados.rados.0.smithi114.stdout:604: set_chunk oid 32 target oid
2024-03-30T10:50:45.510 INFO:teuthology.orchestra.run.smithi114.stderr:nodeep-scrub is unset
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:604: set_chunk oid 32 offset: 290816 length: 1934848 target oid 0378f8e2b9b9b7874b4ebbd2a2ee0b60ba9d9d03714df09f4f4ed90c21bca5b1 offset: 0 snap -1
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:605: rollback oid 36 current snap is 0
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:605: done (4 left)
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:606: rollback oid 29 current snap is 0
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:606: done (4 left)
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:607: read oid 44 snap -1
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:607:  expect (ObjNum 43 snap 0 seq_num 43)
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:608: delete oid 19 current snap is 0
2024-03-30T10:50:45.511 INFO:tasks.rados.rados.0.smithi114.stdout:601:  expect (ObjNum 14 snap 0 seq_num 14)
2024-03-30T10:50:45.512 INFO:tasks.rados.rados.0.smithi114.stdout:sparse read omitted non-zero data at 635298
2024-03-30T10:50:45.513 INFO:tasks.rados.rados.0.smithi114.stderr:601: oid 15 contents (ObjNum 14 snap 0 seq_num 14) corrupt

This one seems to come from a cache tiering test. At the moment I don't know whether this a problem with the modified testing tool itself or – which would be worse – from unveiling a bug in OSD.

Actions #7

Updated by Yuri Weinstein about 2 months ago

  • Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
  • QA Runs deleted (wip-ceph_test_rados-partial-reads)
Actions #8

Updated by Yuri Weinstein about 2 months ago

  • Git Branch changed from https://github.com/rzarzynski/ceph/tree/wip-ceph_test_rados-partial-reads to yuriw/ceph/commits/wip-ceph_test_rados-partial-reads
Actions #9

Updated by Yuri Weinstein about 2 months ago

rebuilding

Actions #10

Updated by Yuri Weinstein about 2 months ago

  • Assignee changed from Laura Flores to Yuri Weinstein
Actions #11

Updated by Yuri Weinstein about 2 months ago

  • Description updated (diff)
Actions #12

Updated by Yuri Weinstein about 2 months ago

could not schedule, rebased

Actions #13

Updated by Yuri Weinstein about 2 months ago

@Radoslaw Smigielski I need a PR for this

Actions #14

Updated by Yuri Weinstein about 2 months ago

  • Assignee changed from Yuri Weinstein to Radoslaw Zarzynski
Actions #15

Updated by Laura Flores about 2 months ago

From Radek in Slack:

rzarzynski
  10:52 AM
@Yuri Weinstein, @Laura Flores: I think I have a fix for `ceph_test_rados`. Would you mind a rerun? I pushed the to already existing branch, the same as in https://tracker.ceph.com/issues/65237

So Yuri you can re-push the same branch to CI and run it through the rados suite on the new SHA.

Actions #16

Updated by Yuri Weinstein about 2 months ago

tried to push the branch again

Actions #18

Updated by Yuri Weinstein about 2 months ago

  • QA Runs set to wip-ceph_test_rados-partial-reads
Actions #19

Updated by Yuri Weinstein about 2 months ago

@Radoslaw Smigielski see the rerun

Actions #20

Updated by Radoslaw Zarzynski about 2 months ago

The sparse_read problem seems to be fixed. Still there are some mysterious deaths of ceph_test_rados like:

2024-04-04T20:16:05.852 INFO:tasks.rados.rados.0.smithi052.stdout:update_object_version oid 440 v 117 (ObjNum 503 snap 0 seq_num 503) dirty exists
2024-04-04T20:16:05.852 INFO:tasks.rados.rados.0.smithi052.stdout:504:  left oid 440 (ObjNum 503 snap 0 seq_num 503)
2024-04-04T20:16:05.852 INFO:tasks.rados.rados.0.smithi052.stdout:505:  finishing write tid 1 to smithi05237511-243
2024-04-04T20:16:05.852 INFO:tasks.rados.rados.0.smithi052.stdout:514:  seq_num 505 ranges {487535=480043,1584936=624833}
2024-04-04T20:16:05.855 INFO:tasks.rados.rados.0.smithi052.stdout:514:  writing smithi05237511-163 from 487535 to 967578 tid 1
2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:514:  writing smithi05237511-163 from 1584936 to 2209769 tid 2
2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:502: done (11 left)
2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:504: done (10 left)
2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:515: write oid 66 current snap is 0
2024-04-04T20:16:05.862 INFO:tasks.rados.rados.0.smithi052.stdout:515:  seq_num 506 ranges {504342=555292,1643418=484844}
2024-04-04T20:16:05.868 INFO:tasks.rados.rados.0.smithi052.stdout:515:  writing smithi05237511-66 from 504342 to 1059634 tid 1
2024-04-04T20:16:05.874 INFO:tasks.rados.rados.0.smithi052.stdout:515:  writing smithi05237511-66 from 1643418 to 2128262 tid 2
2024-04-04T20:16:05.874 INFO:tasks.rados.rados.0.smithi052.stdout:516: delete oid 182 current snap is 0
2024-04-04T20:16:05.875 INFO:tasks.rados.rados.0.smithi052.stdout:508:  expect (ObjNum 407 snap 0 seq_num 407)
2024-04-04T20:16:06.302 DEBUG:teuthology.orchestra.run:got remote process result: None
2024-04-04T20:16:06.304 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "src/gevent/greenlet.py", line 908, in gevent._gevent_cgreenlet.Greenlet.run
  File "/home/teuthworker/src/github.com_ceph_ceph-c_66deef1598934c3b1ef2f044e380a7d00244cead/qa/tasks/rados.py", line 280, in thread
    run.wait(tests.values())
  File "/home/teuthworker/src/git.ceph.com_teuthology_d7d6e60867cadf19fbfd7655a551ae7d58060757/teuthology/orchestra/run.py", line 479, in wait
    proc.wait()
  File "/home/teuthworker/src/git.ceph.com_teuthology_d7d6e60867cadf19fbfd7655a551ae7d58060757/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_teuthology_d7d6e60867cadf19fbfd7655a551ae7d58060757/teuthology/orchestra/run.py", line 179, in _raise_for_status

http://qa-proxy.ceph.com/teuthology/yuriw-2024-04-04_14:50:58-rados-wip-ceph_test_rados-partial-reads-distro-default-smithi/7640060/teuthology.log

Will investigate this further.

Actions #21

Updated by Radoslaw Zarzynski about 2 months ago

Replicated locally:

[rzarzynski@o06 build]$ bin/ceph --cluster ceph osd pool create unique_pool_0 16 && bin/ceph --cluster ceph osd pool application enable unique_pool_0 rados --yes-i-really-mean-it && bin/ceph_test_rados --max-ops 4000 --objects 500 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op read 45 --op write 22 --op delete 10 --op write_excl 22 --pool unique_pool_0
...
515:  writing o061608448-289 from 1644029 to 2416189 tid 2
 waiting on 16
502:  expect (ObjNum 176 snap 0 seq_num 176)
Segmentation fault (core dumped)

WIP!

Actions #22

Updated by Radoslaw Zarzynski about 2 months ago

Switching to Thread 0x7fffd3fff700 (LWP 1608643)]                                                                                                                                                                
0x00000000004504b6 in ContentsGenerator::iterator::seek (pos=0, this=0x7fffb4002210) at /home/rzarzynski/ceph1/src/test/osd/Object.h:109                                                                          
109         void seek(uint64_t pos) { impl->seek(pos); }
Missing separate debuginfos, use: dnf debuginfo-install brotli-1.0.6-2.el8.x86_64 cyrus-sasl-lib-2.1.27-5.el8.x86_64 glibc-2.28-203.el8.x86_64 keyutils-libs-1.5.10-9.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libb
lkid-2.32.1-43.el8.x86_64 libcap-2.48-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libcurl-7.61.1-34.el8.x86_64 libgcc-8.5.0-13.el8.x86_64 libibverbs-48.0-1.el8.x86_64 libidn2-2.2.0-1.el8.x86_64 libmount-2.32.1-4
3.el8.x86_64 libnghttp2-1.33.0-3.el8_2.1.x86_64 libnl3-3.7.0-1.el8.x86_64 libpsl-0.20.2-6.el8.x86_64 librdmacm-48.0-1.el8.x86_64 libselinux-2.9-3.el8.x86_64 libssh-0.9.6-3.el8.x86_64 libstdc++-8.5.0-13.el8.x86_6
4 libunistring-0.9.9-3.el8.x86_64 libuuid-2.32.1-43.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 lttng-ust-2.8.1-11.el8.x86_64 openldap-2.4.46-18.el8.x86_64 openssl-libs-1.1.1k-12.el8.x86_64 pcre2-10.32-2.el8.x86_64
systemd-libs-239-79.el8.x86_64 thrift-0.13.0-2.el8.x86_64 userspace-rcu-0.10.1-2.el8.x86_64 zlib-1.2.11-16.el8_2.x86_64                                                                                           
(gdb) bt
#0  0x00000000004504b6 in ContentsGenerator::iterator::seek (pos=0, this=0x7fffb4002210) at /home/rzarzynski/ceph1/src/test/osd/Object.h:109                                                                      
#1  ObjectDesc::iterator::seek (_pos=0, this=0x7fffd3ffc490) at /home/rzarzynski/ceph1/src/test/osd/Object.h:434                                                                                                  
#2  ObjectDesc::check_sparse (this=this@entry=0x6fddc0, extents=std::map with 2 elements = {...}, to_check=..., offlen=...) at /home/rzarzynski/ceph1/src/test/osd/Object.cc:156
#3  0x000000000043e4a8 in ReadOp::_finish (this=0x6fdd50, info=<optimized out>) at /home/rzarzynski/ceph1/src/test/osd/RadosModel.h:1604
#4  0x00000000004540ae in TestOp::finish (this=<optimized out>, info=info@entry=0x0) at /home/rzarzynski/ceph1/src/test/osd/RadosModel.cc:20
#5  0x00000000004540c4 in read_callback (comp=<optimized out>, arg=<optimized out>) at /home/rzarzynski/ceph1/src/test/osd/RadosModel.cc:25
#6  0x00007ffff7eba176 in librados::CB_AioComplete::operator() (this=0x7fffd3ffc768) at /home/rzarzynski/ceph1/src/librados/AioCompletionImpl.h:140
(gdb) print impl
$1 = (ContentsGenerator::iterator_impl *) 0x0

So it's the problem with seeking.

Actions

Also available in: Atom PDF