Project

General

Profile

Actions

Bug #5874

closed

rgw: cuttlefish cls_rgw tests fails against next

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-08-03T15:11:11.863 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [==========] Running 8 tests from 1 test case.
2013-08-03T15:11:11.863 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [----------] Global test environment set-up.
2013-08-03T15:11:11.863 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [----------] 8 tests from cls_rgw
2013-08-03T15:11:11.863 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [ RUN      ] cls_rgw.init
2013-08-03T15:11:13.066 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [       OK ] cls_rgw.init (1203 ms)
2013-08-03T15:11:13.066 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [ RUN      ] cls_rgw.index_basic
2013-08-03T15:11:13.070 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: test/cls_rgw/test_cls_rgw.cc:99: Failure
2013-08-03T15:11:13.070 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: Value of: ioctx.operate(bucket_oid, op)
2013-08-03T15:11:13.070 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]:   Actual: -5
2013-08-03T15:11:13.070 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: Expected: 0
2013-08-03T15:11:13.070 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [  FAILED  ] cls_rgw.index_basic (4 ms)
2013-08-03T15:11:13.071 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [ RUN      ] cls_rgw.index_multiple_obj_writers
2013-08-03T15:11:13.223 INFO:teuthology.task.ceph.osd.3.err:[10.214.131.30]: daemon-helper: command crashed with signal 11
2013-08-03T15:11:36.930 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [       OK ] cls_rgw.index_multiple_obj_writers (23861 ms)
2013-08-03T15:11:36.930 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [ RUN      ] cls_rgw.index_remove_object
2013-08-03T15:11:37.181 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [       OK ] cls_rgw.index_remove_object (251 ms)
2013-08-03T15:11:37.181 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [ RUN      ] cls_rgw.index_suggest
2013-08-03T15:11:37.185 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: test/cls_rgw/test_cls_rgw.cc:262: Failure
2013-08-03T15:11:37.185 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: Value of: ioctx.operate(bucket_oid, op)
2013-08-03T15:11:37.185 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]:   Actual: -5
2013-08-03T15:11:37.185 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: Expected: 0
2013-08-03T15:11:37.186 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [  FAILED  ] cls_rgw.index_suggest (4 ms)
2013-08-03T15:11:37.186 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [ RUN      ] cls_rgw.gc_set
2013-08-03T15:11:37.774 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [       OK ] cls_rgw.gc_set (589 ms)
2013-08-03T15:11:37.774 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [ RUN      ] cls_rgw.gc_defer
2013-08-03T15:11:46.873 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [       OK ] cls_rgw.gc_defer (9099 ms)
2013-08-03T15:11:46.873 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [ RUN      ] cls_rgw.finalize
2013-08-03T15:11:48.137 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [       OK ] cls_rgw.finalize (1264 ms)
2013-08-03T15:11:48.137 INFO:teuthology.task.workunit.client.0.out:[10.214.131.28]: [----------] 8 tests from cls_rgw (36275 ms total)

test is
ubuntu@teuthology:/a/teuthology-2013-08-02_01:30:04-upgrade-next-testing-basic-plana/93844$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 05542c395ce50bb1750cc6fead85727903fc3e72
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
    log-whitelist:
    - slow request
    sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
  s3tests:
    branch: next
  workunit:
    sha1: ef036bd4bc0e79bff8a5805800fbdeb0cc2db6ae
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
tasks:
- chef: null
- clock.check: null
- install:
    branch: cuttlefish
- ceph: null
- workunit:
    branch: cuttlefish
    clients:
      client.0:
      - rados/load-gen-mix.sh
- install.upgrade:
    osd.0:
      branch: next
    osd.2:
      branch: next
- ceph.restart:
  - osd.0
  - osd.2
- workunit:
    branch: cuttlefish
    clients:
      client.0:
      - rados/test.sh
      - cls
teuthology_branch: next

note that the cls test is run from client.0, which was not upgraded, but 2/4 osds are upgraded and restarted.
Actions #1

Updated by Sage Weil over 10 years ago

  • Project changed from Ceph to rgw
Actions #2

Updated by Ian Colle over 10 years ago

  • Assignee set to Yehuda Sadeh
Actions #3

Updated by Yehuda Sadeh over 10 years ago

It looks like some osd crashed here:

2013-08-03T15:11:13.223 INFO:teuthology.task.ceph.osd.3.err:[10.214.131.30]: daemon-helper: command crashed with signal 11

Actions #4

Updated by Yehuda Sadeh over 10 years ago

We get this, which looks like #5752:

 128.41c26202 e397) v4 ==== 147+0+20 (3830404673 0 3874395120) 0x15726c0 con 0x1f17580
2013-08-06 15:46:22.967531 7f169c587700  0 _load_class could not open class /usr/lib/rados-classes/libcls_rgw.so (dlopen failed): /usr/lib/rados-classes/libcls_rgw.so: undefined symbol: _Z21cls_current_subop_numPv
2013-08-06 15:46:22.967559 7f169c587700  1 -- 10.214.131.4:6803/4043 --> 10.214.131.38:0/1006175 -- osd_op_reply(2 bucket-0 [call rgw.bucket_init_index] ack = -5 (Input/output error)) v4 -- ?+0 0x1a72600 con 0x1f17580

and then the osd crashes. Not sure why we see it, shouldn't the osd have been installed with the new version by now? The crash itself is around here:

#4  <signal handler called>
#5  0x000000000009f406 in ?? ()
#6  0x00007f1691268755 in ?? ()
#7  0x0000000002f8aa00 in ?? ()
#8  0x0000000001536eb0 in ?? ()
#9  0x0000000001536e28 in ?? ()
#10 0x000000000068bd64 in ClassHandler::_load_class(ClassHandler::ClassData*) ()
#11 0x000000000068c3a6 in ClassHandler::open_class(std::string const&, ClassHandler::ClassData**) ()
#12 0x000000000060869b in OSD::init_op_flags(std::tr1::shared_ptr<OpRequest>) ()
#13 0x000000000063ce78 in OSD::handle_op(std::tr1::shared_ptr<OpRequest>) ()
#14 0x0000000000646159 in OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>) ()
#15 0x00000000006521fe in OSD::_dispatch(Message*) ()
#16 0x0000000000652986 in OSD::ms_dispatch(Message*) ()
#17 0x00000000008e7ec1 in DispatchQueue::entry() ()
#18 0x0000000000832e4d in DispatchQueue::DispatchThread::entry() ()
#19 0x00007f16a8711e9a in start_thread (arg=0x7f169c587700) at pthread_create.c:308
#20 0x00007f16a68a74bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
Actions #5

Updated by Yehuda Sadeh over 10 years ago

The osd hasn't been restarted at this point.

Actions #6

Updated by Yehuda Sadeh over 10 years ago

So basically this is #5752. We can try working around it by running the objclass unitest before the upgrade (which will hopefully get the osd to load the class object).

Actions #7

Updated by Sage Weil over 10 years ago

  • Status changed from New to Resolved

backported the preload osd class patches to cuttlefish and enabled in teuthology so we can avoid this problem in testing.

Actions

Also available in: Atom PDF