Project

General

Profile

Actions

Bug #6208

closed

rgw: md5 checksum failed on readwrite during upgrade-next tests

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-09-02T19:19:13.165 INFO:teuthology.orchestra.run.err:[10.214.133.34]: Traceback (most recent call last):
2013-09-02T19:19:13.165 INFO:teuthology.orchestra.run.err:[10.214.133.34]:   File "/home/ubuntu/cephtest/17389/s3-tests/virtualenv/bin/s3tests-test-readwrite", line 9, in <module>
2013-09-02T19:19:13.165 INFO:teuthology.orchestra.run.err:[10.214.133.34]:     load_entry_point('s3tests==0.0.1', 'console_scripts', 's3tests-test-readwrite')()
2013-09-02T19:19:13.165 INFO:teuthology.orchestra.run.err:[10.214.133.34]:   File "/home/ubuntu/cephtest/17389/s3-tests/s3tests/readwrite.py", line 255, in main
2013-09-02T19:19:13.165 INFO:teuthology.orchestra.run.err:[10.214.133.34]:     trace=temp_dict['error']['traceback'])
2013-09-02T19:19:13.165 INFO:teuthology.orchestra.run.err:[10.214.133.34]: Exception: exception:
2013-09-02T19:19:13.166 INFO:teuthology.orchestra.run.err:[10.214.133.34]:      md5sum check failed
2013-09-02T19:19:13.166 INFO:teuthology.orchestra.run.err:[10.214.133.34]:      None
ubuntu@teuthology:/a/teuthology-2013-09-02_01:30:04-upgrade-next-testing-basic-plana/17389$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 263cbbcaf605e359a46e30889595d82629f82080
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
    log-whitelist:
    - slow request
    sha1: e48d6cb4023fb3735e9c4288f5d5c7bac44eadde
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: e48d6cb4023fb3735e9c4288f5d5c7bac44eadde
  s3tests:
    branch: next
  workunit:
    sha1: e48d6cb4023fb3735e9c4288f5d5c7bac44eadde
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
tasks:
- chef: null
- clock.check: null
- install:
    branch: bobtail
- ceph: null
- rgw: null
- s3tests:
    client.0:
      force-branch: bobtail
      rgw_server: client.0
- install.upgrade:
    all:
      branch: dumpling
- ceph.restart:
  - mon.a
  - mon.b
  - mon.c
  - mds.a
  - osd.0
  - osd.1
  - osd.2
  - osd.3
  - rgw.client.0
- swift:
    client.0:
      rgw_server: client.0
- install.upgrade:
    all:
      branch: next
- ceph.restart:
  - osd.0
  - osd.1
  - osd.2
  - osd.3
  - mds.a
  - mon.a
  - mon.b
  - mon.c
  - rgw.client.0
- s3readwrite:
    client.0:
      readwrite:
        bucket: rwtest
        duration: 300
        files:
          num: 10
          size: 2000
          stddev: 500
        readers: 10
        writers: 3
      rgw_server: client.0
teuthology_branch: next
Actions #1

Updated by Ian Colle over 10 years ago

  • Assignee set to Yehuda Sadeh
Actions #2

Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High
Actions #3

Updated by Yehuda Sadeh over 10 years ago

There's a bunch of these in the apache logs:

10.214.133.34 l - [02/Sep/2013:19:19:24 -0700] "GET /rwtest/cvthpmijcgjudvjqfa HTTP/1.1" 103 24576 "{Referer}i" "Boto/2.11.0 (linux2)" 
10.214.133.34 l - [02/Sep/2013:19:19:24 -0700] "GET /rwtest/ittqrixshckdrse HTTP/1.1" 103 24576 "{Referer}i" "Boto/2.11.0 (linux2)" 
10.214.133.34 l - [02/Sep/2013:19:19:24 -0700] "GET /rwtest/mgjjlsx HTTP/1.1" 200 1502143 "{Referer}i" "Boto/2.11.0 (linux2)" 
10.214.133.34 l - [02/Sep/2013:19:19:24 -0700] "GET /rwtest/ziayoxjyidm HTTP/1.1" 103 24576 "{Referer}i" "Boto/2.11.0 (linux2)" 
10.214.133.34 l - [02/Sep/2013:19:19:24 -0700] "GET /rwtest/ziayoxjyidm HTTP/1.1" 103 24576 "{Referer}i" "Boto/2.11.0 (linux2)" 
10.214.133.34 l - [02/Sep/2013:19:19:24 -0700] "GET /rwtest/mgjjlsx HTTP/1.1" 200 1502143 "{Referer}i" "Boto/2.11.0 (linux2)" 
10.214.133.34 l - [02/Sep/2013:19:19:24 -0700] "GET /rwtest/ittqrixshckdrse HTTP/1.1" 103 24576 "{Referer}i" "Boto/2.11.0 (linux2)" 

and the gateway itself complains:

2013-09-02 19:19:25.603506 7f4b84ff9700  0 ERROR: s->cio->print() returned err=-1
2013-09-02 19:19:25.603509 7f4b607b0700  0 ERROR: s->cio->print() returned err=-1
2013-09-02 19:19:25.603513 7f4b607b0700  0 ERROR: s->cio->print() returned err=-1
2013-09-02 19:19:25.603516 7f4b607b0700  0 ERROR: s->cio->print() returned err=-1

Note that the clocks on the client and on the gateway machine don't completely agree, it seems that these errors are around the same time the test failed.

From what I can tell it's some miscommunication between apache and the gateway.

Actions #4

Updated by Ian Colle over 10 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF