Project

General

Profile

Bug #19636

Updated by Nathan Cutler almost 7 years ago

test descriptions: 

 * upgrade:client-upgrade/hammer-client-x/rbd/{0-cluster/start.yaml 1-install/hammer-client-x.yaml 2-workload/rbd_notification_tests.yaml} 
 * upgrade:client-upgrade/jewel-client-x/rbd/{0-cluster/start.yaml 1-install/jewel-client-x.yaml 2-workload/rbd_notification_tests.yaml} 

 test failure reproducible? YES 

 test URLs: 

 * http://pulpito.ceph.com/smithfarm-2017-04-16_18:30:46-upgrade:client-upgrade-wip-kraken-backports-distro-basic-vps/1033423/ 
 * http://pulpito.ceph.com/smithfarm-2017-04-16_18:30:46-upgrade:client-upgrade-wip-kraken-backports-distro-basic-vps/1033424/ 

 test cluster: 

 <pre> 
   roles: 
   - - mon.a 
     - mon.b 
     - mon.c 
     - osd.0 
     - osd.1 
     - osd.2 
     - client.0 
   - - client.1 
 </pre> 

 what seems to happen: 

 # hammer is installed on both nodes 
 # the "client.1" node is upgraded to kraken (wip-kraken-backports) 
 # ceph task runs 
 # the "hammer" version of "rbd/notify_master.sh" is run on client.0 and the "hammer" version of "rbd/notify_slave.sh" is run on client.1 (afaict the hammer and kraken versions of these scripts are identical) 

 <pre> 
 2017-04-16T18:44:20.785 INFO:tasks.workunit.client.0.vpm175.stderr:+ 
 dirname /home/ubuntu/cephtest/clone.client.0/qa/workunits/rbd/notify_master.sh 
 2017-04-16T18:44:20.786 INFO:tasks.workunit.client.0.vpm175.stderr:+ 
 relpath=/home/ubuntu/cephtest/clone.client.0/qa/workunits/rbd/../../../src/test/librbd 
 2017-04-16T18:44:20.787 INFO:tasks.workunit.client.0.vpm175.stderr:+ 
 python /home/ubuntu/cephtest/clone.client.0/qa/workunits/rbd/../../../src/test/librbd/test_notify.py 
 master 
 </pre> 

 Three hours later, they are stopped (timeout): 

 <pre> 
 2017-04-16T21:44:02.648 INFO:tasks.workunit:Stopping ['rbd/notify_master.sh'] on client.0... 
 2017-04-16T21:44:02.649 INFO:teuthology.orchestra.run.vpm089:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0' 
 2017-04-16T21:44:02.668 INFO:tasks.workunit:Stopping ['rbd/notify_slave.sh'] on client.1... 
 2017-04-16T21:44:02.669 INFO:teuthology.orchestra.run.vpm101:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list.client.1 /home/ubuntu/cephtest/clone.client.1' 
 </pre> 

 And 1/10 of a second later - as if the job fails with an exception: script just gave up without actually doing anything - the 'rbd/notify_master.sh' script fails: 

 <pre> 
 2017-04-16T21:44:02.758 ERROR:teuthology.run_tasks:Saw exception from tasks. 
 Traceback (most recent call last): 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks 
     manager = run_one_task(taskname, ctx=ctx, config=config) 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task 
     return task(**kwargs) 
   File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-kraken-backports/qa/tasks/workunit.py", line 176, in task 
     config.get('env'), timeout=timeout) 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__ 
     for result in self: 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next 
     resurrect_traceback(result) 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback 
     return func(*args, **kwargs) 
   File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-kraken-backports/qa/tasks/workunit.py", line 450, in _run_tests 
     label="workunit test {workunit}".format(workunit=workunit) 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 193, in run 
     r = self._runner(client=self.ssh, name=self.shortname, **kwargs) 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 414, in run 
     r.wait() 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 149, in wait 
     self._raise_for_status() 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 171, in _raise_for_status 
     node=self.hostname, label=self.label 
 CommandFailedError: Command failed (workunit test rbd/notify_master.sh) on vpm089 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=hammer TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 RBD_FEATURES=13 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rbd/notify_master.sh' 
 </pre> 

Back