Project

General

Profile

Bug #20449

Updated by Nathan Cutler almost 7 years ago

This happened in jewel 10.2.8 integration testing. 

 Test: <code>rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml clusters/{fixed-2.yaml openstack.yaml} fs/btrfs.yaml hobj-sort.yaml msgr-failures/fastclose.yaml msgr/async.yaml rados.yaml thrashers/pggrow.yaml workloads/rgw_snaps.yaml}</code> 

 Test URL: http://pulpito.front.sepia.ceph.com/smithfarm-2017-06-27_19:13:42-rados-wip-jewel-backports-distro-basic-smithi/1333051/ 

 Failure message: Command failed on smithi165 with status 1: '/home/ubuntu/cephtest/s3-tests/virtualenv/bin/s3tests-test-readwrite' 

 Analysis: due to some race, the <code>.rgw.control</code> rgw task starts, and begins creating pools, but bails after creating just two: 

 <pre> 
 2017-06-28T04:23:33.387 INFO:teuthology.run_tasks:Running task rgw... 
 2017-06-28T04:23:33.429 DEBUG:tasks.rgw:multisite False 
 2017-06-28T04:23:33.432 DEBUG:tasks.rgw:multi_cluster False 
 2017-06-28T04:23:33.436 DEBUG:tasks.rgw:single cluster run 
 2017-06-28T04:23:33.449 INFO:tasks.rgw:Using civetweb as radosgw frontend 
 2017-06-28T04:23:33.453 INFO:tasks.rgw:creating data pools 
 2017-06-28T04:23:33.458 INFO:teuthology.orchestra.run.smithi165:Running: 'sudo ceph osd pool sometimes gets created, other times not. Since create .rgw.buckets 64 64' 
 2017-06-28T04:23:33.466 INFO:tasks.thrashosds.thrasher:starting do_thrash 
 2017-06-28T04:23:33.468 INFO:tasks.thrashosds.thrasher:in_osds:    [0, 1, 2, 3, 4, 5] out_osds:    [] dead_osds:    [] live_osds:    [1, 0, 3, 2, 5, 4] 
 2017-06-28T04:23:33.470 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0 
 2017-06-28T04:23:33.474 INFO:tasks.thrashosds.thrasher:inject_pause on 1 
 2017-06-28T04:23:33.476 INFO:tasks.thrashosds.thrasher:Testing filestore_inject_stall pause injection for duration 3 
 2017-06-28T04:23:33.478 INFO:tasks.thrashosds.thrasher:Checking after 0, should_be_down=False 
 2017-06-28T04:23:33.481 INFO:teuthology.orchestra.run.smithi165:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config set filestore_inject_stall 3' 
 2017-06-28T04:23:33.484 INFO:tasks.thrashosds.thrasher:starting do_sighup with a delay of 0.1 
 2017-06-28T04:23:33.578 INFO:teuthology.orchestra.run.smithi165.stderr:2017-06-28 04:23:33.586022 7fa3bcc41700 -1 WARNING: the <code>thrash_pool_snaps</code> *assumes* that this pool exists, following dangerous and experimental features are enabled: * 
 2017-06-28T04:23:33.596 INFO:teuthology.orchestra.run.smithi165.stderr:2017-06-28 04:23:33.604531 7fa3bcc41700 -1 WARNING: the following dangerous and experimental features are enabled: * 
 2017-06-28T04:23:33.600 INFO:teuthology.orchestra.run.smithi165.stdout:{ 
 2017-06-28T04:23:33.602 INFO:teuthology.orchestra.run.smithi165.stdout:      "success": "filestore_inject_stall = '3' (unchangeable) " 
 2017-06-28T04:23:33.605 INFO:teuthology.orchestra.run.smithi165.stdout:} 
 2017-06-28T04:23:33.608 INFO:teuthology.orchestra.run.smithi165.stdout: 
 2017-06-28T04:23:34.324 INFO:teuthology.orchestra.run.smithi165.stderr:pool '.rgw.buckets' created 
 2017-06-28T04:23:34.338 DEBUG:tasks.rgw:In rgw.configure_regions_and_zones() and regions is None. Bailing 
 </pre> 

 I guess the test fails if should have stopped there, but it doesn't. 

 continues. The "thrash_pool_snaps" next task is "thrash_pool_snaps", which takes the following config: 

 <pre> 
   - thrash_pool_snaps: 
       pools: 
       - .rgw.buckets 
       - .rgw.root 
       - .rgw.control 
       - .rgw 
       - .users.uid 
       - .users.email 
       - .users 
 </pre> 

 If any Obviously, that can't lead to a good result when only the first two of the listed pools do not exist for whatever reason, were created in the previous step. The test fails. In our case, it's the ".rgw.control" pool fails when thrash_pool_snaps tries to make a snapshot of .rgw.control pool, which sometimes doesn't get created: exist: 

 <pre> 
 2017-06-28T04:24:23.316 INFO:teuthology.orchestra.run.smithi165:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph osd pool mksnap .rgw.control 1' 
 2017-06-28T04:24:23.376 INFO:teuthology.orchestra.run.smithi165.stderr:2017-06-28 04:24:23.383075 7f5f5cb77700 -1 WARNING: the following dangerous and experimental features are enabled: * 
 2017-06-28T04:24:23.391 INFO:teuthology.orchestra.run.smithi165.stderr:2017-06-28 04:24:23.393794 7f5f5cb77700 -1 WARNING: the following dangerous and experimental features are enabled: * 
 2017-06-28T04:24:23.452 INFO:teuthology.orchestra.run.smithi165.stderr:Error ENOENT: unrecognized pool '.rgw.control' 
 </pre>

Back