Bug #10513
closedceph_test_librbd_fsx fails with thrasher
0%
Description
http://pulpito.ceph.com/loic-2015-01-08_10:36:47-rbd-giant-backports-testing-basic-vps/690698/
s) 2015-01-08T03:31:52.223 INFO:teuthology.orchestra.run.vpm020.stdout:1368 write 0xcdd426c thru 0xcdda38b (0x6120 bytes) 2015-01-08T03:31:52.245 INFO:teuthology.orchestra.run.vpm020.stdout:1370 write 0x1c01757 thru 0x1c111a4 (0xfa4e bytes) 2015-01-08T03:31:52.359 INFO:teuthology.orchestra.run.vpm020.stdout:1371 punch from 0xbd27283 to 0xbd28661, (0x13de bytes) 2015-01-08T03:31:52.360 INFO:teuthology.orchestra.run.vpm020.stdout:1372 read 0x8ec1a40 thru 0x8ed14b4 (0xfa75 bytes) 2015-01-08T03:31:52.362 INFO:teuthology.orchestra.run.vpm020.stdout:1376 read 0x2c2dc2f thru 0x2c3a85c (0xcc2e bytes) 2015-01-08T03:31:52.363 INFO:teuthology.orchestra.run.vpm020.stdout:1377 write 0x49bbbaa thru 0x49c3cee (0x8145 bytes) 2015-01-08T03:31:52.381 INFO:teuthology.orchestra.run.vpm020.stdout:1378 clone 18 order 24 su 65536 sc 10 2015-01-08T03:31:53.488 INFO:teuthology.orchestra.run.vpm020.stdout:leaving image image_client.0-clone17 intact 2015-01-08T03:31:54.094 INFO:tasks.thrashosds.thrasher:in_osds: [0, 5, 4, 3, 2, 1] out_osds: [] dead_osds: [] live_osds: [1, 0, 2, 3, 5, 4] 2015-01-08T03:31:54.094 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0 2015-01-08T03:31:54.094 INFO:tasks.thrashosds.thrasher:inject_pause on 2 2015-01-08T03:31:54.094 INFO:tasks.thrashosds.thrasher:Testing filestore_inject_stall pause injection for duration 3 2015-01-08T03:31:54.094 INFO:tasks.thrashosds.thrasher:Checking after 0, should_be_down=False 2015-01-08T03:31:54.095 INFO:teuthology.orchestra.run.vpm020:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config set filestore_inject_stall 3' 2015-01-08T03:31:55.767 INFO:teuthology.orchestra.run.vpm020.stdout:checking clone #16, image image_client.0-clone16 against file /home/ubuntu/cephtest/archive/fsx-image_client.0-parent17 2015-01-08T03:31:57.236 INFO:teuthology.orchestra.run.vpm020.stdout:1379 trunc from 0xcdda38c to 0x356fbb0 2015-01-08T03:31:57.276 INFO:teuthology.orchestra.run.vpm020.stdout:1380 punch from 0x2dc498a to 0x2dd41a9, (0xf81f bytes) 2015-01-08T03:31:58.230 INFO:teuthology.orchestra.run.vpm020.stdout:1381 write 0x9812d9b thru 0x981dbff (0xae65 bytes) 2015-01-08T03:31:58.256 INFO:teuthology.orchestra.run.vpm020.stdout:1382 read 0x41e63cb thru 0x41eaaac (0x46e2 bytes) 2015-01-08T03:31:58.260 INFO:teuthology.orchestra.run.vpm020.stdout:1384 punch from 0x56cf68e to 0x56d8789, (0x90fb bytes) 2015-01-08T03:31:58.262 INFO:teuthology.orchestra.run.vpm020.stdout:1385 punch from 0x34d0564 to 0x34de4ae, (0xdf4a bytes) 2015-01-08T03:31:58.503 INFO:teuthology.orchestra.run.vpm020.stdout:1386 read 0x14bc432 thru 0x14cc124 (0xfcf3 bytes) 2015-01-08T03:31:58.516 INFO:teuthology.orchestra.run.vpm020.stdout:1387 trunc from 0x981dc00 to 0x4b15e63 2015-01-08T03:31:59.360 INFO:teuthology.orchestra.run.vpm020.stdout:1388 write 0x9a9f9e9 thru 0x9aa4022 (0x463a bytes) 2015-01-08T03:31:59.564 INFO:teuthology.orchestra.run.vpm020.stdout:1389 read 0x8ab5000 thru 0x8ab9bf3 (0x4bf4 bytes) 2015-01-08T03:31:59.588 INFO:teuthology.orchestra.run.vpm020.stdout:1390 write 0xc97e2a0 thru 0xc980f4e (0x2caf bytes) 2015-01-08T03:31:59.607 INFO:teuthology.orchestra.run.vpm020.stdout:1391 write 0x83e230d thru 0x83eb9c6 (0x96ba bytes) 2015-01-08T03:31:59.926 INFO:teuthology.orchestra.run.vpm020.stdout:1392 read 0x393a1d1 thru 0x3941192 (0x6fc2 bytes) 2015-01-08T03:31:59.929 INFO:teuthology.orchestra.run.vpm020.stdout:1393 read 0xa4b4684 thru 0xa4bf84a (0xb1c7 bytes) 2015-01-08T03:31:59.956 INFO:teuthology.orchestra.run.vpm020.stdout:1394 write 0xea94054 thru 0xea94507 (0x4b4 bytes) 2015-01-08T03:31:59.973 INFO:teuthology.orchestra.run.vpm020.stdout:1398 read 0x4b1ef26 thru 0x4b26525 (0x7600 bytes) 2015-01-08T03:31:59.975 INFO:teuthology.orchestra.run.vpm020.stdout:1399 clone 19 order 21 su 32768 sc 14 2015-01-08T03:32:04.230 INFO:teuthology.orchestra.run.vpm020.stdout:truncating image image_client.0-clone18 from 0xea94508 (overlap 0x356fbb0) to 0x557f91 2015-01-08T03:32:04.549 INFO:tasks.thrashosds.thrasher:in_osds: [0, 5, 4, 3, 2, 1] out_osds: [] dead_osds: [] live_osds: [1, 0, 2, 3, 5, 4] 2015-01-08T03:32:04.549 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0 2015-01-08T03:32:04.549 INFO:tasks.thrashosds.thrasher:Removing osd 4, in_osds are: [0, 5, 4, 3, 2, 1] 2015-01-08T03:32:04.549 INFO:teuthology.orchestra.run.vpm020:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd out 4' 2015-01-08T03:32:04.973 INFO:teuthology.orchestra.run.vpm020.stdout:checking clone #17, image image_client.0-clone17 against file /home/ubuntu/cephtest/archive/fsx-image_client.0-parent18 2015-01-08T03:32:08.495 ERROR:teuthology.parallel:Exception in parallel execution Traceback (most recent call last): File "/home/teuthworker/src/teuthology_giant/teuthology/parallel.py", line 82, in __exit__ for result in self: File "/home/teuthworker/src/teuthology_giant/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/teuthworker/src/teuthology_giant/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/rbd_fsx.py", line 82, in _run_one_client remote.run(args=args) File "/home/teuthworker/src/teuthology_giant/teuthology/orchestra/remote.py", line 128, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/teuthology_giant/teuthology/orchestra/run.py", line 368, in run r.wait() File "/home/teuthworker/src/teuthology_giant/teuthology/orchestra/run.py", line 103, in wait raise CommandCrashedError(command=self.command) CommandCrashedError: Command crashed: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_librbd_fsx -d -W -R -p 100 -P /home/ubuntu/cephtest/archive -r 1 -w 1 -t 1 -h 1 -l 250000000 -S 0 -N 2000 pool_client.0 image_client.0'
Updated by Loïc Dachary over 9 years ago
- Status changed from New to Won't Fix
josh: There's no trace of the actual error in the teuthology or ceph client logs. Looking at the syslogs, this and several other failures from that run were from running out of memory. This happened with and without caching, so I suspect these tests may simply need more memory in their current configuration, since osds are sharing it too, and they use much more with thrashing.
Updated by Loïc Dachary about 9 years ago
- Subject changed from ceph_test_librbd_fsx fails with thrasher (giant) to ceph_test_librbd_fsx fails with thrasher
On dumpling this time ( http://pulpito.ceph.com/loic-2015-01-29_15:39:38-rbd-dumpling-backports---basic-multi/730029/ ), the test never completes. It is running on bare metal, therefore presumably with more RAM. And the error is not that it crashes, just that it hangs forever.