Project

General

Profile

Backport #14873

Updated by Loïc Dachary about 8 years ago

<pre> 
 On my crappy test cluster (Debian Jessie, Hammer 0.94.6) I'm seeing rados 
 bench crashing doing "seq" runs.  
 As I'm testing cache tiers at the moment I also tried it with a normal, 
 replicated pool with the same result. 

 After creating some benchmark objects with: 
 --- 
 rados -p data bench 20 write -t 32 --no-cleanup 
 --- 

 A consecutive run of this ends in tears: 
 --- 
 # rados -p data bench 10 seq -t 32  
    sec Cur ops     started    finished    avg MB/s    cur MB/s    last lat     avg lat 
      0         0           0           0           0           0           -           0 
 rados: ./common/Mutex.h:96: void Mutex::_pre_unlock(): Assertion `nlock > 0' failed. 
 *** Caught signal (Aborted) ** 
  in thread 7f1894100780 
  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 
  1: rados() [0x4e5e23] 
  2: (()+0xf8d0) [0x7f18915268d0] 
  3: (gsignal()+0x37) [0x7f188fde6067] 
  4: (abort()+0x148) [0x7f188fde7448] 
  5: (()+0x2e266) [0x7f188fddf266] 
  6: (()+0x2e312) [0x7f188fddf312] 
  7: (Mutex::Unlock()+0xb3) [0x4fda93] 
  8: (ObjBencher::seq_read_bench(int, int, int, int, bool)+0x127c) [0x4da37c] 
  9: (ObjBencher::aio_bench(int, int, int, int, int, bool, char const*, bool)+0x2df) [0x4ded8f] 
  10: (main()+0xa664) [0x4be834] 
  11: (__libc_start_main()+0xf5) [0x7f188fdd2b45] 
  12: rados() [0x4c2c97] 
 2016-02-26 14:18:52.641052 7f1894100780 -1 *** Caught signal (Aborted) ** 
  in thread 7f1894100780 
 --- 

 There's nothing particular outstanding or malicious in the recent events, 
 here are the last 2: 
 --- 
     -2> 2016-02-26 14:23:12.439214 7f18c113f780    1 -- 10.0.0.83:0/877189211 --> 10.0.0.85:6804/2921 -- osd_op(client.31691145.0:34 benchmark_data_engtest03_32406_object32 [read 0~4096] 0.def1bb6e ack+read+known_if_redirected e11724) v5 -- ?+0 0x39090d0 con 0x389bed0 
     -1> 2016-02-26 14:23:12.439930 7f18b4549700    1 -- 10.0.0.83:0/877189211 <== osd.11 10.0.0.34:6802/2973 1 ==== osd_op_reply(9 benchmark_data_engtest03_32406_object7 [read 0~4096] v0'0 uv15 ondisk = 0) v6 ==== 205+0+4096 (2792458300 0 1108541644) 0x7f1864000ca0 con 0x38bbf80 
 --- 

 Note that "rand" works fine, as does "seq" on a 0.95.5 cluster.  

 </pre>

Back