There were plenty of other timeouts before the suicide, and this sort of situation is not specific to ceph.parent batches; in my experience, any sustained workload that exceeds the throughput of the slowest osd will eventually cause this slowest osd to suicide because it can't keep up: I/O on the underlying btrfs presumably gets slower and slower until some operation reaches the suicide timeout.
Here is the latest sample, in which the osd was recovering from earlier failures while other uses of the cephfs were underway:
398> 2014-04-28 12:18:47.798145 7f617de64700 5 - op tracker -- , seq: 96045, time: 2014-04-28 12:18:47.798144, event: waiting for subops from (1,255),(10,255), request: osd_op(client.610213.0:731234 1000082579d.00000019 [write 3505880~458752] 0.9515a9ad snapc 1=[] ondisk+write e71456) v4
397> 2014-04-28 12:18:47.798203 7f617de64700 1 - 172.31.160.7:6802/16998 --> osd.1 172.31.160.6:6813/2659 -- osd_sub_op(client.610213.0:731234 0.2d 9515a9ad/1000082579d.00000019/head//0 [] v 71456'222312 snapset=0=[]:[] snapc=0=[]) v9 -- ?+459363 0x16195000
396> 2014-04-28 12:18:47.798279 7f617de64700 1 - 172.31.160.7:6802/16998 --> osd.10 172.31.160.7:6811/13035 -- osd_sub_op(client.610213.0:731234 0.2d 9515a9ad/1000082579d.00000019/head//0 [] v 71456'222312 snapset=0=[]:[] snapc=0=[]) v9 -- ?+459363 0x16191400
395> 2014-04-28 12:18:47.798467 7f617de64700 5 - op tracker -- , seq: 96045, time: 2014-04-28 12:18:47.798467, event: commit_queued_for_journal_write, request: osd_op(client.610213.0:731234 1000082579d.00000019 [write 3505880~458752] 0.9515a9ad snapc 1=[] ondisk+write e71456) v4
-394> 2014-04-28 12:18:47.798536 7f617de64700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f617de64700' had timed out after 30
-393> 2014-04-28 12:18:47.798554 7f617de64700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f617de64700' had suicide timed out after 300
[...]
0> 2014-04-28 12:18:50.190783 7f617de64700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f617de64700 time 2014-04-28 12:18:47.798628
common/HeartbeatMap.cc: 79: FAILED assert(0 "hit suicide timeout")
ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2d9) [0x9b52c9]
2: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, long, long)+0x89) [0x9b55e9]
3: (ThreadPool::worker(ThreadPool::WorkThread*)+0x5ed) [0xa636fd]
4: (ThreadPool::WorkThread::entry()+0x10) [0xa64b10]
5: /lib64/libpthread.so.0() [0x3801607f33]
6: (clone()+0x6d) [0x3800ef4ded]
NOTE: a copy of the executable, or `objdump rdS <executable>` is needed to interpret this.
[...]
-- end dump of recent events ---
2014-04-28 12:18:50.190784 7f61ffb30700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f61ffb30700 time 2014-04-28 12:18:47.805406
common/HeartbeatMap.cc: 79: FAILED assert(0 "hit suicide timeout")
ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2d9) [0x9b52c9]
2: (ceph::HeartbeatMap::is_healthy()+0xc6) [0x9b5ba6]
3: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x9b61ec]
4: (CephContextServiceThread::entry()+0x15b) [0xa86e6b]
5: /lib64/libpthread.so.0() [0x3801607f33]
6: (clone()+0x6d) [0x3800ef4ded]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.