Project

General

Profile

Actions

Bug #20687

closed

osd: crashing on ec read partial failure

Added by Ashley Merrick almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since upgrading to latest RC : ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) seeing multiple OSD's crashing at a time happens at random period's however seems further effected during Recovery & Back-filling traffic.

OSD's most Filestore with a few a couple converted to bluestore, issue is effecting both.

Errors seen during some of the crashes:

1

172.16.3.10:6802/21760 --> 172.16.3.6:6808/15997 -- pg_update_log_missing(6.19ds12 epoch 101931/101928 rep_tid 59 entries 101931'55683 (0'0) error 6:b984d72a:::rbd_data.a1d870238e1f29.0000000000007c0b:head by client.30604127.0:31963 0.000000 2) v2 - 0x55bea0faefc0 con 0

2

log_channel(cluster) log [ERR] : 4.11c required past_interval bounds are empty [101500,100085) but past_intervals is not: ([90726,100084...0083] acting 28)
luminous
The above seems to be the same as : http://tracker.ceph.com/issues/20167 (Marked as Resolved in earlier luminous dev release)

3

2017-07-19 12:50:57.587194 7f19348f1700 -1 /build/ceph-12.1.1/src/osd/PrimaryLogPG.cc: In function 'virtual void C_CopyFrom_AsyncReadCb::finish(int)' thread 7f19348f1700 time 2017-07-19 12:50:57.583192
/build/ceph-12.1.1/src/osd/PrimaryLogPG.cc: 7585: FAILED assert(len <= reply_obj.data.length())


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #20167: osd/PG.cc: 806: FAILED assert(past_intervals.empty())Resolved06/02/2017

Actions
Actions #1

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Need More Info
  • Priority changed from Normal to Urgent

Does this reproduce reliably, or did it just happen once? If you can reproduce, can you do with with 'debug osd = 20' and 'debug ms = 1' and attach the OSD log file?

Thanks!

Actions #2

Updated by Ashley Merrick almost 7 years ago

Sage Weil wrote:

Does this reproduce reliably, or did it just happen once? If you can reproduce, can you do with with 'debug osd = 20' and 'debug ms = 1' and attach the OSD log file?

Thanks!

Yes happens around about once an hour (Not sure if anything run's once an hour at OSD / Cluster level?), have increased log level on one of the most troublesome OSD's (due to disk space), and will try and catch the next instance.

Actions #3

Updated by Ashley Merrick almost 7 years ago

Sage Weil wrote:

Does this reproduce reliably, or did it just happen once? If you can reproduce, can you do with with 'debug osd = 20' and 'debug ms = 1' and attach the OSD log file?

Thanks!

Filestore :

Was able to get a LOG file from one of the OSD failing (FileStore seems a slightly different error)

I have uploaded the file here due to size : http://amerrick.co.uk/ceph/ceph-osd_split.4.log05

From what I can see the restart happens around : 2017-07-19 21:45:04.301339

Bluestore : (These OSD's seem to crash alot more)

Was able to get a LOG file from one of the OSD failing (Bluestore)

I have uploaded the file here due to size : http://amerrick.co.uk/ceph/ceph-osd_split.10.log06

From what I can see the restart happens around : 03:00:02.719080 7fe376229700

Actions #4

Updated by Ashley Merrick almost 7 years ago

Ashley Merrick wrote:

Sage Weil wrote:

Does this reproduce reliably, or did it just happen once? If you can reproduce, can you do with with 'debug osd = 20' and 'debug ms = 1' and attach the OSD log file?

Thanks!

Filestore :

Was able to get a LOG file from one of the OSD failing (FileStore seems a slightly different error)

I have uploaded the file here due to size : http://amerrick.co.uk/ceph/ceph-osd_split.4.log05

From what I can see the restart happens around : 2017-07-19 21:45:04.301339

Bluestore : (These OSD's seem to crash alot more)

Was able to get a LOG file from one of the OSD failing (Bluestore)

I have uploaded the file here due to size : http://amerrick.co.uk/ceph/ceph-osd_split.10.log06

From what I can see the restart happens around : 03:00:02.719080 7fe376229700

The file store error is : FileStore.cc: 5385: FAILED assert(0 "ERROR: source must exist")

Seems related to this code:

5375       // the source collection/object does not exist. If we are replaying, we
5376 // should be safe, so just return 0 and move on.
5377 if (replaying) {
5378 dout(10) << FUNC << ": " << c << "/" << o << " from "
5379 << oldcid << "/" << oldoid << " (dne, continue replay) " << dendl;
5380 } else if (allow_enoent) {
5381 dout(10) << FUNC << ": " << c << "/" << o << " from "
5382 << oldcid << "/" << oldoid << " (dne, ignoring enoent)"
5383 << dendl;
5384 } else {
5385 assert(0 "ERROR: source must exist");
Actions #5

Updated by Sage Weil over 6 years ago

  • Subject changed from OSD: Multiple OSD crashing to osd: crashing on ec read partial failure
  • Status changed from Need More Info to 12

from the bluestore osd:

   -99> 2017-07-20 03:00:02.714534 7fe376229700  1 -- 172.16.3.3:6804/4498 --> 172.16.3.4:6810/27046 -- MOSDECSubOpReadReply(6.2a3s0 103015/103008 ECSubReadReply(tid=885708, attrs_read=0)) v2 -- 0x558085042dc0 con 0
   -98> 2017-07-20 03:00:02.714537 7fe376229700 20 -- 172.16.3.3:6804/4498 >> 172.16.3.4:6810/27046 conn(0x558070745800 :-1 s=STATE_OPEN pgs=1001 cs=1 l=0).prepare_send_message m MOSDECSubOpReadReply(6.2a3s0 103015/103008 ECSubReadReply(tid=885708, attrs_read=0)) v2
   -97> 2017-07-20 03:00:02.714540 7fe376229700 20 -- 172.16.3.3:6804/4498 >> 172.16.3.4:6810/27046 conn(0x558070745800 :-1 s=STATE_OPEN pgs=1001 cs=1 l=0).prepare_send_message encoding features 1152323339925389307 0x558085042dc0 MOSDECSubOpReadReply(6.2a3s0 103015/103008 ECSubReadReply(tid=885708, attrs_read=0)) v2
   -96> 2017-07-20 03:00:02.714551 7fe376229700 15 -- 172.16.3.3:6804/4498 >> 172.16.3.4:6810/27046 conn(0x558070745800 :-1 s=STATE_OPEN pgs=1001 cs=1 l=0).send_message inline write is denied, reschedule m=0x558085042dc0
   -95> 2017-07-20 03:00:02.714555 7fe376229700 20 Event(0x5580015a7e80 nevent=5000 time_id=301).wakeup
   -94> 2017-07-20 03:00:02.714561 7fe376229700 10 osd.10 103015 dequeue_op 0x558086315fa0 finish
   -93> 2017-07-20 03:00:02.714582 7fe376229700 20 osd.10 op_wq(0) _process 6.14s1 item PGQueueable(0x558087528aa0 prio 127 cost 0 e103015) queued
   -92> 2017-07-20 03:00:02.714587 7fe376229700 20 osd.10 op_wq(0) _process 6.14s1 item PGQueueable(0x558087528aa0 prio 127 cost 0 e103015) pg 0x55806c549000
   -91> 2017-07-20 03:00:02.714589 7fe376229700 10 osd.10 103015 dequeue_op 0x558087528aa0 prio 127 cost 0 latency 0.315598 MOSDECSubOpReadReply(6.14s1 103015/103014 ECSubReadReply(tid=23540, attrs_read=0)) v2 pg pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degrade
   -89> 2017-07-20 03:00:02.714602 7fe376229700 20 osd.10 103015 share_map osd.59 172.16.3.7:6818/4465 103015
   -88> 2017-07-20 03:00:02.714609 7fe376229700 20 osd.10 103015 should_share_map osd.59 172.16.3.7:6818/4465 103015
   -87> 2017-07-20 03:00:02.714613 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_message: MOSDECSubOpReadReply(6.14s1 103015/103014 ECSubReadReply(tid=23540, attrs_read=0)) v2
   -85> 2017-07-20 03:00:02.714622 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply: reply ECSubReadReply(tid=23540, attrs_read=0)
   -82> 2017-07-20 03:00:02.714632 7fe376229700 20 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply shard=59(11) error=-2
   -79> 2017-07-20 03:00:02.714642 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply readop not complete: ReadOp(tid=23540, to_read={6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head=read_r
equest_t(to_read=[0,4197440,40], need=59(11),92(12), want_attrs=0)}, complete={6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head=read_result_t(r=0, errors={4(9)=-2,8(8)=-2,42(5)=-2,46(4)=-2,53(10)=-2,59(11)=-2,69(6)=-2,72(7)=-2,83(3)=-2,84(2)=-2}, noattrs, returned=(0, 4197440, [10(1),419744]))}, priority=127, obj_to_source={6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head=4(9),8(8),10(1),42(5),46(4),53(10),59(11),69(6),72(7),83(3),84(2),92(12)}, source_to_obj={4(9)=6:280cff5a:::rbd_data.15d0df2ae8
944a.000000000000108c:head,8(8)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,10(1)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,42(5)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,46(4)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,53(10)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,59(11)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,69(6)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,72(7)=6:280cff5a:::rbd_data.15d0df2ae89
44a.000000000000108c:head,83(3)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,84(2)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,92(12)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head}, in_progress=92(12))
   -78> 2017-07-20 03:00:02.714689 7fe376229700 10 osd.10 103015 dequeue_op 0x558087528aa0 finish
   -77> 2017-07-20 03:00:02.714696 7fe376229700 20 osd.10 op_wq(0) _process 6.14s1 item PGQueueable(0x558083c191e0 prio 127 cost 0 e103015) queued
   -76> 2017-07-20 03:00:02.714700 7fe376229700 20 osd.10 op_wq(0) _process 6.14s1 item PGQueueable(0x558083c191e0 prio 127 cost 0 e103015) pg 0x55806c549000
   -75> 2017-07-20 03:00:02.714702 7fe376229700 10 osd.10 103015 dequeue_op 0x558083c191e0 prio 127 cost 0 latency 0.302750 MOSDECSubOpReadReply(6.14s1 103015/103014 ECSubReadReply(tid=23540, attrs_read=0)) v2 pg pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded]
   -74> 2017-07-20 03:00:02.714710 7fe376229700 20 osd.10 103015 share_map osd.92 172.16.3.15:6800/2726 103015
   -73> 2017-07-20 03:00:02.714712 7fe376229700 20 osd.10 103015 should_share_map osd.92 172.16.3.15:6800/2726 103015
   -72> 2017-07-20 03:00:02.714715 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_message: MOSDECSubOpReadReply(6.14s1 103015/103014 ECSubReadReply(tid=23540, attrs_read=0)) v2
   -71> 2017-07-20 03:00:02.714722 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply: reply ECSubReadReply(tid=23540, attrs_read=0)
   -70> 2017-07-20 03:00:02.714729 7fe376229700 20 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply shard=92(12) error=-2
   -69> 2017-07-20 03:00:02.714736 7fe376229700 20 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply have shard=1
   -68> 2017-07-20 03:00:02.714746 7fe376229700 20 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply minimum_to_decode failed
   -67> 2017-07-20 03:00:02.714756 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] send_all_remaining_reads have/error shards=1,2,3,4,5,6,7,8,9,10,11,12
   -66> 2017-07-20 03:00:02.714764 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 4(9)
   -65> 2017-07-20 03:00:02.714772 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 8(8)
   -64> 2017-07-20 03:00:02.714780 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 10(1)
   -63> 2017-07-20 03:00:02.714787 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 42(5)
   -62> 2017-07-20 03:00:02.714794 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 46(4)
   -61> 2017-07-20 03:00:02.714801 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 53(10)
   -60> 2017-07-20 03:00:02.714809 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 59(11)
   -59> 2017-07-20 03:00:02.714816 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 69(6)
   -58> 2017-07-20 03:00:02.714823 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 72(7)
   -57> 2017-07-20 03:00:02.714830 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 83(3)
   -56> 2017-07-20 03:00:02.714837 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 84(2)
   -55> 2017-07-20 03:00:02.714844 7fe376229700 10 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] get_remaining_shards: checking acting 92(12)
   -50> 2017-07-20 03:00:02.714857 7fe376229700 20 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply: Use one of the shard errors err=-2
   -48> 2017-07-20 03:00:02.714865 7fe376229700 20 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply Complete: ReadOp(tid=23540, to_read={6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head=read_request_t(to_read=[0,4197440,40], need=59(11),92(12), want_attrs=0)}, complete={6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head=read_result_t(r=-2, errors={4(9)=-2,8(8)=-2,42(5)=-2,46(4)=-2,53(10)=-2,59(11)=-2,69(6)=-2,72(7)=-2,83(3)=-2,84(2)=-2,92(12)=-2}, noattrs, returned=(0, 4197440, [10(1),419744]))}, priority=127, obj_to_source={6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head=4(9),8(8),10(1),42(5),46(4),53(10),59(11),69(6),72(7),83(3),84(2),92(12)}, source_to_obj={4(9)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,8(8)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,10(1)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,42(5)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,46(4)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,53(10)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,59(11)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,69(6)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,72(7)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,83(3)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,84(2)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head,92(12)=6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head}, in_progress=)
   -47> 2017-07-20 03:00:02.714904 7fe376229700 20 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] objects_read_async_cb: got: {6:280cff5a:::rbd_data.15d0df2ae8944a.000000000000108c:head=-2,{}}
   -46> 2017-07-20 03:00:02.714913 7fe376229700 20 osd.10 pg_epoch: 103015 pg[6.14s1( v 103013'63454 (97669'60382,103013'63454] local-lis/les=103014/103015 n=25044 ec=31534/31534 lis/c 103014/103008 les/c/f 103015/103013/67943 103014/103014/103014) [2147483647,10,84,83,46,42,69,72,8,4,53,59,92] r=1 lpr=103014 pi=[103008,103014)/1 crt=103013'63454 mlcod 0'0 active+undersized+degraded] objects_read_async_cb: cache: ExtentCache(
     0> 2017-07-20 03:00:02.719080 7fe376229700 -1 /build/ceph-12.1.1/src/osd/PrimaryLogPG.cc: In function 'virtual void C_CopyFrom_AsyncReadCb::finish(int)' thread 7fe376229700 time 2017-07-20 03:00:02.714921
/build/ceph-12.1.1/src/osd/PrimaryLogPG.cc: 7585: FAILED assert(len <= reply_obj.data.length())

 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x557ff666fe32]
 2: (C_CopyFrom_AsyncReadCb::finish(int)+0x131) [0x557ff629c9e1]
 3: (Context::complete(int)+0x9) [0x557ff611b8b9]
 4: (()+0x79bc70) [0x557ff63bfc70]
 5: (ECBackend::kick_reads()+0x48) [0x557ff63cf908]
 6: (CallClientContexts::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x562) [0x557ff63de162]
 7: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x7f) [0x557ff63b495f]
 8: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*, ZTracer::Trace const&)+0x1077) [0x557ff63c9da7]
 9: (ECBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x2a6) [0x557ff63ca946]
 10: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5e7) [0x557ff623f667]
 11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x557ff60dfb07]
 12: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x557ff633a0a7]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x557ff610b34c]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x93d) [0x557ff6675add]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x557ff6677d00]
 16: (()+0x8064) [0x7fe38e8c1064]
 17: (clone()+0x6d) [0x7fe38d9b562d]

Actions #6

Updated by David Zafman over 6 years ago

The logging shows that this object got ENOENT from every shard. Error handling is needed in the CEPH_OSD_OP_COPY_GET code path. The function C_CopyFrom_AsyncReadCb::finish() needs to check the r (error value) and probably we need an added field in object_copy_data_t so that fill_in_copy_get() can return an error.

Actions #7

Updated by David Zafman over 6 years ago

There isn't anything beyond the crash for FileStore.cc: 5385: FAILED assert(0 "ERROR: source must exist")

It would be helpful if it could be reproduced with debug_osd=20 and debug_filestore=20

Actions #8

Updated by David Zafman over 6 years ago

  • Assignee set to David Zafman
Actions #9

Updated by David Zafman over 6 years ago

One interesting thing to note is that both crashes involve missing things: collections (directories) or missing objects (files). It is also possible that an ignored failure to perform a collection_move_rename() could result in ENOENT for the object files.

Actions #10

Updated by Ashley Merrick over 6 years ago

David Zafman wrote:

One interesting thing to note is that both crashes involve missing things: collections (directories) or missing objects (files). It is also possible that an ignored failure to perform a collection_move_rename() could result in ENOENT for the object files.

Have captured a crash on Filestore with additional logging, crash happened right at the end of the log file.

http://amerrick.co.uk/ceph/ceph_split.log10

Actions #11

Updated by Ashley Merrick over 6 years ago

David Zafman wrote:

One interesting thing to note is that both crashes involve missing things: collections (directories) or missing objects (files). It is also possible that an ignored failure to perform a collection_move_rename() could result in ENOENT for the object files.


   -28> 2017-07-21 03:51:29.156616 7f2a492fc700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 104599'63628, trimmed: , clear_divergent_priors: 0
   -27> 2017-07-21 03:51:29.156645 7f2a492fc700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_transactions(2137): existing 0x55dcec846000 osr(6.14s8 0x55dcea218750)
   -26> 2017-07-21 03:51:29.156766 7f2a492fc700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_transactions(2191): (writeahead) 675064119 [Transaction(0x55dd32c1aa00),Transaction(0x55dd32c1ab78)]
   -25> 2017-07-21 03:51:29.156778 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3148fb80 finish
   -24> 2017-07-21 03:51:29.156782 7f2a492fc700 20 osd.8 op_wq(0) _process empty q, waiting
   -23> 2017-07-21 03:51:29.156991 7f2a59b10700  5 filestore(/var/lib/ceph/osd/ceph-8) _journaled_ahead(2285): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750) [Transaction(0x55dd32c1aa00),Transaction(0x55dd32c1ab78)]
   -22> 2017-07-21 03:51:29.156999 7f2a59b10700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_op(2014): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750) 421670 bytes   (queue has 1 ops and 421670 bytes)
   -21> 2017-07-21 03:51:29.157004 7f2a59b10700 10 filestore(/var/lib/ceph/osd/ceph-8)  queueing ondisk 0x55dd3196ae00
   -20> 2017-07-21 03:51:29.157009 7f2a58b0e700  5 filestore(/var/lib/ceph/osd/ceph-8) _do_op(2057): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750)/0x55dcea218750 start
   -19> 2017-07-21 03:51:29.157015 7f2a58b0e700 10 filestore(/var/lib/ceph/osd/ceph-8) _do_transaction(2598): on 0x55dd32c1aa00
   -18> 2017-07-21 03:51:29.157016 7f2a5830d700 20 osd.8 104599 share_map_peer 0x55dd335bc000 already has epoch 104599
   -17> 2017-07-21 03:51:29.157019 7f2a5830d700  1 -- 172.16.3.1:6818/18216 --> 172.16.3.3:6802/18365 -- MOSDECSubOpWriteReply(6.14s0 104599/104598 ECSubWriteReply(tid=1828, last_complete=104599'63628, committed=1, applied=0)) v2 -- 0x55dd31f12d80 con 0
   -16> 2017-07-21 03:51:29.157019 7f2a58b0e700 15 filestore(/var/lib/ceph/osd/ceph-8) _collection_move_rename(5347): 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#f88c from 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#
   -15> 2017-07-21 03:51:29.157064 7f2a58b0e700 10 filestore(/var/lib/ceph/osd/ceph-8) error opening file /var/lib/ceph/osd/ceph-8/current/6.14s8_head/DIR_4/DIR_1/DIR_8/rbd\udata.552ec5238e1f29.000000000008b013__head_8F596814__6_ffffffffffffffff_8 with flags=2: (2) No such file or directory
   -14> 2017-07-21 03:51:29.157225 7f2a6f0fd700  5 -- 172.16.3.1:6818/18216 >> 172.16.3.4:6812/26486 conn(0x55dd31d58800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2103 cs=1 l=0). rx osd.21 seq 2281 0x55dd35ad1c80 MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2
   -13> 2017-07-21 03:51:29.157236 7f2a6f0fd700  1 -- 172.16.3.1:6818/18216 <== osd.21 172.16.3.4:6812/26486 2281 ==== MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2 ==== 1020+0+0 (1187173091 0 0) 0x55dd35ad1c80 con 0x55dd31d58800
   -12> 2017-07-21 03:51:29.157243 7f2a6f0fd700 15 osd.8 104599 enqueue_op 0x55dd3516c9a0 prio 127 cost 0 latency 0.000038 epoch 104599 MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2
   -11> 2017-07-21 03:51:29.157247 7f2a6f0fd700 20 osd.8 op_wq(0) _enqueue 6.5as0 PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599)
   -10> 2017-07-21 03:51:29.157261 7f2a492fc700 20 osd.8 op_wq(0) _process 6.5as0 item PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599) queued
    -9> 2017-07-21 03:51:29.157269 7f2a492fc700 20 osd.8 op_wq(0) _process 6.5as0 item PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599) pg 0x55dd055d4000
    -8> 2017-07-21 03:51:29.157273 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3516c9a0 prio 127 cost 0 latency 0.000067 MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2 pg pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded]
    -7> 2017-07-21 03:51:29.157289 7f2a492fc700 20 osd.8 104599 share_map osd.21 172.16.3.4:6812/26486 104599
    -6> 2017-07-21 03:51:29.157293 7f2a492fc700 20 osd.8 104599 should_share_map osd.21 172.16.3.4:6812/26486 104599
    -5> 2017-07-21 03:51:29.157299 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_message: MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2
    -4> 2017-07-21 03:51:29.157311 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply: reply ECSubReadReply(tid=41523, attrs_read=0)
    -3> 2017-07-21 03:51:29.157325 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply readop not complete: ReadOp(tid=41523, to_read={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=read_request_t(to_read=[1048320,8320,0], need=7(1),8(0),21(6),22(7),42(9),46(8),68(3),70(2),80(4),87(5), want_attrs=0)}, complete={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=read_result_t(r=0, errors={}, noattrs, returned=(1048320, 8320, [8(0),832, 21(6),832]))}, priority=127, obj_to_source={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=7(1),8(0),21(6),22(7),42(9),46(8),68(3),70(2),80(4),87(5)}, source_to_obj={7(1)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,8(0)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,21(6)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,22(7)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,42(9)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,46(8)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,68(3)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,70(2)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,80(4)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,87(5)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head}, in_progress=7(1),22(7),42(9),46(8),68(3),70(2),80(4),87(5))
    -2> 2017-07-21 03:51:29.157376 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3516c9a0 finish
    -1> 2017-07-21 03:51:29.157386 7f2a492fc700 20 osd.8 op_wq(0) _process empty q, waiting
     0> 2017-07-21 03:51:29.159508 7f2a58b0e700 -1 /build/ceph-12.1.1/src/os/filestore/FileStore.cc: In function 'int FileStore::_collection_move_rename(const coll_t&, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&, bool)' thread 7f2a58b0e700 time 2017-07-21 03:51:29.157073
/build/ceph-12.1.1/src/os/filestore/FileStore.cc: 5385: FAILED assert(0 == "ERROR: source must exist")
Actions #12

Updated by David Zafman over 6 years ago

Ashley Merrick wrote:

David Zafman wrote:

One interesting thing to note is that both crashes involve missing things: collections (directories) or missing objects (files). It is also possible that an ignored failure to perform a collection_move_rename() could result in ENOENT for the object files.

[...]

Please grep for "rbd_data.552ec5238e1f29.000000000008b013:head" and post 100 lines of log preceding the assert.

Actions #13

Updated by Ashley Merrick over 6 years ago

David Zafman wrote:

Ashley Merrick wrote:

David Zafman wrote:

One interesting thing to note is that both crashes involve missing things: collections (directories) or missing objects (files). It is also possible that an ignored failure to perform a collection_move_rename() could result in ENOENT for the object files.

[...]

Please grep for "rbd_data.552ec5238e1f29.000000000008b013:head" and post 100 lines of log preceding the assert.

Hopefully this is what you was after.

 cat ceph_split.log10 | grep rbd_data.552ec5238e1f29.000000000008b013:head
2017-07-21 03:51:29.156572 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63627 (97669'60582,104599'63627] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 crt=104599'63627 active] append_log log((97669'60582,104599'63627], crt=104599'63627) [104599'63628 (68070'24554) modify   6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head by osd.79.104298:167244 2017-07-20 23:51:22.589778 0]
2017-07-21 03:51:29.156596 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63628 (97669'60582,104599'63628] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 lua=104599'63627 crt=104599'63627 lcod 104599'63627 active] add_log_entry 104599'63628 (68070'24554) modify   6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head by osd.79.104298:167244 2017-07-20 23:51:22.589778 0
2017-07-21 03:51:29.157019 7f2a58b0e700 15 filestore(/var/lib/ceph/osd/ceph-8) _collection_move_rename(5347): 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#f88c from 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#
   -30> 2017-07-21 03:51:29.156572 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63627 (97669'60582,104599'63627] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 crt=104599'63627 active] append_log log((97669'60582,104599'63627], crt=104599'63627) [104599'63628 (68070'24554) modify   6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head by osd.79.104298:167244 2017-07-20 23:51:22.589778 0]
   -29> 2017-07-21 03:51:29.156596 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63628 (97669'60582,104599'63628] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 lua=104599'63627 crt=104599'63627 lcod 104599'63627 active] add_log_entry 104599'63628 (68070'24554) modify   6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head by osd.79.104298:167244 2017-07-20 23:51:22.589778 0
   -16> 2017-07-21 03:51:29.157019 7f2a58b0e700 15 filestore(/var/lib/ceph/osd/ceph-8) _collection_move_rename(5347): 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#f88c from 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#
cat ceph_split.log10 | grep -B 100  "2017-07-21 03:51:29.159508 7f2a58b0e700 -1 /build/ceph-12.1.1/src/os/filestore/FileStore.cc: In function 'int FileStore::_collection_move_rename(const coll_t&, " 
2017-07-21 03:51:29.134271 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) read(3264): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 128128~13312/13312
2017-07-21 03:51:29.134276 7f2a47af9700 20 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_sub_read read request=13312 r=13312 len=13312
2017-07-21 03:51:29.134287 7f2a47af9700 20 osd.8 104599 share_map_peer 0x55dd31e42000 already has epoch 104599
2017-07-21 03:51:29.134304 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dd32b3ef20 finish
2017-07-21 03:51:29.134311 7f2a47af9700 20 osd.8 op_wq(3) _process empty q, waiting
2017-07-21 03:51:29.151105 7f2a6f0fd700 15 osd.8 104599 enqueue_op 0x55dd383403c0 prio 127 cost 0 latency 0.000052 epoch 104599 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396335, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=141024,13728,0}, attrs_to_read=)) v3
2017-07-21 03:51:29.151112 7f2a6f0fd700 20 osd.8 op_wq(3) _enqueue 6.35fs9 PGQueueable(0x55dd383403c0 prio 127 cost 0 e104599)
2017-07-21 03:51:29.151125 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dd383403c0 prio 127 cost 0 e104599) queued
2017-07-21 03:51:29.151134 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dd383403c0 prio 127 cost 0 e104599) pg 0x55dd17b7e000
2017-07-21 03:51:29.151136 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dd383403c0 prio 127 cost 0 latency 0.000083 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396335, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=141024,13728,0}, attrs_to_read=)) v3 pg pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active]
2017-07-21 03:51:29.151151 7f2a47af9700 20 osd.8 104599 share_map osd.98 172.16.3.15:6812/3711 104599
2017-07-21 03:51:29.151155 7f2a47af9700 20 osd.8 104599 should_share_map osd.98 172.16.3.15:6812/3711 104599
2017-07-21 03:51:29.151160 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_message: MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396335, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=141024,13728,0}, attrs_to_read=)) v3
2017-07-21 03:51:29.151172 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: Getting attr on 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
2017-07-21 03:51:29.151181 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: not in cache 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
2017-07-21 03:51:29.151222 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) stat(3177): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# = 0 (size 419744)
2017-07-21 03:51:29.151228 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: found on disk, size 419744
2017-07-21 03:51:29.151236 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) getattr(4392): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key'
2017-07-21 03:51:29.151247 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) getattr(4426): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key' = 70
2017-07-21 03:51:29.151255 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) read(3209): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 141024~13728
2017-07-21 03:51:29.151265 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) read(3264): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 141024~13728/13728
2017-07-21 03:51:29.151270 7f2a47af9700 20 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_sub_read read request=13728 r=13728 len=13728
2017-07-21 03:51:29.151281 7f2a47af9700 20 osd.8 104599 share_map_peer 0x55dd31e42000 already has epoch 104599
2017-07-21 03:51:29.151299 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dd383403c0 finish
2017-07-21 03:51:29.151308 7f2a47af9700 20 osd.8 op_wq(3) _process empty q, waiting
2017-07-21 03:51:29.151487 7f2a6f0fd700 15 osd.8 104599 enqueue_op 0x55dcdf8cc2c0 prio 127 cost 0 latency 0.000037 epoch 104599 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396338, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=154336,13312,0}, attrs_to_read=)) v3
2017-07-21 03:51:29.151493 7f2a6f0fd700 20 osd.8 op_wq(3) _enqueue 6.35fs9 PGQueueable(0x55dcdf8cc2c0 prio 127 cost 0 e104599)
2017-07-21 03:51:29.151501 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dcdf8cc2c0 prio 127 cost 0 e104599) queued
2017-07-21 03:51:29.151505 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dcdf8cc2c0 prio 127 cost 0 e104599) pg 0x55dd17b7e000
2017-07-21 03:51:29.151507 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dcdf8cc2c0 prio 127 cost 0 latency 0.000058 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396338, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=154336,13312,0}, attrs_to_read=)) v3 pg pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active]
2017-07-21 03:51:29.151518 7f2a47af9700 20 osd.8 104599 share_map osd.98 172.16.3.15:6812/3711 104599
2017-07-21 03:51:29.151521 7f2a47af9700 20 osd.8 104599 should_share_map osd.98 172.16.3.15:6812/3711 104599
2017-07-21 03:51:29.151524 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_message: MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396338, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=154336,13312,0}, attrs_to_read=)) v3
2017-07-21 03:51:29.151534 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: Getting attr on 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
2017-07-21 03:51:29.151542 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: not in cache 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
2017-07-21 03:51:29.151579 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) stat(3177): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# = 0 (size 419744)
2017-07-21 03:51:29.151584 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: found on disk, size 419744
2017-07-21 03:51:29.151592 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) getattr(4392): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key'
2017-07-21 03:51:29.151599 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) getattr(4426): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key' = 70
2017-07-21 03:51:29.151607 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) read(3209): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 154336~13312
2017-07-21 03:51:29.151617 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) read(3264): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 154336~13312/13312
2017-07-21 03:51:29.151621 7f2a47af9700 20 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_sub_read read request=13312 r=13312 len=13312
2017-07-21 03:51:29.151633 7f2a47af9700 20 osd.8 104599 share_map_peer 0x55dd31e42000 already has epoch 104599
2017-07-21 03:51:29.151647 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dcdf8cc2c0 finish
2017-07-21 03:51:29.151654 7f2a47af9700 20 osd.8 op_wq(3) _process empty q, waiting
2017-07-21 03:51:29.152477 7f2a6f0fd700 15 osd.8 104599 enqueue_op 0x55dd375d66e0 prio 127 cost 0 latency 0.000044 epoch 104599 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396339, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=167232,13728,0}, attrs_to_read=)) v3
2017-07-21 03:51:29.152483 7f2a6f0fd700 20 osd.8 op_wq(3) _enqueue 6.35fs9 PGQueueable(0x55dd375d66e0 prio 127 cost 0 e104599)
2017-07-21 03:51:29.152493 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dd375d66e0 prio 127 cost 0 e104599) queued
2017-07-21 03:51:29.152500 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dd375d66e0 prio 127 cost 0 e104599) pg 0x55dd17b7e000
2017-07-21 03:51:29.152502 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dd375d66e0 prio 127 cost 0 latency 0.000069 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396339, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=167232,13728,0}, attrs_to_read=)) v3 pg pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active]
2017-07-21 03:51:29.152513 7f2a47af9700 20 osd.8 104599 share_map osd.98 172.16.3.15:6812/3711 104599
2017-07-21 03:51:29.152517 7f2a47af9700 20 osd.8 104599 should_share_map osd.98 172.16.3.15:6812/3711 104599
2017-07-21 03:51:29.152520 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_message: MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396339, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=167232,13728,0}, attrs_to_read=)) v3
2017-07-21 03:51:29.152531 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: Getting attr on 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
2017-07-21 03:51:29.152539 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: not in cache 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
2017-07-21 03:51:29.152576 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) stat(3177): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# = 0 (size 419744)
2017-07-21 03:51:29.152582 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: found on disk, size 419744
2017-07-21 03:51:29.152589 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) getattr(4392): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key'
2017-07-21 03:51:29.152598 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) getattr(4426): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key' = 70
2017-07-21 03:51:29.152605 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) read(3209): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 167232~13728
2017-07-21 03:51:29.152616 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) read(3264): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 167232~13728/13728
2017-07-21 03:51:29.152621 7f2a47af9700 20 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_sub_read read request=13728 r=13728 len=13728
2017-07-21 03:51:29.152633 7f2a47af9700 20 osd.8 104599 share_map_peer 0x55dd31e42000 already has epoch 104599
2017-07-21 03:51:29.152649 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dd375d66e0 finish
2017-07-21 03:51:29.152657 7f2a47af9700 20 osd.8 op_wq(3) _process empty q, waiting
2017-07-21 03:51:29.156488 7f2a6e0fb700 15 osd.8 104599 enqueue_op 0x55dd3148fb80 prio 127 cost 0 latency 0.000704 epoch 104599 MOSDECSubOpWrite(6.14s8 104599/104598 ECSubWrite(tid=1828, reqid=osd.79.104298:167244, at_version=104599'63628, trim_to=0'0, roll_forward_to=104599'63627)) v2
2017-07-21 03:51:29.156494 7f2a6e0fb700 20 osd.8 op_wq(0) _enqueue 6.14s8 PGQueueable(0x55dd3148fb80 prio 127 cost 0 e104599)
2017-07-21 03:51:29.156511 7f2a492fc700 20 osd.8 op_wq(0) _process 6.14s8 item PGQueueable(0x55dd3148fb80 prio 127 cost 0 e104599) queued
2017-07-21 03:51:29.156520 7f2a492fc700 20 osd.8 op_wq(0) _process 6.14s8 item PGQueueable(0x55dd3148fb80 prio 127 cost 0 e104599) pg 0x55dcec296000
2017-07-21 03:51:29.156525 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3148fb80 prio 127 cost 0 latency 0.000741 MOSDECSubOpWrite(6.14s8 104599/104598 ECSubWrite(tid=1828, reqid=osd.79.104298:167244, at_version=104599'63628, trim_to=0'0, roll_forward_to=104599'63627)) v2 pg pg[6.14s8( v 104599'63627 (97669'60582,104599'63627] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 crt=104599'63627 active]
2017-07-21 03:51:29.156542 7f2a492fc700 20 osd.8 104599 share_map osd.11 172.16.3.3:6802/18365 104599
2017-07-21 03:51:29.156547 7f2a492fc700 20 osd.8 104599 should_share_map osd.11 172.16.3.3:6802/18365 104599
2017-07-21 03:51:29.156554 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63627 (97669'60582,104599'63627] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 crt=104599'63627 active] handle_message: MOSDECSubOpWrite(6.14s8 104599/104598 ECSubWrite(tid=1828, reqid=osd.79.104298:167244, at_version=104599'63628, trim_to=0'0, roll_forward_to=104599'63627)) v2
2017-07-21 03:51:29.156572 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63627 (97669'60582,104599'63627] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 crt=104599'63627 active] append_log log((97669'60582,104599'63627], crt=104599'63627) [104599'63628 (68070'24554) modify   6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head by osd.79.104298:167244 2017-07-20 23:51:22.589778 0]
2017-07-21 03:51:29.156596 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63628 (97669'60582,104599'63628] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 lua=104599'63627 crt=104599'63627 lcod 104599'63627 active] add_log_entry 104599'63628 (68070'24554) modify   6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head by osd.79.104298:167244 2017-07-20 23:51:22.589778 0
2017-07-21 03:51:29.156616 7f2a492fc700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 104599'63628, trimmed: , clear_divergent_priors: 0
2017-07-21 03:51:29.156645 7f2a492fc700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_transactions(2137): existing 0x55dcec846000 osr(6.14s8 0x55dcea218750)
2017-07-21 03:51:29.156766 7f2a492fc700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_transactions(2191): (writeahead) 675064119 [Transaction(0x55dd32c1aa00),Transaction(0x55dd32c1ab78)]
2017-07-21 03:51:29.156778 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3148fb80 finish
2017-07-21 03:51:29.156782 7f2a492fc700 20 osd.8 op_wq(0) _process empty q, waiting
2017-07-21 03:51:29.156991 7f2a59b10700  5 filestore(/var/lib/ceph/osd/ceph-8) _journaled_ahead(2285): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750) [Transaction(0x55dd32c1aa00),Transaction(0x55dd32c1ab78)]
2017-07-21 03:51:29.156999 7f2a59b10700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_op(2014): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750) 421670 bytes   (queue has 1 ops and 421670 bytes)
2017-07-21 03:51:29.157004 7f2a59b10700 10 filestore(/var/lib/ceph/osd/ceph-8)  queueing ondisk 0x55dd3196ae00
2017-07-21 03:51:29.157009 7f2a58b0e700  5 filestore(/var/lib/ceph/osd/ceph-8) _do_op(2057): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750)/0x55dcea218750 start
2017-07-21 03:51:29.157015 7f2a58b0e700 10 filestore(/var/lib/ceph/osd/ceph-8) _do_transaction(2598): on 0x55dd32c1aa00
2017-07-21 03:51:29.157016 7f2a5830d700 20 osd.8 104599 share_map_peer 0x55dd335bc000 already has epoch 104599
2017-07-21 03:51:29.157019 7f2a58b0e700 15 filestore(/var/lib/ceph/osd/ceph-8) _collection_move_rename(5347): 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#f88c from 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#
2017-07-21 03:51:29.157064 7f2a58b0e700 10 filestore(/var/lib/ceph/osd/ceph-8) error opening file /var/lib/ceph/osd/ceph-8/current/6.14s8_head/DIR_4/DIR_1/DIR_8/rbd\udata.552ec5238e1f29.000000000008b013__head_8F596814__6_ffffffffffffffff_8 with flags=2: (2) No such file or directory
2017-07-21 03:51:29.157243 7f2a6f0fd700 15 osd.8 104599 enqueue_op 0x55dd3516c9a0 prio 127 cost 0 latency 0.000038 epoch 104599 MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2
2017-07-21 03:51:29.157247 7f2a6f0fd700 20 osd.8 op_wq(0) _enqueue 6.5as0 PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599)
2017-07-21 03:51:29.157261 7f2a492fc700 20 osd.8 op_wq(0) _process 6.5as0 item PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599) queued
2017-07-21 03:51:29.157269 7f2a492fc700 20 osd.8 op_wq(0) _process 6.5as0 item PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599) pg 0x55dd055d4000
2017-07-21 03:51:29.157273 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3516c9a0 prio 127 cost 0 latency 0.000067 MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2 pg pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded]
2017-07-21 03:51:29.157289 7f2a492fc700 20 osd.8 104599 share_map osd.21 172.16.3.4:6812/26486 104599
2017-07-21 03:51:29.157293 7f2a492fc700 20 osd.8 104599 should_share_map osd.21 172.16.3.4:6812/26486 104599
2017-07-21 03:51:29.157299 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_message: MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2
2017-07-21 03:51:29.157311 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply: reply ECSubReadReply(tid=41523, attrs_read=0)
2017-07-21 03:51:29.157325 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply readop not complete: ReadOp(tid=41523, to_read={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=read_request_t(to_read=[1048320,8320,0], need=7(1),8(0),21(6),22(7),42(9),46(8),68(3),70(2),80(4),87(5), want_attrs=0)}, complete={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=read_result_t(r=0, errors={}, noattrs, returned=(1048320, 8320, [8(0),832, 21(6),832]))}, priority=127, obj_to_source={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=7(1),8(0),21(6),22(7),42(9),46(8),68(3),70(2),80(4),87(5)}, source_to_obj={7(1)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,8(0)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,21(6)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,22(7)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,42(9)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,46(8)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,68(3)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,70(2)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,80(4)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,87(5)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head}, in_progress=7(1),22(7),42(9),46(8),68(3),70(2),80(4),87(5))
2017-07-21 03:51:29.157376 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3516c9a0 finish
2017-07-21 03:51:29.157386 7f2a492fc700 20 osd.8 op_wq(0) _process empty q, waiting
2017-07-21 03:51:29.159508 7f2a58b0e700 -1 /build/ceph-12.1.1/src/os/filestore/FileStore.cc: In function 'int FileStore::_collection_move_rename(const coll_t&, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&, bool)' thread 7f2a58b0e700 time 2017-07-21 03:51:29.157073
--
  -100> 2017-07-21 03:51:29.151160 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_message: MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396335, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=141024,13728,0}, attrs_to_read=)) v3
   -99> 2017-07-21 03:51:29.151172 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: Getting attr on 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
   -98> 2017-07-21 03:51:29.151181 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: not in cache 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
   -97> 2017-07-21 03:51:29.151222 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) stat(3177): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# = 0 (size 419744)
   -96> 2017-07-21 03:51:29.151228 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: found on disk, size 419744
   -95> 2017-07-21 03:51:29.151236 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) getattr(4392): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key'
   -94> 2017-07-21 03:51:29.151247 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) getattr(4426): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key' = 70
   -93> 2017-07-21 03:51:29.151255 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) read(3209): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 141024~13728
   -92> 2017-07-21 03:51:29.151265 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) read(3264): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 141024~13728/13728
   -91> 2017-07-21 03:51:29.151270 7f2a47af9700 20 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_sub_read read request=13728 r=13728 len=13728
   -90> 2017-07-21 03:51:29.151281 7f2a47af9700 20 osd.8 104599 share_map_peer 0x55dd31e42000 already has epoch 104599
   -89> 2017-07-21 03:51:29.151283 7f2a47af9700  1 -- 172.16.3.1:6818/18216 --> 172.16.3.15:6812/3711 -- MOSDECSubOpReadReply(6.35fs0 104599/104544 ECSubReadReply(tid=1396335, attrs_read=0)) v2 -- 0x55dd35564680 con 0
   -88> 2017-07-21 03:51:29.151299 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dd383403c0 finish
   -87> 2017-07-21 03:51:29.151308 7f2a47af9700 20 osd.8 op_wq(3) _process empty q, waiting
   -86> 2017-07-21 03:51:29.151465 7f2a6f0fd700  5 -- 172.16.3.1:6818/18216 >> 172.16.3.15:6812/3711 conn(0x55dd31e42000 :6818 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1808 cs=1 l=0). rx osd.98 seq 11737 0x55dd35194000 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396338, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=154336,13312,0}, attrs_to_read=)) v3
   -85> 2017-07-21 03:51:29.151478 7f2a6f0fd700  1 -- 172.16.3.1:6818/18216 <== osd.98 172.16.3.15:6812/3711 11737 ==== MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396338, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=154336,13312,0}, attrs_to_read=)) v3 ==== 192+0+0 (960044583 0 0) 0x55dd35194000 con 0x55dd31e42000
   -84> 2017-07-21 03:51:29.151487 7f2a6f0fd700 15 osd.8 104599 enqueue_op 0x55dcdf8cc2c0 prio 127 cost 0 latency 0.000037 epoch 104599 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396338, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=154336,13312,0}, attrs_to_read=)) v3
   -83> 2017-07-21 03:51:29.151493 7f2a6f0fd700 20 osd.8 op_wq(3) _enqueue 6.35fs9 PGQueueable(0x55dcdf8cc2c0 prio 127 cost 0 e104599)
   -82> 2017-07-21 03:51:29.151501 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dcdf8cc2c0 prio 127 cost 0 e104599) queued
   -81> 2017-07-21 03:51:29.151505 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dcdf8cc2c0 prio 127 cost 0 e104599) pg 0x55dd17b7e000
   -80> 2017-07-21 03:51:29.151507 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dcdf8cc2c0 prio 127 cost 0 latency 0.000058 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396338, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=154336,13312,0}, attrs_to_read=)) v3 pg pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active]
   -79> 2017-07-21 03:51:29.151518 7f2a47af9700 20 osd.8 104599 share_map osd.98 172.16.3.15:6812/3711 104599
   -78> 2017-07-21 03:51:29.151521 7f2a47af9700 20 osd.8 104599 should_share_map osd.98 172.16.3.15:6812/3711 104599
   -77> 2017-07-21 03:51:29.151524 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_message: MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396338, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=154336,13312,0}, attrs_to_read=)) v3
   -76> 2017-07-21 03:51:29.151534 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: Getting attr on 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
   -75> 2017-07-21 03:51:29.151542 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: not in cache 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
   -74> 2017-07-21 03:51:29.151579 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) stat(3177): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# = 0 (size 419744)
   -73> 2017-07-21 03:51:29.151584 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: found on disk, size 419744
   -72> 2017-07-21 03:51:29.151592 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) getattr(4392): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key'
   -71> 2017-07-21 03:51:29.151599 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) getattr(4426): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key' = 70
   -70> 2017-07-21 03:51:29.151607 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) read(3209): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 154336~13312
   -69> 2017-07-21 03:51:29.151617 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) read(3264): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 154336~13312/13312
   -68> 2017-07-21 03:51:29.151621 7f2a47af9700 20 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_sub_read read request=13312 r=13312 len=13312
   -67> 2017-07-21 03:51:29.151633 7f2a47af9700 20 osd.8 104599 share_map_peer 0x55dd31e42000 already has epoch 104599
   -66> 2017-07-21 03:51:29.151634 7f2a47af9700  1 -- 172.16.3.1:6818/18216 --> 172.16.3.15:6812/3711 -- MOSDECSubOpReadReply(6.35fs0 104599/104544 ECSubReadReply(tid=1396338, attrs_read=0)) v2 -- 0x55dd33f058c0 con 0
   -65> 2017-07-21 03:51:29.151647 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dcdf8cc2c0 finish
   -64> 2017-07-21 03:51:29.151654 7f2a47af9700 20 osd.8 op_wq(3) _process empty q, waiting
   -63> 2017-07-21 03:51:29.152448 7f2a6f0fd700  5 -- 172.16.3.1:6818/18216 >> 172.16.3.15:6812/3711 conn(0x55dd31e42000 :6818 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1808 cs=1 l=0). rx osd.98 seq 11738 0x55dd37d8af80 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396339, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=167232,13728,0}, attrs_to_read=)) v3
   -62> 2017-07-21 03:51:29.152468 7f2a6f0fd700  1 -- 172.16.3.1:6818/18216 <== osd.98 172.16.3.15:6812/3711 11738 ==== MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396339, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=167232,13728,0}, attrs_to_read=)) v3 ==== 192+0+0 (2756641378 0 0) 0x55dd37d8af80 con 0x55dd31e42000
   -61> 2017-07-21 03:51:29.152477 7f2a6f0fd700 15 osd.8 104599 enqueue_op 0x55dd375d66e0 prio 127 cost 0 latency 0.000044 epoch 104599 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396339, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=167232,13728,0}, attrs_to_read=)) v3
   -60> 2017-07-21 03:51:29.152483 7f2a6f0fd700 20 osd.8 op_wq(3) _enqueue 6.35fs9 PGQueueable(0x55dd375d66e0 prio 127 cost 0 e104599)
   -59> 2017-07-21 03:51:29.152493 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dd375d66e0 prio 127 cost 0 e104599) queued
   -58> 2017-07-21 03:51:29.152500 7f2a47af9700 20 osd.8 op_wq(3) _process 6.35fs9 item PGQueueable(0x55dd375d66e0 prio 127 cost 0 e104599) pg 0x55dd17b7e000
   -57> 2017-07-21 03:51:29.152502 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dd375d66e0 prio 127 cost 0 latency 0.000069 MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396339, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=167232,13728,0}, attrs_to_read=)) v3 pg pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active]
   -56> 2017-07-21 03:51:29.152513 7f2a47af9700 20 osd.8 104599 share_map osd.98 172.16.3.15:6812/3711 104599
   -55> 2017-07-21 03:51:29.152517 7f2a47af9700 20 osd.8 104599 should_share_map osd.98 172.16.3.15:6812/3711 104599
   -54> 2017-07-21 03:51:29.152520 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_message: MOSDECSubOpRead(6.35fs9 104599/104544 ECSubRead(tid=1396339, to_read={6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head=167232,13728,0}, attrs_to_read=)) v3
   -53> 2017-07-21 03:51:29.152531 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: Getting attr on 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
   -52> 2017-07-21 03:51:29.152539 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: not in cache 6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head
   -51> 2017-07-21 03:51:29.152576 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) stat(3177): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# = 0 (size 419744)
   -50> 2017-07-21 03:51:29.152582 7f2a47af9700 10 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] get_hash_info: found on disk, size 419744
   -49> 2017-07-21 03:51:29.152589 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) getattr(4392): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key'
   -48> 2017-07-21 03:51:29.152598 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) getattr(4426): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 'hinfo_key' = 70
   -47> 2017-07-21 03:51:29.152605 7f2a47af9700 15 filestore(/var/lib/ceph/osd/ceph-8) read(3209): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 167232~13728
   -46> 2017-07-21 03:51:29.152616 7f2a47af9700 10 filestore(/var/lib/ceph/osd/ceph-8) read(3264): 6.35fs9_head/9#6:fad67619:::rbd_data.a8a4cb2ae8944a.000000000001ca91:head# 167232~13728/13728
   -45> 2017-07-21 03:51:29.152621 7f2a47af9700 20 osd.8 pg_epoch: 104599 pg[6.35fs9( v 104550'56305 (93434'53212,104550'56305] local-lis/les=104544/104545 n=24804 ec=31534/31534 lis/c 104544/104544 les/c/f 104545/104549/67943 104543/104544/100103) [98,100,64,65,23,25,35,37,7,8,52,57,44] r=9 lpr=104544 luod=0'0 crt=104550'56305 active] handle_sub_read read request=13728 r=13728 len=13728
   -44> 2017-07-21 03:51:29.152633 7f2a47af9700 20 osd.8 104599 share_map_peer 0x55dd31e42000 already has epoch 104599
   -43> 2017-07-21 03:51:29.152635 7f2a47af9700  1 -- 172.16.3.1:6818/18216 --> 172.16.3.15:6812/3711 -- MOSDECSubOpReadReply(6.35fs0 104599/104544 ECSubReadReply(tid=1396339, attrs_read=0)) v2 -- 0x55dd35ad1c80 con 0
   -42> 2017-07-21 03:51:29.152649 7f2a47af9700 10 osd.8 104599 dequeue_op 0x55dd375d66e0 finish
   -41> 2017-07-21 03:51:29.152657 7f2a47af9700 20 osd.8 op_wq(3) _process empty q, waiting
   -40> 2017-07-21 03:51:29.156461 7f2a6e0fb700  5 -- 172.16.3.1:6818/18216 >> 172.16.3.3:6802/18365 conn(0x55dd335bc000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=37 cs=1 l=0). rx osd.11 seq 390 0x55dd368fa000 MOSDECSubOpWrite(6.14s8 104599/104598 ECSubWrite(tid=1828, reqid=osd.79.104298:167244, at_version=104599'63628, trim_to=0'0, roll_forward_to=104599'63627)) v2
   -39> 2017-07-21 03:51:29.156476 7f2a6e0fb700  1 -- 172.16.3.1:6818/18216 <== osd.11 172.16.3.3:6802/18365 390 ==== MOSDECSubOpWrite(6.14s8 104599/104598 ECSubWrite(tid=1828, reqid=osd.79.104298:167244, at_version=104599'63628, trim_to=0'0, roll_forward_to=104599'63627)) v2 ==== 422038+0+0 (122823496 0 0) 0x55dd368fa000 con 0x55dd335bc000
   -38> 2017-07-21 03:51:29.156488 7f2a6e0fb700 15 osd.8 104599 enqueue_op 0x55dd3148fb80 prio 127 cost 0 latency 0.000704 epoch 104599 MOSDECSubOpWrite(6.14s8 104599/104598 ECSubWrite(tid=1828, reqid=osd.79.104298:167244, at_version=104599'63628, trim_to=0'0, roll_forward_to=104599'63627)) v2
   -37> 2017-07-21 03:51:29.156494 7f2a6e0fb700 20 osd.8 op_wq(0) _enqueue 6.14s8 PGQueueable(0x55dd3148fb80 prio 127 cost 0 e104599)
   -36> 2017-07-21 03:51:29.156511 7f2a492fc700 20 osd.8 op_wq(0) _process 6.14s8 item PGQueueable(0x55dd3148fb80 prio 127 cost 0 e104599) queued
   -35> 2017-07-21 03:51:29.156520 7f2a492fc700 20 osd.8 op_wq(0) _process 6.14s8 item PGQueueable(0x55dd3148fb80 prio 127 cost 0 e104599) pg 0x55dcec296000
   -34> 2017-07-21 03:51:29.156525 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3148fb80 prio 127 cost 0 latency 0.000741 MOSDECSubOpWrite(6.14s8 104599/104598 ECSubWrite(tid=1828, reqid=osd.79.104298:167244, at_version=104599'63628, trim_to=0'0, roll_forward_to=104599'63627)) v2 pg pg[6.14s8( v 104599'63627 (97669'60582,104599'63627] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 crt=104599'63627 active]
   -33> 2017-07-21 03:51:29.156542 7f2a492fc700 20 osd.8 104599 share_map osd.11 172.16.3.3:6802/18365 104599
   -32> 2017-07-21 03:51:29.156547 7f2a492fc700 20 osd.8 104599 should_share_map osd.11 172.16.3.3:6802/18365 104599
   -31> 2017-07-21 03:51:29.156554 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63627 (97669'60582,104599'63627] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 crt=104599'63627 active] handle_message: MOSDECSubOpWrite(6.14s8 104599/104598 ECSubWrite(tid=1828, reqid=osd.79.104298:167244, at_version=104599'63628, trim_to=0'0, roll_forward_to=104599'63627)) v2
   -30> 2017-07-21 03:51:29.156572 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63627 (97669'60582,104599'63627] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 crt=104599'63627 active] append_log log((97669'60582,104599'63627], crt=104599'63627) [104599'63628 (68070'24554) modify   6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head by osd.79.104298:167244 2017-07-20 23:51:22.589778 0]
   -29> 2017-07-21 03:51:29.156596 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.14s8( v 104599'63628 (97669'60582,104599'63628] local-lis/les=104598/104599 n=25124 ec=31534/31534 lis/c 104598/104598 les/c/f 104599/104599/67943 104596/104598/104598) [11,10,84,83,46,42,69,72,8,4,53,59,92] r=8 lpr=104598 luod=0'0 lua=104599'63627 crt=104599'63627 lcod 104599'63627 active] add_log_entry 104599'63628 (68070'24554) modify   6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head by osd.79.104298:167244 2017-07-20 23:51:22.589778 0
   -28> 2017-07-21 03:51:29.156616 7f2a492fc700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 104599'63628, trimmed: , clear_divergent_priors: 0
   -27> 2017-07-21 03:51:29.156645 7f2a492fc700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_transactions(2137): existing 0x55dcec846000 osr(6.14s8 0x55dcea218750)
   -26> 2017-07-21 03:51:29.156766 7f2a492fc700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_transactions(2191): (writeahead) 675064119 [Transaction(0x55dd32c1aa00),Transaction(0x55dd32c1ab78)]
   -25> 2017-07-21 03:51:29.156778 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3148fb80 finish
   -24> 2017-07-21 03:51:29.156782 7f2a492fc700 20 osd.8 op_wq(0) _process empty q, waiting
   -23> 2017-07-21 03:51:29.156991 7f2a59b10700  5 filestore(/var/lib/ceph/osd/ceph-8) _journaled_ahead(2285): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750) [Transaction(0x55dd32c1aa00),Transaction(0x55dd32c1ab78)]
   -22> 2017-07-21 03:51:29.156999 7f2a59b10700  5 filestore(/var/lib/ceph/osd/ceph-8) queue_op(2014): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750) 421670 bytes   (queue has 1 ops and 421670 bytes)
   -21> 2017-07-21 03:51:29.157004 7f2a59b10700 10 filestore(/var/lib/ceph/osd/ceph-8)  queueing ondisk 0x55dd3196ae00
   -20> 2017-07-21 03:51:29.157009 7f2a58b0e700  5 filestore(/var/lib/ceph/osd/ceph-8) _do_op(2057): 0x55dd36e86b40 seq 675064119 osr(6.14s8 0x55dcea218750)/0x55dcea218750 start
   -19> 2017-07-21 03:51:29.157015 7f2a58b0e700 10 filestore(/var/lib/ceph/osd/ceph-8) _do_transaction(2598): on 0x55dd32c1aa00
   -18> 2017-07-21 03:51:29.157016 7f2a5830d700 20 osd.8 104599 share_map_peer 0x55dd335bc000 already has epoch 104599
   -17> 2017-07-21 03:51:29.157019 7f2a5830d700  1 -- 172.16.3.1:6818/18216 --> 172.16.3.3:6802/18365 -- MOSDECSubOpWriteReply(6.14s0 104599/104598 ECSubWriteReply(tid=1828, last_complete=104599'63628, committed=1, applied=0)) v2 -- 0x55dd31f12d80 con 0
   -16> 2017-07-21 03:51:29.157019 7f2a58b0e700 15 filestore(/var/lib/ceph/osd/ceph-8) _collection_move_rename(5347): 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#f88c from 6.14s8_head/8#6:28169af1:::rbd_data.552ec5238e1f29.000000000008b013:head#
   -15> 2017-07-21 03:51:29.157064 7f2a58b0e700 10 filestore(/var/lib/ceph/osd/ceph-8) error opening file /var/lib/ceph/osd/ceph-8/current/6.14s8_head/DIR_4/DIR_1/DIR_8/rbd\udata.552ec5238e1f29.000000000008b013__head_8F596814__6_ffffffffffffffff_8 with flags=2: (2) No such file or directory
   -14> 2017-07-21 03:51:29.157225 7f2a6f0fd700  5 -- 172.16.3.1:6818/18216 >> 172.16.3.4:6812/26486 conn(0x55dd31d58800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2103 cs=1 l=0). rx osd.21 seq 2281 0x55dd35ad1c80 MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2
   -13> 2017-07-21 03:51:29.157236 7f2a6f0fd700  1 -- 172.16.3.1:6818/18216 <== osd.21 172.16.3.4:6812/26486 2281 ==== MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2 ==== 1020+0+0 (1187173091 0 0) 0x55dd35ad1c80 con 0x55dd31d58800
   -12> 2017-07-21 03:51:29.157243 7f2a6f0fd700 15 osd.8 104599 enqueue_op 0x55dd3516c9a0 prio 127 cost 0 latency 0.000038 epoch 104599 MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2
   -11> 2017-07-21 03:51:29.157247 7f2a6f0fd700 20 osd.8 op_wq(0) _enqueue 6.5as0 PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599)
   -10> 2017-07-21 03:51:29.157261 7f2a492fc700 20 osd.8 op_wq(0) _process 6.5as0 item PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599) queued
    -9> 2017-07-21 03:51:29.157269 7f2a492fc700 20 osd.8 op_wq(0) _process 6.5as0 item PGQueueable(0x55dd3516c9a0 prio 127 cost 0 e104599) pg 0x55dd055d4000
    -8> 2017-07-21 03:51:29.157273 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3516c9a0 prio 127 cost 0 latency 0.000067 MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2 pg pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded]
    -7> 2017-07-21 03:51:29.157289 7f2a492fc700 20 osd.8 104599 share_map osd.21 172.16.3.4:6812/26486 104599
    -6> 2017-07-21 03:51:29.157293 7f2a492fc700 20 osd.8 104599 should_share_map osd.21 172.16.3.4:6812/26486 104599
    -5> 2017-07-21 03:51:29.157299 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_message: MOSDECSubOpReadReply(6.5as0 104599/104545 ECSubReadReply(tid=41523, attrs_read=0)) v2
    -4> 2017-07-21 03:51:29.157311 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply: reply ECSubReadReply(tid=41523, attrs_read=0)
    -3> 2017-07-21 03:51:29.157325 7f2a492fc700 10 osd.8 pg_epoch: 104599 pg[6.5as0( v 104465'55641 (92880'51903,104465'55641] local-lis/les=104545/104546 n=24931 ec=31534/31534 lis/c 104545/101602 les/c/f 104546/101605/67943 104543/104545/104545) [8,7,70,68,80,87,21,22,46,42,2147483647,2147483647,99] r=0 lpr=104545 pi=[101602,104545)/1 crt=104465'55641 lcod 0'0 mlcod 0'0 active+undersized+degraded] handle_sub_read_reply readop not complete: ReadOp(tid=41523, to_read={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=read_request_t(to_read=[1048320,8320,0], need=7(1),8(0),21(6),22(7),42(9),46(8),68(3),70(2),80(4),87(5), want_attrs=0)}, complete={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=read_result_t(r=0, errors={}, noattrs, returned=(1048320, 8320, [8(0),832, 21(6),832]))}, priority=127, obj_to_source={6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head=7(1),8(0),21(6),22(7),42(9),46(8),68(3),70(2),80(4),87(5)}, source_to_obj={7(1)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,8(0)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,21(6)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,22(7)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,42(9)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,46(8)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,68(3)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,70(2)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,80(4)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head,87(5)=6:5a37ca0f:::rbd_data.dac1e1238e1f29.000000000000e300:head}, in_progress=7(1),22(7),42(9),46(8),68(3),70(2),80(4),87(5))
    -2> 2017-07-21 03:51:29.157376 7f2a492fc700 10 osd.8 104599 dequeue_op 0x55dd3516c9a0 finish
    -1> 2017-07-21 03:51:29.157386 7f2a492fc700 20 osd.8 op_wq(0) _process empty q, waiting
     0> 2017-07-21 03:51:29.159508 7f2a58b0e700 -1 /build/ceph-12.1.1/src/os/filestore/FileStore.cc: In function 'int FileStore::_collection_move_rename(const coll_t&, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&, bool)' thread 7f2a58b0e700 time 2017-07-21 03:51:29.157073
/build/ceph-12.1.1/src/os/filestore/FileStore.cc: 5385: FAILED assert(0 == "ERROR: source must exist")
Actions #14

Updated by David Zafman over 6 years ago

I am able to reproduce the assert(0 == "ERROR: source must exist") by directly removing a shard from any non-primary osd. Then writing that object using rados put command which does a write_full() rados operation.

Is there some reason that a bunch of filestore objects would be missing from a disk? Could there be a bunch of files in lost+found after an fsck?

Actions #15

Updated by Ashley Merrick over 6 years ago

David Zafman wrote:

I am able to reproduce the assert(0 == "ERROR: source must exist") by directly removing a shard from any non-primary osd. Then writing that object using rados put command which does a write_full() rados operation.

Is there some reason that a bunch of filestore objects would be missing from a disk? Could there be a bunch of files in lost+found after an fsck?

I will check this, however the issue started after I removed an old file store OSD and replaced it with a bluestore OSD and then the backfill process kicked off. It then started to cause the mixture of OSD's to crash. A while ago I did have an EC PG that was stuck inconsistent that I was able to fix. But maybe that still has some missing shards and due to the full read from a backfill causing the crash.

Actions #16

Updated by Ashley Merrick over 6 years ago

David Zafman wrote:

I am able to reproduce the assert(0 == "ERROR: source must exist") by directly removing a shard from any non-primary osd. Then writing that object using rados put command which does a write_full() rados operation.

Is there some reason that a bunch of filestore objects would be missing from a disk? Could there be a bunch of files in lost+found after an fsck?

So done an xfs_repair, no issues logged and nothing showing in lost and found. Done this on the OSD that the log was produced from.

Actions #17

Updated by Ashley Merrick over 6 years ago

David Zafman wrote:

I am able to reproduce the assert(0 == "ERROR: source must exist") by directly removing a shard from any non-primary osd. Then writing that object using rados put command which does a write_full() rados operation.

Is there some reason that a bunch of filestore objects would be missing from a disk? Could there be a bunch of files in lost+found after an fsck?

So dug into my notes, and seems the OSD's that keep dropping are all linked to the past PG that was inconsistent. Seems even though I managed to fix it, it's still finding the occasional missing file and is causing the OSD's attached to the PG to fail. So maybe something that needs to be catched and dealt with to not cause a total OSD failure?

Actions #18

Updated by Ashley Merrick over 6 years ago

Ashley Merrick wrote:

David Zafman wrote:

I am able to reproduce the assert(0 == "ERROR: source must exist") by directly removing a shard from any non-primary osd. Then writing that object using rados put command which does a write_full() rados operation.

Is there some reason that a bunch of filestore objects would be missing from a disk? Could there be a bunch of files in lost+found after an fsck?

So dug into my notes, and seems the OSD's that keep dropping are all linked to the past PG that was inconsistent. Seems even though I managed to fix it, it's still finding the occasional missing file and is causing the OSD's attached to the PG to fail. So maybe something that needs to be catched and dealt with to not cause a total OSD failure?

Tried a ceph pg force_create_pg however says the following "Error ENOTSUP: this command is obsolete"

Actions #19

Updated by David Zafman over 6 years ago

A fix for the case of a copy_from() (#3 in description) which will fail when an object in unreadable instead of crash the OSD.

https://github.com/ceph/ceph/pull/16567

Actions #20

Updated by David Zafman over 6 years ago

Ashley Merrick wrote:

Tried a ceph pg force_create_pg however says the following "Error ENOTSUP: this command is obsolete"

I see a new version of this command restored in current luminous branch. It is changed to "ceph osd force-create-pg <pgid>".

Experimenting on my cluster I created these steps to blow away a bad pg and replace it with an empty one.

WARNING THIS CAUSES DATA LOSS (backup any critical data that you can)

1. Install v12.1.2 once it is available or build and install packages using the "luminous" branch.
2. ceph osd set noout
3. Shutdown all OSDs with the bad PG
4. Remove the <pgid>s#_head directory in the filestore from each OSD store
5. Start the OSDs back up
6. ceph pg dump pgs should show the removed pg as "unknown" state
7. ceph osd force-create-pg <pgid>
8. ceph pg dump pgs should show the pg back as "acitve+clean" state
9. ceph osd unset noout

Actions #21

Updated by David Zafman over 6 years ago

  • Priority changed from Urgent to High

I'm lowering priority to High because cluster involved has badly corrupted PG, so this shouldn't hold up Luminous release.

Actions #22

Updated by Ashley Merrick over 6 years ago

David Zafman wrote:

Ashley Merrick wrote:

Tried a ceph pg force_create_pg however says the following "Error ENOTSUP: this command is obsolete"

I see a new version of this command restored in current luminous branch. It is changed to "ceph osd force-create-pg <pgid>".

Experimenting on my cluster I created these steps to blow away a bad pg and replace it with an empty one.

WARNING THIS CAUSES DATA LOSS (backup any critical data that you can)

1. Install v12.1.2 once it is available or build and install packages using the "luminous" branch.
2. ceph osd set noout
3. Shutdown all OSDs with the bad PG
4. Remove the <pgid>s#_head directory in the filestore from each OSD store
5. Start the OSDs back up
6. ceph pg dump pgs should show the removed pg as "unknown" state
7. ceph osd force-create-pg <pgid>
8. ceph pg dump pgs should show the pg back as "acitve+clean" state
9. ceph osd unset noout

How can I do this on a bluestore OSD (Some of the OSD's for this PG are Bluestore), I once did find a command line to remove an object from a storage but unable to find this now.

Actions #23

Updated by David Zafman over 6 years ago

the copy_from() was fixed by https://github.com/ceph/ceph/pull/15693 so I closed my pull request.

Actions #24

Updated by David Zafman over 6 years ago

Use ceph-objectstore-tool to remove the pg. Here is an example. If data-path refers to bluestore it will remove it.


$ ceph-objectstore-tool --data-path dev/osd0 --pgid 1.4 --op remove
 marking collection for removal
setting '_remove' omap key
finish_remove_pgs 1.4_head removing 1.4
Remove successful
$ bin/ceph-objectstore-tool --data-path dev/osd0 --op list-pgs
1.5
1.2
1.0
1.1
1.7
1.3
1.6

Actions #25

Updated by Ashley Merrick over 6 years ago

David Zafman wrote:

Use ceph-objectstore-tool to remove the pg. Here is an example. If data-path refers to bluestore it will remove it.

[...]

So that worked fine, the cluster is now back to healthy.

Underlying issue was resolved, however not sure if the filestore/bluestore crash is something that we wish to catch and not crash the OSD.

Actions #26

Updated by David Zafman over 6 years ago

  • Related to Bug #20167: osd/PG.cc: 806: FAILED assert(past_intervals.empty()) added
Actions #27

Updated by David Zafman over 6 years ago

  • Status changed from 12 to Resolved

These issues were precipitated by a improper pg recovery. There were however 2 uncovered code fixes:

https://github.com/ceph/ceph/pull/15693

http://tracker.ceph.com/issues/20167. (not sure what pull request resolved this tracker)

Actions

Also available in: Atom PDF