Project

General

Profile

Actions

Bug #12645

closed

chunky_scrub assert fail

Added by science luo over 8 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Cluster:5 nodes, 4 mons ,each node has 10 osds
Ceph version:0.87
Operation:
(1)reboot node.1
(2)then unmount all osd in node.1
(3)/etc/init.d/ceph start
(4)a few minutes later , assert fail occurred on osd.33 in node.1

log_info:

--- begin dump of recent events ---
122> 2015-07-30 16:21:35.745180 7f6837a33700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.745180, event: started, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
121> 2015-07-30 16:21:35.745356 7f6837232700 1 - 100.100.100.244:6800/6354 --> 100.100.100.242:0/1030262 -- osd_op_reply(301120 rbd_data.791f542ae8944a.0000000000049826 [read 0~4194304] v0'0 uv0 ack = 2 ((2) No such file or directory)) v6 - ?+0 0x11336c00 con 0x11501700
120> 2015-07-30 16:21:35.745441 7f6837a33700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.745441, event: started, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
119> 2015-07-30 16:21:35.745508 7f6837232700 5 - op tracker -- seq: 13272, time: 2015-07-30 16:21:35.745507, event: done, op: osd_op(client.8781641.0:301120 rbd_data.791f542ae8944a.0000000000049826 [read 0~4194304] 39.f26da258 ack+read+known_if_redirected e83500)
118> 2015-07-30 16:21:35.745860 7f6837a33700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.745859, event: waiting for subops from 0,7,41, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
117> 2015-07-30 16:21:35.746131 7f6837a33700 1 - 111.111.111.244:6800/6354 --> 111.111.111.241:6816/5028833 -- osd_sub_op(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] v 83500'402 snapset=0=[]:[] snapc=0=[]) v11 -- ?+939 0x141b6d00 con 0x10626b00
116> 2015-07-30 16:21:35.746205 7f6837a33700 1 - 111.111.111.244:6800/6354 --> 111.111.111.243:6814/9316 -- osd_sub_op(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] v 83500'402 snapset=0=[]:[] snapc=0=[]) v11 -- ?+939 0x1300ac00 con 0xeaaa680
115> 2015-07-30 16:21:35.746328 7f6837a33700 1 - 111.111.111.244:6800/6354 --> 111.111.111.242:6822/22314 -- osd_sub_op(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] v 83500'402 snapset=0=[]:[] snapc=0=[]) v11 -- ?+939 0x13d35700 con 0x10628c00
114> 2015-07-30 16:21:35.746710 7f6837a33700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.746710, event: commit_queued_for_journal_write, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
113> 2015-07-30 16:21:35.746968 7f684c5f4700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.746968, event: write_thread_in_journal_buffer, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
112> 2015-07-30 16:21:35.748340 7f682e2f0700 1 - 111.111.111.244:6800/6354 <== osd.0 111.111.111.241:6816/5028833 505 ==== osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0) v2 ==== 177+0+0 (4135779876 0 0) 0x1226cec0 con 0x10626b00
111> 2015-07-30 16:21:35.748365 7f682e2f0700 5 - op tracker -- seq: 13301, time: 2015-07-30 16:21:35.748238, event: header_read, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
110> 2015-07-30 16:21:35.748377 7f682e2f0700 5 - op tracker -- seq: 13301, time: 2015-07-30 16:21:35.748240, event: throttled, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
109> 2015-07-30 16:21:35.748389 7f682e2f0700 5 - op tracker -- seq: 13301, time: 2015-07-30 16:21:35.748322, event: all_read, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
108> 2015-07-30 16:21:35.748397 7f682e2f0700 5 - op tracker -- seq: 13301, time: 0.000000, event: dispatched, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
107> 2015-07-30 16:21:35.748429 7f683a238700 5 - op tracker -- seq: 13301, time: 2015-07-30 16:21:35.748428, event: reached_pg, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
106> 2015-07-30 16:21:35.748446 7f683a238700 5 - op tracker -- seq: 13301, time: 2015-07-30 16:21:35.748446, event: started, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
105> 2015-07-30 16:21:35.748468 7f682dbe9700 1 - 111.111.111.244:6800/6354 <== osd.41 111.111.111.242:6822/22314 128 ==== osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0) v2 ==== 177+0+0 (4232092591 0 0) 0x12fe2940 con 0x10628c00
104> 2015-07-30 16:21:35.748494 7f682dbe9700 5 - op tracker -- seq: 13302, time: 2015-07-30 16:21:35.748361, event: header_read, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
103> 2015-07-30 16:21:35.748506 7f682dbe9700 5 - op tracker -- seq: 13302, time: 2015-07-30 16:21:35.748362, event: throttled, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
102> 2015-07-30 16:21:35.748513 7f682dbe9700 5 - op tracker -- seq: 13302, time: 2015-07-30 16:21:35.748463, event: all_read, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
101> 2015-07-30 16:21:35.748519 7f682dbe9700 5 - op tracker -- seq: 13302, time: 0.000000, event: dispatched, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
100> 2015-07-30 16:21:35.748488 7f683a238700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.748488, event: sub_op_commit_rec, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
99> 2015-07-30 16:21:35.748641 7f6837a33700 5 - op tracker -- seq: 13302, time: 2015-07-30 16:21:35.748641, event: reached_pg, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
98> 2015-07-30 16:21:35.748657 7f6837a33700 5 - op tracker -- seq: 13302, time: 2015-07-30 16:21:35.748657, event: started, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
97> 2015-07-30 16:21:35.748638 7f683a238700 5 - op tracker -- seq: 13301, time: 2015-07-30 16:21:35.748637, event: done, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
96> 2015-07-30 16:21:35.748665 7f6837a33700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.748665, event: sub_op_commit_rec, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
95> 2015-07-30 16:21:35.748811 7f6837a33700 5 - op tracker -- seq: 13302, time: 2015-07-30 16:21:35.748811, event: done, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
94> 2015-07-30 16:21:35.759747 7f6839a37700 5 - op tracker -- seq: 13283, time: 2015-07-30 16:21:35.759747, event: started, op: osd_op(client.8964076.0:36129 rbd_data.36bbb37843e45.0000000000008d05 [read 0~4194304] 42.8f189ad3 ack+read+known_if_redirected e83500)
93> 2015-07-30 16:21:35.759934 7f6839a37700 1 - 100.100.100.244:6800/6354 --> 100.100.100.242:0/1027184 -- osd_op_reply(36129 rbd_data.36bbb37843e45.0000000000008d05 [read 0~0] v0'0 uv672 ondisk = 0) v6 -- ?+0 0x11980dc0 con 0x111dc260
92> 2015-07-30 16:21:35.759986 7f6839a37700 5 - op tracker -- seq: 13283, time: 2015-07-30 16:21:35.759986, event: done, op: osd_op(client.8964076.0:36129 rbd_data.36bbb37843e45.0000000000008d05 [read 0~0] 42.8f189ad3 ack+read+known_if_redirected e83500)
91> 2015-07-30 16:21:35.760057 7f6839a37700 5 - op tracker -- seq: 13257, time: 2015-07-30 16:21:35.760056, event: reached_pg, op: osd_op(client.8963508.0:79045 rbd_data.791f542ae8944a.000000000000cdd0@6 [read 0~4194304] 39.cc7213d8 ack+read+localize_reads+known_if_redirected e83500)
90> 2015-07-30 16:21:35.760490 7f6839a37700 1 - 100.100.100.244:6800/6354 --> 100.100.100.242:0/1012965 -- osd_op_reply(79045 rbd_data.791f542ae8944a.000000000000cdd0 [read 0~4194304] v0'0 uv0 ack = 2 ((2) No such file or directory)) v6 - ?+0 0x119839c0 con 0x1126e000
89> 2015-07-30 16:21:35.760578 7f6839a37700 5 - op tracker -- seq: 13257, time: 2015-07-30 16:21:35.760578, event: done, op: osd_op(client.8963508.0:79045 rbd_data.791f542ae8944a.000000000000cdd0@6 [read 0~4194304] 39.cc7213d8 ack+read+localize_reads+known_if_redirected e83500)
88> 2015-07-30 16:21:35.760659 7f6839a37700 5 - op tracker -- seq: 13282, time: 2015-07-30 16:21:35.760658, event: reached_pg, op: osd_op(client.8781641.0:301184 rbd_data.791f542ae8944a.0000000000049866 [read 0~4194304] 39.d614fb58 ack+read+known_if_redirected e83500)
87> 2015-07-30 16:21:35.760920 7f6839a37700 1 - 100.100.100.244:6800/6354 --> 100.100.100.242:0/1030262 -- osd_op_reply(301184 rbd_data.791f542ae8944a.0000000000049866 [read 0~4194304] v0'0 uv0 ack = 2 ((2) No such file or directory)) v6 - ?+0 0x119802c0 con 0x11501700
86> 2015-07-30 16:21:35.760992 7f6839a37700 5 - op tracker -- seq: 13282, time: 2015-07-30 16:21:35.760992, event: done, op: osd_op(client.8781641.0:301184 rbd_data.791f542ae8944a.0000000000049866 [read 0~4194304] 39.d614fb58 ack+read+known_if_redirected e83500)
85> 2015-07-30 16:21:35.764237 7f684bdf3700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.764237, event: journaled_completion_queued, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
84> 2015-07-30 16:21:35.764325 7f68495ee700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.764325, event: op_commit, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
83> 2015-07-30 16:21:35.764880 7f6849def700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.764880, event: op_applied, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
82> 2015-07-30 16:21:35.778311 7f6830916700 1 - 111.111.111.244:6800/6354 <== osd.7 111.111.111.243:6814/9316 350 ==== osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0) v2 ==== 177+0+0 (1892352649 0 0) 0x117cb700 con 0xeaaa680
81> 2015-07-30 16:21:35.778330 7f6830916700 5 - op tracker -- seq: 13303, time: 2015-07-30 16:21:35.778265, event: header_read, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
80> 2015-07-30 16:21:35.778342 7f6830916700 5 - op tracker -- seq: 13303, time: 2015-07-30 16:21:35.778266, event: throttled, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
79> 2015-07-30 16:21:35.778348 7f6830916700 5 - op tracker -- seq: 13303, time: 2015-07-30 16:21:35.778306, event: all_read, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
78> 2015-07-30 16:21:35.778355 7f6830916700 5 - op tracker -- seq: 13303, time: 0.000000, event: dispatched, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
77> 2015-07-30 16:21:35.778385 7f683a238700 5 - op tracker -- seq: 13303, time: 2015-07-30 16:21:35.778385, event: reached_pg, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
76> 2015-07-30 16:21:35.778412 7f683a238700 5 - op tracker -- seq: 13303, time: 2015-07-30 16:21:35.778412, event: started, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
75> 2015-07-30 16:21:35.778449 7f683a238700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.778449, event: sub_op_commit_rec, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
74> 2015-07-30 16:21:35.778551 7f683a238700 1 - 100.100.100.244:6800/6354 --> 100.100.100.242:0/1011392 -- osd_op_reply(55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] v83500'402 uv402 ack = 0) v6 -- ?+0 0x11334dc0 con 0x11501440
73> 2015-07-30 16:21:35.778704 7f683a238700 1 - 100.100.100.244:6800/6354 --> 100.100.100.242:0/1011392 -- osd_op_reply(55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] v83500'402 uv402 ondisk = 0) v6 -- ?+0 0x11983700 con 0x11501440
72> 2015-07-30 16:21:35.778840 7f683a238700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.778840, event: commit_sent, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
71> 2015-07-30 16:21:35.778959 7f683a238700 5 - op tracker -- seq: 13300, time: 2015-07-30 16:21:35.778959, event: done, op: osd_op(client.8963436.0:55739 rbd_data.48e7a77843e45.0000000000002447 [set-alloc-hint object_size 16777216 write_size 16777216,write 0~0] 45.71e72ee1 snapc 1f=[1f] ack+ondisk+write+known_if_redirected e83500)
70> 2015-07-30 16:21:35.778979 7f683a238700 5 - op tracker -- seq: 13303, time: 2015-07-30 16:21:35.778978, event: done, op: osd_sub_op_reply(client.8963436.0:55739 45.2e1 71e72ee1/rbd_data.48e7a77843e45.0000000000002447/head//45 [] ondisk, result = 0)
69> 2015-07-30 16:21:35.779976 7f681daf0700 1 - 100.100.100.244:6800/6354 <== client.8964076 100.100.100.242:0/1027184 465 ==== osd_op(client.8964076.0:36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~4194304] 42.3dc15453 ack+read+known_if_redirected e83500) v4 ==== 190+0+0 (1325724880 0 0) 0x11e28300 con 0x111dc260
68> 2015-07-30 16:21:35.780032 7f681daf0700 5 - op tracker -- seq: 13304, time: 2015-07-30 16:21:35.779815, event: header_read, op: osd_op(client.8964076.0:36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~4194304] 42.3dc15453 ack+read+known_if_redirected e83500)
67> 2015-07-30 16:21:35.780062 7f681daf0700 5 - op tracker -- seq: 13304, time: 2015-07-30 16:21:35.779818, event: throttled, op: osd_op(client.8964076.0:36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~4194304] 42.3dc15453 ack+read+known_if_redirected e83500)
66> 2015-07-30 16:21:35.780074 7f681daf0700 5 - op tracker -- seq: 13304, time: 2015-07-30 16:21:35.779971, event: all_read, op: osd_op(client.8964076.0:36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~4194304] 42.3dc15453 ack+read+known_if_redirected e83500)
65> 2015-07-30 16:21:35.780080 7f681daf0700 5 - op tracker -- seq: 13304, time: 0.000000, event: dispatched, op: osd_op(client.8964076.0:36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~4194304] 42.3dc15453 ack+read+known_if_redirected e83500)
64> 2015-07-30 16:21:35.780209 7f6839a37700 5 - op tracker -- seq: 13304, time: 2015-07-30 16:21:35.780208, event: reached_pg, op: osd_op(client.8964076.0:36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~4194304] 42.3dc15453 ack+read+known_if_redirected e83500)
63> 2015-07-30 16:21:35.781438 7f6840345700 1 - 111.111.111.244:6802/6354 <== osd.1 111.111.111.243:0/10101 109 ==== osd_ping(ping e83500 stamp 2015-07-30 16:21:35.782152) v2 ==== 47+0+0 (1494193579 0 0) 0x134d1a20 con 0x1087d1e0
62> 2015-07-30 16:21:35.781470 7f6840345700 1 - 111.111.111.244:6802/6354 --> 111.111.111.243:0/10101 -- osd_ping(ping_reply e83500 stamp 2015-07-30 16:21:35.782152) v2 -- ?+0 0x1429c3c0 con 0x1087d1e0
61> 2015-07-30 16:21:35.781548 7f6841c49700 1 - 100.100.100.244:6802/6354 <== osd.1 111.111.111.243:0/10101 109 ==== osd_ping(ping e83500 stamp 2015-07-30 16:21:35.782152) v2 ==== 47+0+0 (1494193579 0 0) 0x13818580 con 0x1087d8c0
60> 2015-07-30 16:21:35.781575 7f6841c49700 1 - 100.100.100.244:6802/6354 --> 111.111.111.243:0/10101 -- osd_ping(ping_reply e83500 stamp 2015-07-30 16:21:35.782152) v2 -- ?+0 0x11511c20 con 0x1087d8c0
59> 2015-07-30 16:21:35.794789 7f6839a37700 5 - op tracker -- seq: 13304, time: 2015-07-30 16:21:35.794789, event: started, op: osd_op(client.8964076.0:36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~4194304] 42.3dc15453 ack+read+known_if_redirected e83500)
58> 2015-07-30 16:21:35.794967 7f6839a37700 1 - 100.100.100.244:6800/6354 --> 100.100.100.242:0/1027184 -- osd_op_reply(36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~0] v0'0 uv673 ondisk = 0) v6 -- ?+0 0x11982ec0 con 0x111dc260
57> 2015-07-30 16:21:35.795039 7f6839a37700 5 - op tracker -- seq: 13304, time: 2015-07-30 16:21:35.795039, event: done, op: osd_op(client.8964076.0:36159 rbd_data.36bbb37843e45.0000000000008d23 [read 0~0] 42.3dc15453 ack+read+known_if_redirected e83500)
56> 2015-07-30 16:21:35.818118 7f6816370700 1 - 100.100.100.244:6800/6354 >> :/0 pipe(0x12fe39c0 sd=283 :6800 s=0 pgs=0 cs=0 l=0 c=0x14571440).accept sd=283 100.100.100.241:48688/0
55> 2015-07-30 16:21:35.818589 7f6816370700 1 - 100.100.100.244:6800/6354 <== client.8961853 100.100.100.241:0/4019426 1 ==== osd_op(client.8961853.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500) v4 ==== 201+0+0 (2905364837 0 0) 0x138dc080 con 0x14571440
54> 2015-07-30 16:21:35.818633 7f6816370700 5 - op tracker -- seq: 13305, time: 2015-07-30 16:21:35.818549, event: header_read, op: osd_op(client.8961853.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
53> 2015-07-30 16:21:35.818655 7f6816370700 5 - op tracker -- seq: 13305, time: 2015-07-30 16:21:35.818552, event: throttled, op: osd_op(client.8961853.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
52> 2015-07-30 16:21:35.818678 7f6816370700 5 - op tracker -- seq: 13305, time: 2015-07-30 16:21:35.818585, event: all_read, op: osd_op(client.8961853.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
51> 2015-07-30 16:21:35.818685 7f6816370700 5 - op tracker -- seq: 13305, time: 0.000000, event: dispatched, op: osd_op(client.8961853.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
50> 2015-07-30 16:21:35.818756 7f6837a33700 5 - op tracker -- seq: 13305, time: 2015-07-30 16:21:35.818756, event: reached_pg, op: osd_op(client.8961853.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
49> 2015-07-30 16:21:35.819405 7f6837a33700 5 - op tracker -- seq: 13305, time: 2015-07-30 16:21:35.819405, event: started, op: osd_op(client.8961853.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
48> 2015-07-30 16:21:35.819563 7f6837a33700 1 - 100.100.100.244:6800/6354 --> 100.100.100.241:0/4019426 -- osd_op_reply(86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] v0'0 uv104257 ondisk = 0) v6 -- ?+0 0x1151c100 con 0x14571440
47> 2015-07-30 16:21:35.819710 7f6837a33700 5 - op tracker -- seq: 13305, time: 2015-07-30 16:21:35.819710, event: done, op: osd_op(client.8961853.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
46> 2015-07-30 16:21:35.822678 7f6816370700 1 - 100.100.100.244:6800/6354 <== client.8961853 100.100.100.241:0/4019426 2 ==== osd_op(client.8961853.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500) v4 ==== 201+0+9 (3541475255 0 2163927993) 0x138ddc00 con 0x14571440
45> 2015-07-30 16:21:35.822708 7f6816370700 5 - op tracker -- seq: 13306, time: 2015-07-30 16:21:35.822627, event: header_read, op: osd_op(client.8961853.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
44> 2015-07-30 16:21:35.822723 7f6816370700 5 - op tracker -- seq: 13306, time: 2015-07-30 16:21:35.822629, event: throttled, op: osd_op(client.8961853.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
43> 2015-07-30 16:21:35.822730 7f6816370700 5 - op tracker -- seq: 13306, time: 2015-07-30 16:21:35.822673, event: all_read, op: osd_op(client.8961853.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
42> 2015-07-30 16:21:35.822737 7f6816370700 5 - op tracker -- seq: 13306, time: 0.000000, event: dispatched, op: osd_op(client.8961853.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
41> 2015-07-30 16:21:35.822813 7f683a238700 5 - op tracker -- seq: 13306, time: 2015-07-30 16:21:35.822813, event: reached_pg, op: osd_op(client.8961853.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
40> 2015-07-30 16:21:35.822907 7f683a238700 5 - op tracker -- seq: 13306, time: 2015-07-30 16:21:35.822907, event: started, op: osd_op(client.8961853.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
39> 2015-07-30 16:21:35.823123 7f683a238700 1 - 100.100.100.244:6800/6354 --> 100.100.100.241:0/4019426 -- osd_op_reply(87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] v0'0 uv104257 ondisk = 0) v6 -- ?+0 0x11983440 con 0x14571440
38> 2015-07-30 16:21:35.823232 7f683a238700 5 - op tracker -- seq: 13306, time: 2015-07-30 16:21:35.823232, event: done, op: osd_op(client.8961853.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
37> 2015-07-30 16:21:35.833232 7f681849b700 1 - 100.100.100.244:6800/6354 >> :/0 pipe(0x12fe1080 sd=346 :6800 s=0 pgs=0 cs=0 l=0 c=0x145707e0).accept sd=346 100.100.100.243:63649/0
36> 2015-07-30 16:21:35.833868 7f681849b700 1 - 100.100.100.244:6800/6354 <== client.8961869 100.100.100.243:0/4032026 1 ==== osd_op(client.8961869.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500) v4 ==== 201+0+0 (2905364837 0 0) 0x138da000 con 0x145707e0
35> 2015-07-30 16:21:35.833888 7f681849b700 5 - op tracker -- seq: 13307, time: 2015-07-30 16:21:35.833821, event: header_read, op: osd_op(client.8961869.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
34> 2015-07-30 16:21:35.833899 7f681849b700 5 - op tracker -- seq: 13307, time: 2015-07-30 16:21:35.833823, event: throttled, op: osd_op(client.8961869.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
33> 2015-07-30 16:21:35.833906 7f681849b700 5 - op tracker -- seq: 13307, time: 2015-07-30 16:21:35.833863, event: all_read, op: osd_op(client.8961869.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
32> 2015-07-30 16:21:35.833912 7f681849b700 5 - op tracker -- seq: 13307, time: 0.000000, event: dispatched, op: osd_op(client.8961869.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
31> 2015-07-30 16:21:35.833962 7f6837a33700 5 - op tracker -- seq: 13307, time: 2015-07-30 16:21:35.833962, event: reached_pg, op: osd_op(client.8961869.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
30> 2015-07-30 16:21:35.834173 7f6837a33700 5 - op tracker -- seq: 13307, time: 2015-07-30 16:21:35.834173, event: started, op: osd_op(client.8961869.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
29> 2015-07-30 16:21:35.834284 7f681daf0700 1 - 100.100.100.244:6800/6354 <== client.8964076 100.100.100.242:0/1027184 466 ==== osd_op(client.8964076.0:36165 rbd_data.36bbb37843e45.0000000000008d29 [read 0~4194304] 42.e84c00e3 ack+read+known_if_redirected e83500) v4 ==== 190+0+0 (4263769100 0 0) 0x11e28d00 con 0x111dc260
28> 2015-07-30 16:21:35.834359 7f681daf0700 5 - op tracker -- seq: 13308, time: 2015-07-30 16:21:35.834094, event: header_read, op: osd_op(client.8964076.0:36165 rbd_data.36bbb37843e45.0000000000008d29 [read 0~4194304] 42.e84c00e3 ack+read+known_if_redirected e83500)
27> 2015-07-30 16:21:35.834389 7f681daf0700 5 - op tracker -- seq: 13308, time: 2015-07-30 16:21:35.834095, event: throttled, op: osd_op(client.8964076.0:36165 rbd_data.36bbb37843e45.0000000000008d29 [read 0~4194304] 42.e84c00e3 ack+read+known_if_redirected e83500)
26> 2015-07-30 16:21:35.834412 7f681daf0700 5 - op tracker -- seq: 13308, time: 2015-07-30 16:21:35.834279, event: all_read, op: osd_op(client.8964076.0:36165 rbd_data.36bbb37843e45.0000000000008d29 [read 0~4194304] 42.e84c00e3 ack+read+known_if_redirected e83500)
25> 2015-07-30 16:21:35.834421 7f681daf0700 5 - op tracker -- seq: 13308, time: 0.000000, event: dispatched, op: osd_op(client.8964076.0:36165 rbd_data.36bbb37843e45.0000000000008d29 [read 0~4194304] 42.e84c00e3 ack+read+known_if_redirected e83500)
24> 2015-07-30 16:21:35.834393 7f6837a33700 1 - 100.100.100.244:6800/6354 --> 100.100.100.243:0/4032026 -- osd_op_reply(86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] v0'0 uv104257 ondisk = 0) v6 -- ?+0 0x11982680 con 0x145707e0
23> 2015-07-30 16:21:35.834475 7f6839236700 5 - op tracker -- seq: 13308, time: 2015-07-30 16:21:35.834475, event: reached_pg, op: osd_op(client.8964076.0:36165 rbd_data.36bbb37843e45.0000000000008d29 [read 0~4194304] 42.e84c00e3 ack+read+known_if_redirected e83500)
22> 2015-07-30 16:21:35.834500 7f6837a33700 5 - op tracker -- seq: 13307, time: 2015-07-30 16:21:35.834500, event: done, op: osd_op(client.8961869.0:86 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [stat] 36.3c9aed11 ack+read+known_if_redirected e83500)
21> 2015-07-30 16:21:35.834610 7f681daf0700 1 - 100.100.100.244:6800/6354 <== client.8964076 100.100.100.242:0/1027184 467 ==== osd_op(client.8964076.0:36167 rbd_data.36bbb37843e45.0000000000008d2b [read 0~4194304] 42.b0ef0d53 ack+read+known_if_redirected e83500) v4 ==== 190+0+0 (2866564848 0 0) 0x11e26c80 con 0x111dc260
20> 2015-07-30 16:21:35.834663 7f681daf0700 5 - op tracker -- seq: 13309, time: 2015-07-30 16:21:35.834464, event: header_read, op: osd_op(client.8964076.0:36167 rbd_data.36bbb37843e45.0000000000008d2b [read 0~4194304] 42.b0ef0d53 ack+read+known_if_redirected e83500)
19> 2015-07-30 16:21:35.834678 7f681daf0700 5 - op tracker -- seq: 13309, time: 2015-07-30 16:21:35.834465, event: throttled, op: osd_op(client.8964076.0:36167 rbd_data.36bbb37843e45.0000000000008d2b [read 0~4194304] 42.b0ef0d53 ack+read+known_if_redirected e83500)
18> 2015-07-30 16:21:35.834694 7f681daf0700 5 - op tracker -- seq: 13309, time: 2015-07-30 16:21:35.834606, event: all_read, op: osd_op(client.8964076.0:36167 rbd_data.36bbb37843e45.0000000000008d2b [read 0~4194304] 42.b0ef0d53 ack+read+known_if_redirected e83500)
17> 2015-07-30 16:21:35.834700 7f681daf0700 5 - op tracker -- seq: 13309, time: 0.000000, event: dispatched, op: osd_op(client.8964076.0:36167 rbd_data.36bbb37843e45.0000000000008d2b [read 0~4194304] 42.b0ef0d53 ack+read+known_if_redirected e83500)
16> 2015-07-30 16:21:35.834748 7f6839a37700 5 - op tracker -- seq: 13309, time: 2015-07-30 16:21:35.834748, event: reached_pg, op: osd_op(client.8964076.0:36167 rbd_data.36bbb37843e45.0000000000008d2b [read 0~4194304] 42.b0ef0d53 ack+read+known_if_redirected e83500)
15> 2015-07-30 16:21:35.834854 7f681daf0700 1 - 100.100.100.244:6800/6354 <== client.8964076 100.100.100.242:0/1027184 468 ==== osd_op(client.8964076.0:36170 rbd_data.36bbb37843e45.0000000000008d2e [read 0~4194304] 42.8b336859 ack+read+known_if_redirected e83500) v4 ==== 190+0+0 (448687112 0 0) 0x11e28f80 con 0x111dc260
14> 2015-07-30 16:21:35.834914 7f681daf0700 5 - op tracker -- seq: 13310, time: 2015-07-30 16:21:35.834745, event: header_read, op: osd_op(client.8964076.0:36170 rbd_data.36bbb37843e45.0000000000008d2e [read 0~4194304] 42.8b336859 ack+read+known_if_redirected e83500)
13> 2015-07-30 16:21:35.834938 7f681daf0700 5 - op tracker -- seq: 13310, time: 2015-07-30 16:21:35.834746, event: throttled, op: osd_op(client.8964076.0:36170 rbd_data.36bbb37843e45.0000000000008d2e [read 0~4194304] 42.8b336859 ack+read+known_if_redirected e83500)
12> 2015-07-30 16:21:35.834948 7f681daf0700 5 - op tracker -- seq: 13310, time: 2015-07-30 16:21:35.834850, event: all_read, op: osd_op(client.8964076.0:36170 rbd_data.36bbb37843e45.0000000000008d2e [read 0~4194304] 42.8b336859 ack+read+known_if_redirected e83500)
11> 2015-07-30 16:21:35.834955 7f681daf0700 5 - op tracker -- seq: 13310, time: 0.000000, event: dispatched, op: osd_op(client.8964076.0:36170 rbd_data.36bbb37843e45.0000000000008d2e [read 0~4194304] 42.8b336859 ack+read+known_if_redirected e83500)
10> 2015-07-30 16:21:35.835023 7f6836a31700 5 - op tracker -- seq: 13310, time: 2015-07-30 16:21:35.835023, event: reached_pg, op: osd_op(client.8964076.0:36170 rbd_data.36bbb37843e45.0000000000008d2e [read 0~4194304] 42.8b336859 ack+read+known_if_redirected e83500)
9> 2015-07-30 16:21:35.837695 7f681849b700 1 - 100.100.100.244:6800/6354 <== client.8961869 100.100.100.243:0/4032026 2 ==== osd_op(client.8961869.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500) v4 ==== 201+0+9 (3541475255 0 2163927993) 0x138da500 con 0x145707e0
8> 2015-07-30 16:21:35.837730 7f681849b700 5 - op tracker -- seq: 13311, time: 2015-07-30 16:21:35.837653, event: header_read, op: osd_op(client.8961869.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
7> 2015-07-30 16:21:35.837743 7f681849b700 5 - op tracker -- seq: 13311, time: 2015-07-30 16:21:35.837654, event: throttled, op: osd_op(client.8961869.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
6> 2015-07-30 16:21:35.837750 7f681849b700 5 - op tracker -- seq: 13311, time: 2015-07-30 16:21:35.837691, event: all_read, op: osd_op(client.8961869.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
5> 2015-07-30 16:21:35.837758 7f681849b700 5 - op tracker -- seq: 13311, time: 0.000000, event: dispatched, op: osd_op(client.8961869.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
4> 2015-07-30 16:21:35.837831 7f683a238700 5 - op tracker -- seq: 13311, time: 2015-07-30 16:21:35.837831, event: reached_pg, op: osd_op(client.8961869.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
3> 2015-07-30 16:21:35.838060 7f683a238700 5 - op tracker -- seq: 13311, time: 2015-07-30 16:21:35.838060, event: started, op: osd_op(client.8961869.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
2> 2015-07-30 16:21:35.838341 7f683a238700 1 - 100.100.100.244:6800/6354 --> 100.100.100.243:0/4032026 -- osd_op_reply(87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] v0'0 uv104257 ondisk = 0) v6 -- ?+0 0x11337c80 con 0x145707e0
1> 2015-07-30 16:21:35.838435 7f683a238700 5 - op tracker -- seq: 13311, time: 2015-07-30 16:21:35.838435, event: done, op: osd_op(client.8961869.0:87 rbd_id.volume-3b3734d3-722d-495a-8253-9ef48b582907 [call rbd.get_id] 36.3c9aed11 ack+read+known_if_redirected e83500)
0> 2015-07-30 16:21:35.848222 7f6835a2f700 -1 ** Caught signal (Aborted) *
in thread 7f6835a2f700

ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
1: /usr/bin/ceph-osd() [0xa8b002]
2: (()+0xf130) [0x7f6856c04130]
3: (gsignal()+0x39) [0x7f685561e5f9]
4: (abort()+0x148) [0x7f6855620068]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7f6855f229d5]
6: (()+0x5e946) [0x7f6855f20946]
7: (()+0x5e973) [0x7f6855f20973]
8: (()+0x5eb9f) [0x7f6855f20b9f]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xb7db1a]
10: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x13ad) [0x7bd60d]
11: (PG::scrub(ThreadPool::TPHandle&)+0x316) [0x7bdc46]
12: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x29) [0x69fe59]
13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa66) [0xb6e636]
14: (ThreadPool::WorkThread::entry()+0x10) [0xb6f6c0]
15: (()+0x7df3) [0x7f6856bfcdf3]
16: (clone()+0x6d) [0x7f68556df54d]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.33.log
--
end dump of recent events ---


Files

lrb_ceph-osd.33.log (564 KB) lrb_ceph-osd.33.log science luo, 08/10/2015 02:04 AM
Actions #1

Updated by science luo over 8 years ago

According to my point,the reason for this bug is :

1.3 scrub request to repica
1936> 2015-07-30 16:21:31.804154 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.240:6801/20887 -- replica scrub(pg: 45.34b,from:0'0,to:83486'377,epoch:83500,start:0//0//-1,end:a5cd434b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x14426a00 con 0x10668680
1935> 2015-07-30 16:21:31.804233 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.241:6817/5025257 -- replica scrub(pg: 45.34b,from:0'0,to:83486'377,epoch:83500,start:0//0//-1,end:a5cd434b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x12ebe800 con 0x10667b80
1934> 2015-07-30 16:21:31.804296 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.243:6816/9546 -- replica scrub(pg: 45.34b,from:0'0,to:83486'377,epoch:83500,start:0//0//-1,end:a5cd434b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x138da500 con 0xeaaac00

2.but 1 scrub response

1048> 2015-07-30 16:21:33.389734 7f682f300700  1 - 111.111.111.244:6800/6354 <== osd.2 111.111.111.240:6801/20887 280 ==== osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+11083 (3892521603 0 369299912) 0x14379700 con 0x10668680
1047> 2015-07-30 16:21:33.389767 7f682f300700 5 - op tracker -- seq: 13230, time: 2015-07-30 16:21:33.389583, event: header_read, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
1046> 2015-07-30 16:21:33.389779 7f682f300700 5 - op tracker -- seq: 13230, time: 2015-07-30 16:21:33.389584, event: throttled, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
1045> 2015-07-30 16:21:33.389787 7f682f300700 5 - op tracker -- seq: 13230, time: 2015-07-30 16:21:33.389728, event: all_read, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
1044> 2015-07-30 16:21:33.389800 7f682f300700 5 - op tracker -- seq: 13230, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])

3.then 3 scrub request resend
934> 2015-07-30 16:21:33.447087 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.240:6801/20887 -- replica scrub(pg: 45.34b,from:0'0,to:83450'370,epoch:83500,start:a5cd434b//0//-1,end:2c9e834b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x11265700 con 0x10668680
933> 2015-07-30 16:21:33.447142 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.241:6817/5025257 -- replica scrub(pg: 45.34b,from:0'0,to:83450'370,epoch:83500,start:a5cd434b//0//-1,end:2c9e834b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x131f4080 con 0x10667b80
932> 2015-07-30 16:21:33.447163 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.243:6816/9546 -- replica scrub(pg: 45.34b,from:0'0,to:83450'370,epoch:83500,start:a5cd434b//0//-1,end:2c9e834b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x131f2c80 con 0xeaaac00

4.3 scrub response received
Line 1246: 797> 2015-07-30 16:21:34.019190 7f6830a17700 1 - 111.111.111.244:6800/6354 <== osd.23 111.111.111.243:6816/9546 435 ==== osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+11083 (216947441 0 369299912) 0x13d34c00 con 0xeaaac00
Line 1247: 796> 2015-07-30 16:21:34.019290 7f6830a17700 5 - op tracker -- seq: 13243, time: 2015-07-30 16:21:34.018768, event: header_read, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1248: 795> 2015-07-30 16:21:34.019321 7f6830a17700 5 - op tracker -- seq: 13243, time: 2015-07-30 16:21:34.018771, event: throttled, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1249: 794> 2015-07-30 16:21:34.019365 7f6830a17700 5 - op tracker -- seq: 13243, time: 2015-07-30 16:21:34.019183, event: all_read, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1250: 793> 2015-07-30 16:21:34.019375 7f6830a17700 5 - op tracker -- seq: 13243, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1283: 760> 2015-07-30 16:21:34.226783 7f682f805700 1 - 111.111.111.244:6800/6354 <== osd.9 111.111.111.241:6817/5025257 258 ==== osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+11083 (4247077718 0 369299912) 0x1300ac00 con 0x10667b80
Line 1284: 759> 2015-07-30 16:21:34.226826 7f682f805700 5 - op tracker -- seq: 13248, time: 2015-07-30 16:21:34.226450, event: header_read, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1285: 758> 2015-07-30 16:21:34.226840 7f682f805700 5 - op tracker -- seq: 13248, time: 2015-07-30 16:21:34.226452, event: throttled, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1286: 757> 2015-07-30 16:21:34.226847 7f682f805700 5 - op tracker -- seq: 13248, time: 2015-07-30 16:21:34.226773, event: all_read, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1287: 756> 2015-07-30 16:21:34.226854 7f682f805700 5 - op tracker -- seq: 13248, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1739: 304> 2015-07-30 16:21:35.114375 7f682f300700 1 - 111.111.111.244:6800/6354 <== osd.2 111.111.111.240:6801/20887 281 ==== osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+11051 (3892521603 0 116785855) 0x11dced00 con 0x10668680
Line 1740: 303> 2015-07-30 16:21:35.114396 7f682f300700 5 - op tracker -- seq: 13275, time: 2015-07-30 16:21:35.114292, event: header_read, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1741: 302> 2015-07-30 16:21:35.114409 7f682f300700 5 - op tracker -- seq: 13275, time: 2015-07-30 16:21:35.114293, event: throttled, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1742: 301> 2015-07-30 16:21:35.114419 7f682f300700 5 - op tracker -- seq: 13275, time: 2015-07-30 16:21:35.114369, event: all_read, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
Line 1743: 300> 2015-07-30 16:21:35.114427 7f682f300700 5 - op tracker -- seq: 13275, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])

5.osd/PG.cc: 3982: FAILED assert(scrubber.waiting_on == 0)

Any idea to resolve this?

Actions #3

Updated by Sage Weil over 8 years ago

  • Priority changed from Normal to High
  • Source changed from other to Community (user)
Actions #4

Updated by Samuel Just over 7 years ago

  • Assignee deleted (Samuel Just)
Actions #5

Updated by Xiaoyan Li over 7 years ago

I am working on this issue.
Until now, I have the following result.

Actions #6

Updated by Xiaoyan Li over 7 years ago

The process from the log:
1. As Luo said, osd 33 sends out scrub map to other 3 nodes:

1936> 2015-07-30 16:21:31.804154 7f6835a2f700  1 - 111.111.111.244:6800/6354 --> 111.111.111.240:6801/20887 -- replica scrub(pg: 45.34b,from:0'0,to:83486'377,epoch:83500,start:0//0//-1,end:a5cd434b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x14426a00 con 0x10668680
1935> 2015-07-30 16:21:31.804233 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.241:6817/5025257 -- replica scrub(pg: 45.34b,from:0'0,to:83486'377,epoch:83500,start:0//0//-1,end:a5cd434b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x12ebe800 con 0x10667b80
1934> 2015-07-30 16:21:31.804296 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.243:6816/9546 -- replica scrub(pg: 45.34b,from:0'0,to:83486'377,epoch:83500,start:0//0//-1,end:a5cd434b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x138da500 con 0xeaaac00

2. And received just one reply from osd.2
1048> 2015-07-30 16:21:33.389734 7f682f300700 1 - 111.111.111.244:6800/6354 <== osd.2 111.111.111.240:6801/20887 280 ==== osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+11083 (3892521603 0 369299912) 0x14379700 con 0x10668680

3. And then it goes to compare scrub maps. This is wizard, as waiting_on should not be zero here.
-1022> 2015-07-30 16:21:33.443821 7f6835a2f700 2 osd.33 pg_epoch: 83500 pg[45.34b( v 83500'385 (0'0,83500'385] local-les=83486 n=353 ec=67627 les/c 83486/83486 83453/83462/83462) [33,23,9,2] r=0 lpr=83462 crt=83486'383 lcod 83486'384 mlcod 83486'384 active+clean+scrubbing+deep] scrub osd.33 has 25 items
-1021> 2015-07-30 16:21:33.443913 7f6835a2f700 2 osd.33 pg_epoch: 83500 pg[45.34b( v 83500'385 (0'0,83500'385] local-les=83486 n=353 ec=67627 les/c 83486/83486 83453/83462/83462) [33,23,9,2] r=0 lpr=83462 crt=83486'383 lcod 83486'384 mlcod 83486'384 active+clean+scrubbing+deep] scrub replica 2 has 0 items
-1020> 2015-07-30 16:21:33.443988 7f6835a2f700 2 osd.33 pg_epoch: 83500 pg[45.34b( v 83500'385 (0'0,83500'385] local-les=83486 n=353 ec=67627 les/c 83486/83486 83453/83462/83462) [33,23,9,2] r=0 lpr=83462 crt=83486'383 lcod 83486'384 mlcod 83486'384 active+clean+scrubbing+deep] scrub replica 9 has 0 items
-1019> 2015-07-30 16:21:33.444058 7f6835a2f700 2 osd.33 pg_epoch: 83500 pg[45.34b( v 83500'385 (0'0,83500'385] local-les=83486 n=353 ec=67627 les/c 83486/83486 83453/83462/83462) [33,23,9,2] r=0 lpr=83462 crt=83486'383 lcod 83486'384 mlcod 83486'384 active+clean+scrubbing+deep] scrub replica 23 has 0 items

4. After this step, it goes to NEW_CHUNK state again. It computes new start and end, and starts scrub.
934> 2015-07-30 16:21:33.447087 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.240:6801/20887 -- replica scrub(pg: 45.34b,from:0'0,to:83450'370,epoch:83500,start:a5cd434b//0//-1,end:2c9e834b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x11265700 con 0x10668680
933> 2015-07-30 16:21:33.447142 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.241:6817/5025257 -- replica scrub(pg: 45.34b,from:0'0,to:83450'370,epoch:83500,start:a5cd434b//0//-1,end:2c9e834b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x131f4080 con 0x10667b80
932> 2015-07-30 16:21:33.447163 7f6835a2f700 1 - 111.111.111.244:6800/6354 --> 111.111.111.243:6816/9546 -- replica scrub(pg: 45.34b,from:0'0,to:83450'370,epoch:83500,start:a5cd434b//0//-1,end:2c9e834b//0//-1,chunky:1,deep:1,version:5) v5 -- ?+0 0x131f2c80 con 0xeaaac00

5. Received following scrub map.
797> 2015-07-30 16:21:34.019190 7f6830a17700 1 - 111.111.111.244:6800/6354 <== osd.23 111.111.111.243:6816/9546 435 ==== osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+11083 (216947441 0 369299912) 0x13d34c00 con 0xeaaac00
760> 2015-07-30 16:21:34.226783 7f682f805700 1 - 111.111.111.244:6800/6354 <== osd.9 111.111.111.241:6817/5025257 258 ==== osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+11083 (4247077718 0 369299912) 0x1300ac00 con 0x10667b80
304> 2015-07-30 16:21:35.114375 7f682f300700 1 - 111.111.111.244:6800/6354 <== osd.2 111.111.111.240:6801/20887 281 ==== osd_sub_op(unknown.0.0:0 45.34b 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v11 ==== 1202+0+11051 (3892521603 0 116785855) 0x11dced00 con 0x10668680

6. In fact we don’t know whether the two request from osd 9 and 23 are from former request or later request. The main problem is that why it starts to compare scrub map when just receiving one request.

When reading the codes, I noticed that there are differences between v0.87 and current master branch. In current codes, it checks whether scrub is active when handling scrub map from replica nodes.
https://github.com/ceph/ceph/blob/0.94.2/src/osd/PG.cc#L3439

I think following process may lead waiting_on wrong:
1. Wen scrubbing, after sends out replica scrub to other replica nodes, it gets something wrong and clears states and scrubs.
2. And then replica nodes reply, and it leads waiting_on become negative number.
https://github.com/ceph/ceph/blob/0.94.2/src/osd/PG.cc#L3454
3. When scrub is scheduled again, it didn't initialize waiting_on, and starts from a negative number.
4. When waiting_on goes to zero, and starts compare scrub map. At that moment, another scrub map comes and leads waiting_on minus one again, it leads assert(waiting_on == 0) fail and assert.

Actions #7

Updated by Xiaoyan Li over 7 years ago

The following step is to verify that above process leads problem.

Actions #8

Updated by Samuel Just over 7 years ago

If this is a bug in giant, it's not going to get backported (though you're free to fix it for your own cluster!)

Actions #9

Updated by Sage Weil about 7 years ago

  • Status changed from New to Can't reproduce

pelase reopen if this occurs on hammer or jewel or later!

Actions

Also available in: Atom PDF