Actions
Bug #13383
closedimport/export screws up same_interval_since
Status:
Can't reproduce
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
before the export:
remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569395 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] check_recovery_sources no source osds () went down remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569403 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] handle_activate_map remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569409 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] state<Started/Primary/Active>: Active: handling ActMap remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569415 7fed03dea700 7 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] state<Started/Primary>: handle ActMap primary remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569422 7fed03dea700 15 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] publish_stats_to_osd 21: no change since remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569428 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] take_waiters remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569434 7fed03dea700 20 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] handle_activate_map: Not dirtying info: last_persisted is 21 while current is 99 remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569440 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] handle_peering_event: epoch_sent: 99 epoch_requested: 99 NullEvt
note, les/c/f 21/21/0 20/20/11
on import,
2015-10-05 22:36:16.272259 7f4a8176a900 5 osd.12 pg_epoch: 99 pg[1.9(unlocked)] enter Initial 2015-10-05 22:36:16.272293 7f4a8176a900 20 osd.12 pg_epoch: 99 pg[1.9(unlocked)] enter NotTrimming 2015-10-05 22:36:16.272605 7f4a8176a900 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=0 crt=0'0 inactive] handle_loaded 2015-10-05 22:36:16.272641 7f4a8176a900 5 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=0 crt=0'0 inactive NOTIFY] exit Initial 0.000382 0 0.000000 2015-10-05 22:36:16.272673 7f4a8176a900 5 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=0 crt=0'0 inactive NOTIFY] enter Reset 2015-10-05 22:36:16.272700 7f4a8176a900 20 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=0 crt=0'0 inactive NOTIFY] set_last_peering_reset 99 2015-10-05 22:36:16.272728 7f4a8176a900 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] Clearing blocked outgoing recovery messages 2015-10-05 22:36:16.272758 7f4a8176a900 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] Not blocking outgoing recovery messages 2015-10-05 22:36:16.272786 7f4a8176a900 10 osd.12 99 load_pgs loaded pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] log((0'0,0'0], crt=0'0) 2015-10-05 22:36:16.350769 7f4a8176a900 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] null 2015-10-05 22:36:16.383015 7f4a66378700 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] handle_peering_event: epoch_sent: 99 epoch_requested: 99 NullEvt 2015-10-05 22:36:18.891128 7f4a6f38a700 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] null 2015-10-05 22:36:18.951343 7f4a65b77700 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] handle_advance_map [4]/[4] -- 4/4 2015-10-05 22:36:18.951356 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] state<Reset>: Reset advmap 2015-10-05 22:36:18.951375 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] generate_past_intervals over epochs 21-99 2015-10-05 22:36:18.951778 7f4a65b77700 20 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] new interval newup [4] newacting [4] 2015-10-05 22:36:18.951787 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] state<Reset>: should restart peering, calling start_peering_interval again 2015-10-05 22:36:18.951793 7f4a65b77700 20 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] set_last_peering_reset 100 2015-10-05 22:36:18.951798 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=100 crt=0'0 inactive NOTIFY] Clearing blocked outgoing recovery messages 2015-10-05 22:36:18.951804 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=100 crt=0'0 inactive NOTIFY] Not blocking outgoing recovery messages 2015-10-05 22:36:18.951816 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [4] r=-1 lpr=100 pi=99-99/1 crt=0'0 inactive NOTIFY] start_peering_interval: check_new_interval output: generate_past_intervals interval(99-99 up [1,4](1) acting [1,4](1)): not rw, up_thru 98 up_from 11 last_epoch_clean 21 2015-10-05 22:36:18.951834 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [4] r=-1 lpr=100 pi=99-99/1 crt=0'0 inactive NOTIFY] noting past interval(99-99 up [1,4](1) acting [1,4](1)) 2015-10-05 22:36:18.951842 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 100/100/100) [4] r=-1 lpr=100 pi=99-99/1 crt=0'0 inactive NOTIFY] up [1,4] -> [4], acting [1,4] -> [4], acting_primary 1 -> 4, up_primary 1 -> 4, role -1 -> -1 ...
same_interval_since is reset to 99.. and the past interval generated by genreate_past_intervals is wrong (99-99 intead of 21-99). that means that forever after (until clean, at least) other osds will also try to recalculate that interval too (and not correct it).
indirectly led to #13382
/a/sage-2015-10-05_14:30:41-upgrade:hammer-x-infernalis---basic-vps/1089801
Updated by David Zafman about 8 years ago
- Status changed from New to Can't reproduce
Not sure if this is really fix, but the osd logs aren't available anymore.
We'll reopen if is is seen again.
Actions