Project

General

Profile

Actions

Bug #13383

closed

import/export screws up same_interval_since

Added by Sage Weil over 8 years ago. Updated about 8 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

before the export:

remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569395 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] check_recovery_sources no source osds () went down
remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569403 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] handle_activate_map 
remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569409 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] state<Started/Primary/Active>: Active: handling ActMap
remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569415 7fed03dea700  7 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] state<Started/Primary>: handle ActMap primary
remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569422 7fed03dea700 15 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] publish_stats_to_osd 21: no change since
remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569428 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] take_waiters
remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569434 7fed03dea700 20 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] handle_activate_map: Not dirtying info: last_persisted is 21 while current is 99
remote/vpm057/log/ceph-osd.1.log.gz:2015-10-05 22:35:36.569440 7fed03dea700 10 osd.1 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c/f 21/21/0 20/20/11) [1,4] r=0 lpr=20 crt=0'0 mlcod 0'0 active+clean NIBBLEWISE] handle_peering_event: epoch_sent: 99 epoch_requested: 99 NullEvt

note, les/c/f 21/21/0 20/20/11

on import,

2015-10-05 22:36:16.272259 7f4a8176a900  5 osd.12 pg_epoch: 99 pg[1.9(unlocked)] enter Initial
2015-10-05 22:36:16.272293 7f4a8176a900 20 osd.12 pg_epoch: 99 pg[1.9(unlocked)] enter NotTrimming
2015-10-05 22:36:16.272605 7f4a8176a900 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=0 crt=0'0 inactive] handle_loaded
2015-10-05 22:36:16.272641 7f4a8176a900  5 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=0 crt=0'0 inactive NOTIFY] exit Initial 0.000382 0 0.000000
2015-10-05 22:36:16.272673 7f4a8176a900  5 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=0 crt=0'0 inactive NOTIFY] enter Reset
2015-10-05 22:36:16.272700 7f4a8176a900 20 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=0 crt=0'0 inactive NOTIFY] set_last_peering_reset 99
2015-10-05 22:36:16.272728 7f4a8176a900 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] Clearing blocked outgoing recovery messages
2015-10-05 22:36:16.272758 7f4a8176a900 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] Not blocking outgoing recovery messages
2015-10-05 22:36:16.272786 7f4a8176a900 10 osd.12 99 load_pgs loaded pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] log((0'0,0'0], crt=0'0)
2015-10-05 22:36:16.350769 7f4a8176a900 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] null
2015-10-05 22:36:16.383015 7f4a66378700 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] handle_peering_event: epoch_sent: 99 epoch_requested: 99 NullEvt
2015-10-05 22:36:18.891128 7f4a6f38a700 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] null
2015-10-05 22:36:18.951343 7f4a65b77700 10 osd.12 pg_epoch: 99 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] handle_advance_map [4]/[4] -- 4/4
2015-10-05 22:36:18.951356 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] state<Reset>: Reset advmap
2015-10-05 22:36:18.951375 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] generate_past_intervals over epochs 21-99
2015-10-05 22:36:18.951778 7f4a65b77700 20 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] new interval newup [4] newacting [4]
2015-10-05 22:36:18.951787 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] state<Reset>: should restart peering, calling start_peering_interval again
2015-10-05 22:36:18.951793 7f4a65b77700 20 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=99 crt=0'0 inactive NOTIFY] set_last_peering_reset 100
2015-10-05 22:36:18.951798 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=100 crt=0'0 inactive NOTIFY] Clearing blocked outgoing recovery messages
2015-10-05 22:36:18.951804 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [1,4] r=-1 lpr=100 crt=0'0 inactive NOTIFY] Not blocking outgoing recovery messages
2015-10-05 22:36:18.951816 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [4] r=-1 lpr=100 pi=99-99/1 crt=0'0 inactive NOTIFY] start_peering_interval: check_new_interval output: generate_past_intervals interval(99-99 up [1,4](1) acting [1,4](1)): not rw, up_thru 98 up_from 11 last_epoch_clean 21
2015-10-05 22:36:18.951834 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 20/99/11) [4] r=-1 lpr=100 pi=99-99/1 crt=0'0 inactive NOTIFY]  noting past interval(99-99 up [1,4](1) acting [1,4](1))
2015-10-05 22:36:18.951842 7f4a65b77700 10 osd.12 pg_epoch: 100 pg[1.9( empty local-les=21 n=0 ec=4 les/c 21/21 100/100/100) [4] r=-1 lpr=100 pi=99-99/1 crt=0'0 inactive NOTIFY]  up [1,4] -> [4], acting [1,4] -> [4], acting_primary 1 -> 4, up_primary 1 -> 4, role -1 -> -1
...

same_interval_since is reset to 99.. and the past interval generated by genreate_past_intervals is wrong (99-99 intead of 21-99). that means that forever after (until clean, at least) other osds will also try to recalculate that interval too (and not correct it).

indirectly led to #13382

/a/sage-2015-10-05_14:30:41-upgrade:hammer-x-infernalis---basic-vps/1089801

Actions #1

Updated by David Zafman about 8 years ago

  • Status changed from New to Can't reproduce

Not sure if this is really fix, but the osd logs aren't available anymore.

We'll reopen if is is seen again.

Actions #2

Updated by David Zafman about 8 years ago

  • Assignee set to David Zafman
Actions

Also available in: Atom PDF