Project

General

Profile

Actions

Bug #12720

closed

probably import/export related: osd/PG.cc: 6893: FAILED assert(pg->peer_info.count(so))

Added by Samuel Just over 8 years ago. Updated about 8 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-08-17 15:39:51.750439 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30(unlocked)] enter Initial
2015-08-17 15:39:51.750453 7facf0b6c700 20 osd.4 pg_epoch: 309 pg[0.30(unlocked)] enter NotTrimming
2015-08-17 15:39:51.750461 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( DNE empty local-les=0 n=0 ec=0 les/c 0/0 0/0/0) [] r=0 lpr=0 crt=0'0 inactive] init role 0 up [4] acting [4,1] history ec=1 les/c 249/249 303/307/307 9 past_intervals
2015-08-17 15:39:51.750502 7facf0b6c700 5 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 4294967295'18446744073709551615, trimmed:
2015-08-17 15:39:51.750519 7facf0b6c700 7 osd.4 310 _create_lock_pg pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=0 pi=249-275/9 crt=0'0 mlcod 0'0 inactive]
2015-08-17 15:39:51.750527 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=0 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] handle_create
2015-08-17 15:39:51.750537 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=0 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] exit Initial 0.000097 0 0.000000
2015-08-17 15:39:51.750548 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=0 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] enter Reset
2015-08-17 15:39:51.750557 7facf0b6c700 20 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=0 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] set_last_peering_reset 309
2015-08-17 15:39:51.750565 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] Clearing blocked outgoing recovery messages
2015-08-17 15:39:51.750574 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] Not blocking outgoing recovery messages
2015-08-17 15:39:51.750585 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] update_heartbeat_peers -> 1,4
2015-08-17 15:39:51.750594 7facf0b6c700 20 osd.4 310 need_heartbeat_peer_update
2015-08-17 15:39:51.750596 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] take_waiters
2015-08-17 15:39:51.750605 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] exit Reset 0.000056 1 0.000047
2015-08-17 15:39:51.750615 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] enter Started
2015-08-17 15:39:51.750624 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] enter Start
2015-08-17 15:39:51.750633 7facf0b6c700 1 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] state<Start>: transitioning to Primary
2015-08-17 15:39:51.750644 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] exit Start 0.000019 0 0.000000
2015-08-17 15:39:51.750654 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] enter Started/Primary
2015-08-17 15:39:51.750664 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 inactive] enter Started/Primary/Peering
2015-08-17 15:39:51.750673 7facf0b6c700 5 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetInfo
2015-08-17 15:39:51.750684 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] _calc_past_interval_range: already have past intervals back to 249
2015-08-17 15:39:51.750695 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(307-275 up [5](5) acting [5,2](5))
2015-08-17 15:39:51.750706 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(303-306 up [4](4) acting [1,5](1) maybe_went_rw)
2015-08-17 15:39:51.750718 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(280-302 up [1](1) acting [1,5](1) maybe_went_rw)
2015-08-17 15:39:51.750729 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(279-279 up [1](1) acting [5,2](5))
2015-08-17 15:39:51.750739 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(276-278 up [](-1) acting [5,2](5) maybe_went_rw)
2015-08-17 15:39:51.750749 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(270-275 up [5](5) acting [5,2](5) maybe_went_rw)
2015-08-17 15:39:51.750760 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(268-269 up [5](5) acting [5](5))
2015-08-17 15:39:51.750770 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(264-267 up [2](2) acting [2](2))
2015-08-17 15:39:51.750780 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(249-263 up [3,2](3) acting [3,2](3) maybe_went_rw)
2015-08-17 15:39:51.750790 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior prior osd.3 is down
2015-08-17 15:39:51.750799 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior final: probe 1,2,4,5 down 3 blocked_by {}
2015-08-17 15:39:51.750810 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] up_thru 303 < same_since 307, must notify monitor
2015-08-17 15:39:51.750821 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: querying info from osd.1
2015-08-17 15:39:51.750832 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: querying info from osd.2
2015-08-17 15:39:51.750842 7facf0b6c700 10 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: querying info from osd.5
2015-08-17 15:39:51.750854 7facf0b6c700 15 osd.4 pg_epoch: 309 pg[0.30( empty local-les=0 n=0 ec=1 les/c 249/249 303/307/307) [4]/[4,1] r=0 lpr=309 pi=249-275/9 crt=0'0 mlcod 0'0 peering] publish_stats_to_osd 309:1

PriorSet: build_prior interval(307-275 up [5](5) acting [5,2](5)) indicates probably an issue rebuilding the prior set.

ubuntu@teuthology:/ceph/teuthology-archive/samuelj-2015-08-17_14:38:16-rados-wip-sam-working-distro-basic-multi/1019231/remote

Based on master around f4853d88318a3c18ffd3f067020d486f92dc8a97 (proxy-write changes, shouldn't be related)


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #12688: pg import can cause bogus interval (first > last)Duplicate08/13/2015

Actions
Actions #1

Updated by Samuel Just over 8 years ago

Sage says same_primary_since was probably wrong.

Actions #2

Updated by Sage Weil over 8 years ago

  • Status changed from New to 12
Actions #3

Updated by Sage Weil over 8 years ago

-2042> 2015-09-09 01:08:28.147484 7f823a00c700 10 osd.0 pg_epoch: 1242 pg[0.2c7( empty local-les=0 n=0 ec=1 les/c 1138/1140 1242/1242/1242) [0,3] r=0 lpr=1242 pi=1140-1206/11 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(1242-1206 up [0,3](0) acting [0
,3](0))

another instance of the bogus pg interval, /var/lib/teuthworker/archive/sage-2015-09-08_17:29:00-rados-wip-sage-testing---basic-multi/1047194

Actions #4

Updated by Sage Weil about 8 years ago

  • Status changed from 12 to Can't reproduce
Actions

Also available in: Atom PDF