Project

General

Profile

Actions

Bug #7686

closed

osd spinning in agent_work

Added by Samuel Just about 10 years ago. Updated about 10 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Probably related to the num_dirty/archived hitset issue?

2014-03-11 16:05:15.959964 7fc99340a700 20 osd.0 212 agent_entry oids
2014-03-11 16:05:15.959998 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work max 4, flush idle, evict some, pos 0//0//-1
2014-03-11 16:05:15.961350 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work got 1 objects
2014-03-11 16:05:15.962256 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] populate_obc_watchers e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.962466 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.962606 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] get_object_context 0x104d5180 e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19 rwstate(none n=0 w=0) oi:e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0) 0 > 1 read e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.962710 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work skip (hit set) e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.962976 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work final position 0//0//-1
2014-03-11 16:05:15.963106 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_choose_mode dirty 0 full 1
2014-03-11 16:05:15.963320 7fc99340a700 10 osd.0 212 agent_entry tiers 3, top is 1000000 with pgs 59, ops 0/4 active
2014-03-11 16:05:15.963356 7fc99340a700 20 osd.0 212 agent_entry oids
2014-03-11 16:05:15.963389 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work max 4, flush idle, evict some, pos 0//0//-1
2014-03-11 16:05:15.964674 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work got 1 objects
2014-03-11 16:05:15.965586 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] populate_obc_watchers e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.965803 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.965968 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] get_object_context 0x104d2b00 e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19 rwstate(none n=0 w=0) oi:e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0) 0 -> 1 read e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.966078 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work skip (hit set) e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.966370 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work final position 0//0//-1
2014-03-11 16:05:15.966496 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_choose_mode dirty 0 full 1
2014-03-11 16:05:15.966706 7fc99340a700 10 osd.0 212 agent_entry tiers 3, top is 1000000 with pgs 59, ops 0/4 active
2014-03-11 16:05:15.966742 7fc99340a700 20 osd.0 212 agent_entry oids
2014-03-11 16:05:15.966773 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work max 4, flush idle, evict some, pos 0//0//-1
2014-03-11 16:05:15.968040 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work got 1 objects
2014-03-11 16:05:15.968906 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] populate_obc_watchers e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.969120 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.969260 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] get_object_context 0x104d4100 e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19 rwstate(none n=0 w=0) oi:e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0) 0 -> 1 read e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.969364 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work skip (hit set) e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.969659 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work final position 0//0//-1
2014-03-11 16:05:15.969793 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_choose_mode dirty 0 full 1
2014-03-11 16:05:15.970003 7fc99340a700 10 osd.0 212 agent_entry tiers 3, top is 1000000 with pgs 59, ops 0/4 active
2014-03-11 16:05:15.970034 7fc99340a700 20 osd.0 212 agent_entry oids
2014-03-11 16:05:15.970063 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work max 4, flush idle, evict some, pos 0//0//-1
2014-03-11 16:05:15.971410 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work got 1 objects
2014-03-11 16:05:15.972305 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] populate_obc_watchers e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.972511 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.972655 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] get_object_context 0x104d5180 e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19 rwstate(none n=0 w=0) oi:e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0) 0 -> 1 read e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.972765 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work skip (hit set) e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.973040 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work final position 0//0//-1
2014-03-11 16:05:15.973176 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_choose_mode dirty 0 full 1
2014-03-11 16:05:15.973408 7fc99340a700 10 osd.0 212 agent_entry tiers 3, top is 1000000 with pgs 59, ops 0/4 active
2014-03-11 16:05:15.973445 7fc99340a700 20 osd.0 212 agent_entry oids
2014-03-11 16:05:15.973483 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work max 4, flush idle, evict some, pos 0//0//-1
2014-03-11 16:05:15.973891 7fc99e420700 1 -
192.168.10.1:6802/6102 <== osd.14 192.168.10.1:0/13635 100812 ==== osd_ping(ping e212 stamp 2014-03-11 16:05:15.973529) v2 ==== 47+0+0 (3014438055 0 0) 0xb826a00 con 0x3b54940
2014-03-11 16:05:15.973928 7fc99e420700 1 -- 192.168.10.1:6802/6102 --> 192.168.10.1:0/13635 -- osd_ping(ping_reply e212 stamp 2014-03-11 16:05:15.973529) v2 -- ?+0 0xb872a00 con 0x3b54940
2014-03-11 16:05:15.973956 7fc99e420700 20 osd.0 212 _share_map_outgoing 0x3b53ce0 already has epoch 212
2014-03-11 16:05:15.975125 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work got 1 objects
2014-03-11 16:05:15.976042 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] populate_obc_watchers e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.976251 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19
2014-03-11 16:05:15.976390 7fc99340a700 10 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] get_object_context 0x104d2b00 e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19 rwstate(none n=0 w=0) oi:e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0) 0 -> 1 read e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.976494 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work skip (hit set) e6d/hit_set_19.e6d_archive_2014-03-10 15:03:33.014187_2014-03-10 15:04:33.164462/head/.ceph-internal/19(158'3 unknown.0.0:0 wrlock_by=unknown.0.0:0 s 83 uv0)
2014-03-11 16:05:15.976772 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_work final position 0//0//-1
2014-03-11 16:05:15.976907 7fc99340a700 20 osd.0 pg_epoch: 212 pg[19.e6d( v 158'5 (0'0,158'5] local-les=156 n=1 ec=148 les/c 156/156 150/150/148) [0] r=-1 lpr=150 crt=0'0 lcod 158'4 mlcod 158'4 active+clean] agent_choose_mode dirty 0 full 1


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #7674: Cache pool configuration stallsResolvedSage Weil03/10/2014

Actions
Actions #1

Updated by Mark Nelson about 10 years ago

FYI, this was/is on burnupiX after setting an OSD out and bringing it back in.

Actions #2

Updated by David Zafman about 10 years ago

agent_entry() is hitting a case where there appears to be work to do, but we never evict a hit set archive. In the log above every time agent_work() is called there is only 1 object found which happens to be an hit set archive so it is skipped. But agent_choose_mode() doesn't disables that pg so that agent_entry() can sleep waiting to be triggered by real activity.

Actions #3

Updated by Sage Weil about 10 years ago

  • Status changed from New to Duplicate
  • Source changed from other to Development
Actions

Also available in: Atom PDF