Project

General

Profile

Support #20046

PGs stuck in active+rempped state

Added by Bjoern Teipel almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)

I have attached a stuck PG dump to this ticket.
Here the high level status.

At this point we have not yet found the issue leading to this condition.

ceph pg dump_stuck unclean
ok

pg_stat state up up_primary acting acting_primary
6.1a3 active+remapped [16,15] 16 [16,15,12] 16
6.6b3 active+remapped [25,15] 25 [25,15,5] 25
6.4c active+remapped [26,15] 26 [26,15,5] 26

  1. ceph pg map 6.1a3
    osdmap e18487 pg 6.1a3 (6.1a3) -> up [16,15] acting [16,15,12]

log from the OSD12

2017-05-20 12:44:37.962450 7fb0ea2028c0 -1 ** ERROR: error flushing journal /var/lib/ceph/osd/ceph-12/journal for object store /var/lib/ceph/osd/ceph-12: (16) Device or resource busy

2017-05-22 08:09:31.750950 7f87a8cf5700 0 -- 10.40.204.55:6801/1367931 submit_message osd_op_reply(8059177 rbd_data.25b9ab28cb25d9.00000000000000b5 [set-alloc-hint object_size 8388608 write_size 8388608,write 2510848~4096] v18487'18885804 uv18885804 ondisk = 0) v6 remote, 10.40.204.102:0/971063084, failed lossy con, dropping message 0x38325080
2017-05-22 08:26:32.613715 7f879d546700 0 -- 10.40.204.55:6801/1367931 >> 10.40.204.101:0/915052960 pipe(0x3eeef000 sd=231 :6801 s=0 pgs=0 cs=0 l=1 c=0x2c829760).accept replacing existing (lossy) channel (new one lossy=1)
2017-05-22 09:49:56.145224 7f87732c4700 0 -- 10.40.204.55:6801/1367931 >> 10.40.204.106:0/1974653405 pipe(0x4663b000 sd=623 :6801 s=0 pgs=0 cs=0 l=1 c=0x1c4be840).accept replacing existing (lossy) channel (new one lossy=1)
2017-05-22 09:50:41.252918 7f8752323700 0 -- 10.40.204.55:6801/1367931 >> 10.40.204.104:0/947951625 pipe(0x3497c000 sd=112 :6801 s=0 pgs=0 cs=0 l=1 c=0x1c4bf760).accept replacing existing (lossy) channel (new one lossy=1)
2017-05-22 10:54:37.325969 7f8768896700 0 -- 10.40.204.55:6801/1367931 >> 10.40.204.101:0/1987788711 pipe(0x4e03e000 sd=484 :6801 s=0 pgs=0 cs=0 l=1 c=0xd7287e0).accept replacing existing (lossy) channel (new one lossy=1)
2017-05-22 11:44:34.683734 7f8777f10700 0 -- 10.40.204.55:6801/1367931 >> 10.40.204.110:0/3123734214 pipe(0x1a96a000 sd=441 :6801 s=0 pgs=0 cs=0 l=1 c=0x34f1bfa0).accept replacing existing (lossy) channel (new one lossy=1)
2017-05-22 12:44:56.073541 7f876aaea700 0 -- 10.40.204.55:6801/1367931 >> 10.40.204.110:0/4178892124 pipe(0x3669b000 sd=233 :6801 s=0 pgs=0 cs=0 l=1 c=0x2c828dc0).accept replacing existing (lossy) channel (new one lossy=1)
2017-05-22 13:27:39.462251 7f879ff80700 0 -- 10.40.204.55:6801/1367931 >> 10.40.204.108:0/318776049 pipe(0x3c69000 sd=494 :6801 s=0 pgs=0 cs=0 l=1 c=0x492ed860).accept replacing existing (lossy) channel (new one lossy=1)
2017-05-22 13:28:47.320133 7f8777e0f700 0 -- 10.40.204.55:6801/1367931 >> 10.40.204.107:0/3931882591 pipe(0x2965d000 sd=476 :6801 s=0 pgs=0 cs=0 l=1 c=0x492eab00).accept replacing existing (lossy) channel (new one lossy=1)

pg_dump_61a3.txt View (14.2 KB) Bjoern Teipel, 05/22/2017 07:53 PM

History

#1 Updated by Greg Farnum almost 3 years ago

  • Tracker changed from Bug to Support
  • Status changed from New to Resolved

This is a result of CRUSH failing to map the PG in your configuration. Look at the crush tunables and max_tries.

Also available in: Atom PDF