Project

General

Profile

Actions

Bug #7116

closed

mon: pg_temp left behind if pool deleted while pg_temp exist

Added by David Zafman over 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I bumped the pg_num/pgp_num for pool 3 and then deleted pool 3 quickly afterwards:

./ceph osd dump
....
pg_temp 3.2 [2,3]
pg_temp 3.5 [2,1]

I modified the mon to remove any pg_temp if the pool doesn't exist but the OSD seemed to keep re-adding them back in the OSDMap.

Actions #1

Updated by David Zafman over 10 years ago

  • Assignee set to David Zafman
Actions #2

Updated by Ian Colle over 10 years ago

  • Priority changed from Normal to High
Actions #3

Updated by David Zafman over 10 years ago

  • Status changed from New to Fix Under Review
Actions #4

Updated by David Zafman about 10 years ago

  • Status changed from Fix Under Review to 12
  • Assignee changed from David Zafman to Joao Eduardo Luis

The first cut at fixing this problem is in wip-7116. Sage had comments on the pull request for this branch https://github.com/ceph/ceph/pull/1087.

Below is a script which created the orphaned pg_temps when run on my build machine in the dir "src" in a build tree.

#! /bin/sh

OSD=4 ./vstart.sh -l -n -d -o "osd_min_pg_log_entries=5" -o "osd_max_pg_log_entries=10" 
sleep 10

while(true)
do

  ./rados rmpool testpgtemp testpgtemp --yes-i-really-really-mean-it
  ./rados mkpool testpgtemp
  ./ceph osd pool set testpgtemp size 1
  ./rados -p testpgtemp bench 240 write --no-cleanup
  ./ceph osd pool set testpgtemp size 4

  while(true)
  do
    ./ceph osd dump > foo
    clear
    date
    cat foo
    grep pg_temp foo
    if [ $? = "0" ];
    then
      break
    fi
  done

  echo removing... 
  ./rados rmpool testpgtemp testpgtemp --yes-i-really-really-mean-it

  ./ceph osd dump | grep pg_temp > /dev/null
  if [ $? = "0" ];
  then
    break
  fi
done
Actions #5

Updated by Joao Eduardo Luis about 10 years ago

  • Subject changed from pg_temp left behind if pool deleted while pg_temp exist to mon: pg_temp left behind if pool deleted while pg_temp exist
  • Category set to Monitor
Actions #6

Updated by Joao Eduardo Luis about 10 years ago

  • Status changed from 12 to Fix Under Review

wip-7116-joao ; https://github.com/ceph/ceph/pull/1153

Haven't been able to reproduce using David's test for well over an hour now, at it used to be a matter of minutes.

Actions #7

Updated by Sage Weil about 10 years ago

  • Status changed from Fix Under Review to Resolved
  • Source changed from other to Development
Actions #8

Updated by Dan van der Ster about 10 years ago

Is there a way cleanup orphaned pg_temp's that are in a cluster from before this patch existed? We still have quite a few

  1. ceph osd dump | grep pg_temp | wc -l
    1352

even after upgrading to 0.67.7.

Sorry to update this resolved ticket.

Actions

Also available in: Atom PDF