Bug #7116
mon: pg_temp left behind if pool deleted while pg_temp exist
0%
Description
I bumped the pg_num/pgp_num for pool 3 and then deleted pool 3 quickly afterwards:
./ceph osd dump
....
pg_temp 3.2 [2,3]
pg_temp 3.5 [2,1]
I modified the mon to remove any pg_temp if the pool doesn't exist but the OSD seemed to keep re-adding them back in the OSDMap.
History
#1 Updated by David Zafman over 9 years ago
- Assignee set to David Zafman
#2 Updated by Ian Colle over 9 years ago
- Priority changed from Normal to High
#3 Updated by David Zafman over 9 years ago
- Status changed from New to Fix Under Review
#4 Updated by David Zafman over 9 years ago
- Status changed from Fix Under Review to 12
- Assignee changed from David Zafman to Joao Eduardo Luis
The first cut at fixing this problem is in wip-7116. Sage had comments on the pull request for this branch https://github.com/ceph/ceph/pull/1087.
Below is a script which created the orphaned pg_temps when run on my build machine in the dir "src" in a build tree.
#! /bin/sh OSD=4 ./vstart.sh -l -n -d -o "osd_min_pg_log_entries=5" -o "osd_max_pg_log_entries=10" sleep 10 while(true) do ./rados rmpool testpgtemp testpgtemp --yes-i-really-really-mean-it ./rados mkpool testpgtemp ./ceph osd pool set testpgtemp size 1 ./rados -p testpgtemp bench 240 write --no-cleanup ./ceph osd pool set testpgtemp size 4 while(true) do ./ceph osd dump > foo clear date cat foo grep pg_temp foo if [ $? = "0" ]; then break fi done echo removing... ./rados rmpool testpgtemp testpgtemp --yes-i-really-really-mean-it ./ceph osd dump | grep pg_temp > /dev/null if [ $? = "0" ]; then break fi done
#5 Updated by Joao Eduardo Luis over 9 years ago
- Subject changed from pg_temp left behind if pool deleted while pg_temp exist to mon: pg_temp left behind if pool deleted while pg_temp exist
- Category set to Monitor
#6 Updated by Joao Eduardo Luis over 9 years ago
- Status changed from 12 to Fix Under Review
wip-7116-joao ; https://github.com/ceph/ceph/pull/1153
Haven't been able to reproduce using David's test for well over an hour now, at it used to be a matter of minutes.
#7 Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Resolved
- Source changed from other to Development
#8 Updated by Dan van der Ster over 9 years ago
Is there a way cleanup orphaned pg_temp's that are in a cluster from before this patch existed? We still have quite a few
- ceph osd dump | grep pg_temp | wc -l
1352
even after upgrading to 0.67.7.
Sorry to update this resolved ticket.