Bug #2210: osd: some PGs remains remapped or degraded - Ceph - Ceph

Actions

Copy link

Bug #2210

closed

osd: some PGs remains remapped or degraded

Added by soft crack about 12 years ago. Updated about 12 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Some PGs remains 'remapped' or 'degraded' status after adding an osd server.

The steps to re-produce the bugs:
1. Create a new ceph cluster in one server. The ceph.conf file is attached.

mkdir /tmp/ceph
mkcephfs -c /etc/ceph/ceph.conf --prepare-monmap -d /tmp/ceph
mkfs.btrfs /dev/sdb
sudo mount /dev/sdb /mnt/osd.0
sudo mkcephfs --init-local-daemons osd -d /tmp/ceph/
sudo mkcephfs --init-local-daemons mds -d /tmp/ceph/
mkcephfs --prepare-mon -d /tmp/ceph
sudo mkcephfs --init-local-daemons mon -d /tmp/ceph

sudo service ceph start
sudo cp /tmp/ceph/keyring.admin /etc/ceph/keyring.client.admin

As we have only one osd server and the replication level is set to 2 by default, all PGs are degraded. Now 'sudo ceph -s' outputs:

2012-03-26 10:43:24.101281    pg v6: 198 pgs: 198 active+degraded; 8730 bytes data, 1268 KB used, 67930 MB / 70006 MB avail
2012-03-26 10:43:24.102380   mds e4: 1/1/1 up {0=0=up:active}
2012-03-26 10:43:24.102462   osd e3: 1 osds: 1 up, 1 in
2012-03-26 10:43:24.102616   log 2012-03-26 10:41:14.676241 mon.0 192.168.12.201:6789/0 5 : [INF] mds.0 192.168.12.201:6800/18839 up:active
2012-03-26 10:43:24.102731   mon e1: 1 mons at {0=192.168.12.201:6789/0}

2. Add a new osd server on another computer to the cluster

a. copy file ceph.conf to the osd server and add a section:

[osd.1]
        host = server03
        btrfs devs = /dev/sdb2
        osd journal = /dev/sda3

b. get monmap file on mon server using command 'sudo ceph mon getmap -o /tmp/monmap' and copy the file to '/tmp/monmap' on the new osd server
c. create and start the new osd server

sudo mkfs.btrfs /dev/sdb2
sudo mount /dev/sdb2 /mnt/osd.1
sudo ceph-osd -i 1 --mkfs --monmap /tmp/monmap --mkkey

d. copy /etc/ceph/keyring.osd.1 to /tmp on mon host and on mon host:

sudo ceph auth add osd.1 osd 'allow *' mon 'allow rwx' -i /tmp/keyring.osd.1
sudo ceph osd setmaxosd 2

e. start osd server on osd host

Now, 'ceph -s' outputs:

2012-03-26 11:23:20.080953    pg v30: 198 pgs: 198 active+degraded; 8730 bytes data, 1884 KB used, 1923 GB / 1927 GB avail
2012-03-26 11:23:20.082102   mds e9: 1/1/1 up {0=0=up:active}
2012-03-26 11:23:20.082184   osd e14: 2 osds: 2 up, 2 in
2012-03-26 11:23:20.082339   log 2012-03-26 11:21:54.361487 mon.0 192.168.12.201:6789/0 7 : [INF] osd.1 192.168.12.203:6800/9713 boot
2012-03-26 11:23:20.082484   mon e1: 1 mons at {0=192.168.12.201:6789/0}

3. Include the new osd in data placement

sudo ceph osd getcrushmap -o /tmp/crush
crushtool -d /tmp/crush -o /tmp/crush.txt
#edit crush.txt, and the txt file is attached
vi /tmp/crush.txt
crushtool -c /tmp/crush.txt -o /tmp/crush.new

4. Watch cluster activity by 'ceph -w'. Finanlly, 'ceph -s' outputs:

2012-03-26 11:38:25.954504    pg v121: 198 pgs: 174 active+clean, 9 active+remapped, 15 active+degraded; 8730 bytes data, 2784 KB used, 1923 GB / 1927 GB avail
2012-03-26 11:38:25.956305   mds e9: 1/1/1 up {0=0=up:active}
2012-03-26 11:38:25.956522   osd e18: 2 osds: 2 up, 2 in
2012-03-26 11:38:25.956814   log 2012-03-26 11:35:23.955189 osd.1 192.168.12.203:6800/9713 97 : [INF] 2.3d scrub ok
2012-03-26 11:38:25.957011   mon e1: 1 mons at {0=192.168.12.201:6789/0}

'ceph osd dump' outputs:

2012-03-26 11:41:00.413699 mon <- [osd,dump]
2012-03-26 11:41:00.414630 mon.0 -> 'dumped osdmap epoch 18' (0)
epoch 18
fsid 3a82be90-56e1-4f57-ae53-94c46ef325aa
created 2012-03-26 10:41:11.824618
modifed 2012-03-26 11:30:45.981408
flags 

pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0

max_osd 2
osd.0 up   in  weight 1 up_from 11 up_thru 15 down_at 10 last_clean_interval [2,9) 192.168.12.201:6801/20205 192.168.12.201:6802/20205 192.168.12.201:6803/20205 exists,up
osd.1 up   in  weight 1 up_from 14 up_thru 17 down_at 8 last_clean_interval [5,7) 192.168.12.203:6800/9713 192.168.12.203:6801/9713 192.168.12.203:6802/9713 exists,up

pg_temp 0.e [1,0]
pg_temp 0.14 [1,0]
pg_temp 0.21 [1,0]
pg_temp 1.d [1,0]
pg_temp 1.13 [1,0]
pg_temp 1.20 [1,0]
pg_temp 2.c [1,0]
pg_temp 2.12 [1,0]
pg_temp 2.1f [1,0]
blacklist 192.168.12.201:6800/18839 expires 2012-03-26 11:44:27.224895

Files

Download all files