Bug #2210
osd: some PGs remains remapped or degraded
0%
Description
Some PGs remains 'remapped' or 'degraded' status after adding an osd server.
The steps to re-produce the bugs:
1. Create a new ceph cluster in one server. The ceph.conf file is attached.
mkdir /tmp/ceph mkcephfs -c /etc/ceph/ceph.conf --prepare-monmap -d /tmp/ceph mkfs.btrfs /dev/sdb sudo mount /dev/sdb /mnt/osd.0 sudo mkcephfs --init-local-daemons osd -d /tmp/ceph/ sudo mkcephfs --init-local-daemons mds -d /tmp/ceph/ mkcephfs --prepare-mon -d /tmp/ceph sudo mkcephfs --init-local-daemons mon -d /tmp/ceph sudo service ceph start sudo cp /tmp/ceph/keyring.admin /etc/ceph/keyring.client.admin
As we have only one osd server and the replication level is set to 2 by default, all PGs are degraded. Now 'sudo ceph -s' outputs:
2012-03-26 10:43:24.101281 pg v6: 198 pgs: 198 active+degraded; 8730 bytes data, 1268 KB used, 67930 MB / 70006 MB avail 2012-03-26 10:43:24.102380 mds e4: 1/1/1 up {0=0=up:active} 2012-03-26 10:43:24.102462 osd e3: 1 osds: 1 up, 1 in 2012-03-26 10:43:24.102616 log 2012-03-26 10:41:14.676241 mon.0 192.168.12.201:6789/0 5 : [INF] mds.0 192.168.12.201:6800/18839 up:active 2012-03-26 10:43:24.102731 mon e1: 1 mons at {0=192.168.12.201:6789/0}
2. Add a new osd server on another computer to the cluster
a. copy file ceph.conf to the osd server and add a section:
[osd.1] host = server03 btrfs devs = /dev/sdb2 osd journal = /dev/sda3
b. get monmap file on mon server using command 'sudo ceph mon getmap -o /tmp/monmap' and copy the file to '/tmp/monmap' on the new osd server
c. create and start the new osd server
sudo mkfs.btrfs /dev/sdb2 sudo mount /dev/sdb2 /mnt/osd.1 sudo ceph-osd -i 1 --mkfs --monmap /tmp/monmap --mkkey
d. copy /etc/ceph/keyring.osd.1 to /tmp on mon host and on mon host:
sudo ceph auth add osd.1 osd 'allow *' mon 'allow rwx' -i /tmp/keyring.osd.1 sudo ceph osd setmaxosd 2
e. start osd server on osd host
Now, 'ceph -s' outputs:
2012-03-26 11:23:20.080953 pg v30: 198 pgs: 198 active+degraded; 8730 bytes data, 1884 KB used, 1923 GB / 1927 GB avail 2012-03-26 11:23:20.082102 mds e9: 1/1/1 up {0=0=up:active} 2012-03-26 11:23:20.082184 osd e14: 2 osds: 2 up, 2 in 2012-03-26 11:23:20.082339 log 2012-03-26 11:21:54.361487 mon.0 192.168.12.201:6789/0 7 : [INF] osd.1 192.168.12.203:6800/9713 boot 2012-03-26 11:23:20.082484 mon e1: 1 mons at {0=192.168.12.201:6789/0}
3. Include the new osd in data placement
sudo ceph osd getcrushmap -o /tmp/crush crushtool -d /tmp/crush -o /tmp/crush.txt #edit crush.txt, and the txt file is attached vi /tmp/crush.txt crushtool -c /tmp/crush.txt -o /tmp/crush.new
4. Watch cluster activity by 'ceph -w'. Finanlly, 'ceph -s' outputs:
2012-03-26 11:38:25.954504 pg v121: 198 pgs: 174 active+clean, 9 active+remapped, 15 active+degraded; 8730 bytes data, 2784 KB used, 1923 GB / 1927 GB avail 2012-03-26 11:38:25.956305 mds e9: 1/1/1 up {0=0=up:active} 2012-03-26 11:38:25.956522 osd e18: 2 osds: 2 up, 2 in 2012-03-26 11:38:25.956814 log 2012-03-26 11:35:23.955189 osd.1 192.168.12.203:6800/9713 97 : [INF] 2.3d scrub ok 2012-03-26 11:38:25.957011 mon e1: 1 mons at {0=192.168.12.201:6789/0}
'ceph osd dump' outputs:
2012-03-26 11:41:00.413699 mon <- [osd,dump] 2012-03-26 11:41:00.414630 mon.0 -> 'dumped osdmap epoch 18' (0) epoch 18 fsid 3a82be90-56e1-4f57-ae53-94c46ef325aa created 2012-03-26 10:41:11.824618 modifed 2012-03-26 11:30:45.981408 flags pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0 max_osd 2 osd.0 up in weight 1 up_from 11 up_thru 15 down_at 10 last_clean_interval [2,9) 192.168.12.201:6801/20205 192.168.12.201:6802/20205 192.168.12.201:6803/20205 exists,up osd.1 up in weight 1 up_from 14 up_thru 17 down_at 8 last_clean_interval [5,7) 192.168.12.203:6800/9713 192.168.12.203:6801/9713 192.168.12.203:6802/9713 exists,up pg_temp 0.e [1,0] pg_temp 0.14 [1,0] pg_temp 0.21 [1,0] pg_temp 1.d [1,0] pg_temp 1.13 [1,0] pg_temp 1.20 [1,0] pg_temp 2.c [1,0] pg_temp 2.12 [1,0] pg_temp 2.1f [1,0] blacklist 192.168.12.201:6800/18839 expires 2012-03-26 11:44:27.224895
Related issues
History
#1 Updated by Josh Durgin over 11 years ago
- Category set to OSD
- Source changed from Development to Community (user)
#2173 has some osd logs and related info for the same problem on a less clean cluster. Thanks for the detailed steps to reproduce!
#2 Updated by Sage Weil over 11 years ago
- Status changed from New to Duplicate
this is actually a crush problem, see #2047.