Bug #6796
closed
ceph mons interpretting pg splits very wrong
Added by Tamilarasi muthamizhan over 10 years ago.
Updated over 10 years ago.
Description
logs are copied to mira055.front.sepia.ceph.com:/home/ubuntu/bug_6776_1
to reproduce the bug,
roles:
- [mon.a, mon.b, mds.a, osd.0, osd.1, osd.2]
- [mon.c, osd.3, osd.4, client.0]
targets:
ubuntu@mira055.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuGXr9u0bZulWME/jgz9DahtJZoS6GQtyck22cSVLQ62NRidREQwSlJ/1VQTdfhiFehKQVw/U9mVt1ljlF/xbrbqpq+p7cdrZLnePGUau7d12cF4yedp/BV4xWdIT2MBjY1tX2a3G4OJc9fQdep96BtkdKhSGm5TUOcu/Avwhr9rbcSwQM/ZmbKGwIXEZJUHtUY+dH4IZmBoxwieF+bQmfTpk1E/VFg8VaApcdBTA3vJ7FtsB5mu+ktYAuiCpAHCW4eBDkaZ7kQoBg1WguzqYW6N1GGUuVstGazEgNaJ3A7dTt212phZbLZVJRcJnX5F+poMVMo4OgwJszZs/6gsb5
ubuntu@mira080.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGnNwnG4C8IU3gTmWaNB9YU0gbNeoBAcyD18JRuLJlLMKaZgvD2qvDgjis/4n1Fn0w7yY9YILNAI+fRlifaZRjg0nNyjIB3MpYbK/7oB12sO3R/fpNhA8FU6bt0V/9XQWrdWLe2s1PlTVgucMOVEJnp+3eFSR+3thfR9XXHqjdpOJ7Q1Ra/dLzjk/SP94i0EshvBlPl4kClyEWqKLlkMvZNGzKeJj+J9g8jagvsSJ65fdyi9qcaLVvicuOeL6T4ZGRypPaYfUNLRsPGRjTlqE2IZxC8RBtuyL4sVdFl2hBGT3HhOF4IQuFYTWgWzh9fUDJdwCVI7FH8O3id0QLirE7
tasks:
#- chef:
- install:
branch: dumpling
- ceph:
conf:
mon:
debug mon: 20
debug ms: 1
debug paxos: 20
osd:
debug ms: 1
debug osd: 20
client:
debug objecter: 20
- install.upgrade:
osd.0:
branch: emperor
- ceph.restart: [osd.0, osd.1, osd.2]
- thrashosds:
timeout: 1200
chance_pgnum_grow: 1
chance_pgpnum_fix: 1
- ceph.restart:
daemons: [mon.a]
wait-for-healthy: false
wait-for-osds-up: true
- rados:
clients: [client.0]
ops: 4000
objects: 50
op_weights:
read: 100
write: 100
delete: 50
snap_create: 50
snap_remove: 50
rollback: 50
copy_from: 50
- ceph.restart:
daemons: [mon.b]
wait-for-healthy: false
wait-for-osds-up: true
- ceph.wait_for_mon_quorum: [a, b]
- workunit:
branch: dumpling
clients:
client.0:
- rados/test.sh
The actual problem is that the mon sees
2013-11-18T15:04:50.683 DEBUG:teuthology.orchestra.run:Running [10.214.135.126]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set rbd pgp_num 20'
2013-11-18T15:04:50.824 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.650170 7f90b0c5c700 10 client.?.objecter ms_handle_connect 0x7f90ac01e990
2013-11-18T15:04:50.825 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.650209 7f90b0c5c700 10 client.?.objecter resend_mon_ops
2013-11-18T15:04:50.827 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.653450 7f90b2c60700 10 client.4217.objecter maybe_request_map subscribing (onetime) to next osd map
2013-11-18T15:04:50.829 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654188 7f90b0c5c700 3 client.4217.objecter handle_osd_map got epochs [11,11] > 0
2013-11-18T15:04:50.829 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654200 7f90b0c5c700 3 client.4217.objecter handle_osd_map decoding full epoch 11
2013-11-18T15:04:50.829 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654337 7f90b0c5c700 20 client.4217.objecter dump_active .. 0 homeless
2013-11-18T15:04:50.830 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654424 7f90b0c5c700 3 client.4217.objecter handle_osd_map ignoring epochs [11,11] <= 11
2013-11-18T15:04:50.830 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654435 7f90b0c5c700 20 client.4217.objecter dump_active .. 0 homeless
2013-11-18T15:04:50.907 INFO:teuthology.task.ceph.mon.c.err:[10.214.134.140]: 2013-11-18 15:04:56.733108 7fe7b234f700 -1 bad boost::get: key val is not type long
2013-11-18T15:04:50.934 INFO:teuthology.task.ceph.mon.c.err:[10.214.134.140]: 2013-11-18 15:04:56.759372 7fe7b234f700 -1 0x7fe7b234ab18
2013-11-18T15:04:51.020 INFO:teuthology.orchestra.run.err:[10.214.135.126]: specified pgp_num 7497019 > pg_num 20
and eventually ends up with
2013-11-18T15:05:06.618 DEBUG:teuthology.orchestra.run:Running [10.214.135.126]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set metadata pg_num 30'
2013-11-18T15:05:06.759 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.583891 7f951cbb9700 10 client.?.objecter ms_handle_connect 0x7f951801e990
2013-11-18T15:05:06.759 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.583964 7f951cbb9700 10 client.?.objecter resend_mon_ops
2013-11-18T15:05:06.761 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.587143 7f951ebbd700 10 client.4207.objecter maybe_request_map subscribing (onetime) to next osd map
2013-11-18T15:05:06.762 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.587909 7f951cbb9700 3 client.4207.objecter handle_osd_map got epochs [12,12] > 0
2013-11-18T15:05:06.762 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.587917 7f951cbb9700 3 client.4207.objecter handle_osd_map decoding full epoch 12
2013-11-18T15:05:06.763 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.588014 7f951cbb9700 20 client.4207.objecter dump_active .. 0 homeless
2013-11-18T15:05:06.763 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.588061 7f951cbb9700 3 client.4207.objecter handle_osd_map ignoring epochs [12,12] <= 12
2013-11-18T15:05:06.763 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.588071 7f951cbb9700 20 client.4207.objecter dump_active .. 0 homeless
2013-11-18T15:05:06.839 INFO:teuthology.task.ceph.mon.c.err:[10.214.134.140]: 2013-11-18 15:05:12.665072 7fe7b234f700 -1 bad boost::get: key val is not type long
2013-11-18T15:05:06.841 INFO:teuthology.task.ceph.mon.c.err:[10.214.134.140]: 2013-11-18 15:05:12.667426 7fe7b234f700 -1 0x7fe7b234ab18
2013-11-18T15:05:06.902 INFO:teuthology.orchestra.run.err:[10.214.135.126]: set pool 1 pg_num to 7497019
- Subject changed from ceph osds crashed when running a mix of dumpling and emperor to ceph mons interpretting pg splits very wrong
- Assignee changed from Samuel Just to Greg Farnum
Summarizing discussion with Sam and Dan:
the problem is caused by 2fe0d0d97af95c22db80800f5b9da51f672d9407, which changed the parameter type on "ceph osd pool set" from "CephInt" to "CephString". There are a number of ways this can cause issues with mixed-version monitor clusters:
1) on the client side, it can get a command description from an Emperor monitor and send the required commands to a Dumpling monitor (this is not a new issue, but is one that a fix needs to handle),
2) on the monitor side, an Emperor monitor can get well-formed commands and then forward them on to a Dumpling leader which does not understand them.
The proposed solution is to
1) be more careful about error checking on calls to cmd_getval(),
2) force monitors to only accept the intersection of commands which all monitors can interpret,
3) develop a different system for changing parameter types than this one (ie, duplicating commands with the new type?)
The real issue here is that the Dumpling code is not considering the return value of cmd_getval() when obtaining the pgnum, leaving an uninitialized variable to be used. This became noticeable due to the change in Emperor, which lets the client know it expects a string instead of an integer -- and being the Dumpling monitor unable to read the string, instead of failing it just goes on causing mayhem.
The fix we discussed this afternoon means to patch Emperor such that we revert the 'CephString' change in 2fe0d0d97af95c22db80800f5b9da51f672d9407 along with making the required adjustments so that a client is able to still set 'hashpspool' (instead of using 'true' or 'false' strings, use '1' and '0' integers).
The fix itself is little more than a temporary workaround until we reach a proper way of extending the monitor's existing commands without breaking backward compatibility. A possible way of achieving this would be defining an 'api_version', but there are a few challenges involved in supporting this. For this purpose I've opened ticket #6805 to deal with this.
Also opened ticket #6806 to deal with all the cmd_getval() calls ignoring the return value.
Proposed fix in wip-6796, pull request https://github.com/ceph/ceph/pull/857
- Status changed from New to In Progress
- Assignee changed from Greg Farnum to Joao Eduardo Luis
- Status changed from In Progress to Pending Backport
- Category changed from OSD to Monitor
- Priority changed from Immediate to Urgent
Downgrading to Urgent, since only waiting on Backport
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF