Bug #6796
closedceph mons interpretting pg splits very wrong
0%
Description
logs are copied to mira055.front.sepia.ceph.com:/home/ubuntu/bug_6776_1
to reproduce the bug,
roles: - [mon.a, mon.b, mds.a, osd.0, osd.1, osd.2] - [mon.c, osd.3, osd.4, client.0] targets: ubuntu@mira055.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuGXr9u0bZulWME/jgz9DahtJZoS6GQtyck22cSVLQ62NRidREQwSlJ/1VQTdfhiFehKQVw/U9mVt1ljlF/xbrbqpq+p7cdrZLnePGUau7d12cF4yedp/BV4xWdIT2MBjY1tX2a3G4OJc9fQdep96BtkdKhSGm5TUOcu/Avwhr9rbcSwQM/ZmbKGwIXEZJUHtUY+dH4IZmBoxwieF+bQmfTpk1E/VFg8VaApcdBTA3vJ7FtsB5mu+ktYAuiCpAHCW4eBDkaZ7kQoBg1WguzqYW6N1GGUuVstGazEgNaJ3A7dTt212phZbLZVJRcJnX5F+poMVMo4OgwJszZs/6gsb5 ubuntu@mira080.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGnNwnG4C8IU3gTmWaNB9YU0gbNeoBAcyD18JRuLJlLMKaZgvD2qvDgjis/4n1Fn0w7yY9YILNAI+fRlifaZRjg0nNyjIB3MpYbK/7oB12sO3R/fpNhA8FU6bt0V/9XQWrdWLe2s1PlTVgucMOVEJnp+3eFSR+3thfR9XXHqjdpOJ7Q1Ra/dLzjk/SP94i0EshvBlPl4kClyEWqKLlkMvZNGzKeJj+J9g8jagvsSJ65fdyi9qcaLVvicuOeL6T4ZGRypPaYfUNLRsPGRjTlqE2IZxC8RBtuyL4sVdFl2hBGT3HhOF4IQuFYTWgWzh9fUDJdwCVI7FH8O3id0QLirE7 tasks: #- chef: - install: branch: dumpling - ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: debug ms: 1 debug osd: 20 client: debug objecter: 20 - install.upgrade: osd.0: branch: emperor - ceph.restart: [osd.0, osd.1, osd.2] - thrashosds: timeout: 1200 chance_pgnum_grow: 1 chance_pgpnum_fix: 1 - ceph.restart: daemons: [mon.a] wait-for-healthy: false wait-for-osds-up: true - rados: clients: [client.0] ops: 4000 objects: 50 op_weights: read: 100 write: 100 delete: 50 snap_create: 50 snap_remove: 50 rollback: 50 copy_from: 50 - ceph.restart: daemons: [mon.b] wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: [a, b] - workunit: branch: dumpling clients: client.0: - rados/test.sh
Updated by Samuel Just over 10 years ago
The actual problem is that the mon sees
2013-11-18T15:04:50.683 DEBUG:teuthology.orchestra.run:Running [10.214.135.126]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set rbd pgp_num 20'
2013-11-18T15:04:50.824 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.650170 7f90b0c5c700 10 client.?.objecter ms_handle_connect 0x7f90ac01e990
2013-11-18T15:04:50.825 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.650209 7f90b0c5c700 10 client.?.objecter resend_mon_ops
2013-11-18T15:04:50.827 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.653450 7f90b2c60700 10 client.4217.objecter maybe_request_map subscribing (onetime) to next osd map
2013-11-18T15:04:50.829 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654188 7f90b0c5c700 3 client.4217.objecter handle_osd_map got epochs [11,11] > 0
2013-11-18T15:04:50.829 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654200 7f90b0c5c700 3 client.4217.objecter handle_osd_map decoding full epoch 11
2013-11-18T15:04:50.829 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654337 7f90b0c5c700 20 client.4217.objecter dump_active .. 0 homeless
2013-11-18T15:04:50.830 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654424 7f90b0c5c700 3 client.4217.objecter handle_osd_map ignoring epochs [11,11] <= 11
2013-11-18T15:04:50.830 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:04:56.654435 7f90b0c5c700 20 client.4217.objecter dump_active .. 0 homeless
2013-11-18T15:04:50.907 INFO:teuthology.task.ceph.mon.c.err:[10.214.134.140]: 2013-11-18 15:04:56.733108 7fe7b234f700 -1 bad boost::get: key val is not type long
2013-11-18T15:04:50.934 INFO:teuthology.task.ceph.mon.c.err:[10.214.134.140]: 2013-11-18 15:04:56.759372 7fe7b234f700 -1 0x7fe7b234ab18
2013-11-18T15:04:51.020 INFO:teuthology.orchestra.run.err:[10.214.135.126]: specified pgp_num 7497019 > pg_num 20
and eventually ends up with
2013-11-18T15:05:06.618 DEBUG:teuthology.orchestra.run:Running [10.214.135.126]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set metadata pg_num 30'
2013-11-18T15:05:06.759 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.583891 7f951cbb9700 10 client.?.objecter ms_handle_connect 0x7f951801e990
2013-11-18T15:05:06.759 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.583964 7f951cbb9700 10 client.?.objecter resend_mon_ops
2013-11-18T15:05:06.761 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.587143 7f951ebbd700 10 client.4207.objecter maybe_request_map subscribing (onetime) to next osd map
2013-11-18T15:05:06.762 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.587909 7f951cbb9700 3 client.4207.objecter handle_osd_map got epochs [12,12] > 0
2013-11-18T15:05:06.762 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.587917 7f951cbb9700 3 client.4207.objecter handle_osd_map decoding full epoch 12
2013-11-18T15:05:06.763 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.588014 7f951cbb9700 20 client.4207.objecter dump_active .. 0 homeless
2013-11-18T15:05:06.763 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.588061 7f951cbb9700 3 client.4207.objecter handle_osd_map ignoring epochs [12,12] <= 12
2013-11-18T15:05:06.763 INFO:teuthology.orchestra.run.err:[10.214.135.126]: 2013-11-18 15:05:12.588071 7f951cbb9700 20 client.4207.objecter dump_active .. 0 homeless
2013-11-18T15:05:06.839 INFO:teuthology.task.ceph.mon.c.err:[10.214.134.140]: 2013-11-18 15:05:12.665072 7fe7b234f700 -1 bad boost::get: key val is not type long
2013-11-18T15:05:06.841 INFO:teuthology.task.ceph.mon.c.err:[10.214.134.140]: 2013-11-18 15:05:12.667426 7fe7b234f700 -1 0x7fe7b234ab18
2013-11-18T15:05:06.902 INFO:teuthology.orchestra.run.err:[10.214.135.126]: set pool 1 pg_num to 7497019
Updated by Samuel Just over 10 years ago
- Subject changed from ceph osds crashed when running a mix of dumpling and emperor to ceph mons interpretting pg splits very wrong
Updated by Samuel Just over 10 years ago
- Assignee changed from Samuel Just to Greg Farnum
Updated by Greg Farnum over 10 years ago
Summarizing discussion with Sam and Dan:
the problem is caused by 2fe0d0d97af95c22db80800f5b9da51f672d9407, which changed the parameter type on "ceph osd pool set" from "CephInt" to "CephString". There are a number of ways this can cause issues with mixed-version monitor clusters:
1) on the client side, it can get a command description from an Emperor monitor and send the required commands to a Dumpling monitor (this is not a new issue, but is one that a fix needs to handle),
2) on the monitor side, an Emperor monitor can get well-formed commands and then forward them on to a Dumpling leader which does not understand them.
The proposed solution is to
1) be more careful about error checking on calls to cmd_getval(),
2) force monitors to only accept the intersection of commands which all monitors can interpret,
3) develop a different system for changing parameter types than this one (ie, duplicating commands with the new type?)
Updated by Joao Eduardo Luis over 10 years ago
The real issue here is that the Dumpling code is not considering the return value of cmd_getval() when obtaining the pgnum, leaving an uninitialized variable to be used. This became noticeable due to the change in Emperor, which lets the client know it expects a string instead of an integer -- and being the Dumpling monitor unable to read the string, instead of failing it just goes on causing mayhem.
The fix we discussed this afternoon means to patch Emperor such that we revert the 'CephString' change in 2fe0d0d97af95c22db80800f5b9da51f672d9407 along with making the required adjustments so that a client is able to still set 'hashpspool' (instead of using 'true' or 'false' strings, use '1' and '0' integers).
The fix itself is little more than a temporary workaround until we reach a proper way of extending the monitor's existing commands without breaking backward compatibility. A possible way of achieving this would be defining an 'api_version', but there are a few challenges involved in supporting this. For this purpose I've opened ticket #6805 to deal with this.
Also opened ticket #6806 to deal with all the cmd_getval() calls ignoring the return value.
Proposed fix in wip-6796, pull request https://github.com/ceph/ceph/pull/857
Updated by Greg Farnum over 10 years ago
- Status changed from New to In Progress
- Assignee changed from Greg Farnum to Joao Eduardo Luis
Updated by Joao Eduardo Luis over 10 years ago
- Status changed from In Progress to Pending Backport
Updated by Joao Eduardo Luis over 10 years ago
- Category changed from OSD to Monitor
Updated by Ian Colle over 10 years ago
- Priority changed from Immediate to Urgent
Downgrading to Urgent, since only waiting on Backport
Updated by Sage Weil over 10 years ago
- Status changed from Pending Backport to Resolved