Bug #39355
closedrunning ceph command on a partially upgraded cluster might fail
0%
Description
Trying to perform an upgrade from mimic to octopus fails because running a ceph command on a partially upgraded cluster might end up with error.
Environment:
mon0 192.168.1.10
mon1 192.168.1.11
mon2 192.168.1.12
mgr0 192.168.1.30
osd0 192.168.1.100
osd1 192.168.1.101
upgrade processes node by node : mon0, mon1, mon2, mgr0, osd0 and osd1
once mon0 gets upgraded, here is how the cluster looks like:
{
"mon": {
"ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)": 2,
"ceph version 15.0.0-408-gc74cffd (c74cffd8a8529f99a46bad67803112be483a81ba) octopus (dev)": 1
},
"mgr": {
"ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)": 1
},
"osd": {
"ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)": 4
},
"mds": {},
"overall": {
"ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)": 7,
"ceph version 15.0.0-408-gc74cffd (c74cffd8a8529f99a46bad67803112be483a81ba) octopus (dev)": 1
}
}
from either mon1 or mon2, if I run a basic command like `ceph -s` it might fail depending which monitor is actually executing the command:
[root@mon1 ~]# ceph s this time the command must have run on mon0.
Traceback (most recent call last):
File "/bin/ceph", line 1222, in <module>
retval = main()
File "/bin/ceph", line 1146, in main
sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
cmd['sig'] = parse_funcsig(cmd['sig'])
File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
raise JsonFormat(s)
ceph_argparse.JsonFormat: unknown type CephBool
[root@mon1 ~]# ceph -s
Traceback (most recent call last):
File "/bin/ceph", line 1222, in <module>
retval = main()
File "/bin/ceph", line 1146, in main
sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
cmd['sig'] = parse_funcsig(cmd['sig'])
File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
raise JsonFormat(s)
ceph_argparse.JsonFormat: unknown type CephBool
[root@mon1 ~]# ceph -s # <----
cluster:
id: 68d9bc4b-ac11-43e0-850c-61a78a188b78
health: HEALTH_WARN
noout,norebalance flag(s) set
too few PGs per OSD (4 < min 30)
services:
mon: 3 daemons, quorum mon0,mon1,mon2 (age 2h)
mgr: mon0(active)
osd: 4 osds: 4 up, 4 in
flags noout,norebalance
To verify this, I added the -m flag to the same command so I force the execution on mon0 which is upgraded, still from mon1 :
[root@mon1 ~]# ceph -m 192.168.1.10:6789 -s
cluster:
id: 68d9bc4b-ac11-43e0-850c-61a78a188b78
health: HEALTH_WARN
noout,norebalance flag(s) set
too few PGs per OSD (4 < min 30)
services:
mon: 3 daemons, quorum mon0,mon1,mon2 (age 2h)
mgr: mon0(active)
osd: 4 osds: 4 up, 4 in
flags noout,norebalance
adding `-m 192.168.1.10:6789` makes that command never failing.
As soon as I run this same command with `-m 192.168.1.11:6789` or `-m 192.168.1.12:6789` it always fails:
[root@mon1 ~]# ceph -m 192.168.1.11:6789 -s
Traceback (most recent call last):
File "/bin/ceph", line 1222, in <module>
retval = main()
File "/bin/ceph", line 1146, in main
sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
cmd['sig'] = parse_funcsig(cmd['sig'])
File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
raise JsonFormat(s)
ceph_argparse.JsonFormat: unknown type CephBool
[root@mon1 ~]# ceph -m 192.168.1.12:6789 -s
Traceback (most recent call last):
File "/bin/ceph", line 1222, in <module>
retval = main()
File "/bin/ceph", line 1146, in main
sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
cmd['sig'] = parse_funcsig(cmd['sig'])
File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
raise JsonFormat(s)
ceph_argparse.JsonFormat: unknown type CephBool
in ceph-ansible, when we upgrade a cluster, we have to run ceph commands before and after each node is upgraded.
Until now, depending on which monitor node these commands landed, the upgrade could fail right after the first monitor is upgraded.
I got around this bug by using the -m flag but I think it is still worth to open an issue for this behavior.
Updated by Igor Fedotov about 5 years ago
- Project changed from bluestore to Ceph
Updated by Greg Farnum almost 5 years ago
- Status changed from New to Closed
15.0.0 is obviously an in-development release; I believe I saw PRs go by fixing up issues with Ceph bool.
Updated by Chris MacNaughton over 4 years ago
I've seen exactly this error when upgrading from Mimic to Nautilus
Updated by Chris MacNaughton over 4 years ago
I've opened https://tracker.ceph.com/issues/41535 to track this issue on Mimic->Nautilus
Updated by Nathan Cutler over 4 years ago
- Related to Bug #41535: Trying to upgrade from Ceph Mimic to Nautilus can fail added