Bug #57890
cmd_getval() throws but many callers don't catch the exception
0%
Description
In https://github.com/ceph/ceph/pull/23557 we switched cmd_getval()
to throw on error. This family of functions have around 500 of users while only few were fitted with try {} catch
as – possibly – even an unhandled exception can be deemed a lesser evil than a byzantine fault (due to letting a program to execute further after ignoring an error). However, recently https://tracker.ceph.com/issues/54558 popped up and ignited the discussion (see https://github.com/ceph/ceph/pull/45547#pullrequestreview-1099607585).
Could we tackle the problem by switching back to error codes while using [[nodiscard]]
to hunt down all callers who ignore the errors?
History
#1 Updated by Radoslaw Zarzynski over 1 year ago
- Description updated (diff)
#2 Updated by Yaarit Hatuka over 1 year ago
For reference, here are crashes with `cmd_getval` in their backtrace:
http://telemetry.front.sepia.ceph.com:4000/d/Nvj6XTaMk/spec-search?orgId=1&var-substr_1=cmd_getval&var-substr_2=&var-substr_3=&var-majors_affected=&var-minors_affected=&var-assert_function=&var-assert_condition=&var-sig_v1=&var-sig_v2=&var-daemons=&var-only_new_fingerprints=false&var-status_description=All&var-only_open=false
#3 Updated by Laura Flores over 1 year ago
- Priority changed from Normal to High
#4 Updated by Ben Gao over 1 year ago
I am working on this for the potential damage to storage cluster. Altogether there are 360 call(cmd_getval) to be looked at. they are located at:
src/common(14)
src/messages(6)
src/mon(308)
src/osd(29)
Will commit changes by the folder.
#5 Updated by Radoslaw Zarzynski 6 months ago
There is a related PR: https://github.com/ceph/ceph/pull/49279. However, IIUC it fixes part of the occurrences.