pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout is set but not reach
In Gnocchi, with use the python-rados API and we recently encounter some data corruption when "rados_osd_op_timeout" is set.
After digging, we end up that aio_read() doesn't return the expected data and doesn't return any error.
The issue on Gnocchi side: https://github.com/gnocchixyz/gnocchi/pull/190
This have been workarounded by doing read() instead of aio_read()
Ceph version was 10.2.7, but I can reproduce it on many other version.
I have attached a script to reproduce, it actual outputs:
no timeout read(): 'my fancy blob' : True
with timeout read(): 'my fancy blob' : True
no timeout aio_read(): 'my fancy blob' (length or errno: 13): True
with timeout aio_read(): 'exc_traceback' (length or errno: 13): False
The last line shows that aio_read doesn't return the expected blob.
#6 Updated by Kefu Chai about 2 years ago
- Category set to Correctness/Safety
- Status changed from Verified to Need Review
- Assignee set to Kefu Chai
- Release set to jewel
- Component(RADOS) librados added
#8 Updated by Nathan Cutler about 2 years ago
- Status changed from Need Review to Pending Backport
- Backport set to jewel
Fixed in Infernalis by https://github.com/ceph/ceph/commit/64bca33ae76646879e6801c45e6d91852e488f8b
Needs backport to jewel.