Bug #7610
closedMemory corruption during rados bench
0%
Description
This is ceph 0.77-655-g195d53a-1saucy
After doing parallel 4194306 byte object writes/reads to EC encoded pools via rados bench, any further sequential read smaller than 4194281 bytes seems to cause rados to segfault with memory corruption errors. It appears to be repeatable. Interestingly, if the 4MB writes/reads are not done, smaller reads appear to work fine.
Looking at the core file with gdb, it looks like the source of this is in common/timer.cc when some kind of callback is suppposed to be getting erased on line 99. That's all GDB is telling me right now with a simple backtrace. Core dump and rados executable included.
Files
Updated by Mark Nelson about 10 years ago
- File valgrind.out valgrind.out added
- Severity deleted (
3 - minor)
Ran rados under valgrind and valgrind is not happy, but I'm not sure it's really telling us much. Sam suspects general memory corruption (which perhaps his backs up).
Valgrind command used:
valgrind --tool=memcheck --leak-check=yes --soname-synonyms=somalloc=*tcmalloc* /usr/bin/rados c /tmp/cbt/ceph/ceph.conf -p rados-bench`hostname -s`-0 -b 4096 bench 300 seq --concurrent-ios 32 --no-cleanup 2> valgrind.out
Updated by Mark Nelson about 10 years ago
Performed further tests with tcmalloc's HEAPCHECK but without much luck.
I also ran through the same tests with 3x replication pools and have not been able to reproduce with replication instead of EC.