Bug #7610: Memory corruption during rados bench - Ceph - Ceph

Actions

Copy link

Bug #7610

closed

Memory corruption during rados bench

Added by Mark Nelson about 10 years ago. Updated about 10 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Samuel Just

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This is ceph 0.77-655-g195d53a-1saucy

After doing parallel 4194306 byte object writes/reads to EC encoded pools via rados bench, any further sequential read smaller than 4194281 bytes seems to cause rados to segfault with memory corruption errors. It appears to be repeatable. Interestingly, if the 4MB writes/reads are not done, smaller reads appear to work fine.

Looking at the core file with gdb, it looks like the source of this is in common/timer.cc when some kind of callback is suppposed to be getting erased on line 99. That's all GDB is telling me right now with a simple backtrace. Core dump and rados executable included.

Files

Download all files

core.tgz (412 KB) core.tgz		Mark Nelson, 03/05/2014 06:24 AM
valgrind.out (7.34 KB) valgrind.out		Mark Nelson, 03/05/2014 11:08 AM

Actions

Copy link

Updated by Ian Colle about 10 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by Mark Nelson about 10 years ago

File valgrind.out valgrind.out added
Severity deleted (~~3 - minor~~)

Ran rados under valgrind and valgrind is not happy, but I'm not sure it's really telling us much. Sam suspects general memory corruption (which perhaps his backs up).

Valgrind command used:

valgrind --tool=memcheck --leak-check=yes --soname-synonyms=somalloc=*tcmalloc* /usr/bin/rados ~~c /tmp/cbt/ceph/ceph.conf -p rados-bench~~`hostname -s`-0 -b 4096 bench 300 seq --concurrent-ios 32 --no-cleanup 2> valgrind.out

Actions

Copy link