Project

General

Profile

Actions

Bug #7610

closed

Memory corruption during rados bench

Added by Mark Nelson about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is ceph 0.77-655-g195d53a-1saucy

After doing parallel 4194306 byte object writes/reads to EC encoded pools via rados bench, any further sequential read smaller than 4194281 bytes seems to cause rados to segfault with memory corruption errors. It appears to be repeatable. Interestingly, if the 4MB writes/reads are not done, smaller reads appear to work fine.

Looking at the core file with gdb, it looks like the source of this is in common/timer.cc when some kind of callback is suppposed to be getting erased on line 99. That's all GDB is telling me right now with a simple backtrace. Core dump and rados executable included.


Files

core.tgz (412 KB) core.tgz Mark Nelson, 03/05/2014 06:24 AM
valgrind.out (7.34 KB) valgrind.out Mark Nelson, 03/05/2014 11:08 AM
Actions #1

Updated by Ian Colle about 10 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Mark Nelson about 10 years ago

Ran rados under valgrind and valgrind is not happy, but I'm not sure it's really telling us much. Sam suspects general memory corruption (which perhaps his backs up).

Valgrind command used:

valgrind --tool=memcheck --leak-check=yes --soname-synonyms=somalloc=*tcmalloc* /usr/bin/rados c /tmp/cbt/ceph/ceph.conf -p rados-bench`hostname -s`-0 -b 4096 bench 300 seq --concurrent-ios 32 --no-cleanup 2> valgrind.out

Actions #3

Updated by Mark Nelson about 10 years ago

Performed further tests with tcmalloc's HEAPCHECK but without much luck.

I also ran through the same tests with 3x replication pools and have not been able to reproduce with replication instead of EC.

Actions #4

Updated by Samuel Just about 10 years ago

  • Status changed from New to 7
Actions #5

Updated by Sage Weil about 10 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF