Project

General

Profile

Actions

Bug #11399

closed

unittest_osd_types fails on wip-newstore

Added by Loïc Dachary about 9 years ago. Updated about 9 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It is a transient error (see log.txt.gz for the full log).

[ RUN      ] PGLogTest.rewind_divergent_log
osd/PGLog.cc: In function 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7fb60e4bb7c0 time 2015-04-15 09:52:58.549159
osd/PGLog.cc: 472: FAILED assert(newhead >= log.tail)
 ceph version 0.93-1031-g24196c9 (24196c9152a6cf3c1877c0e119702186f45d0549)

This run comes from https://github.com/ceph/ceph/pull/4369 which modifies something trivial that could not be blamed for this error.


Files

log.txt.gz (45.9 KB) log.txt.gz test log output Loïc Dachary, 04/15/2015 10:18 AM
Actions #1

Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)
Actions #2

Updated by Loïc Dachary about 9 years ago

FAIL: ./src/unittest_osd_types.log
Running main() from gmock_main.cc
[==========] Running 29 tests from 10 test cases.
[----------] Global test environment set-up.
[----------] 6 tests from hobject
[ RUN      ] hobject.prefixes0
test/osd/types.cc:37: Failure
Value of: prefixes_correct
  Actual: { "0000000000000000.02A" }
Expected: prefixes_out
Which is: { "0000000000000000.045" }
[  FAILED  ] hobject.prefixes0 (0 ms)
[ RUN      ] hobject.prefixes1
test/osd/types.cc:53: Failure
Value of: prefixes_correct
  Actual: { "0000000000000014.F0", "0000000000000014.F4", "0000000000000014.F8", "0000000000000014.FC" }
Expected: prefixes_out
Which is: { "0000000000000014.F0", "0000000000000014.F1", "0000000000000014.F2", "0000000000000014.F3" }
[  FAILED  ] hobject.prefixes1 (0 ms)
[ RUN      ] hobject.prefixes2
test/osd/types.cc:73: Failure

earlier in the log.
Actions #3

Updated by Loïc Dachary about 9 years ago

  • Project changed from Ceph to sepia
  • Subject changed from osd/PGLog.cc: 472: FAILED assert(newhead >= log.tail) to rex003 RAM failure

In the context of https://github.com/ceph/ceph/pull/3946 Kefu was able to reproduce the problem on rex003 but never on his own machine. That combined with the fact that hobject_t::get_prefixes is only computation and can hardly fail because of any environmental issue, it suggests rex003 is having a bit of RAM that is not working as it should.

Actions #4

Updated by Loïc Dachary about 9 years ago

  • Assignee set to Loïc Dachary

ubuntu@rex003 runs memtester in screen as sudo memtester 60G

Actions #5

Updated by Loïc Dachary about 9 years ago

The make check bot is paused.

Actions #6

Updated by Loïc Dachary about 9 years ago

Increase CPU activity to maximize the chances to discover a bad RAM

[ubuntu@rex003 ~]$ for i in $(seq 1 20); do while : ; do : ; done & done
[1] 13978
[2] 13979
[3] 13980
[4] 13981
[5] 13982
[6] 13983
[7] 13984
[8] 13985
[9] 13986
[10] 13987
[11] 13988
[12] 13989
[13] 13990
[14] 13991
[15] 13992
[16] 13993
[17] 13994
[18] 13995
[19] 13996
[20] 13997

Actions #7

Updated by Loïc Dachary about 9 years ago

memtester hang, kill -9 does nothing, strace -p shows nothing. Restarted sudo memtester 60G after a reboot --force hoping for better results. Without the CPU activity this time.

Actions #8

Updated by Loïc Dachary about 9 years ago

The first run was successfull. During the second run a busy loop was run on 20 core. The third run is in progress and no sign of error.

  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok

  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok

Actions #9

Updated by Loïc Dachary about 9 years ago

  • Project changed from sepia to Ceph
  • Subject changed from rex003 RAM failure to unittest_osd_types fails on wip-newstore
  • Status changed from 12 to Rejected

At this point in time all pull requests targetting wip-newstore are expected to fail make check.

Actions

Also available in: Atom PDF