Bug #11399
closed
unittest_osd_types fails on wip-newstore
Added by Loïc Dachary about 9 years ago.
Updated about 9 years ago.
Description
It is a transient error (see log.txt.gz for the full log).
[ RUN ] PGLogTest.rewind_divergent_log
osd/PGLog.cc: In function 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7fb60e4bb7c0 time 2015-04-15 09:52:58.549159
osd/PGLog.cc: 472: FAILED assert(newhead >= log.tail)
ceph version 0.93-1031-g24196c9 (24196c9152a6cf3c1877c0e119702186f45d0549)
This run comes from
https://github.com/ceph/ceph/pull/4369 which modifies something trivial that could not be blamed for this error.
Files
- Description updated (diff)
FAIL: ./src/unittest_osd_types.log
Running main() from gmock_main.cc
[==========] Running 29 tests from 10 test cases.
[----------] Global test environment set-up.
[----------] 6 tests from hobject
[ RUN ] hobject.prefixes0
test/osd/types.cc:37: Failure
Value of: prefixes_correct
Actual: { "0000000000000000.02A" }
Expected: prefixes_out
Which is: { "0000000000000000.045" }
[ FAILED ] hobject.prefixes0 (0 ms)
[ RUN ] hobject.prefixes1
test/osd/types.cc:53: Failure
Value of: prefixes_correct
Actual: { "0000000000000014.F0", "0000000000000014.F4", "0000000000000014.F8", "0000000000000014.FC" }
Expected: prefixes_out
Which is: { "0000000000000014.F0", "0000000000000014.F1", "0000000000000014.F2", "0000000000000014.F3" }
[ FAILED ] hobject.prefixes1 (0 ms)
[ RUN ] hobject.prefixes2
test/osd/types.cc:73: Failure
earlier in the log.
- Project changed from Ceph to sepia
- Subject changed from osd/PGLog.cc: 472: FAILED assert(newhead >= log.tail) to rex003 RAM failure
In the context of https://github.com/ceph/ceph/pull/3946 Kefu was able to reproduce the problem on rex003 but never on his own machine. That combined with the fact that hobject_t::get_prefixes is only computation and can hardly fail because of any environmental issue, it suggests rex003 is having a bit of RAM that is not working as it should.
- Assignee set to Loïc Dachary
ubuntu@rex003 runs memtester in screen as sudo memtester 60G
The make check bot is paused.
Increase CPU activity to maximize the chances to discover a bad RAM
[ubuntu@rex003 ~]$ for i in $(seq 1 20); do while : ; do : ; done & done
[1] 13978
[2] 13979
[3] 13980
[4] 13981
[5] 13982
[6] 13983
[7] 13984
[8] 13985
[9] 13986
[10] 13987
[11] 13988
[12] 13989
[13] 13990
[14] 13991
[15] 13992
[16] 13993
[17] 13994
[18] 13995
[19] 13996
[20] 13997
memtester hang, kill -9 does nothing, strace -p shows nothing. Restarted sudo memtester 60G after a reboot --force hoping for better results. Without the CPU activity this time.
The first run was successfull. During the second run a busy loop was run on 20 core. The third run is in progress and no sign of error.
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
- Project changed from sepia to Ceph
- Subject changed from rex003 RAM failure to unittest_osd_types fails on wip-newstore
- Status changed from 12 to Rejected
At this point in time all pull requests targetting wip-newstore are expected to fail make check.
Also available in: Atom
PDF