Bug #11399
closedunittest_osd_types fails on wip-newstore
0%
Description
It is a transient error (see log.txt.gz for the full log).
[ RUN ] PGLogTest.rewind_divergent_log osd/PGLog.cc: In function 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7fb60e4bb7c0 time 2015-04-15 09:52:58.549159 osd/PGLog.cc: 472: FAILED assert(newhead >= log.tail) ceph version 0.93-1031-g24196c9 (24196c9152a6cf3c1877c0e119702186f45d0549)
This run comes from https://github.com/ceph/ceph/pull/4369 which modifies something trivial that could not be blamed for this error.
Files
Updated by Loïc Dachary about 9 years ago
FAIL: ./src/unittest_osd_types.log Running main() from gmock_main.cc [==========] Running 29 tests from 10 test cases. [----------] Global test environment set-up. [----------] 6 tests from hobject [ RUN ] hobject.prefixes0 test/osd/types.cc:37: Failure Value of: prefixes_correct Actual: { "0000000000000000.02A" } Expected: prefixes_out Which is: { "0000000000000000.045" } [ FAILED ] hobject.prefixes0 (0 ms) [ RUN ] hobject.prefixes1 test/osd/types.cc:53: Failure Value of: prefixes_correct Actual: { "0000000000000014.F0", "0000000000000014.F4", "0000000000000014.F8", "0000000000000014.FC" } Expected: prefixes_out Which is: { "0000000000000014.F0", "0000000000000014.F1", "0000000000000014.F2", "0000000000000014.F3" } [ FAILED ] hobject.prefixes1 (0 ms) [ RUN ] hobject.prefixes2 test/osd/types.cc:73: Failure
earlier in the log.
Updated by Loïc Dachary about 9 years ago
- Project changed from Ceph to sepia
- Subject changed from osd/PGLog.cc: 472: FAILED assert(newhead >= log.tail) to rex003 RAM failure
In the context of https://github.com/ceph/ceph/pull/3946 Kefu was able to reproduce the problem on rex003 but never on his own machine. That combined with the fact that hobject_t::get_prefixes is only computation and can hardly fail because of any environmental issue, it suggests rex003 is having a bit of RAM that is not working as it should.
Updated by Loïc Dachary about 9 years ago
- Assignee set to Loïc Dachary
ubuntu@rex003 runs memtester in screen as sudo memtester 60G
Updated by Loïc Dachary about 9 years ago
Increase CPU activity to maximize the chances to discover a bad RAM
[ubuntu@rex003 ~]$ for i in $(seq 1 20); do while : ; do : ; done & done [1] 13978 [2] 13979 [3] 13980 [4] 13981 [5] 13982 [6] 13983 [7] 13984 [8] 13985 [9] 13986 [10] 13987 [11] 13988 [12] 13989 [13] 13990 [14] 13991 [15] 13992 [16] 13993 [17] 13994 [18] 13995 [19] 13996 [20] 13997
Updated by Loïc Dachary about 9 years ago
memtester hang, kill -9 does nothing, strace -p shows nothing. Restarted sudo memtester 60G after a reboot --force hoping for better results. Without the CPU activity this time.
Updated by Loïc Dachary about 9 years ago
The first run was successfull. During the second run a busy loop was run on 20 core. The third run is in progress and no sign of error.
Stuck Address : ok Random Value : ok Compare XOR : ok Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Solid Bits : ok Block Sequential : ok Checkerboard : ok Bit Spread : ok Bit Flip : ok Walking Ones : ok Walking Zeroes : ok 8-bit Writes : ok 16-bit Writes : ok
Updated by Loïc Dachary about 9 years ago
- Project changed from sepia to Ceph
- Subject changed from rex003 RAM failure to unittest_osd_types fails on wip-newstore
- Status changed from 12 to Rejected
At this point in time all pull requests targetting wip-newstore are expected to fail make check.