Project

General

Profile

Bug #15011

TEST_crush_reject_empty: ceph osd setcrushmap error on BTRFS

Added by Anonymous about 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
ceph cli
Target version:
-
Start date:
03/08/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

When running a make check on current master on my local btrfs system I have the following trace :

../qa/workunits/ceph-helpers.sh:1152: expect_failure: ./ceph osd setcrushmap -i testdir/osd-crush/empty_map.map
../qa/workunits/ceph-helpers.sh:1155: expect_failure: success=false
../qa/workunits/ceph-helpers.sh:1158: expect_failure: false
../qa/workunits/ceph-helpers.sh:1158: expect_failure: grep --quiet 'Error EINVAL' testdir/osd-crush/out
../qa/workunits/ceph-helpers.sh:1159: expect_failure: cat testdir/osd-crush/out
  • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
    Traceback (most recent call last):
    File "./ceph", line 953, in <module>
    retval = main()
    File "./ceph", line 887, in main
    sigdict, inbuf, verbose)
    File "./ceph", line 470, in new_style_command
    inbuf=inbuf)
    File "/ceph/src/pybind/ceph_argparse.py", line 1270, in json_command
    raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
    RuntimeError: "{'prefix': u'osd setcrushmap'}": exception "['{"prefix": "osd setcrushmap"}']": exception 'utf8' codec can't decode byte 0xef in position 23: invalid continuation byte

I'm adding to this bug the complete log file & the empty_map.map file.

osd-crush.sh.log View (902 KB) Anonymous, 03/08/2016 04:17 PM

empty_map.map (51 Bytes) Anonymous, 03/08/2016 04:17 PM

Associated revisions

Revision aec9cd23 (diff)
Added by Erwan Velu about 3 years ago

SubProcess: Avoid buffer corruption when calling err()

Some code like crush/CrushTester.cc, uses err() to get the output status of a
spawned command.

As per reported in bug #15011, some tests were failing because of some utf8
decoding issues. The implied buffers were generated by the output of err()
function of SubProcess.

err() is returning the str().c_str() value of 'errstr' object which is not
living much longer that the function itself.

This patch returns a std::string to avoid the temporary allocation and fixes
the associated err() calls accordingly.

Since this commit, the make check is now passing everytime.

Fixes: #15011

Signed-off-by: Erwan Velu <>

History

#1 Updated by Anonymous about 3 years ago

I got the same trace in another test :

../qa/workunits/cephtool/test.sh:1702: test_mon_crushmap_validation: printf '%s\n' '#!/bin/sh
cat > /dev/null
exit 0'
../qa/workunits/cephtool/test.sh:1707: test_mon_crushmap_validation: ceph osd setcrushmap -i /tmp/cephtool27928/map
  • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
    2016-03-08 17:39:47.522648 7f8e0a1a2700 -1 WARNING: the following dangerous and experimental features are enabled: *
    2016-03-08 17:39:47.522655 7f8e0a1a2700 0 lockdep start
    2016-03-08 17:39:47.528534 7f8e0a1a2700 -1 WARNING: the following dangerous and experimental features are enabled: *
    set crush map
    2016-03-08 17:39:47.793462 7f8e0a1a2700 0 lockdep stop
    ../qa/workunits/cephtool/test.sh:1709: test_mon_crushmap_validation: printf '%s\n' '#!/bin/sh
    cat > /dev/null
    exit 1'
    ../qa/workunits/cephtool/test.sh:1714: test_mon_crushmap_validation: expect_false ceph osd setcrushmap -i /tmp/cephtool27928/map
    ../qa/workunits/cephtool/test.sh:30: expect_false: set -x
    ../qa/workunits/cephtool/test.sh:31: expect_false: ceph osd setcrushmap -i /tmp/cephtool27928/map
  • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
    2016-03-08 17:39:47.975821 7fd1eb8f0700 -1 WARNING: the following dangerous and experimental features are enabled: *
    2016-03-08 17:39:47.975828 7fd1eb8f0700 0 lockdep start
    2016-03-08 17:39:47.981257 7fd1eb8f0700 -1 WARNING: the following dangerous and experimental features are enabled: *
    Traceback (most recent call last):
    File "/ceph/src/ceph", line 953, in <module>
    retval = main()
    File "/ceph/src/ceph", line 887, in main
    sigdict, inbuf, verbose)
    File "/ceph/src/ceph", line 470, in new_style_command
    inbuf=inbuf)
    File "/ceph/src/pybind/ceph_argparse.py", line 1270, in json_command
    raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
    RuntimeError: "{'prefix': u'osd setcrushmap'}": exception "['{"prefix": "osd setcrushmap"}']": exception 'utf8' codec can't decode byte 0xf0 in position 22: invalid continuation byte

#2 Updated by Anonymous about 3 years ago

And more suprisingly in osd pool create testing :

./test/mon/osd-crush.sh:129: TEST_crush_rule_create_erasure: expect_failure testdir/osd-crush 'Error EINVAL' ./ceph osd pool create mypool 1 1 erasure
../qa/workunits/ceph-helpers.sh:1146: expect_failure: local dir=testdir/osd-crush
../qa/workunits/ceph-helpers.sh:1147: expect_failure: shift
../qa/workunits/ceph-helpers.sh:1148: expect_failure: local 'expected=Error EINVAL'
../qa/workunits/ceph-helpers.sh:1149: expect_failure: shift
../qa/workunits/ceph-helpers.sh:1150: expect_failure: local success
../qa/workunits/ceph-helpers.sh:1152: expect_failure: ./ceph osd pool create mypool 1 1 erasure
../qa/workunits/ceph-helpers.sh:1155: expect_failure: success=false
../qa/workunits/ceph-helpers.sh:1158: expect_failure: false
../qa/workunits/ceph-helpers.sh:1158: expect_failure: grep --quiet 'Error EINVAL' testdir/osd-crush/out
../qa/workunits/ceph-helpers.sh:1159: expect_failure: cat testdir/osd-crush/out
  • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
    Traceback (most recent call last):
    File "./ceph", line 953, in <module>
    retval = main()
    File "./ceph", line 887, in main
    sigdict, inbuf, verbose)
    File "./ceph", line 470, in new_style_command
    inbuf=inbuf)
    File "/ceph/src/pybind/ceph_argparse.py", line 1270, in json_command
    raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
    RuntimeError: "{u'pool_type': u'erasure', 'prefix': u'osd pool create', u'pg_num': 1L, u'pool': u'mypool', u'pgp_num': 1L}": exception "['{"pool_type": "erasure", "prefix": "osd pool create", "pg_num": 1, "pool": "mypool", "pgp_num": 1}']": exception 'utf8' codec can't decode byte 0xe0 in position 33: invalid continuation byte
    ../qa/workunits/ceph-helpers.sh:1160: expect_failure: return 1
    ./test/mon/osd-crush.sh:130: TEST_crush_rule_create_erasure: return 1
    ./test/mon/osd-crush.sh:32: run: return 1

#3 Updated by Anonymous about 3 years ago

I worked on it trying to understand where this decoding error comes from.

The decoding complains comes from https://github.com/ceph/ceph/blob/master/src/pybind/rados/rados.pyx#L1155
Getting deeper in the calls, it happen that _outs looks like corrupted like this :
2016-03-10 09:23:01.549686 7fb54ebd3700 -1 mon.a@0(leader).osd e7 error on crush map: ÀeƵ<8d>U

We are able to find this kind of corrupted string inside outs (some other exists to).

Looking at the code, I ended up here : https://github.com/ceph/ceph/blob/2b9775db2c062ddac6794a25a041beced5f6784f/src/mon/OSDMonitor.cc#L5400
This call is pretty strange as since commit https://github.com/ceph/ceph/commit/aa238e5ed50f44a94caf84567267e4f6be8732a2, CrushTester::test_with_crushtool() expects 4 arguments and not 3, which could explains such issue.

I'm not confident enough on this part to make a patch for adding the missing ruleset but something is really strange here.

#4 Updated by Anonymous about 3 years ago

Ok I got wrong on my assumptions. The prototypes makes the last argument optional. I have so to determine why and who is corrupting this buffer.

#5 Updated by Anonymous about 3 years ago

Continued my debugging and reach that state :

In https://github.com/ceph/ceph/blob/master/src/common/SubProcess.h#L364, errstr is perfectly set (I added instrement code to prove it)
but at this particular time, if we access errstr via its method (https://github.com/ceph/ceph/blob/master/src/common/SubProcess.h#L222) it is just corrupted
to prove that, I added the following code
+ FILE * f;
+ f=fopen("/tmp/join", "w");
+ fprintf(f, "|cmd=%s| exit status=%d; err=|%s| err2=|%s|\n", cmd.c_str(), WEXITSTATUS,err(),errstr.str().c_str());
+ fclose(f);

and I get
cmd=crushtool exit status=1; err= `����U err2= crushtool: exit status: 1

Note that err() function does only {return (errstr.str().c_str()}

#6 Updated by Greg Farnum about 2 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF