https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2016-12-04T14:49:27ZCeph Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=824962016-12-04T14:49:27ZSage Weilsage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Immediate</i></li></ul><p>/a/sage-2016-12-03_19:33:05-rados-master---basic-smithi/599046</p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=824992016-12-04T15:08:44ZSage Weilsage@newdream.net
<ul></ul><p>/a/sage-2016-12-03_19:34:03-rados-wip-sage-testing---basic-smithi/599343</p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=825712016-12-05T19:07:54ZSage Weilsage@newdream.net
<ul></ul><p>bf7d77a84b144ffdc92efd7d19d3038b75911b54 looks like it could maybe be the culprit? it moved a global static to a function static.</p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=825722016-12-05T19:17:25ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>15</i></li><li><strong>Priority</strong> changed from <i>Immediate</i> to <i>Urgent</i></li></ul><p>It's a valgrind bug: <a class="external" href="http://stackoverflow.com/questions/37032339/valgrind-unrecognised-instruction">http://stackoverflow.com/questions/37032339/valgrind-unrecognised-instruction</a></p>
<p><a class="external" href="https://github.com/svn2github/valgrind-vex/commit/1ab61656f71e94ce12b68de87f1e28cf3dc0c18c">https://github.com/svn2github/valgrind-vex/commit/1ab61656f71e94ce12b68de87f1e28cf3dc0c18c</a> fixes it.</p>
<p>So far we're only seeing this with the xenial version of valgrind, 1:3.11.0-1ubuntu4.1.</p>
<p>ubuntu launchpad bug: <a class="external" href="https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1501545">https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1501545</a></p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=825732016-12-05T19:21:47ZSage Weilsage@newdream.net
<ul></ul><p>In the meantime, we can force valgrind runs onto centos.</p>
<p><a class="external" href="https://github.com/ceph/ceph-qa-suite/pull/1301">https://github.com/ceph/ceph-qa-suite/pull/1301</a></p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=826382016-12-07T03:53:38ZSage Weilsage@newdream.net
<ul></ul><pre>
<sage> jamespage: any chance we can poke the valgrind package maintainer to update the (xenial) package? the bug was fixed about a year ago.
<frickler> sage: backporting changes may take some time, could you test whether installing the package from yakkety would solve your issue? https://launchpad.net/ubuntu/+source/valgrind/1:3.12.0~svn20160714-1ubuntu2/+build/10602185
</pre> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=829832016-12-13T19:09:59ZDavid Galloway
<ul></ul><p>Sepia smithis running Xenial now have valgrind 3.12.0~svn20160714-1ubuntu2 installed.</p>
<pre>
ansible -a "wget -O /tmp/valgrind.deb https://launchpad.net/ubuntu/+source/valgrind/1:3.12.0~svn20160714-1ubuntu2/+build/10602185/+files/valgrind_3.12.0~svn20160714-1ubuntu2_amd64.deb" smithi[079:116]
ansible -a "sudo dpkg -i /tmp/valgrind.deb" smithi[079:116]
</pre>
<p>(also did smithi126 separately)</p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=918612017-05-31T15:38:51ZGreg Farnumgfarnum@redhat.com
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/15389">https://github.com/ceph/ceph/pull/15389</a></p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=919542017-06-01T16:48:50ZSage Weilsage@newdream.net
<ul></ul><pre>
2017-06-01T03:26:56.134 INFO:tasks.ceph.osd.1.smithi192.stderr:vex amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x6 0xF 0x42 0xC1
2017-06-01T03:26:56.138 INFO:tasks.ceph.osd.1.smithi192.stderr:vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
2017-06-01T03:26:56.140 INFO:tasks.ceph.osd.1.smithi192.stderr:vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
2017-06-01T03:26:56.143 INFO:tasks.ceph.osd.1.smithi192.stderr:vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
2017-06-01T03:26:56.147 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== valgrind: Unrecognised instruction at address 0xc053b15.
2017-06-01T03:26:56.150 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== Your program just tried to execute an instruction that Valgrind
2017-06-01T03:26:56.153 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== did not recognise. There are two possible reasons for this.
2017-06-01T03:26:56.156 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== 1. Your program has a bug and erroneously jumped to a non-code
2017-06-01T03:26:56.158 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== location. If you are running Memcheck and you just saw a
2017-06-01T03:26:56.161 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== warning about a bad jump, it's probably your program's fault.
2017-06-01T03:26:56.164 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== 2. The instruction is legitimate but Valgrind doesn't handle it,
2017-06-01T03:26:56.167 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== i.e. it's Valgrind's fault. If you think this is the case or
2017-06-01T03:26:56.170 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== you are not sure, please let us know and we'll try to fix it.
2017-06-01T03:26:56.173 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== Either way, Valgrind will now raise a SIGILL signal which will
2017-06-01T03:26:56.176 INFO:tasks.ceph.osd.1.smithi192.stderr:==00:00:00:06.020 91335== probably kill your program.
2017-06-01T03:26:56.180 INFO:tasks.ceph.osd.1.smithi192.stderr:*** Caught signal (Illegal instruction) **
2017-06-01T03:26:56.183 INFO:tasks.ceph.osd.1.smithi192.stderr: in thread 96db6c0 thread_name:memcheck-amd64-
2017-06-01T03:26:56.187 INFO:tasks.ceph.osd.1.smithi192.stderr: ceph version 12.0.2-1874-g9581c1e (9581c1ec2323fe8aeeb9e60dc3397298b2350970) luminous (dev)
2017-06-01T03:26:56.190 INFO:tasks.ceph.osd.1.smithi192.stderr: 1: (()+0x9e05d2) [0xae85d2]
2017-06-01T03:26:56.193 INFO:tasks.ceph.osd.1.smithi192.stderr: 2: (()+0x11390) [0xb972390]
2017-06-01T03:26:56.196 INFO:tasks.ceph.osd.1.smithi192.stderr: 3: (()+0xb7b15) [0xc053b15]
2017-06-01T03:26:56.200 INFO:tasks.ceph.osd.1.smithi192.stderr: 4: (std::random_device::_M_getval()+0x92) [0xc053cb2]
2017-06-01T03:26:56.202 INFO:tasks.ceph.osd.1.smithi192.stderr: 5: (MonClient::_add_conns(unsigned long)+0xe9) [0xb414e9]
2017-06-01T03:26:56.206 INFO:tasks.ceph.osd.1.smithi192.stderr: 6: (MonClient::_reopen_session(int)+0x45f) [0xb42e0f]
2017-06-01T03:26:56.209 INFO:tasks.ceph.osd.1.smithi192.stderr: 7: (MonClient::authenticate(double)+0x62d) [0xb4452d]
2017-06-01T03:26:56.212 INFO:tasks.ceph.osd.1.smithi192.stderr: 8: (OSD::init()+0x265a) [0x5936ca]
2017-06-01T03:26:56.216 INFO:tasks.ceph.osd.1.smithi192.stderr: 9: (main()+0x2ebc) [0x4a8b5c]
2017-06-01T03:26:56.221 INFO:tasks.ceph.osd.1.smithi192.stderr: 10: (__libc_start_main()+0xf0) [0xc85d830]
2017-06-01T03:26:56.224 INFO:tasks.ceph.osd.1.smithi192.stderr: 11: (_start()+0x29) [0x52eb59]
2017-06-01T03:26:56.229 INFO:tasks.ceph.osd.1.smithi192.stderr:2017-06-01 03:26:56.075549 96db6c0 -1 osd.1 0 log_to_monitors {default=true}
2017-06-01T03:26:56.232 INFO:tasks.ceph.osd.1.smithi192.stderr:2017-06-01 03:26:56.174695 96db6c0 -1 *** Caught signal (Illegal instruction) **
</pre><br />/a/sage-2017-06-01_02:27:12-rados-wip-sage-testing2---basic-smithi/1249784
<p>not fixed yet on xenial<br /><pre>
teuthology:1248735 04:47 PM $ ssh smithi192
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.12.0-rc3-ceph-gdc9938ed50b8 x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Last login: Thu Jun 1 16:39:53 2017 from 172.21.0.51
ubuntu@smithi192:~$ dpkg -l | grep valgrind
ii valgrind 1:3.11.0-1ubuntu4.1 amd64 instrumentation framework for building dynamic analysis tools
</pre><br /><pre>
valgrind is already the newest version (1:3.11.0-1ubuntu4.1).
</pre></p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=929142017-06-14T20:24:13ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Assignee</strong> set to <i>David Galloway</i></li></ul><p>Hmm, any idea why the valgrind versions seem to have regressed, David?</p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=929242017-06-14T20:59:20ZDavid Galloway
<ul></ul><p>Greg Farnum wrote:</p>
<blockquote>
<p>Hmm, any idea why the valgrind versions seem to have regressed, David?</p>
</blockquote>
<p>A few of the smithi have definitely been reimaged since I last touched this. At the time, I didn't have the foresight to put this in ceph-cm-ansible. I'll do that now.</p>
<p>The version that'll get installed is intended for Zesty. Is that ok? <a class="external" href="https://launchpad.net/ubuntu/zesty/amd64/valgrind/1:3.12.0-1ubuntu1">https://launchpad.net/ubuntu/zesty/amd64/valgrind/1:3.12.0-1ubuntu1</a></p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=929532017-06-15T17:18:46ZDavid Galloway
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph-cm-ansible/pull/319">https://github.com/ceph/ceph-cm-ansible/pull/319</a></p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=937712017-06-26T22:46:28ZSage Weilsage@newdream.net
<ul></ul><p>1:3.12.0-1.1ubuntu1 on smithi107 showed the error in <a class="issue tracker-1 status-1 priority-4 priority-default" title="Bug: rados/verify valgrind tests: osds fail to start (xenial valgrind) (New)" href="https://tracker.ceph.com/issues/20360">#20360</a></p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=940832017-06-29T22:47:53ZDavid Galloway
<ul></ul><p>Okay, I've uploaded <a class="external" href="https://launchpad.net/ubuntu/+source/valgrind/1:3.12.0~svn20160714-1ubuntu2/+build/10602185/+files/valgrind_3.12.0~svn20160714-1ubuntu2_amd64.deb">https://launchpad.net/ubuntu/+source/valgrind/1:3.12.0~svn20160714-1ubuntu2/+build/10602185/+files/valgrind_3.12.0~svn20160714-1ubuntu2_amd64.deb</a> to chacra and removed the zesty version (3.12.0-1ubuntu1) which introduced a new bug <a class="external" href="http://tracker.ceph.com/issues/20360">http://tracker.ceph.com/issues/20360</a>.</p>
<p>The hope is that 3.12.0~svn20160714-1ubuntu2 will fix the original valgrind bug in this issue and won't have the new bug in the 3.12.0-1ubuntu1 version.</p>
<p>I've removed valgrind from all the Xenial testnodes [1] so the svn version will get installed on the next ceph-cm-ansible run.</p>
<p>[1] <code>for host in $(tl --brief -a -m smithi --os-type ubuntu --os-version 16.04 | grep -v 'slave\|dfuller\|tracker' | cut -d ' ' -f1); do ssh $host "sudo apt-get purge -y valgrind"; done</code></p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=941072017-06-30T13:31:06ZSage Weilsage@newdream.net
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-1 priority-4 priority-default" href="/issues/20360">Bug #20360</a>: rados/verify valgrind tests: osds fail to start (xenial valgrind)</i> added</li></ul> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=946082017-07-07T03:12:07ZSage Weilsage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Normal</i></li></ul><p>confining valgrind tests to centos again, so this is not a high priority.</p> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=1536742019-12-05T21:45:42ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>15</i> to <i>Fix Under Review</i></li></ul> Ceph - Bug #18126: Illegal instruction from Messenger::create -> std::random_device::_M_getvalhttps://tracker.ceph.com/issues/18126?journal_id=1537962019-12-06T14:30:23ZDavid Galloway
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>