https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2021-03-29T08:19:01ZCeph CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1888832021-03-29T08:19:01ZXiubo Lixiubli@redhat.com
<ul></ul><p>Checked the client logs in `smithi016/log/ceph-client.0.25180.log.gz`, everything works well till now, I didn't see any exception.</p>
<pre>
2021-03-25T09:56:04.247 INFO:teuthology.orchestra.run.smithi016.stderr:2021-03-25T09:56:04.241+0000 7fbb9e669700 1 -- 172.21.15.16:0/3146238485 wait complete.
2021-03-25T09:56:06.572 DEBUG:teuthology.orchestra.run:got remote process result: 124
2021-03-25T09:56:07.260 DEBUG:teuthology.orchestra.run.smithi016:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph quorum_status
</pre>
<p>[Edit]<br />The 124 means the command timed out.</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1889012021-03-29T12:58:06ZXiubo Lixiubli@redhat.com
<ul></ul><p>Checked all the mds/osd/mon/client/kernel/misc related logs, didn't find any error during that exception around 2021-03-25T09:56:06.572.</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1889022021-03-29T13:05:34ZXiubo Lixiubli@redhat.com
<ul></ul><p>The exception occured just before the snap test at:</p>
<pre>
2021-03-25T09:56:06.572 DEBUG:teuthology.orchestra.run:got remote process result: 124
</pre>
<p>And just after that remote process returned errno 124 and before the task ['fs/snaps'] was stopped by teuthology, the snap test was running well:</p>
<pre>
2021-03-25T09:56:11.131 INFO:tasks.workunit.client.0.smithi016.stderr:Checking out files: 2% (6/227) ^MChecking out files: 3% (7/227) ^MChecking out files: 4% (10/227) ^MChecking out files: 5% (12/227) ^MChecking out files: 6% (14/227) ^MChecking out files: 7% (16/227) ^MChecking out files: 7% (18/227) ^MChecking out files: 8% (19/227) ^MChecking out files: 8% (20/227) ^MChecking out files: 9% (21/227) ^MChecking out files: 10% (23/227) ^MChecking out files: 11% (25/227) ^MChecking out files: 12% (28/227) ^MChecking out files: 13% (30/227) ^MChecking out files: 14% (32/227) ^MChecking out files: 15% (35/227) ^MChecking out files: 15% (36/227) ^MChecking out files: 16% (37/227) ^MChecking out files: 16% (38/227) ^MChecking out files: 17% (39/227) ^MChecking out files: 18% (41/227) ^MChecking out files: 19% (44/227) ^MChecking out files: 19% (45/227) ^MChecking out files: 20% (46/227) ^MChecking out files: 21% (48/227) ^MChecking out files: 22% (50/227) ^MChecking out files: 23% (53/227) ^MChecking out files: 24% (55/227) ^MChecking out files: 25% (57/227) ^MChecking out files: 26% (60/227) ^MChecking out files: 27% (62/227) ^MChecking out files: 28% (64/227) ^MChecking out files: 29% (66/227) ^MChecking out files: 30% (69/227) ^MChecking out files: 31% (71/227) ^MChecking out files: 32% (73/227) ^MChecking out files: 33% (75/227) ^MChecking out files: 34% (78/227) ^MChecking out files: 35% (80/227) ^MChecking out files: 36% (82/227) ^MChecking out files: 37% (84/227) ^MChecking out files: 37% (86/227) ^MChecking out files: 38% (87/227) ^MChecking out files: 39% (89/227) ^MChecking out files: 39% (90/227) ^MChecking out files: 40% (91/227) ^MChecking out files: 40% (92/227) ^MChecking out files: 40% (93/227) ^MChecking out files: 41% (94/227) ^MChecking out files: 41% (95/227) ^MChecking out files: 42% (96/227) ^MChecking out files: 42% (97/227) ^MChecking out files: 43% (98/227) ^MChecking out files: 43% (99/227) ^MChecking out files: 44% (100/227) ^MChecking out files: 44% (101/227) ^MChecking out files: 44% (102/227) ^MChecking out files: 45% (103/227) ^MChecking out files: 45% (104/227) ^MChecking out files: 46% (105/227) ^MChecking out files: 46% (106/227) ^MChecking out files: 47% (107/227) ^MChecking out files: 47% (108/227) ^MChecking out files: 48% (109/227) ^MChecking out files: 48% (110/227) ^MChecking out files: 48% (111/227) ^MChecking out files: 49% (112/227) ^MChecking out files: 49% (113/227) ^MChecking out files: 50% (114/227) ^MChecking out files: 50% (115/227) ^MChecking out files: 51% (116/227)
2021-03-25T09:56:11.132 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
</pre>
<p>And no any warning/error found in the client/mds/osd logs.</p>
<p>I really doubt it was one bug in the remote process...</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1889072021-03-29T13:51:55ZXiubo Lixiubli@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Xiubo Li</i></li></ul> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1889272021-03-29T23:55:06ZXiubo Lixiubli@redhat.com
<ul></ul><p>Okay, I was in wrong direction yesterday.</p>
<p>I think it was the `git clone ceph...` command's problem, it took too long:</p>
<p>The script is:<br /><pre>
1 #!/bin/sh -x
2
3 set -e
4
5 git clone git://git.ceph.com/ceph.git
6 cd ceph
7
8 versions=`seq 1 21`
...
</pre></p>
<p>The logs are from 2021-03-25T06:56:06.474:</p>
<pre>
2021-03-25T06:56:06.474 INFO:tasks.workunit:Running workunit fs/snaps/snaptest-git-ceph.sh...
2021-03-25T06:56:06.475 DEBUG:teuthology.orchestra.run.smithi016:workunit test fs/snaps/snaptest-git-ceph.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=0691d6bed3e3aaf89688b125297e25f6f6c3fae2 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-git-ceph.sh
2021-03-25T06:56:06.574 INFO:tasks.workunit.client.0.smithi016.stderr:+ set -e
2021-03-25T06:56:06.574 INFO:tasks.workunit.client.0.smithi016.stderr:+ git clone git://git.ceph.com/ceph.git
2021-03-25T06:56:06.582 INFO:tasks.workunit.client.0.smithi016.stderr:Cloning into 'ceph'...
...
</pre>
<p>And the `cd ceph` at 2021-03-25T09:33:06.323:</p>
<pre>
2021-03-25T09:33:06.323 INFO:tasks.workunit.client.0.smithi016.stderr:+ cd ceph
2021-03-25T09:33:06.323 INFO:tasks.workunit.client.0.smithi016.stderr:+ seq 1 21
2021-03-25T09:33:06.340 INFO:tasks.workunit.client.0.smithi016.stderr:+ versions=1
2021-03-25T09:33:06.340 INFO:tasks.workunit.client.0.smithi016.stderr:2
2021-03-25T09:33:06.340 INFO:tasks.workunit.client.0.smithi016.stderr:3
2021-03-25T09:33:06.340 INFO:tasks.workunit.client.0.smithi016.stderr:4
2021-03-25T09:33:06.340 INFO:tasks.workunit.client.0.smithi016.stderr:5
2021-03-25T09:33:06.340 INFO:tasks.workunit.client.0.smithi016.stderr:6
2021-03-25T09:33:06.340 INFO:tasks.workunit.client.0.smithi016.stderr:7
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:8
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:9
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:10
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:11
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:12
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:13
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:14
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:15
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:16
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:17
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:18
2021-03-25T09:33:06.341 INFO:tasks.workunit.client.0.smithi016.stderr:19
2021-03-25T09:33:06.342 INFO:tasks.workunit.client.0.smithi016.stderr:20
2021-03-25T09:33:06.342 INFO:tasks.workunit.client.0.smithi016.stderr:21
2021-03-25T09:33:06.342 INFO:tasks.workunit.client.0.smithi016.stderr:+ ver=v0.1
2021-03-25T09:33:06.342 INFO:tasks.workunit.client.0.smithi016.stderr:+ echo v0.1
2021-03-25T09:33:06.342 INFO:tasks.workunit.client.0.smithi016.stdout:v0.1
2021-03-25T09:33:06.342 INFO:tasks.workunit.client.0.smithi016.stderr:+ git reset --hard v0.1
</pre>
<p>It took 2h and 37m already.</p>
<p>Locally recently when I am cloning the ceph repo from the github, it sometimes will stuck and then may timeout with connection failure to the github, I need to try many times. Not sure is it the similar problem causing the git clone taking to long here ?</p>
<p>Tried without the VPN, for me mostly the git clone or git pull can be better and very fast.</p>
<p>@Patrick,</p>
<p>Maybe we could save a ceph repo somewhere locally in the test lab and clone it from there, will it make sense ?</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1889332021-03-30T03:20:08ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>Xiubo Li wrote:</p>
<blockquote>
<p>Okay, I was in wrong direction yesterday.</p>
<p>I think it was the `git clone ceph...` command's problem, it took too long:</p>
<p>The script is:<br />[...]</p>
<p>The logs are from 2021-03-25T06:56:06.474:</p>
<p>[...]</p>
<p>And the `cd ceph` at 2021-03-25T09:33:06.323:</p>
<p>[...]</p>
<p>It took 2h and 37m already.</p>
<p>Locally recently when I am cloning the ceph repo from the github, it sometimes will stuck and then may timeout with connection failure to the github, I need to try many times. Not sure is it the similar problem causing the git clone taking to long here ?</p>
<p>Tried without the VPN, for me mostly the git clone or git pull can be better and very fast.</p>
<p>@Patrick,</p>
<p>Maybe we could save a ceph repo somewhere locally in the test lab and clone it from there, will it make sense ?</p>
</blockquote>
<p>!! Yes. We should definitely do that. We've done it for other QA downloads/git clones. Please talk with David Galloway via email about a mirror. I think we may already have one for ceph.git.</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1889342021-03-30T03:48:19ZXiubo Lixiubli@redhat.com
<ul></ul><p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
<p>[...]</p>
<blockquote>
<p>@Patrick,</p>
<p>Maybe we could save a ceph repo somewhere locally in the test lab and clone it from there, will it make sense ?</p>
</blockquote>
<p>!! Yes. We should definitely do that. We've done it for other QA downloads/git clones. Please talk with David Galloway via email about a mirror. I think we may already have one for ceph.git.</p>
</blockquote>
<p>Sure.</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1891052021-04-01T14:58:20ZDavid Galloway
<ul></ul><p>Xiubo Li wrote:</p>
<blockquote>
<p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
<p>[...]</p>
<blockquote>
<p>@Patrick,</p>
<p>Maybe we could save a ceph repo somewhere locally in the test lab and clone it from there, will it make sense ?</p>
</blockquote>
<p>!! Yes. We should definitely do that. We've done it for other QA downloads/git clones. Please talk with David Galloway via email about a mirror. I think we may already have one for ceph.git.</p>
</blockquote>
<p>Sure.</p>
</blockquote>
<p>What repo do you want to mirror?</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1895052021-04-05T16:48:30ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>David Galloway wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
<blockquote>
<p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
<p>[...]</p>
<blockquote>
<p>@Patrick,</p>
<p>Maybe we could save a ceph repo somewhere locally in the test lab and clone it from there, will it make sense ?</p>
</blockquote>
<p>!! Yes. We should definitely do that. We've done it for other QA downloads/git clones. Please talk with David Galloway via email about a mirror. I think we may already have one for ceph.git.</p>
</blockquote>
<p>Sure.</p>
</blockquote>
<p>What repo do you want to mirror?</p>
</blockquote>
<p>ceph.git. Sorry for pulling you into this David; it's actually already using the ceph.com mirror.</p>
<p>@xiubo, I think we can chalk this up to random noise and NAB.</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1895332021-04-06T02:15:47ZXiubo Lixiubli@redhat.com
<ul></ul><p>Patrick Donnelly wrote:</p>
<blockquote>
<p>David Galloway wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
<blockquote>
<p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
<p>[...]</p>
<blockquote>
<p>@Patrick,</p>
<p>Maybe we could save a ceph repo somewhere locally in the test lab and clone it from there, will it make sense ?</p>
</blockquote>
<p>!! Yes. We should definitely do that. We've done it for other QA downloads/git clones. Please talk with David Galloway via email about a mirror. I think we may already have one for ceph.git.</p>
</blockquote>
<p>Sure.</p>
</blockquote>
<p>What repo do you want to mirror?</p>
</blockquote>
<p>ceph.git. Sorry for pulling you into this David; it's actually already using the ceph.com mirror.</p>
<p>@xiubo, I think we can chalk this up to random noise and NAB.</p>
</blockquote>
<p>Yeah. Agree.</p>
<p>While not sure letting the clone command try under a `timeout 1200 git clone git://git.ceph.com:/ceph.git` and try 3 times will work here ? From my experience sometimes the stuck or slow socket connecting issue could be fixed up by retrying.</p>
<p>Make sense ?</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1895342021-04-06T03:21:47ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>Xiubo Li wrote:</p>
<blockquote>
<p>Patrick Donnelly wrote:</p>
<blockquote>
<p>David Galloway wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
<blockquote>
<p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
<p>[...]</p>
<blockquote>
<p>@Patrick,</p>
<p>Maybe we could save a ceph repo somewhere locally in the test lab and clone it from there, will it make sense ?</p>
</blockquote>
<p>!! Yes. We should definitely do that. We've done it for other QA downloads/git clones. Please talk with David Galloway via email about a mirror. I think we may already have one for ceph.git.</p>
</blockquote>
<p>Sure.</p>
</blockquote>
<p>What repo do you want to mirror?</p>
</blockquote>
<p>ceph.git. Sorry for pulling you into this David; it's actually already using the ceph.com mirror.</p>
<p>@xiubo, I think we can chalk this up to random noise and NAB.</p>
</blockquote>
<p>Yeah. Agree.</p>
<p>While not sure letting the clone command try under a `timeout 1200 git clone git://git.ceph.com:/ceph.git` and try 3 times will work here ? From my experience sometimes the stuck or slow socket connecting issue could be fixed up by retrying.</p>
<p>Make sense ?</p>
</blockquote>
<p>Makes sense.</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1895352021-04-06T05:54:43ZXiubo Lixiubli@redhat.com
<ul></ul><p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Xiubo Li wrote:</p>
</blockquote>
<p>[...]</p>
<blockquote><blockquote><blockquote><blockquote>
<p>What repo do you want to mirror?</p>
</blockquote>
<p>ceph.git. Sorry for pulling you into this David; it's actually already using the ceph.com mirror.</p>
<p>@xiubo, I think we can chalk this up to random noise and NAB.</p>
</blockquote>
<p>Yeah. Agree.</p>
<p>While not sure letting the clone command try under a `timeout 1200 git clone git://git.ceph.com:/ceph.git` and try 3 times will work here ? From my experience sometimes the stuck or slow socket connecting issue could be fixed up by retrying.</p>
<p>Make sense ?</p>
</blockquote>
<p>Makes sense.</p>
</blockquote>
<p>Okay, will fix it.</p> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1895662021-04-06T09:22:40ZXiubo Lixiubli@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li><li><strong>Pull request ID</strong> set to <i>40611</i></li></ul> CephFS - Bug #50021: qa: snaptest-git-ceph failure during mon thrashinghttps://tracker.ceph.com/issues/50021?journal_id=1897492021-04-08T02:36:34ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>