Project

General

Profile

Bug #45512

Jobs are running on the same machine at the same time

Added by David Galloway 7 months ago. Updated 7 months ago.

Status:
New
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature:

History

#1 Updated by David Galloway 7 months ago

Related?

2020-05-11 20:14:57,859.859 DEBUG:teuthology.lock.ops:lock_many request: {'locked_by': 'scheduled_varsha@teuthology', 'count': 1, 'machine_type': 'smithi', 'description': '/home/teuthworker/archive/varsha-2020-05-11_15:54:35-rados-wip-varsha-testing-distro-basic-smithi/5046616'}
2020-05-11 20:14:57,879.879 DEBUG:teuthology.lock.ops:locked ubuntu@smithi078.front.sepia.ceph.com
2020-05-11 20:14:57,903.903 DEBUG:teuthology.exit:Installing handler: Handler(exiter=<teuthology.exit.Exiter object at 0x7fb9a894c668>, func=<function ConsoleLog.start_logging.<locals>.kill_console_loggers at 0x7fb9a8b07950>, signals=[15])
2020-05-11 20:14:57,911.911 INFO:teuthology.task.internal.lock_machines:Locking machines...
2020-05-11 20:14:57,911.911 DEBUG:teuthology.report:Pushing job info to http://paddles.front.sepia.ceph.com/
Traceback (most recent call last):
  File "<string>", line 1, in <module>
NameError: name 'file' is not defined
2020-05-11 20:14:57,961.961 DEBUG:teuthology.lock.ops:lock_many request: {'locked_by': 'scheduled_varsha@teuthology', 'count': 1, 'machine_type': 'smithi', 'description': '/home/teuthworker/archive/varsha-2020-05-11_15:58:25-rados-wip-varsha-testing-distro-basic-smithi/5046639'}
2020-05-11 20:14:57,981.981 DEBUG:teuthology.lock.ops:locked ubuntu@smithi012.front.sepia.ceph.com
2020-05-11 20:14:58,006.006 DEBUG:teuthology.exit:Installing handler: Handler(exiter=<teuthology.exit.Exiter object at 0x7f1bd11316d8>, func=<function ConsoleLog.start_logging.<locals>.kill_console_loggers at 0x7f1bd129e950>, signals=[15])
Traceback (most recent call last):
  File "<string>", line 1, in <module>
NameError: name 'file' is not defined

#2 Updated by Zack Cerza 7 months ago

I don't have the answer yet, but here's some information from the paddles server:


ubuntu@pulpito:~$ grep -a smithi104 paddles.out.log
...
2020-05-11 14:17:08,997 INFO [paddles.controllers.nodes] Locked <Node smithi104.front.sepia.ceph.com> for scheduled_teuthology@teuthology with description /home/teuthworker/archive/teuthology-2020-05-11_02:01:02-rbd-master-distro-basic-smithi/5043278
2020-05-11 14:18:02,202 DEBUG [paddles.controllers.nodes] Unlocking <Node smithi104.front.sepia.ceph.com> for scheduled_teuthology@teuthology
2020-05-11 14:18:02,204 INFO [paddles.controllers.nodes] Unlocked <Node smithi104.front.sepia.ceph.com> for scheduled_teuthology@teuthology
2020-05-11 14:19:22,832 INFO [paddles.controllers.nodes] Locked <Node smithi104.front.sepia.ceph.com> <Node smithi068.front.sepia.ceph.com> <Node smithi065.front.sepia.ceph.com> for scheduled_teuthology@teuthology with description /home/teuthworker/archive/teuthology-2020-05-11_02:01:02-rbd-master-distro-basic-smithi/5043260
2020-05-11 14:20:15,937 DEBUG [paddles.controllers.nodes] Unlocking <Node smithi104.front.sepia.ceph.com> for scheduled_teuthology@teuthology
2020-05-11 14:20:15,939 INFO [paddles.controllers.nodes] Unlocked <Node smithi104.front.sepia.ceph.com> for scheduled_teuthology@teuthology
2020-05-11 14:20:34,843 INFO [paddles.controllers.nodes] Locked <Node smithi104.front.sepia.ceph.com> for scheduled_teuthology@teuthology with description /home/teuthworker/archive/teuthology-2020-05-11_02:01:02-rbd-master-distro-basic-smithi/5043279
2020-05-11 14:22:29,870 DEBUG [paddles.controllers.nodes] Unlocking <Node smithi104.front.sepia.ceph.com> for scheduled_teuthology@teuthology
2020-05-11 14:22:29,871 INFO [paddles.controllers.nodes] Unlocked <Node smithi104.front.sepia.ceph.com> for scheduled_teuthology@teuthology
...

smithi104 was unlocked very shortly after 5043278 locked it, and the request came from teuthology.front, though it looks like it didn't come from the job's process. This all happened before 5043279 had even started, let alone attempted any locks.

#3 Updated by David Galloway 7 months ago

Another instance: smithi146

http://qa-proxy.ceph.com/teuthology/teuthology-2020-05-12_02:30:02-upgrade:mimic-x-nautilus-distro-basic-smithi/5048079/teuthology.log

2020-05-12T11:57:04.811 INFO:teuthology.task.ansible.out:fatal: [smithi146.front.sepia.ceph.com]: FAILED! => {"changed": true, "cmd": ["rpm", "--rebuilddb"], "delta": "0:00:00.034381", "end": "2020-05-12 11:57:04.779147", "msg": "non-zero return code", "rc": 255, "start": "2020-05-12 11:57:04.744766", "stderr": "error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)", "stderr_lines": ["error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)"], "stdout": "", "stdout_lines": []}

http://qa-proxy.ceph.com/teuthology/teuthology-2020-05-12_05:00:02-smoke-master-testing-basic-smithi/5048546/teuthology.log

2020-05-12T11:56:34.186 INFO:teuthology.task.kernel:Installing kernel 2df39dc136ccbbccbbb0d080cebc99614118a329 on mon.b...
2020-05-12T11:56:34.186 INFO:teuthology.orchestra.run.smithi146:> sudo rpm -ivh --oldpackage --replacefiles --replacepkgs /tmp/kernel.x86_64.rpm
2020-05-12T11:56:35.953 INFO:teuthology.orchestra.run.smithi146.stdout:Verifying...                          ########################################
2020-05-12T11:56:36.255 INFO:teuthology.orchestra.run.smithi146.stdout:Preparing...                          ########################################
2020-05-12T11:56:36.290 INFO:teuthology.orchestra.run.smithi146.stdout:Updating / installing...
2020-05-12T11:57:11.180 INFO:teuthology.orchestra.run.smithi146.stdout:kernel-5.7.0_rc5_ceph_g2df39dc136cc-1 ########################################
2020-05-12T11:57:31.587 DEBUG:teuthology.task.kernel:install_kernel(remote=ubuntu@smithi146.front.sepia.ceph.com, path=/tmp/kernel.x86_64.rpm, version=None)
2020-05-12T11:57:31.588 INFO:teuthology.orchestra.run.smithi146:> rpm -qlp /tmp/kernel.x86_64.rpm

This is very bad.

2020-05-12 11:47:09,086 INFO  [paddles.controllers.nodes] Locked <Node smithi164.front.sepia.ceph.com> <Node smithi189.front.sepia.ceph.com> <Node smithi043.front.sepia.ceph.com> <Node smithi146.front.sepia.ceph.com> <Node smithi060.front.sepia.ceph.com> for scheduled_teuthology@teuthology with description /home/teuthworker/archive/teuthology-2020-05-12_02:30:02-upgrade:mimic-x-nautilus-distro-basic-smithi/5048079
2020-05-12 11:49:17,688 DEBUG [paddles.controllers.nodes] Unlocking <Node smithi146.front.sepia.ceph.com> for scheduled_teuthology@teuthology
2020-05-12 11:49:17,689 INFO  [paddles.controllers.nodes] Unlocked <Node smithi146.front.sepia.ceph.com> for scheduled_teuthology@teuthology
2020-05-12 11:49:20,932 INFO  [paddles.controllers.nodes] Locked <Node smithi043.front.sepia.ceph.com> <Node smithi146.front.sepia.ceph.com> <Node smithi060.front.sepia.ceph.com> for scheduled_teuthology@teuthology with description /home/teuthworker/archive/teuthology-2020-05-12_05:00:02-smoke-master-testing-basic-smithi/5048546
2020-05-12 11:53:51,613 INFO  [paddles.controllers.nodes] Updating <Node smithi146.front.sepia.ceph.com>: {u'ssh_pub_key': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC0HT4TiSwtVS+c5nPdyrnm6k8MKCC+3NMPcrvOkLEImO/jRJ97HY0b0PSw0KKglQEu98LjHeRqAZ6FoDAR9un59KxI9OBjwMlaaGFvAlY5EW5C5bUCsBFX0qnJ/BKRlkuQ7NH2GpaeCDJmfw2bO4W0HKtLjASivoWdsw+6PTRcRSZJwRvgTbL+/TQEbh7IYbvIjMo2u7Zj8mmwTCAys6leCTIEI2OgKlhlZAkWZh2c8E99QzEPdqURZCYQj6IeGMhN2wOq9eLcHGaItCsDaWR4Kpanj95mn1H0tS0dFBgdA0OP5UIhECV7fHyM9Bluc1H6xidFAWJXWUVEKzRgZjjB1nKO3SBhuGGpY/8XJAVj904AsIsAEmig4+6oQWbAS5BwqHlXzmGhyenjeKpQVGiSCtrPZLq7ucY/gAm0ixI/Sj8Ax4xQwKwug9Q44PfCiz99ra3dOEjrYqOCgOyTO2LCqUaeEW/vrDfDnxvcsnTZuU4uu1Fe/WFkC6hTf3JarUU='}
2020-05-12 11:53:52,465 INFO  [paddles.controllers.nodes] Updating <Node smithi146.front.sepia.ceph.com>: {u'name': u'smithi146.front.sepia.ceph.com', u'up': True, u'os_version': u'8.1', u'user': u'ubuntu', u'os_type': u'rhel', u'arch': u'x86_64', u'ssh_pub_key': u'ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBGhz/E0m6Y9T1S6IqTAoM7P0vMFhIp1dFVD6lWtcKOArRiRZly3wztdD4MHbOpypD7y4R5Bvh8KlF6FEdQTS01M='}
2020-05-12 11:53:54,311 INFO  [paddles.controllers.nodes] Updating <Node smithi146.front.sepia.ceph.com>: {u'name': u'smithi146.front.sepia.ceph.com', u'up': True, u'os_version': u'8.1', u'user': u'ubuntu', u'os_type': u'rhel', u'arch': u'x86_64', u'ssh_pub_key': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC0HT4TiSwtVS+c5nPdyrnm6k8MKCC+3NMPcrvOkLEImO/jRJ97HY0b0PSw0KKglQEu98LjHeRqAZ6FoDAR9un59KxI9OBjwMlaaGFvAlY5EW5C5bUCsBFX0qnJ/BKRlkuQ7NH2GpaeCDJmfw2bO4W0HKtLjASivoWdsw+6PTRcRSZJwRvgTbL+/TQEbh7IYbvIjMo2u7Zj8mmwTCAys6leCTIEI2OgKlhlZAkWZh2c8E99QzEPdqURZCYQj6IeGMhN2wOq9eLcHGaItCsDaWR4Kpanj95mn1H0tS0dFBgdA0OP5UIhECV7fHyM9Bluc1H6xidFAWJXWUVEKzRgZjjB1nKO3SBhuGGpY/8XJAVj904AsIsAEmig4+6oQWbAS5BwqHlXzmGhyenjeKpQVGiSCtrPZLq7ucY/gAm0ixI/Sj8Ax4xQwKwug9Q44PfCiz99ra3dOEjrYqOCgOyTO2LCqUaeEW/vrDfDnxvcsnTZuU4uu1Fe/WFkC6hTf3JarUU='}
2020-05-12 11:54:20,966 INFO  [paddles.controllers.nodes] Updating <Node smithi146.front.sepia.ceph.com>: {u'name': u'smithi146.front.sepia.ceph.com', u'up': True, u'os_version': u'8.1', u'user': u'ubuntu', u'os_type': u'rhel', u'arch': u'x86_64', u'ssh_pub_key': u'ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBGhz/E0m6Y9T1S6IqTAoM7P0vMFhIp1dFVD6lWtcKOArRiRZly3wztdD4MHbOpypD7y4R5Bvh8KlF6FEdQTS01M='}
2020-05-12 11:54:22,944 INFO  [paddles.controllers.nodes] Updating <Node smithi146.front.sepia.ceph.com>: {u'name': u'smithi146.front.sepia.ceph.com', u'up': True, u'os_version': u'8.1', u'user': u'ubuntu', u'os_type': u'rhel', u'arch': u'x86_64', u'ssh_pub_key': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC0HT4TiSwtVS+c5nPdyrnm6k8MKCC+3NMPcrvOkLEImO/jRJ97HY0b0PSw0KKglQEu98LjHeRqAZ6FoDAR9un59KxI9OBjwMlaaGFvAlY5EW5C5bUCsBFX0qnJ/BKRlkuQ7NH2GpaeCDJmfw2bO4W0HKtLjASivoWdsw+6PTRcRSZJwRvgTbL+/TQEbh7IYbvIjMo2u7Zj8mmwTCAys6leCTIEI2OgKlhlZAkWZh2c8E99QzEPdqURZCYQj6IeGMhN2wOq9eLcHGaItCsDaWR4Kpanj95mn1H0tS0dFBgdA0OP5UIhECV7fHyM9Bluc1H6xidFAWJXWUVEKzRgZjjB1nKO3SBhuGGpY/8XJAVj904AsIsAEmig4+6oQWbAS5BwqHlXzmGhyenjeKpQVGiSCtrPZLq7ucY/gAm0ixI/Sj8Ax4xQwKwug9Q44PfCiz99ra3dOEjrYqOCgOyTO2LCqUaeEW/vrDfDnxvcsnTZuU4uu1Fe/WFkC6hTf3JarUU='}
2020-05-12 12:09:52,403 DEBUG [paddles.controllers.nodes] Unlocking <Node smithi146.front.sepia.ceph.com> for scheduled_teuthology@teuthology
2020-05-12 12:09:52,405 INFO  [paddles.controllers.nodes] Unlocked <Node smithi146.front.sepia.ceph.com> for scheduled_teuthology@teuthology

#4 Updated by Kefu Chai 7 months ago

  • Assignee set to Kefu Chai

#5 Updated by Kefu Chai 7 months ago

  • Status changed from New to Resolved

#6 Updated by David Galloway 7 months ago

  • Status changed from Resolved to In Progress

This is still happening :(

http://pulpito.ceph.com/nodes/smithi040.front.sepia.ceph.com/

5058215 started at 2020-05-15 15:17:50 but 5058307 started at 2020-05-15 14:31:32 and is still going as of 2020-05-15 20:31:40.

Also 5058318 started at 2020-05-15 15:00:53 and somehow managed to pass while 5058307 was going?!

#7 Updated by Kefu Chai 7 months ago

  • Status changed from In Progress to New
  • Assignee deleted (Kefu Chai)

#8 Updated by David Galloway 7 months ago

Another example

Also available in: Atom PDF