Project

General

Profile

Actions

Bug #55531

closed

octopus: No remote osd logs captured in dead jobs

Added by Laura Flores almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On dead jobs, the osd logs were not captured because of a problem shown in the supervisor log.

/a/lflores-2022-04-27_17:53:15-rados:thrash-erasure-code-big-wip-yuri2-testing-2022-04-26-1132-octopus-distro-default-smithi/6810170/supervisor.6810170.log

2022-05-01T13:08:57.006 INFO:teuthology.orchestra.remote:Trying to reconnect to host 'ubuntu@smithi007.front.sepia.ceph.com'
2022-05-01T13:08:57.852 INFO:teuthology.orchestra.remote:Trying to reconnect to host 'ubuntu@smithi038.front.sepia.ceph.com'
2022-05-01T13:08:58.564 INFO:teuthology.orchestra.remote:Trying to reconnect to host 'ubuntu@smithi073.front.sepia.ceph.com'
2022-05-01T13:08:59.894 INFO:teuthology.kill:No teuthology processes running
2022-05-01T13:09:01.233 INFO:teuthology.kill:Nuking machines: ['smithi007', 'smithi038', 'smithi073']
2022-05-01T13:09:02.397 INFO:teuthology.kill:/home/teuthworker/src/git.ceph.com_git_teuthology_135488acc76a85490d37201ec506706cc4ab2d62/virtualenv/lib/python3.6/site-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
2022-05-01T13:09:02.397 INFO:teuthology.kill:  from cryptography.hazmat.backends import default_backend
2022-05-01T13:09:02.397 INFO:teuthology.kill:/home/teuthworker/src/git.ceph.com_git_teuthology_135488acc76a85490d37201ec506706cc4ab2d62/virtualenv/lib/python3.6/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
2022-05-01T13:09:02.398 INFO:teuthology.kill:  "class": algorithms.Blowfish,
2022-05-01T13:09:02.398 INFO:teuthology.kill:2022-05-01 13:09:02,396.396 INFO:teuthology.nuke:targets:
2022-05-01T13:09:02.398 INFO:teuthology.kill:  smithi007.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDRaQ698SHzXcOGpU5f/jZngZEAdSU4XU5OYUGhQw42PBRIBABGliKkUngNLGrAXbek7mS/cc41fT/bOvAcZcRBuItQDy0uW2uvVUoqp5UtAcJB85vtvIWMUaqPcEPrBbFHT3UmbSI5+p+fQ1o5HV9KAq7p34b38yIKHSLAjHHRpsbOEmFO0MyI8vNXU7/TS7ygK939xcS9UYa6iXdYoDd+NKBFioyfYoPNIbb8oFjBKdsZ64d+yr1pBXVKSGKmmOLlD+vTf5z3x52ejbg6KNBq+/6K3/kLNro3ZNo1ckNw/F1+FCyNL6H3lzYArJOMlm0jo4urWdBcUqHTIxYVBamP
2022-05-01T13:09:02.398 INFO:teuthology.kill:  smithi038.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDS8ZQAM9xHhnVm1A+X931k1UDtD5Jtcbo6bf5RF3AMYjJAoqH0gDWCUBfz1J/Uewr0TKZ9J1tWiJd0/PWMAfS6euUewmBU1hpGevelXJdvsGwWHjGubumUPuG8UsCnVfDhcVXUpzVxDBnY6O8AuV9EkJf6QsU3mh7Fiq10GkcnLwfxBLT7+pF8OXF5sXSgHOrADkF6/0KijjQ9/NH1H8RjNk6c6ZQaJZlDUb9mvWLdiYjltB7gJZw4iXPxYp2xg2x8YLBJ77Dwhw9f1NACW0Vjrtl79td6wuAL1GK8tpGyvl7eJdGJ2vavyXCdThx0Ryix6LRkDjlEEiCj4tNbRjH
2022-05-01T13:09:02.398 INFO:teuthology.kill:  smithi073.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDt9ypOWYcR7WQFhxF0NLutaMCrn13b/EnITCKtDuNq9dp5p4VBbHHL+SrgFlN4fw1zAZMV3Q2xHQ8TPqF5LqbPbtv68bX1pQfSBeNYqQhP8HOwXcJFrdWHTioZn0yWRSl0lbSxxn7MiP59658hDxaYFLRFE0gWJnBXoxzeUx9GqdS/kjxUGMCP0/VToUmi5R/tdRLBETrJRXJl744lgeZeI/kSQZa1WsQIOFT4gboEkdySegu8/PnkZGiWCAdw+bg37610GN5adzgyOTAKQRqf6+kS/dL8h5e66jWXVWj8vm3NKvHwXtGddJEV6KNfBw2skFY5HdUETBdk8UQjDkvr
2022-05-01T13:09:02.427 INFO:teuthology.kill:2022-05-01 13:09:02,427.427 INFO:teuthology.task.internal.check_lock:Checking locks...
2022-05-01T13:09:02.430 INFO:teuthology.kill:2022-05-01 13:09:02,430.430 INFO:teuthology.task.internal.check_lock:Checking locks...
2022-05-01T13:09:02.433 INFO:teuthology.kill:2022-05-01 13:09:02,433.433 INFO:teuthology.task.internal.check_lock:Checking locks...
2022-05-01T13:09:02.511 INFO:teuthology.kill:2022-05-01 13:09:02,510.510 INFO:teuthology.orchestra.console:Power off smithi038
2022-05-01T13:09:02.523 INFO:teuthology.kill:2022-05-01 13:09:02,523.523 INFO:teuthology.orchestra.console:Power off smithi073
2022-05-01T13:09:02.532 INFO:teuthology.kill:2022-05-01 13:09:02,532.532 INFO:teuthology.orchestra.console:Power off smithi007
2022-05-01T13:09:10.883 INFO:teuthology.kill:2022-05-01 13:09:10,883.883 INFO:teuthology.orchestra.console:Power off for smithi038 completed
2022-05-01T13:09:11.028 INFO:teuthology.kill:2022-05-01 13:09:11,027.027 INFO:teuthology.lock.ops:unlocked: smithi038.front.sepia.ceph.com
2022-05-01T13:09:15.001 INFO:teuthology.kill:2022-05-01 13:09:15,000.000 INFO:teuthology.orchestra.console:Power off for smithi073 completed
2022-05-01T13:09:15.004 INFO:teuthology.kill:2022-05-01 13:09:15,004.004 INFO:teuthology.orchestra.console:Power off for smithi007 completed
2022-05-01T13:09:15.156 INFO:teuthology.kill:2022-05-01 13:09:15,156.156 INFO:teuthology.lock.ops:unlocked: smithi073.front.sepia.ceph.com
2022-05-01T13:09:15.159 INFO:teuthology.kill:2022-05-01 13:09:15,159.159 INFO:teuthology.lock.ops:unlocked: smithi007.front.sepia.ceph.com
2022-05-01T13:11:15.516 ERROR:teuthology.dispatcher.supervisor:Child exited with code -15
2022-05-01T13:11:15.637 WARNING:teuthology.dispatcher.supervisor:Was going to unlock smithi007 but it was locked by another job: /home/teuthworker/archive/soumyakoduri-2022-05-01_13:09:03-rgw:cloud-transition-wip-skoduri-dbstore-tests-distro-basic-smithi/6817493
2022-05-01T13:11:15.638 WARNING:teuthology.dispatcher.supervisor:Was going to unlock smithi038 but it was locked by another job: /home/teuthworker/archive/soumyakoduri-2022-05-01_12:44:18-rgw-wip-skoduri-dbstore-tests-distro-basic-smithi/6817448
2022-05-01T13:11:15.638 WARNING:teuthology.dispatcher.supervisor:Was going to unlock smithi073 but it was locked by another job: /home/teuthworker/archive/soumyakoduri-2022-05-01_12:44:18-rgw-wip-skoduri-dbstore-tests-distro-basic-smithi/6817448                                                                                                                                                                                                                                                        

Actions #1

Updated by Laura Flores almost 2 years ago

  • Subject changed from No remote osd logs captured to No remote osd logs captured in dead jobs
Actions #2

Updated by Zack Cerza almost 2 years ago

  • Category set to QA Suite

Some important context from the supervisor log is:

2022-05-01T13:08:55.379 WARNING:teuthology.dispatcher.supervisor:Job ran longer than 23400s. Killing...

So, teuthology-kill is what would have gathered logs here. This is a separate mechanism from how the ceph task normally does this and looks to be a relatively recent feature.

We know transfer_archives was called. In the info.yaml, but the archive section is:

archive:
  init: /home/ubuntu/cephtest/archive

Compare this with a random job using the master branch:
archive:
  crash: /var/lib/ceph/crash
  init: /home/ubuntu/cephtest/archive
  log: /var/log/ceph

Normally this would be added by the ceph task, but the branch in use here has no mention of the feature which looks like it was added back in 2020 and further refined in (at least) one other PR but didn't end up in this test branch.

Actions #3

Updated by Neha Ojha almost 2 years ago

  • Project changed from teuthology to Ceph
  • Subject changed from No remote osd logs captured in dead jobs to octopus: No remote osd logs captured in dead jobs
  • Category deleted (QA Suite)
Actions #4

Updated by Laura Flores almost 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Laura Flores
  • Pull request ID set to 46149
Actions #5

Updated by Laura Flores almost 2 years ago

/a/yuriw-2022-05-09_21:42:51-rados-wip-yuri2-testing-2022-04-26-1132-octopus-distro-default-smithi/6829091

Actions #6

Updated by Nitzan Mordechai almost 2 years ago

/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/6813848

Actions #7

Updated by Laura Flores almost 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF