Project

General

Profile

Actions

Bug #12031

closed

unable to schedule jobs on sepia (beanstalkd unresponsive)

Added by Loïc Dachary almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

$ cat ~/.teuthology.yaml
lock_server: http://paddles.front.sepia.ceph.com/
results_server: http://10.214.137.104/
queue_host: teuthology.front.sepia.ceph.com
queue_port: 11300
$ ./virtualenv/bin/teuthology-suite -l1 --priority 1000 --suite rgw --subset 1/18 --suite-branch hammer --distro ubuntu --email abhishek.lekshmanan@gmail.com --owner abhishek.lekshmanan@gmail.com --ceph hammer-backports --machine-type plana,burnupi,mira
2015-06-16 09:45:11,557.557 INFO:teuthology.suite:Passed subset=1/18
2015-06-16 09:45:12,103.103 INFO:teuthology.suite:ceph sha1: 3677fd2708856587ac76fde086d1a4f7a20339a8
2015-06-16 09:45:12,677.677 INFO:teuthology.suite:ceph version: 0.94.2-70-g3677fd2-1trusty
2015-06-16 09:45:12,677.677 INFO:teuthology.suite:teuthology branch: master
2015-06-16 09:45:13,509.509 INFO:teuthology.suite:ceph-qa-suite branch: hammer
2015-06-16 09:45:13,510.510 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_hammer
2015-06-16 09:45:14,586.586 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_hammer to branch hammer
2015-06-16 09:45:15,721.721 INFO:teuthology.suite:Suite rgw in /home/loic/src/ceph-qa-suite_hammer/suites/rgw generated 15 jobs (not yet filtered)
2015-06-16 09:45:15,727.727 INFO:teuthology.suite:Stopped after 1 jobs due to --limit=1
2015-06-16 09:45:15,727.727 INFO:teuthology.suite:Scheduling rgw/verify/{overrides.yaml clusters/fixed-2.yaml frontend/civetweb.yaml fs/btrfs.yaml msgr-failures/few.yaml rgw_pool_type/replicated.yaml tasks/rgw_s3tests.yaml validater/valgrind.yaml}
Traceback (most recent call last):
  File "./virtualenv/bin/teuthology-schedule", line 9, in <module>
    load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-schedule')()
  File "/home/loic/software/ceph/teuthology/scripts/schedule.py", line 44, in main
    teuthology.schedule.main(args)
  File "/home/loic/software/ceph/teuthology/teuthology/schedule.py", line 25, in main
    schedule_job(job_config, args['--num'])
  File "/home/loic/software/ceph/teuthology/teuthology/schedule.py", line 75, in schedule_job
    beanstalk.use(tube)
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 190, in use
    return self._interact_value('use %s\r\n' % name, ['USING'])
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 110, in _interact_value
    return self._interact(command, expected_ok, expected_err)[0]
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 87, in _interact
    status, results = self._read_response()
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 96, in _read_response
    line = SocketError.wrap(self._socket_file.readline)
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 43, in wrap
    raise SocketError(err)
beanstalkc.SocketError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
  File "./virtualenv/bin/teuthology-suite", line 9, in <module>
    load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-suite')()
  File "/home/loic/software/ceph/teuthology/scripts/suite.py", line 86, in main
    teuthology.suite.main(args)
  File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 114, in main
    subset=subset,
  File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 297, in prepare_and_schedule
    subset=subset
  File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 617, in schedule_suite
    args=job['args'],
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['./virtualenv/bin/teuthology-schedule', '--name', 'loic-2015-06-16_09:45:11-rgw-hammer-backports---basic-multi', '--num', '1', '--worker', 'multi', '--priority', '1000', '--owner', 'abhishek.lekshmanan@gmail.com', '--description', 'rgw/verify/{overrides.yaml clusters/fixed-2.yaml frontend/civetweb.yaml fs/btrfs.yaml msgr-failures/few.yaml rgw_pool_type/replicated.yaml tasks/rgw_s3tests.yaml validater/valgrind.yaml}', '--', '/tmp/schedule_suite_GRUXTS', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/clusters/fixed-2.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/frontend/civetweb.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/fs/btrfs.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/msgr-failures/few.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/overrides.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/rgw_pool_type/replicated.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/tasks/rgw_s3tests.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/validater/valgrind.yaml']' returned non-zero exit status 1

Related issues 1 (0 open1 closed)

Related to sepia - Bug #12032: activate beanstalkd logsResolvedZack Cerza

Actions
Actions #1

Updated by Loïc Dachary almost 9 years ago

Looks like the beanstalkd daemon is down. Or maybe it changed and longer is at teuthology.front.sepia.ceph.com ?

Actions #2

Updated by Loïc Dachary almost 9 years ago

I see no announcement in the sepia mailing list.

Actions #3

Updated by Loïc Dachary almost 9 years ago

ubuntu@teuthology:~$ sudo netstat -tlpn | grep beans
tcp        0      0 0.0.0.0:11300           0.0.0.0:*               LISTEN      1763/beanstalkd 
Actions #4

Updated by Loïc Dachary almost 9 years ago

It's difficult to figure out the source of the problem because no logs (http://tracker.ceph.com/issues/12032). I'm hesistant to just restart beanstalkd because I've never done it before and http://tracker.ceph.com/projects/lab/search?utf8=%E2%9C%93&issues=1&q=beanstalkd shows no sign that it might actually be needed from time to time.

Actions #5

Updated by Loïc Dachary almost 9 years ago

  • Subject changed from unable to schedule jobs on sepia to unable to schedule jobs on sepia (beanstalkd unresponsive)
Actions #6

Updated by Loïc Dachary almost 9 years ago

$ ./virtualenv/bin/teuthology-suite --filter="upgrade:firefly-x/parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml 2-workload/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/ubuntu_12.04.yaml}" --priority 101 --suite upgrade/firefly-x --suite-branch master --machine-type vps --email loic@dachary.org --ceph master
2015-06-16 10:10:45,162.162 INFO:teuthology.suite:ceph sha1: 81605bdd1def4f26df03d28bc22cc9bed39037ed
2015-06-16 10:10:45,731.731 INFO:teuthology.suite:ceph version: v9.0.1-915.g81605bd
2015-06-16 10:10:45,731.731 INFO:teuthology.suite:teuthology branch: master
2015-06-16 10:10:45,731.731 INFO:teuthology.suite:ceph-qa-suite branch: master
2015-06-16 10:10:45,731.731 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_master
2015-06-16 10:10:46,843.843 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_master to branch master
2015-06-16 10:10:48,222.222 INFO:teuthology.suite:Suite upgrade:firefly-x in /home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x generated 63 jobs (not yet filtered)
2015-06-16 10:10:48,251.251 INFO:teuthology.suite:Scheduling upgrade:firefly-x/parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml 2-workload/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/ubuntu_12.04.yaml}
Traceback (most recent call last):
  File "./virtualenv/bin/teuthology-schedule", line 9, in <module>
    load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-schedule')()
  File "/home/loic/software/ceph/teuthology/scripts/schedule.py", line 44, in main
    teuthology.schedule.main(args)
  File "/home/loic/software/ceph/teuthology/teuthology/schedule.py", line 25, in main
    schedule_job(job_config, args['--num'])
  File "/home/loic/software/ceph/teuthology/teuthology/schedule.py", line 75, in schedule_job
    beanstalk.use(tube)
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 190, in use
    return self._interact_value('use %s\r\n' % name, ['USING'])
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 110, in _interact_value
    return self._interact(command, expected_ok, expected_err)[0]
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 87, in _interact
    status, results = self._read_response()
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 96, in _read_response
    line = SocketError.wrap(self._socket_file.readline)
  File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 43, in wrap
    raise SocketError(err)
beanstalkc.SocketError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
  File "./virtualenv/bin/teuthology-suite", line 9, in <module>
    load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-suite')()
  File "/home/loic/software/ceph/teuthology/scripts/suite.py", line 86, in main
    teuthology.suite.main(args)
  File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 114, in main
    subset=subset,
  File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 297, in prepare_and_schedule
    subset=subset
  File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 617, in schedule_suite
    args=job['args'],
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['./virtualenv/bin/teuthology-schedule', '--name', 'loic-2015-06-16_10:10:44-upgrade:firefly-x-master---basic-vps', '--num', '1', '--worker', 'vps', '--priority', '101', '--description', 'upgrade:firefly-x/parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml 2-workload/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/ubuntu_12.04.yaml}', '--', '/tmp/schedule_suite_zZzjH_', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/0-cluster/start.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/1-firefly-install/firefly.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/ec-rados-parallel.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/rados_api.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/rados_loadgenbig.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/test_rbd_api.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/test_rbd_python.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/3-upgrade-sequence/upgrade-mon-osd-mds.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rados-snaps-few-objects.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rados_loadgenmix.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rados_mon_thrash.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rbd_cls.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rbd_import_export.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rgw_swift.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/distros/ubuntu_12.04.yaml']' returned non-zero exit status 1
Actions #7

Updated by Loïc Dachary almost 9 years ago

  • Priority changed from High to Urgent

Kefu confirms he has the same issue.

Actions #8

Updated by Loïc Dachary almost 9 years ago

  • Status changed from 12 to Resolved

Works now, no idea how it was fixed.

Actions

Also available in: Atom PDF