Actions
Bug #12031
closedunable to schedule jobs on sepia (beanstalkd unresponsive)
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
$ cat ~/.teuthology.yaml lock_server: http://paddles.front.sepia.ceph.com/ results_server: http://10.214.137.104/ queue_host: teuthology.front.sepia.ceph.com queue_port: 11300 $ ./virtualenv/bin/teuthology-suite -l1 --priority 1000 --suite rgw --subset 1/18 --suite-branch hammer --distro ubuntu --email abhishek.lekshmanan@gmail.com --owner abhishek.lekshmanan@gmail.com --ceph hammer-backports --machine-type plana,burnupi,mira 2015-06-16 09:45:11,557.557 INFO:teuthology.suite:Passed subset=1/18 2015-06-16 09:45:12,103.103 INFO:teuthology.suite:ceph sha1: 3677fd2708856587ac76fde086d1a4f7a20339a8 2015-06-16 09:45:12,677.677 INFO:teuthology.suite:ceph version: 0.94.2-70-g3677fd2-1trusty 2015-06-16 09:45:12,677.677 INFO:teuthology.suite:teuthology branch: master 2015-06-16 09:45:13,509.509 INFO:teuthology.suite:ceph-qa-suite branch: hammer 2015-06-16 09:45:13,510.510 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_hammer 2015-06-16 09:45:14,586.586 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_hammer to branch hammer 2015-06-16 09:45:15,721.721 INFO:teuthology.suite:Suite rgw in /home/loic/src/ceph-qa-suite_hammer/suites/rgw generated 15 jobs (not yet filtered) 2015-06-16 09:45:15,727.727 INFO:teuthology.suite:Stopped after 1 jobs due to --limit=1 2015-06-16 09:45:15,727.727 INFO:teuthology.suite:Scheduling rgw/verify/{overrides.yaml clusters/fixed-2.yaml frontend/civetweb.yaml fs/btrfs.yaml msgr-failures/few.yaml rgw_pool_type/replicated.yaml tasks/rgw_s3tests.yaml validater/valgrind.yaml} Traceback (most recent call last): File "./virtualenv/bin/teuthology-schedule", line 9, in <module> load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-schedule')() File "/home/loic/software/ceph/teuthology/scripts/schedule.py", line 44, in main teuthology.schedule.main(args) File "/home/loic/software/ceph/teuthology/teuthology/schedule.py", line 25, in main schedule_job(job_config, args['--num']) File "/home/loic/software/ceph/teuthology/teuthology/schedule.py", line 75, in schedule_job beanstalk.use(tube) File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 190, in use return self._interact_value('use %s\r\n' % name, ['USING']) File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 110, in _interact_value return self._interact(command, expected_ok, expected_err)[0] File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 87, in _interact status, results = self._read_response() File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 96, in _read_response line = SocketError.wrap(self._socket_file.readline) File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 43, in wrap raise SocketError(err) beanstalkc.SocketError: [Errno 104] Connection reset by peer Traceback (most recent call last): File "./virtualenv/bin/teuthology-suite", line 9, in <module> load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-suite')() File "/home/loic/software/ceph/teuthology/scripts/suite.py", line 86, in main teuthology.suite.main(args) File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 114, in main subset=subset, File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 297, in prepare_and_schedule subset=subset File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 617, in schedule_suite args=job['args'], File "/usr/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['./virtualenv/bin/teuthology-schedule', '--name', 'loic-2015-06-16_09:45:11-rgw-hammer-backports---basic-multi', '--num', '1', '--worker', 'multi', '--priority', '1000', '--owner', 'abhishek.lekshmanan@gmail.com', '--description', 'rgw/verify/{overrides.yaml clusters/fixed-2.yaml frontend/civetweb.yaml fs/btrfs.yaml msgr-failures/few.yaml rgw_pool_type/replicated.yaml tasks/rgw_s3tests.yaml validater/valgrind.yaml}', '--', '/tmp/schedule_suite_GRUXTS', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/clusters/fixed-2.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/frontend/civetweb.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/fs/btrfs.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/msgr-failures/few.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/overrides.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/rgw_pool_type/replicated.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/tasks/rgw_s3tests.yaml', '/home/loic/src/ceph-qa-suite_hammer/suites/rgw/verify/validater/valgrind.yaml']' returned non-zero exit status 1
Updated by Loïc Dachary almost 9 years ago
Looks like the beanstalkd daemon is down. Or maybe it changed and longer is at teuthology.front.sepia.ceph.com ?
Updated by Loïc Dachary almost 9 years ago
I see no announcement in the sepia mailing list.
Updated by Loïc Dachary almost 9 years ago
ubuntu@teuthology:~$ sudo netstat -tlpn | grep beans tcp 0 0 0.0.0.0:11300 0.0.0.0:* LISTEN 1763/beanstalkd
Updated by Loïc Dachary almost 9 years ago
It's difficult to figure out the source of the problem because no logs (http://tracker.ceph.com/issues/12032). I'm hesistant to just restart beanstalkd because I've never done it before and http://tracker.ceph.com/projects/lab/search?utf8=%E2%9C%93&issues=1&q=beanstalkd shows no sign that it might actually be needed from time to time.
Updated by Loïc Dachary almost 9 years ago
- Subject changed from unable to schedule jobs on sepia to unable to schedule jobs on sepia (beanstalkd unresponsive)
Updated by Loïc Dachary almost 9 years ago
$ ./virtualenv/bin/teuthology-suite --filter="upgrade:firefly-x/parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml 2-workload/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/ubuntu_12.04.yaml}" --priority 101 --suite upgrade/firefly-x --suite-branch master --machine-type vps --email loic@dachary.org --ceph master 2015-06-16 10:10:45,162.162 INFO:teuthology.suite:ceph sha1: 81605bdd1def4f26df03d28bc22cc9bed39037ed 2015-06-16 10:10:45,731.731 INFO:teuthology.suite:ceph version: v9.0.1-915.g81605bd 2015-06-16 10:10:45,731.731 INFO:teuthology.suite:teuthology branch: master 2015-06-16 10:10:45,731.731 INFO:teuthology.suite:ceph-qa-suite branch: master 2015-06-16 10:10:45,731.731 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_master 2015-06-16 10:10:46,843.843 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_master to branch master 2015-06-16 10:10:48,222.222 INFO:teuthology.suite:Suite upgrade:firefly-x in /home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x generated 63 jobs (not yet filtered) 2015-06-16 10:10:48,251.251 INFO:teuthology.suite:Scheduling upgrade:firefly-x/parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml 2-workload/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/ubuntu_12.04.yaml} Traceback (most recent call last): File "./virtualenv/bin/teuthology-schedule", line 9, in <module> load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-schedule')() File "/home/loic/software/ceph/teuthology/scripts/schedule.py", line 44, in main teuthology.schedule.main(args) File "/home/loic/software/ceph/teuthology/teuthology/schedule.py", line 25, in main schedule_job(job_config, args['--num']) File "/home/loic/software/ceph/teuthology/teuthology/schedule.py", line 75, in schedule_job beanstalk.use(tube) File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 190, in use return self._interact_value('use %s\r\n' % name, ['USING']) File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 110, in _interact_value return self._interact(command, expected_ok, expected_err)[0] File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 87, in _interact status, results = self._read_response() File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 96, in _read_response line = SocketError.wrap(self._socket_file.readline) File "/home/loic/software/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/beanstalkc.py", line 43, in wrap raise SocketError(err) beanstalkc.SocketError: [Errno 104] Connection reset by peer Traceback (most recent call last): File "./virtualenv/bin/teuthology-suite", line 9, in <module> load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-suite')() File "/home/loic/software/ceph/teuthology/scripts/suite.py", line 86, in main teuthology.suite.main(args) File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 114, in main subset=subset, File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 297, in prepare_and_schedule subset=subset File "/home/loic/software/ceph/teuthology/teuthology/suite.py", line 617, in schedule_suite args=job['args'], File "/usr/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['./virtualenv/bin/teuthology-schedule', '--name', 'loic-2015-06-16_10:10:44-upgrade:firefly-x-master---basic-vps', '--num', '1', '--worker', 'vps', '--priority', '101', '--description', 'upgrade:firefly-x/parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml 2-workload/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/ubuntu_12.04.yaml}', '--', '/tmp/schedule_suite_zZzjH_', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/0-cluster/start.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/1-firefly-install/firefly.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/ec-rados-parallel.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/rados_api.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/rados_loadgenbig.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/test_rbd_api.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/2-workload/test_rbd_python.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/3-upgrade-sequence/upgrade-mon-osd-mds.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rados-snaps-few-objects.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rados_loadgenmix.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rados_mon_thrash.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rbd_cls.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rbd_import_export.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/4-final-workload/rgw_swift.yaml', '/home/loic/src/ceph-qa-suite_master/suites/upgrade/firefly-x/parallel/distros/ubuntu_12.04.yaml']' returned non-zero exit status 1
Updated by Loïc Dachary almost 9 years ago
- Priority changed from High to Urgent
Kefu confirms he has the same issue.
Updated by Loïc Dachary almost 9 years ago
- Status changed from 12 to Resolved
Works now, no idea how it was fixed.
Actions