Actions
Bug #13879
openovh: hadoop wordcount is timing out on OpenStack nodes
Status:
New
Priority:
Low
Assignee:
-
Category:
Testing
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We don't really have any logging, and I think this runs pretty fast on sepia. We've got some other issues with MPI not being able to communicate that makes me wonder if we're trying to use ports that are blocked, although the other job in the suite is passing.
2015-11-14T18:44:58.402 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:44:58 INFO client.RMProxy: Connecting to ResourceManager at /158.69.71.28:8050 2015-11-14T18:44:59.092 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:44:58 INFO ceph.CephFileSystem: selectDataPool path=/tmp/hadoop-yarn/staging/ubuntu/.staging/job_1447526634752_0001/job.jar pool:repl=data:2 wanted=3 2015-11-14T18:44:59.203 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:44:59 INFO input.FileInputFormat: Total input paths to process : 3 2015-11-14T18:44:59.225 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:44:59 INFO ceph.CephFileSystem: selectDataPool path=/tmp/hadoop-yarn/staging/ubuntu/.staging/job_1447526634752_0001/job.split pool:repl=data:2 wanted=3 2015-11-14T18:44:59.249 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:44:59 INFO ceph.CephFileSystem: selectDataPool path=/tmp/hadoop-yarn/staging/ubuntu/.staging/job_1447526634752_0001/job.splitmetainfo pool:repl=data:2 wanted=3 2015-11-14T18:44:59.266 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:44:59 INFO mapreduce.JobSubmitter: number of splits:3 2015-11-14T18:44:59.274 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:44:59 INFO ceph.CephFileSystem: selectDataPool path=/tmp/hadoop-yarn/staging/ubuntu/.staging/job_1447526634752_0001/job.xml pool:repl=data:2 wanted=3 2015-11-14T18:44:59.525 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:44:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447526634752_0001 2015-11-14T18:45:00.863 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:45:00 INFO impl.YarnClientImpl: Submitted application application_1447526634752_0001 2015-11-14T18:45:00.907 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:45:00 INFO mapreduce.Job: The url to track the job: http://target071028.ovh.sepia.ceph.com:8088/proxy/application_1447526634752_0001/ 2015-11-14T18:45:00.908 INFO:tasks.workunit.client.0.target071029.stderr:15/11/14 18:45:00 INFO mapreduce.Job: Running job: job_1447526634752_0001 2015-11-14T21:44:47.105 INFO:tasks.workunit:Stopping ['hadoop/wordcount.sh'] on client.0... 2015-11-14T21:44:47.106 INFO:teuthology.orchestra.run.target071029:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/workunit.client.0' 2015-11-14T21:44:47.410 ERROR:teuthology.parallel:Exception in parallel execution Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__ for result in self: File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/ceph-qa-suite_hammer/tasks/workunit.py", line 361, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 156, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run r.wait() File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait label=self.label) CommandFailedError: Command failed (workunit test hadoop/wordcount.sh) on target071029 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=a79acd41187e6b049432bdc314f192e3fbb560a3 TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" PATH=$PATH:/usr/sbin adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.0/hadoop/wordcount.sh'
Updated by John Spray over 7 years ago
- Subject changed from qa: hadoop wordcount is timing out on OpenStack nodes to ovh: hadoop wordcount is timing out on OpenStack nodes
- Priority changed from Normal to Low
Actions