Bug #9679
closedCeph hadoop terasort job failure
Added by Huamin Chen over 9 years ago. Updated about 5 years ago.
0%
Description
Hadoop version: 2.4.1
Ceph version:
ceph --version
ceph version 0.85-986-g031ef05 (031ef0551ebc98d824075558e884faf23ad92f2d)
How to reproduce:
Run 10GB terasort on plana85
su -l hadoop
./hadoop/bin/hadoop jar ./hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar terasort -Dmapreduce.job.maps=32 /in-dir /out-dir
Error:
14/10/07 08:02:36 INFO mapreduce.Job: Task Id : attempt_1412690505174_0006_m_000034_0, Status : FAILED
Error: java.io.EOFException: read past eof
at org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:261)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Files
client.admin.11179.log (722 KB) client.admin.11179.log | bad client | Noah Watkins, 10/10/2014 09:13 AM | |
client.admin.339.log (746 KB) client.admin.339.log | good client | Noah Watkins, 10/10/2014 09:13 AM |
Updated by Noah Watkins over 9 years ago
Thanks for adding this. What command did you use to generate the input?
Updated by Huamin Chen over 9 years ago
Teragen command:
./hadoop/bin/hadoop jar ./hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar teragen -Dmapreduce.job.maps=32 100000000 /in-dir
Updated by Noah Watkins over 9 years ago
Updated by Noah Watkins over 9 years ago
My bet at this point is on the generation of the input data set. Teragen creates a file with X 100byte entries. When multiple mappers are used they divide up the work. Some of the input files based on their size are clearly not containing a multiple of 100 bytes.
Here are what 1, 10 and 16 mappers created for the same requested input data set size
1 mapper
hadoop@plana85:~$ hadoop/bin/hadoop fs -ls /in-dir-1m 14/10/08 22:43:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 2 hadoop 0 2014-10-08 21:45 /in-dir-1m/_SUCCESS -rw-r--r-- 2 hadoop 10000000000 2014-10-08 21:45 /in-dir-1m/part-m-00000
10 mappers
hadoop@plana85:~$ hadoop/bin/hadoop fs -ls /in-dir-10m 14/10/08 22:41:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 11 items -rw-r--r-- 2 hadoop 0 2014-10-08 22:40 /in-dir-10m/_SUCCESS -rw-r--r-- 2 hadoop 995098624 2014-10-08 22:39 /in-dir-10m/part-m-00000 -rw-r--r-- 2 hadoop 1000000000 2014-10-08 22:39 /in-dir-10m/part-m-00001 -rw-r--r-- 2 hadoop 1000000000 2014-10-08 22:39 /in-dir-10m/part-m-00002 -rw-r--r-- 2 hadoop 1000000000 2014-10-08 22:39 /in-dir-10m/part-m-00003 -rw-r--r-- 2 hadoop 1000000000 2014-10-08 22:39 /in-dir-10m/part-m-00004 -rw-r--r-- 2 hadoop 1000000000 2014-10-08 22:39 /in-dir-10m/part-m-00005 -rw-r--r-- 2 hadoop 958398464 2014-10-08 22:39 /in-dir-10m/part-m-00006 -rw-r--r-- 2 hadoop 1000000000 2014-10-08 22:39 /in-dir-10m/part-m-00007 -rw-r--r-- 2 hadoop 967835648 2014-10-08 22:39 /in-dir-10m/part-m-00008 -rw-r--r-- 2 hadoop 1000000000 2014-10-08 22:39 /in-dir-10m/part-m-00009
16 mappers
hadoop@plana85:~$ hadoop/bin/hadoop fs -ls /in-dir-16m 14/10/08 22:32:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 17 items -rw-r--r-- 2 hadoop 0 2014-10-08 22:08 /in-dir-16m/_SUCCESS -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00000 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:06 /in-dir-16m/part-m-00001 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:06 /in-dir-16m/part-m-00002 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00003 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00004 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00005 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:06 /in-dir-16m/part-m-00006 -rw-r--r-- 2 hadoop 612376576 2014-10-08 22:07 /in-dir-16m/part-m-00007 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00008 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00009 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00010 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00011 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00012 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00013 -rw-r--r-- 2 hadoop 625000000 2014-10-08 22:07 /in-dir-16m/part-m-00014 -rw-r--r-- 2 hadoop 612372480 2014-10-08 22:07 /in-dir-16m/part-m-00015
Updated by Noah Watkins over 9 years ago
missing one of these?
int ceph_sync_fs(struct ceph_mount_info *cmount); int ceph_fsync(struct ceph_mount_info *cmount, int fd, int syncdataonly);
Updated by Huamin Chen over 9 years ago
Run the same tests on HDFS 2.4.1, thoguh on a different setup. Terasort finished without any problem.
Cmd:
./hadoop/bin/hadoop jar ./hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar teragen -Dmapreduce.job.maps=32 100000000 /in-dir
teragen files:
./hadoop/bin/hadoop fs ls /in-dir 1 hadoop supergroup 0 2014-10-09 10:00 /in-dir/_SUCCESS
14/10/09 10:01:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 33 items
-rw-r--r-rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00000rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00001rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00002rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00003rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00004rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00005rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00006rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00007rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00008rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00009rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00010rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00011rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00012rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00013rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00014rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00015rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00016rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00017rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00018rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00019rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00020rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00021rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00022rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00023rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00024rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00025rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00026rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00027rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00028rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00029rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00030rw-r--r- 1 hadoop supergroup 312500000 2014-10-09 10:00 /in-dir/part-m-00031
Updated by Huamin Chen over 9 years ago
For comparison, teragen files on CephFS
./hadoop/bin/hadoop fs -ls /in-dir-3
14/10/09 08:05:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 33 itemsrw-r--r- 2 hadoop 0 2014-10-09 08:04 /in-dir-3/_SUCCESSrw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00000rw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00001rw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00002rw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00003rw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00004rw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00005rw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00006rw-r--r- 2 hadoop 306184192 2014-10-09 08:02 /in-dir-3/part-m-00007rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00008rw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00009rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00010rw-r--r- 2 hadoop 312500000 2014-10-09 08:02 /in-dir-3/part-m-00011rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00012rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00013rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00014rw-r--r- 2 hadoop 287309824 2014-10-09 08:03 /in-dir-3/part-m-00015rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00016rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00017rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00018rw-r--r- 2 hadoop 293601280 2014-10-09 08:03 /in-dir-3/part-m-00019rw-r--r- 2 hadoop 287309824 2014-10-09 08:03 /in-dir-3/part-m-00020rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00021rw-r--r- 2 hadoop 268435456 2014-10-09 08:03 /in-dir-3/part-m-00022rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00023rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00024rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00025rw-r--r- 2 hadoop 283115520 2014-10-09 08:03 /in-dir-3/part-m-00026rw-r--r- 2 hadoop 287309824 2014-10-09 08:03 /in-dir-3/part-m-00027rw-r--r- 2 hadoop 306184192 2014-10-09 08:03 /in-dir-3/part-m-00028rw-r--r- 2 hadoop 312500000 2014-10-09 08:03 /in-dir-3/part-m-00029rw-r--r- 2 hadoop 306184192 2014-10-09 08:03 /in-dir-3/part-m-00030rw-r--r- 2 hadoop 306184192 2014-10-09 08:03 /in-dir-3/part-m-00031
Updated by Noah Watkins over 9 years ago
Thanks Huamin. Yeh, It looks like some writes are being lost, probably due to an unclean shutdown. I'll get some traces and look into this.
Updated by Noah Watkins over 9 years ago
empty fs:
nwatkins@martini ~$ cat log ubuntu@plana85:~$ sudo rados -p cephfs_data ls ubuntu@plana85:~$
generate data with teragen:
hadoop@plana85:~$ hadoop/bin/hadoop fs -ls /in-dir-10m 14/10/09 11:47:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 11 items -rw-r--r-- 2 hadoop 0 2014-10-09 11:39 /in-dir-10m/_SUCCESS -rw-r--r-- 2 hadoop 958398464 2014-10-09 11:38 /in-dir-10m/part-m-00000 -rw-r--r-- 2 hadoop 1000000000 2014-10-09 11:38 /in-dir-10m/part-m-00001 -rw-r--r-- 2 hadoop 939524096 2014-10-09 11:38 /in-dir-10m/part-m-00002 -rw-r--r-- 2 hadoop 1000000000 2014-10-09 11:38 /in-dir-10m/part-m-00003 -rw-r--r-- 2 hadoop 996147200 2014-10-09 11:38 /in-dir-10m/part-m-00004 -rw-r--r-- 2 hadoop 1000000000 2014-10-09 11:38 /in-dir-10m/part-m-00005 -rw-r--r-- 2 hadoop 1000000000 2014-10-09 11:38 /in-dir-10m/part-m-00006 -rw-r--r-- 2 hadoop 1000000000 2014-10-09 11:38 /in-dir-10m/part-m-00007 -rw-r--r-- 2 hadoop 1000000000 2014-10-09 11:38 /in-dir-10m/part-m-00008 -rw-r--r-- 2 hadoop 1000000000 2014-10-09 11:38 /in-dir-10m/part-m-00009
sum up size of objects in rados. looks like the sizes match what's reported by cephfs. the extra 3 inodes are probably directories or something.
ubuntu@plana85:~$ cat out | python sum.py 13 inodes 10000001c8d 958398464 10000002c65 1000000000 100000004a7 334 100000008b6 1000000000 100000014a1 1000000000 100000004a9 92039 100000004a8 83772 10000002477 1000000000 100000010aa 1000000000 10000001898 1000000000 10000002082 1000000000 1000000286e 996147200 10000000cb3 939524096
Updated by Noah Watkins over 9 years ago
- File client.admin.11179.log client.admin.11179.log added
- File client.admin.339.log client.admin.339.log added
Here is the directory listing. All of the files should be the same size.
ubuntu@plana55:~$ ls -l ceph-mount/in-dir-1g-noah/ total 880475 -rw-r--r-- 1 hadoop hadoop 100000000 Oct 10 07:54 part-m-00000 -rw-r--r-- 1 hadoop hadoop 76546048 Oct 10 07:54 part-m-00001 -rw-r--r-- 1 hadoop hadoop 67108864 Oct 10 07:54 part-m-00002 -rw-r--r-- 1 hadoop hadoop 95420416 Oct 10 07:54 part-m-00003 -rw-r--r-- 1 hadoop hadoop 100000000 Oct 10 07:54 part-m-00004 -rw-r--r-- 1 hadoop hadoop 100000000 Oct 10 07:54 part-m-00005 -rw-r--r-- 1 hadoop hadoop 76546048 Oct 10 07:54 part-m-00006 -rw-r--r-- 1 hadoop hadoop 85983232 Oct 10 07:54 part-m-00007 -rw-r--r-- 1 hadoop hadoop 100000000 Oct 10 07:54 part-m-00008 -rw-r--r-- 1 hadoop hadoop 100000000 Oct 10 07:54 part-m-00009 -rw-r--r-- 1 hadoop hadoop 0 Oct 10 07:55 _SUCCESS
I have traces for each client that wrote these. Let's look at part-m-00002.
nwatkins@martini 96-writes$ grep -l "part-m-00002" * client.admin.11179.log
There are 96 write calls:
nwatkins@martini 96-writes$ cat client.admin.11179.log | grep "write(" | wc -l 96
And here is what the calls look like:
nwatkins@martini 96-writes$ cat client.admin.11179.log | grep "write(" 2014-10-10 07:54:02.904735 7fd0175d9700 3 client.5521 write(12, "...", 1048576, -1) = 1048576 2014-10-10 07:54:02.968177 7fd0175d9700 3 client.5521 write(12, "...", 1048576, -1) = 1048576 ... clip ... 2014-10-10 07:54:04.889349 7fd0175d9700 3 client.5521 write(12, "...", 1048576, -1) = 1048576 2014-10-10 07:54:04.898805 7fd0175d9700 3 client.5521 write(12, "...", 385280, -1) = 385280
So, 96 writes. All but the last are 1048576 bytes. So, 1048576 * 95 + 385280 = 100000000. All the writes at least reach the client. The file is also successfully closed, right after the last write in the previous trace:
3456 2014-10-10 07:54:04.898805 7fd0175d9700 3 client.5521 write(12, "...", 385280, -1) = 385280 3457 2014-10-10 07:54:04.898808 7fd0175d9700 10 jni: write: exit ret 385280 3458 2014-10-10 07:54:04.898829 7fd0175d9700 10 jni: close: fd 12 3459 2014-10-10 07:54:04.898832 7fd0175d9700 3 client.5521 close enter(12)
What's interesting is that it appears there is a race with unmount and the hadoop client being killed. After the close the trace shows the client enter ceph_unmount but never exit. A trace from a client writing a complete file also doesn't exist unmount, but I do see at the very end of such a trace something about caps, adn that is not present in the trace from the client that generated a bad file.
I included both trace. bad file client: client.admin.11179.log good file client: client.admin.339.log.
The only other piece of information I know about regarding these files is that after the client writing is done, another process in the system renames the file. But that should be serialized with the closing of the file by the client writing.
nwatkins@martini teragen-10m-1g$ grep "in-dir-1g-noah\/part-" *.log | grep rename client.admin.10009.log:2014-10-10 07:55:08.139031 7fd17903d700 10 jni: rename: from /in-dir-1g-noah/_temporary/1/task_1412832704798_0006_m_000002/part-m-00002 to /in-dir-1g-noah/part-m-00002 client.admin.10009.log:2014-10-10 07:55:08.197841 7fd17903d700 10 jni: rename: from /in-dir-1g-noah/_temporary/1/task_1412832704798_0006_m_000009/part-m-00009 to /in-dir-1g-noah/part-m-00009
Updated by Greg Farnum over 9 years ago
Looking at the bad client (11139), the first thing I notice is that the messaging is way backed up. What's the network pipe here, and is it being shared?
2014-10-10 07:54:04.102583 7fcff7439700 20 -- 10.214.132.23:0/1819159467 submit_message osd_op(client.5521.0:6 10000000cd0.00000000 [write 37748736~9437184] 1.4b78efdc snapc 1=[] ondisk+write+localize_reads+known_if_redirected e33) v4 remote, 10.214.132.27:6800/7357, have pipe. ... 2014-10-10 07:54:04.937606 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer encoding 6 features 35184372088831 0x7fd0001c08b0 osd_op(client.5521.0:6 10000000cd0.00000000 [write 37748736~9437184] 1.4b78efdc snapc 1=[] ondisk+write+localize_reads+known_if_redirected e33) v4 2014-10-10 07:54:04.939968 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer signed seq # 6): sig = 6511400259401246671 2014-10-10 07:54:04.940000 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer sending 6 0x7fd0001c08b0
So here we have the sixth message (the fifth write) being encoded onto the wire some .84 seconds after it was queued by the upper layers. That's fairly unusual. If I do some sampling on the Pipe writer thread:
2014-10-10 07:54:03.199084 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer encoding 2 features 35184372088831 0x7fd0001c08b0 osd_op(client.5521.0:2 10000000cd0.00000000 [write 0~9437184] 1.4b78efdc snapc 1=[] ondisk+write+localize_reads+known_if_redirected e33) v4 2014-10-10 07:54:03.201066 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer signed seq # 2): sig = 7993448059363581721 2014-10-10 07:54:03.201095 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer sending 2 0x7fd0001c08b0 2014-10-10 07:54:03.869905 7fcff3f2e700 10 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer: state = open policy.server=0 2014-10-10 07:54:03.869950 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer encoding 3 features 35184372088831 0x7fd0001c10b0 osd_op(client.5521.0:3 10000000cd0.00000000 [write 9437184~9437184] 1.4b78efdc snapc 1=[] ondisk+write+localize_reads+known_if_redirected e33) v4 2014-10-10 07:54:03.872856 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer signed seq # 3): sig = 17390466052185533091 2014-10-10 07:54:03.872902 7fcff3f2e700 20 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer sending 3 0x7fd0001c10b0 2014-10-10 07:54:04.247741 7fcff3f2e700 10 -- 10.214.132.23:0/1819159467 >> 10.214.132.27:6800/7357 pipe(0x7fd010ee4ee0 sd=194 :50309 s=2 pgs=1009 cs=1 l=1 c=0x7fd010ee5170).writer: state = open policy.server=0
It looks like simply writing the second message out to (not necessarily even over) the socket took some .66 seconds; the third took ~.37. Those are 9MB of data each, which on a 1Gb connection is ~.075 seconds of time per.
Then the end of the log...doesn't look like an end. I can see the unmount starting and waiting on an MDS request, but there should also be lots of flushing of data and things.
Is it possible that the Java or Hadoop runtime is terminating the process when it blocks for a little while?
The good client (339) is not suffering from the strange messenger blocked-up-ness.
Updated by Noah Watkins over 9 years ago
I do believe that Hadoop kills the clients after they reach a point that the run-time believes everything has been flushed. I wrongly assumed that close flushed everything out. Adding fsync to close seems to make things work as expected.
hadoop@plana85:~$ hadoop/bin/hadoop fs -ls /in-dir-1g-noah Found 11 items -rw-r--r-- 2 hadoop 0 2014-10-10 16:06 /in-dir-1g-noah/_SUCCESS -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00000 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00001 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00002 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00003 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00004 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00005 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00006 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00007 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00008 -rw-r--r-- 2 hadoop 100000000 2014-10-10 16:05 /in-dir-1g-noah/part-m-00009
hadoop@plana85:~$ hadoop/bin/hadoop fs -ls /in-dir-10g-noah 14/10/10 16:10:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 32 items -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00000 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:07 /in-dir-10g-noah/part-m-00001 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00002 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00003 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00004 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:07 /in-dir-10g-noah/part-m-00005 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00006 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00007 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00008 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:07 /in-dir-10g-noah/part-m-00009 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00010 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00011 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00012 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00013 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00014 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00015 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00016 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00017 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00018 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00019 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00020 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00021 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00022 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00023 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00024 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00025 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00026 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00027 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00028 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00029 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00030 -rw-r--r-- 2 hadoop 312500000 2014-10-10 16:08 /in-dir-10g-noah/part-m-00031
Updated by Noah Watkins over 9 years ago
- Status changed from New to Closed
Fixed in cephfs-hadoop repo.
Updated by Dmitry Buzz about 8 years ago
I'm testing ceph version 10.0.3-1325-g98fba62 on hadoop 3.0.0 and I see the same error in hadoop terasort.
I'm not sure if this is hadoop or ceph issue but I can upload logs if ceph development want to pursue this issue further.
Updated by Greg Farnum about 8 years ago
Dmitry, can you elucidate more on your environment? Which Hadoop, and what bindings did you use to connect Ceph and Hadoop?
(Also, ping Noah.)
Updated by Dmitry Buzz about 8 years ago
Greg,
My setup was simple: ceph jni bindings were build from ceph repo.
Cehp hadoop2 integration was build from here: https://github.com/GregBowyer/cephfs-hadoop.git using maven. All .so and jar files were exposed accordingly and simple put/get file worked ok, but terasort benchmark showed this error.
For what it worth, I think ceph shouldn't chase hadoop HDFS compatibility and rather design proper interface to Spark RDD and Dataframes exposing data locality. This is a better approach than impersonate Ceph in place of HDFS since it decouples Ceph from HDFS dependency. I'm not sure if this is possible though. I've also found on Terasort tests Ceph (x2 replication) is 25%-40% slower than plain HDFS (x2 replication) and this came as a bit of surprise to me. On the other hand if Ceph community wants to maintain proper integration with Hadoop, it should really takeover and maintain cephfs-hadoop in sync with Hadoop versions. It takes unimaginable amount of effort to integrate both, since Ceph site documentation and user list archives are outdated on this subject and googling every possible hiccup towards such integration is not for a fainthearted.
The other route is to develop data locality standard for parallel data stores like Ceph and expose it to host processes in sort of universal manner, independent on underlying data format (CSV, object storage, parallel HDF5, etc...). This is a daunting task, but without this Ceph is not attractive option as an underlying storage for parallel computation frameworks in the ML and Data Analytics domain.
Updated by Dmitry Buzz about 8 years ago
I forgot to mention, I tested both: Hadoop 2.7.1 and 2.7.2 with the same outcome.
64 bit Arch linux circa Jan 2015. 2-10 node cluster.
Updated by Patrick Donnelly about 5 years ago
- Category deleted (
48) - Labels (FS) Java/Hadoop added