Bug #4358
closedkclient: ENOENT during kernel build on kclient
0%
Description
... 2013-03-06T04:44:54.240 INFO:teuthology.task.workunit.client.0.out: HOSTCC scripts/conmakehash 2013-03-06T04:44:54.293 INFO:teuthology.task.workunit.client.0.out: CC kernel/bounds.s 2013-03-06T04:44:54.508 INFO:teuthology.task.workunit.client.0.out: HOSTLD scripts/mod/modpost 2013-03-06T04:44:54.522 INFO:teuthology.task.workunit.client.0.out: GEN include/generated/bounds.h 2013-03-06T04:44:54.547 INFO:teuthology.task.workunit.client.0.out: CC arch/x86/kernel/asm-offsets.s 2013-03-06T04:44:54.666 INFO:teuthology.task.workunit.client.0.err:In file included from include/asm-generic/bitops/le.h:5:0, 2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err: from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/bitops.h:459, 2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err: from include/linux/bitops.h:22, 2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err: from include/linux/kernel.h:17, 2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err: from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/percpu.h:44, 2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err: from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/current.h:5, 2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err: from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/processor.h:15, 2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err: from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/atomic.h:6, 2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err: from include/linux/atomic.h:4, 2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err: from include/linux/crypto.h:20, 2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err: from arch/x86/kernel/asm-offsets.c:8: 2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err:/home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/byteorder.h:4:43: fatal error: linux/byteorder/little_endian.h: No such file or directory 2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err:compilation terminated. 2013-03-06T04:44:54.805 INFO:teuthology.task.workunit.client.0.err:make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1 2013-03-06T04:44:54.806 INFO:teuthology.task.workunit.client.0.err:make: *** [prepare0] Error 2 2013-03-06T04:44:54.806 INFO:teuthology.task.workunit.client.0.err:make: *** Waiting for unfinished jobs....
job was
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-06_01:00:04-regression-master-testing-gcov/16994$ cat orig.config.yaml kernel: kdb: true sha1: d11db70976a80343930d0889d0042bb9956503bf nuke-on-error: true overrides: ceph: conf: osd: osd op thread timeout: 60 coverage: true fs: btrfs log-whitelist: - slow request sha1: af6b6eddae073315c70bba4712511125360a0aad s3tests: branch: master workunit: sha1: af6b6eddae073315c70bba4712511125360a0aad roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - install: null - ceph: null - kclient: null - workunit: clients: all: - kernel_untar_build.sh
Updated by Zheng Yan about 11 years ago
got following message for kernel build error "find: `./include/generated': No such file or directory".
It's strange that there is no ceph_d_prune in the log. Maybe dentry_lru_prune() should not check
"!list_empty(&dentry->d_lru)".
[26812.210412] ceph: mdsc put_session ffff880036a01000 503 -> 502
[26812.210418] ceph: destroy_inode ffff880091993c18 ino 10000018dc8.fffffffffffffffe
[26812.210423] ceph: adding 10000018dc8 release to mds0 msg ffff8800361b0a00 (40078 left)
[26812.210425] ceph: release msg ffff8800361b0a00 at 43/170 (1036)
[26812.210426] ceph: __ceph_remove_cap ffff880116977b80 from ffff880091993c18
[26812.210428] ceph: put_cap ffff880116977b80 41130 = 39288 used + 1025 resv + 817 avail
[26812.210429] ceph: put_snap_realm 1 ffff880135b04d00 39288 -> 39287
[26812.210430] ceph: __cap_delay_cancel ffff880091993c18
[26812.210432] ceph: __ceph_destroy_xattrs p= (null)
[26812.210433] ceph: d_release ffff88011ea6efc0
[26812.210434] ceph: dentry_lru_del ffff88010d621500 ffff88011ea6efc0 'generated'
Updated by Ian Colle about 11 years ago
Let's see if this happens in testing branch after Yan's patches are all applied.
Updated by Greg Farnum about 11 years ago
- Status changed from 12 to 7
An initial patch from Yan is in our testing branch and should fix this issue. (Or at least fixes one cause.) It may get edited a bit before final, though.
Updated by Alex Elder about 11 years ago
I hit this today while testing. Sorry, I don't remember
which test but Sage says he knows what happened.
http://pastebin.com/CsM9QYn2
Updated by Zheng Yan about 11 years ago
any idea to fix the locking issue? use atomic bit operation to modify the i_ceph_flags?
Updated by Sage Weil about 11 years ago
That might work, as long as we don't need to update the flags and i_release_count atomically... that'd have to become an atomic_t, and we'd need to be careful about ordering. It is probably simpler to just introduce a new spinlock for the flags + that field.
Updated by Greg Farnum about 11 years ago
- Assignee changed from Greg Farnum to Sage Weil
Updated by Sage Weil about 11 years ago
20 iterations on testing branch. i ran a bunch on master to make sure i could trigger the old bug, but then couldn't after 100 iterations... so i can't draw any great conclusions. :( i'll hammer more on testing just to make sure it's holding up to that at least. :)
Updated by Sage Weil about 11 years ago
passed another 100 iterations (modulo a machine lockup on the server side)