Project

General

Profile

Actions

Bug #4358

closed

kclient: ENOENT during kernel build on kclient

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

...
2013-03-06T04:44:54.240 INFO:teuthology.task.workunit.client.0.out:  HOSTCC  scripts/conmakehash
2013-03-06T04:44:54.293 INFO:teuthology.task.workunit.client.0.out:  CC      kernel/bounds.s
2013-03-06T04:44:54.508 INFO:teuthology.task.workunit.client.0.out:  HOSTLD  scripts/mod/modpost
2013-03-06T04:44:54.522 INFO:teuthology.task.workunit.client.0.out:  GEN     include/generated/bounds.h
2013-03-06T04:44:54.547 INFO:teuthology.task.workunit.client.0.out:  CC      arch/x86/kernel/asm-offsets.s
2013-03-06T04:44:54.666 INFO:teuthology.task.workunit.client.0.err:In file included from include/asm-generic/bitops/le.h:5:0,
2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err:                 from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/bitops.h:459,
2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err:                 from include/linux/bitops.h:22,
2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err:                 from include/linux/kernel.h:17,
2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err:                 from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/percpu.h:44,
2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err:                 from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/current.h:5,
2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err:                 from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/processor.h:15,
2013-03-06T04:44:54.667 INFO:teuthology.task.workunit.client.0.err:                 from /home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/atomic.h:6,
2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err:                 from include/linux/atomic.h:4,
2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err:                 from include/linux/crypto.h:20,
2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err:                 from arch/x86/kernel/asm-offsets.c:8:
2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err:/home/ubuntu/cephtest/mnt.0/client.0/tmp/t/linux-3.2.9/arch/x86/include/asm/byteorder.h:4:43: fatal error: linux/byteorder/little_endian.h: No such file or directory
2013-03-06T04:44:54.668 INFO:teuthology.task.workunit.client.0.err:compilation terminated.
2013-03-06T04:44:54.805 INFO:teuthology.task.workunit.client.0.err:make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1
2013-03-06T04:44:54.806 INFO:teuthology.task.workunit.client.0.err:make: *** [prepare0] Error 2
2013-03-06T04:44:54.806 INFO:teuthology.task.workunit.client.0.err:make: *** Waiting for unfinished jobs....

job was
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-06_01:00:04-regression-master-testing-gcov/16994$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: d11db70976a80343930d0889d0042bb9956503bf
nuke-on-error: true
overrides:
  ceph:
    conf:
      osd:
        osd op thread timeout: 60
    coverage: true
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: af6b6eddae073315c70bba4712511125360a0aad
  s3tests:
    branch: master
  workunit:
    sha1: af6b6eddae073315c70bba4712511125360a0aad
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- kclient: null
- workunit:
    clients:
      all:
      - kernel_untar_build.sh

Related issues 1 (0 open1 closed)

Has duplicate CephFS - Bug #4398: fix kclient_workunit_misc.yaml in the nightliesDuplicate03/08/2013

Actions
Actions #1

Updated by Zheng Yan about 11 years ago

got following message for kernel build error "find: `./include/generated': No such file or directory".
It's strange that there is no ceph_d_prune in the log. Maybe dentry_lru_prune() should not check
"!list_empty(&dentry->d_lru)".

[26812.210412] ceph: mdsc put_session ffff880036a01000 503 -> 502
[26812.210418] ceph: destroy_inode ffff880091993c18 ino 10000018dc8.fffffffffffffffe
[26812.210423] ceph: adding 10000018dc8 release to mds0 msg ffff8800361b0a00 (40078 left)
[26812.210425] ceph: release msg ffff8800361b0a00 at 43/170 (1036)
[26812.210426] ceph: __ceph_remove_cap ffff880116977b80 from ffff880091993c18
[26812.210428] ceph: put_cap ffff880116977b80 41130 = 39288 used + 1025 resv + 817 avail
[26812.210429] ceph: put_snap_realm 1 ffff880135b04d00 39288 -> 39287
[26812.210430] ceph: __cap_delay_cancel ffff880091993c18
[26812.210432] ceph: __ceph_destroy_xattrs p= (null)
[26812.210433] ceph: d_release ffff88011ea6efc0
[26812.210434] ceph: dentry_lru_del ffff88010d621500 ffff88011ea6efc0 'generated'

Actions #2

Updated by Ian Colle about 11 years ago

  • Assignee set to Greg Farnum
Actions #3

Updated by Ian Colle about 11 years ago

Let's see if this happens in testing branch after Yan's patches are all applied.

Actions #4

Updated by Greg Farnum about 11 years ago

  • Status changed from 12 to 7

An initial patch from Yan is in our testing branch and should fix this issue. (Or at least fixes one cause.) It may get edited a bit before final, though.

Actions #5

Updated by Alex Elder about 11 years ago

I hit this today while testing. Sorry, I don't remember
which test but Sage says he knows what happened.
http://pastebin.com/CsM9QYn2

Actions #6

Updated by Zheng Yan about 11 years ago

any idea to fix the locking issue? use atomic bit operation to modify the i_ceph_flags?

Actions #7

Updated by Sage Weil about 11 years ago

That might work, as long as we don't need to update the flags and i_release_count atomically... that'd have to become an atomic_t, and we'd need to be careful about ordering. It is probably simpler to just introduce a new spinlock for the flags + that field.

Actions #8

Updated by Greg Farnum about 11 years ago

  • Assignee changed from Greg Farnum to Sage Weil
Actions #9

Updated by Sage Weil about 11 years ago

20 iterations on testing branch. i ran a bunch on master to make sure i could trigger the old bug, but then couldn't after 100 iterations... so i can't draw any great conclusions. :( i'll hammer more on testing just to make sure it's holding up to that at least. :)

Actions #10

Updated by Sage Weil about 11 years ago

passed another 100 iterations (modulo a machine lockup on the server side)

Actions #11

Updated by Sage Weil about 11 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF