Feature #5000
closedGet Teuthology to run on ARM's
0%
Description
This one is a tough one to estimate.
Updated by Ian Colle almost 11 years ago
- Translation missing: en.field_story_points changed from 20.00 to 21.00
Updated by Tamilarasi muthamizhan almost 11 years ago
- Status changed from New to In Progress
- Assignee changed from Anonymous to Tamilarasi muthamizhan
just started with this.
Updated by Anonymous almost 11 years ago
- Target version changed from v0.64 to v0.65
Updated by Tamilarasi muthamizhan almost 11 years ago
- Assignee changed from Tamilarasi muthamizhan to Anonymous
Updated by Anonymous almost 11 years ago
I am using : http://gitbuilder.ceph.com/kernel-deb-quantal-armv7l-basic/ref/master/linux-image-3.9.0-ceph-b5b09be3-highbank_3.9.0-ceph-b5b09be3-highbank-1_armhf.deb as my kernel right now. The most recent rgw suite tests failed because teuthology could not find /boot/grub/grub.cfg.
Updated by Sage Weil almost 11 years ago
- Target version changed from v0.65 to v0.66
Updated by Anonymous almost 11 years ago
Note: The current version attempts the same grub operations that the x86_64 code does. This needs to be changed to not happen for ARMs.
Updated by Anonymous almost 11 years ago
I am now getting the following on an rbd kernel_untar_build test:
failure_reason: '"2013-06-19 14:41:28.266297 osd.2 10.214.143.3:6800/28925 14 :
[WRN] 1 slow requests, 1 included below; oldest blocked for > 30.744780 secs"
in cluster log', flavor: basic, mon.a-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f,
mon.b-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, owner: wusui@aardvark,
success: false}
Updated by Anonymous almost 11 years ago
The yaml file is:
machine_type: tala
kernel:
branch: testing
roles:
- [mon.a, mds.a, osd.0, osd.1]
- [mon.b, mon.c, osd.2, osd.3]
- [client.0]
tasks:
- install:
branch: cuttlefish
- ceph:
- rbd:
all:
- workunit:
clients:
all:
- kernel_untar_build.sh
Updated by Anonymous almost 11 years ago
I also got the following:
failure_reason: '"2013-06-20 06:43:21.988558 osd.2 10.214.143.3:6800/456 124039 : [WRN] 2 slow requests, 2 included below; oldest blocked for > 45483.204801 secs" in cluster log', flavor: basic, mon.a-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, mon.b-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, owner: wusui@aardvark, success: false}
while running this yaml:
machine_type: tala kernel: branch: testing roles: - [mon.a, mds.a, osd.0, osd.1,] - [mon.b, mon.c, osd.2, osd.3,] - [client.0] tasks: - install: branch: cuttlefish - ceph: - rbd: all: - workunit: clients: all: [misc/trivial_sync.sh]
This test ran in less than 15 minutes (I have no idea how someting could be blocked for 45483 seconds).
Updated by Anonymous almost 11 years ago
Another run:
INFO:teuthology.run:Summary data: {client.0-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, duration: 193.49849891662598, flavor: basic, mon.a-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, mon.b-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, owner: wusui@aardvark, success: true} INFO:teuthology.run:pass
Updated by Anonymous almost 11 years ago
I can get this to run by artificially messing with timeout values in the need_to_install() code. However, the code here makes assumptions about the returns from uname -r that are incorrect for both ARM and virtual machines. I think that this needs to be fiugred out.
Is there a better linux command for getting the kernel version than uname -r ? The information returned does not seem correct for the teuthology kernel task.
Updated by Anonymous almost 11 years ago
- Status changed from In Progress to Fix Under Review
- Assignee changed from Anonymous to Sage Weil
Changes have been made in wip-teutharm-wusui.
Updated by Anonymous almost 11 years ago
- Status changed from Fix Under Review to In Progress
- Assignee changed from Sage Weil to Anonymous
uname -r returns 3.4.0-34-highbank or 3.4.0-1000-highbank. After I used ipmitool to power cycle
one of the machines, uname -r returned 3.9.0-ceph-19bb6a83-highbank
Updated by Anonymous almost 11 years ago
reboot also fixes /proc/version and uname -r. So I think that I need to reboot, and get rid of the bogus uname workaround code.
Updated by Anonymous almost 11 years ago
The kernel tests have generated the following crash:
[ 957.905812] kernel BUG at /srv/autobuild-ceph/gitbuilder.git/build/include/linux/ceph/decode.h:164! [ 957.914849] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 957.919978] Modules linked in: rbd libceph libcrc32c ipmi_devintf ipmi_si ipmi_msghandler nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc [ 957.932547] CPU: 1 Tainted: G W (3.9.0-ceph-19bb6a83-highbank #1) [ 957.939881] PC is at ceph_osdc_build_request+0x8c/0x4f8 [libceph] [ 957.945967] LR is at 0xec520904 [ 957.949103] pc : [<bf13e76c>] lr : [<ec520904>] psr: 20000153 [ 957.949103] sp : ec753df8 ip : 00000001 fp : ec53e100 [ 957.960571] r10: ebef25c0 r9 : ec5fa400 r8 : ecbcc000 [ 957.965788] r7 : 00000000 r6 : 00000000 r5 : ffffffff r4 : 00000020 [ 957.972307] r3 : 51cc8143 r2 : ec520900 r1 : ec753e58 r0 : ec520908 [ 957.978827] Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment user [ 957.986039] Control: 10c5387d Table: 2c59c04a DAC: 00000015 [ 957.991777] Process rbd (pid: 2138, stack limit = 0xec752238) [ 957.997514] Stack: (0xec753df8 to 0xec754000) [ 958.001864] 3de0: 00000001 00000001 [ 958.010032] 3e00: 00000001 bf139744 ecbcc000 ec55a0a0 00000024 00000000 ebef25c0 fffffffe [ 958.018204] 3e20: ffffffff 00000000 00000000 00000001 ec5fa400 ebef25c0 ec53e100 bf166b68 [ 958.026377] 3e40: 00000000 0000220f fffffffe ffffffff ec753e58 bf13ff24 51cc8143 05b25ed2 [ 958.034548] 3e60: 00000001 00000000 00000000 bf1688d4 00000001 00000000 00000000 00000000 [ 958.042720] 3e80: 00000001 00000060 ec5fa400 ed53d200 ed439600 ed439300 00000001 00000060 [ 958.050888] 3ea0: ec5fa400 ed53d200 00000000 bf16a320 00000000 ec53e100 00000040 ec753eb8 [ 958.059059] 3ec0: ec51df00 ed53d7c0 ed53d200 ed53d7c0 00000000 ed53d7c0 ec5fa400 bf16ed70 [ 958.067230] 3ee0: 00000000 00000060 00000002 ed53d200 00000000 bf16acf4 ed53d7c0 ec752000 [ 958.075402] 3f00: ed980e50 e954f5d8 00000000 00000060 ed53d240 ed53d258 ec753f80 c04f44a8 [ 958.083574] 3f20: edb7910c ec664700 01ade920 c02e4c44 00000060 c016b3dc ec51de40 01adfb84 [ 958.091745] 3f40: 00000060 ec752000 ec753f80 ec752000 00000060 c0108444 00000007 ec51de48 [ 958.099914] 3f60: ed0eb8c0 00000000 00000000 ec51de40 01adfb84 00000001 00000060 c0108858 [ 958.108085] 3f80: 00000000 00000000 51cc8143 00000060 01adfb84 00000007 00000004 c000dd68 [ 958.116257] 3fa0: 00000000 c000dbc0 00000060 01adfb84 00000007 01adfb84 00000060 01adfb80 [ 958.124429] 3fc0: 00000060 01adfb84 00000007 00000004 beded1a8 00000000 01adf2f0 01ade920 [ 958.132599] 3fe0: 00000000 beded180 b6811324 b6811334 800f0010 00000007 2e7f5821 2e7f5c21 [ 958.140815] [<bf13e76c>] (ceph_osdc_build_request+0x8c/0x4f8 [libceph]) from [<bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd]) [ 958.152739] [<bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd]) from [<bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd]) [ 958.164486] [<bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd]) from [<bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd]) [ 958.175967] [<bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd]) from [<bf16acf4>] (rbd_add+0x3c0/0x918 [rbd]) [ 958.185975] [<bf16acf4>] (rbd_add+0x3c0/0x918 [rbd]) from [<c02e4c44>] (bus_attr_store+0x20/0x2c) [ 958.194850] [<c02e4c44>] (bus_attr_store+0x20/0x2c) from [<c016b3dc>] (sysfs_write_file+0x168/0x198) [ 958.203984] [<c016b3dc>] (sysfs_write_file+0x168/0x198) from [<c0108444>] (vfs_write+0x9c/0x170) [ 958.212768] [<c0108444>] (vfs_write+0x9c/0x170) from [<c0108858>] (sys_write+0x3c/0x70) [ 958.220768] [<c0108858>] (sys_write+0x3c/0x70) from [<c000dbc0>] (ret_fast_syscall+0x0/0x30) [ 958.229199] Code: e59d1058 e5913000 e3530000 ba000114 (e7f001f2) [ 958.235300] ---[ end trace da227214a82491ba ]---
Updated by Anonymous almost 11 years ago
This problem can be reproduced by running teuthology with the following yaml file.
machine_type: tala kernel: branch: testing roles: - [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3,] - [client.0] tasks: - install: branch: cuttlefish - ceph: - rbd: all: - workunit: clients: all: [misc/trivial_sync.sh] targets: ubuntu@tala002.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD5mFt7raxufuhfx3dxQDJg5mzJ4N+n94rHC/pEqCFvXSp5Fly9cZZxdmn6N5vNUerXIt7/ui2AlVii/bSNjBJrXGYwi+IK+tRPpHb1e5OaS1FdNeHHIeIofeTmUVC7wzsit7sWCcN0I+FjlVqWjXs4qsjI56MbAMC+YVAepbhOUT/j8tFFLXgMN4xFKx10G4TqGWJqsMA1+WD4DLHWI8GrqccGTdokzaotSFHH3uMJIzXfTpCLts1n6yX2iogmK2ayFyD7TmMPRI9ZQ2E5yvkMsYrAOyyPp7h3RVGRRYWR47mmdrENfjuVKQcK30tBSO3tl13BXxWNl1+rfMOk9Cqz ubuntu@tala004.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4XQUem3ze9TfBfsJ3pL8kPm+Y98TEJDQ76rOcdjMR4Rs8mte1Q1B93hT0CI8uRjFcv9uiKaOlweiqKXSx6N/20dsPQ2LN54FtXLB346vsxDmZH0RRzg7KfHja/AilEW3pN3nlLlYkCN/9yWuId3g1sN1L6Shylyc96OL2b++O5fZhZnzbbaHSvyngU73GY/sfRWWA6bB6suXRe/QMbHA/ge/+EvcjJ74nZynenujAchjcVmY6xzpXsXYtSSpYcdgkVh+7P1H0KkfWJwH8aRvsni7TE/6Zp8AtaROelCW1v5vMaLAUjjFtz2nVy2KSViktX3jIpwHDXoFd3eJumXxT
Updated by Anonymous almost 11 years ago
I also sometimes get these messages.
huh, entered softirq 3 NET_RX c03e7670 preempt_count 00000100, exited with 00000000?
[52975.023874] huh, entered softirq 4 BLOCK c0270bf8 preempt_count 00000100, exited with 00000000?
[53795.858433] huh, entered softirq 4 BLOCK c0270bf8 preempt_count 00000100, exited with 00000000?
[53819.409115] huh, entered softirq 4 BLOCK c0270bf8 preempt_count 00000100, exited with 00000000?
Updated by Anonymous almost 11 years ago
Another test generates the following message on the console:
BUG: scheduling while atomic: swaper/0/0/0xffff0000
I've gotten a few million of these messages.
Updated by Josh Durgin almost 11 years ago
Waiting for a fix to the first issue to build. What's the yaml that triggers the "BUG: scheduling while atomic: swaper/0/0/0xffff0000" Warren?
Updated by Anonymous almost 11 years ago
I think that the following causes the atomic swapper message...
machine_type: tala kernel: branch: testing roles: - [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3,] - [client.0] tasks: - install: branch: cuttlefish - ceph: - rbd: all: - workunit: clients: all: - kernel_untar_build.sh targets: ubuntu@tala002.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD5mFt7raxufuhfx3dxQDJg5mzJ4N+n94rHC/pEqCFvXSp5Fly9cZZxdmn6N5vNUerXIt7/ui2AlVii/bSNjBJrXGYwi+IK+tRPpHb1e5OaS1FdNeHHIeIofeTmUVC7wzsit7sWCcN0I+FjlVqWjXs4qsjI56MbAMC+YVAepbhOUT/j8tFFLXgMN4xFKx10G4TqGWJqsMA1+WD4DLHWI8GrqccGTdokzaotSFHH3uMJIzXfTpCLts1n6yX2iogmK2ayFyD7TmMPRI9ZQ2E5yvkMsYrAOyyPp7h3RVGRRYWR47mmdrENfjuVKQcK30tBSO3tl13BXxWNl1+rfMOk9Cqz ubuntu@tala004.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4XQUem3ze9TfBfsJ3pL8kPm+Y98TEJDQ76rOcdjMR4Rs8mte1Q1B93hT0CI8uRjFcv9uiKaOlweiqKXSx6N/20dsPQ2LN54FtXLB346vsxDmZH0RRzg7KfHja/AilEW3pN3nlLlYkCN/9yWuId3g1sN1L6Shylyc96OL2b++O5fZhZnzbbaHSvyngU73GY/sfRWWA6bB6suXRe/QMbHA/ge/+EvcjJ74nZynenujAchjcVmY6xzpXsXYtSSpYcdgkVh+7P1H0KkfWJwH8aRvsni7TE/6Zp8AtaROelCW1v5vMaLAUjjFtz2nVy2KSViktX3jIpwHDXoFd3eJumXxT
Updated by Josh Durgin almost 11 years ago
The "scheduling while atomic" stuff that sometimes halts boot seems like it's not ceph-related. There's probably a driver problem or something wrong with our newer kernel's config since it doesn't boot reliably. Once it did boot, using wip-arm, I was able to map, do I/O, mkfs, and unmap an rbd image. We should figure out the kernel issue before trying to test more though, so we don't get false positives from it.
Updated by Anonymous almost 11 years ago
- Status changed from In Progress to Fix Under Review
- Assignee changed from Anonymous to Sage Weil
Using Josh's fixed kernel, I was, after several reboot attempts, able to run an rbd kernel test to completion. I used this kernel to repeat the test several times, and tried other rbd runs.
At this point. I believe that fixes for the ARM ceph issues (Josh'es changes in wip-arm)
and teuthology issues (my changes in wip-teutharm-wusui) have been implemented in the branches just mentioned. I will submit this for review, and open another item for the kernel problems with a more specific description of these bugs.
Updated by Sage Weil almost 11 years ago
- Target version changed from v0.66 to v0.67rc
Updated by Sage Weil almost 11 years ago
- Target version changed from v0.67rc to v0.67rc - continued
Updated by Anonymous almost 11 years ago
I have rebased this with the latest master version.
Updated by Sage Weil over 10 years ago
- Target version changed from v0.67rc - continued to v0.68 - continued
Updated by Anonymous over 10 years ago
- Status changed from Fix Under Review to Resolved