Feature #5000: Get Teuthology to run on ARM's - teuthology - Ceph

failure_reason: '"2013-06-19 14:41:28.266297 osd.2 10.214.143.3:6800/28925 14 :
[WRN] 1 slow requests, 1 included below; oldest blocked for > 30.744780 secs"
in cluster log', flavor: basic, mon.a-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f,
mon.b-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, owner: wusui@aardvark,
success: false}

Actions

Copy link

#11

Updated by Anonymous almost 11 years ago

The yaml file is:

machine_type: tala
kernel:
branch: testing
roles:
- [mon.a, mds.a, osd.0, osd.1]
- [mon.b, mon.c, osd.2, osd.3]
- [client.0]
tasks:
- install:
branch: cuttlefish
- ceph:
- rbd:
all:
- workunit:
clients:
all:
- kernel_untar_build.sh

Actions

Copy link

#12

Updated by Anonymous almost 11 years ago

I also got the following:

failure_reason: '"2013-06-20 06:43:21.988558 osd.2 10.214.143.3:6800/456 124039
    : [WRN] 2 slow requests, 2 included below; oldest blocked for > 45483.204801 secs" 
    in cluster log', flavor: basic, mon.a-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f,
  mon.b-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, owner: wusui@aardvark,
  success: false}

while running this yaml:

machine_type: tala
kernel:
  branch: testing
roles:
- [mon.a, mds.a, osd.0, osd.1,]
- [mon.b, mon.c, osd.2, osd.3,]
- [client.0]
tasks:
- install:
    branch: cuttlefish
- ceph:
- rbd:
    all:
- workunit:
    clients:
      all: [misc/trivial_sync.sh]

This test ran in less than 15 minutes (I have no idea how someting could be blocked for 45483 seconds).

Actions

Copy link

#13

Updated by Anonymous almost 11 years ago

Another run:

INFO:teuthology.run:Summary data:
{client.0-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, duration: 193.49849891662598,
  flavor: basic, mon.a-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f, mon.b-kernel-sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f,
  owner: wusui@aardvark, success: true}

INFO:teuthology.run:pass

Actions

Copy link

#14

Updated by Anonymous almost 11 years ago

I can get this to run by artificially messing with timeout values in the need_to_install() code. However, the code here makes assumptions about the returns from uname -r that are incorrect for both ARM and virtual machines. I think that this needs to be fiugred out.

Is there a better linux command for getting the kernel version than uname -r ? The information returned does not seem correct for the teuthology kernel task.

Actions

Copy link

#15

Updated by Anonymous almost 11 years ago

Status changed from In Progress to Fix Under Review
Assignee changed from Anonymous to Sage Weil

Changes have been made in wip-teutharm-wusui.

Actions

Copy link

#16

Updated by Anonymous almost 11 years ago

Status changed from Fix Under Review to In Progress
Assignee changed from Sage Weil to Anonymous

uname -r returns 3.4.0-34-highbank or 3.4.0-1000-highbank. After I used ipmitool to power cycle
one of the machines, uname -r returned 3.9.0-ceph-19bb6a83-highbank

Actions

Copy link

#17

Updated by Anonymous almost 11 years ago

reboot also fixes /proc/version and uname -r. So I think that I need to reboot, and get rid of the bogus uname workaround code.

Actions

Copy link

#18

Updated by Anonymous almost 11 years ago

The kernel tests have generated the following crash:

[  957.905812] kernel BUG at /srv/autobuild-ceph/gitbuilder.git/build/include/linux/ceph/decode.h:164!
[  957.914849] Internal error: Oops - BUG: 0 [#1] SMP ARM
[  957.919978] Modules linked in: rbd libceph libcrc32c ipmi_devintf ipmi_si ipmi_msghandler nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc
[  957.932547] CPU: 1    Tainted: G        W     (3.9.0-ceph-19bb6a83-highbank #1)
[  957.939881] PC is at ceph_osdc_build_request+0x8c/0x4f8 [libceph]
[  957.945967] LR is at 0xec520904
[  957.949103] pc : [<bf13e76c>]    lr : [<ec520904>]    psr: 20000153
[  957.949103] sp : ec753df8  ip : 00000001  fp : ec53e100
[  957.960571] r10: ebef25c0  r9 : ec5fa400  r8 : ecbcc000
[  957.965788] r7 : 00000000  r6 : 00000000  r5 : ffffffff  r4 : 00000020
[  957.972307] r3 : 51cc8143  r2 : ec520900  r1 : ec753e58  r0 : ec520908
[  957.978827] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[  957.986039] Control: 10c5387d  Table: 2c59c04a  DAC: 00000015
[  957.991777] Process rbd (pid: 2138, stack limit = 0xec752238)
[  957.997514] Stack: (0xec753df8 to 0xec754000)
[  958.001864] 3de0:                                                       00000001 00000001
[  958.010032] 3e00: 00000001 bf139744 ecbcc000 ec55a0a0 00000024 00000000 ebef25c0 fffffffe
[  958.018204] 3e20: ffffffff 00000000 00000000 00000001 ec5fa400 ebef25c0 ec53e100 bf166b68
[  958.026377] 3e40: 00000000 0000220f fffffffe ffffffff ec753e58 bf13ff24 51cc8143 05b25ed2
[  958.034548] 3e60: 00000001 00000000 00000000 bf1688d4 00000001 00000000 00000000 00000000
[  958.042720] 3e80: 00000001 00000060 ec5fa400 ed53d200 ed439600 ed439300 00000001 00000060
[  958.050888] 3ea0: ec5fa400 ed53d200 00000000 bf16a320 00000000 ec53e100 00000040 ec753eb8
[  958.059059] 3ec0: ec51df00 ed53d7c0 ed53d200 ed53d7c0 00000000 ed53d7c0 ec5fa400 bf16ed70
[  958.067230] 3ee0: 00000000 00000060 00000002 ed53d200 00000000 bf16acf4 ed53d7c0 ec752000
[  958.075402] 3f00: ed980e50 e954f5d8 00000000 00000060 ed53d240 ed53d258 ec753f80 c04f44a8
[  958.083574] 3f20: edb7910c ec664700 01ade920 c02e4c44 00000060 c016b3dc ec51de40 01adfb84
[  958.091745] 3f40: 00000060 ec752000 ec753f80 ec752000 00000060 c0108444 00000007 ec51de48
[  958.099914] 3f60: ed0eb8c0 00000000 00000000 ec51de40 01adfb84 00000001 00000060 c0108858
[  958.108085] 3f80: 00000000 00000000 51cc8143 00000060 01adfb84 00000007 00000004 c000dd68
[  958.116257] 3fa0: 00000000 c000dbc0 00000060 01adfb84 00000007 01adfb84 00000060 01adfb80
[  958.124429] 3fc0: 00000060 01adfb84 00000007 00000004 beded1a8 00000000 01adf2f0 01ade920
[  958.132599] 3fe0: 00000000 beded180 b6811324 b6811334 800f0010 00000007 2e7f5821 2e7f5c21
[  958.140815] [<bf13e76c>] (ceph_osdc_build_request+0x8c/0x4f8 [libceph]) from [<bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd])
[  958.152739] [<bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd]) from [<bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd])
[  958.164486] [<bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd]) from [<bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd])
[  958.175967] [<bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd]) from [<bf16acf4>] (rbd_add+0x3c0/0x918 [rbd])
[  958.185975] [<bf16acf4>] (rbd_add+0x3c0/0x918 [rbd]) from [<c02e4c44>] (bus_attr_store+0x20/0x2c)
[  958.194850] [<c02e4c44>] (bus_attr_store+0x20/0x2c) from [<c016b3dc>] (sysfs_write_file+0x168/0x198)
[  958.203984] [<c016b3dc>] (sysfs_write_file+0x168/0x198) from [<c0108444>] (vfs_write+0x9c/0x170)
[  958.212768] [<c0108444>] (vfs_write+0x9c/0x170) from [<c0108858>] (sys_write+0x3c/0x70)
[  958.220768] [<c0108858>] (sys_write+0x3c/0x70) from [<c000dbc0>] (ret_fast_syscall+0x0/0x30)
[  958.229199] Code: e59d1058 e5913000 e3530000 ba000114 (e7f001f2) 
[  958.235300] ---[ end trace da227214a82491ba ]---

Actions

Copy link

#19

Updated by Anonymous almost 11 years ago

This problem can be reproduced by running teuthology with the following yaml file.

machine_type: tala
kernel:
  branch: testing
roles:
- [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3,]
- [client.0]
tasks:
- install:
    branch: cuttlefish
- ceph:
- rbd:
    all:
- workunit:
    clients:
      all: [misc/trivial_sync.sh]
targets:
  ubuntu@tala002.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD5mFt7raxufuhfx3dxQDJg5mzJ4N+n94rHC/pEqCFvXSp5Fly9cZZxdmn6N5vNUerXIt7/ui2AlVii/bSNjBJrXGYwi+IK+tRPpHb1e5OaS1FdNeHHIeIofeTmUVC7wzsit7sWCcN0I+FjlVqWjXs4qsjI56MbAMC+YVAepbhOUT/j8tFFLXgMN4xFKx10G4TqGWJqsMA1+WD4DLHWI8GrqccGTdokzaotSFHH3uMJIzXfTpCLts1n6yX2iogmK2ayFyD7TmMPRI9ZQ2E5yvkMsYrAOyyPp7h3RVGRRYWR47mmdrENfjuVKQcK30tBSO3tl13BXxWNl1+rfMOk9Cqz
  ubuntu@tala004.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4XQUem3ze9TfBfsJ3pL8kPm+Y98TEJDQ76rOcdjMR4Rs8mte1Q1B93hT0CI8uRjFcv9uiKaOlweiqKXSx6N/20dsPQ2LN54FtXLB346vsxDmZH0RRzg7KfHja/AilEW3pN3nlLlYkCN/9yWuId3g1sN1L6Shylyc96OL2b++O5fZhZnzbbaHSvyngU73GY/sfRWWA6bB6suXRe/QMbHA/ge/+EvcjJ74nZynenujAchjcVmY6xzpXsXYtSSpYcdgkVh+7P1H0KkfWJwH8aRvsni7TE/6Zp8AtaROelCW1v5vMaLAUjjFtz2nVy2KSViktX3jIpwHDXoFd3eJumXxT

Actions

Copy link

#20

Updated by Anonymous almost 11 years ago

I also sometimes get these messages.

huh, entered softirq 3 NET_RX c03e7670 preempt_count 00000100, exited with 00000000?
[52975.023874] huh, entered softirq 4 BLOCK c0270bf8 preempt_count 00000100, exited with 00000000?
[53795.858433] huh, entered softirq 4 BLOCK c0270bf8 preempt_count 00000100, exited with 00000000?
[53819.409115] huh, entered softirq 4 BLOCK c0270bf8 preempt_count 00000100, exited with 00000000?

Actions

Copy link

#21

Updated by Anonymous almost 11 years ago

Another test generates the following message on the console:

BUG: scheduling while atomic: swaper/0/0/0xffff0000

I've gotten a few million of these messages.

Actions

Copy link

#22

Updated by Josh Durgin almost 11 years ago

Waiting for a fix to the first issue to build. What's the yaml that triggers the "BUG: scheduling while atomic: swaper/0/0/0xffff0000" Warren?

Actions

Copy link

#23

Updated by Anonymous almost 11 years ago

I think that the following causes the atomic swapper message...

machine_type: tala
kernel:
  branch: testing
roles:
- [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3,]
- [client.0]
tasks:
- install:
    branch: cuttlefish
- ceph:
- rbd:
    all:
- workunit:
    clients:
      all:
        - kernel_untar_build.sh
targets:
  ubuntu@tala002.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD5mFt7raxufuhfx3dxQDJg5mzJ4N+n94rHC/pEqCFvXSp5Fly9cZZxdmn6N5vNUerXIt7/ui2AlVii/bSNjBJrXGYwi+IK+tRPpHb1e5OaS1FdNeHHIeIofeTmUVC7wzsit7sWCcN0I+FjlVqWjXs4qsjI56MbAMC+YVAepbhOUT/j8tFFLXgMN4xFKx10G4TqGWJqsMA1+WD4DLHWI8GrqccGTdokzaotSFHH3uMJIzXfTpCLts1n6yX2iogmK2ayFyD7TmMPRI9ZQ2E5yvkMsYrAOyyPp7h3RVGRRYWR47mmdrENfjuVKQcK30tBSO3tl13BXxWNl1+rfMOk9Cqz
  ubuntu@tala004.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4XQUem3ze9TfBfsJ3pL8kPm+Y98TEJDQ76rOcdjMR4Rs8mte1Q1B93hT0CI8uRjFcv9uiKaOlweiqKXSx6N/20dsPQ2LN54FtXLB346vsxDmZH0RRzg7KfHja/AilEW3pN3nlLlYkCN/9yWuId3g1sN1L6Shylyc96OL2b++O5fZhZnzbbaHSvyngU73GY/sfRWWA6bB6suXRe/QMbHA/ge/+EvcjJ74nZynenujAchjcVmY6xzpXsXYtSSpYcdgkVh+7P1H0KkfWJwH8aRvsni7TE/6Zp8AtaROelCW1v5vMaLAUjjFtz2nVy2KSViktX3jIpwHDXoFd3eJumXxT

Actions

Copy link

#24

Updated by Josh Durgin almost 11 years ago

The "scheduling while atomic" stuff that sometimes halts boot seems like it's not ceph-related. There's probably a driver problem or something wrong with our newer kernel's config since it doesn't boot reliably. Once it did boot, using wip-arm, I was able to map, do I/O, mkfs, and unmap an rbd image. We should figure out the kernel issue before trying to test more though, so we don't get false positives from it.

Actions

Copy link

#25

Updated by Anonymous almost 11 years ago

Status changed from In Progress to Fix Under Review
Assignee changed from Anonymous to Sage Weil

Using Josh's fixed kernel, I was, after several reboot attempts, able to run an rbd kernel test to completion. I used this kernel to repeat the test several times, and tried other rbd runs.

At this point. I believe that fixes for the ARM ceph issues (Josh'es changes in wip-arm)
and teuthology issues (my changes in wip-teutharm-wusui) have been implemented in the branches just mentioned. I will submit this for review, and open another item for the kernel problems with a more specific description of these bugs.

Actions

Copy link

#26

Updated by Sage Weil almost 11 years ago

Target version changed from v0.66 to v0.67rc

Actions

Copy link

#27

Updated by Sage Weil almost 11 years ago

Target version changed from v0.67rc to v0.67rc - continued

Actions

Copy link

#28

Updated by Anonymous almost 11 years ago

I have rebased this with the latest master version.

Actions

Copy link

#29

Updated by Sage Weil over 10 years ago

Target version changed from v0.67rc - continued to v0.68 - continued

Actions

Copy link

#30

Updated by Anonymous over 10 years ago

Status changed from Fix Under Review to Resolved

Project

General

Profile

Tools » teuthology

Custom queries

Feature #5000

Get Teuthology to run on ARM's

Updated by Anonymous almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Ian Colle almost 11 years ago

Updated by Tamilarasi muthamizhan almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Tamilarasi muthamizhan almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Josh Durgin almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Josh Durgin almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Sage Weil over 10 years ago

Updated by Anonymous over 10 years ago