Bug #4661: xfstest 139 hung - rbd - Ceph

Actions

Copy link

Bug #4661

closed

xfstest 139 hung

Added by Sage Weil about 11 years ago. Updated over 10 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

i thought there was a bug open for this (saw it last week) but i guess not

 3215 ?        Ss     0:00 sshd: ubuntu [priv] 
 3231 ?        S      0:01  \_ sshd: ubuntu@notty  
29871 ?        Ss     0:00      \_ /usr/bin/sudo TESTDIR=/home/ubuntu/cephtest /bin/bash /home/ubuntu/cephtest/run_xfstests.sh -c 1 -f xfs -t /dev/rbd1 -s /dev/rbd2 133-135 137-141 164-167 184 187-190 192 194 196 199 201 203 214-216 220-227 234 236-238 241 243-24
29872 ?        S      0:00          \_ /bin/bash /home/ubuntu/cephtest/run_xfstests.sh -c 1 -f xfs -t /dev/rbd1 -s /dev/rbd2 133-135 137-141 164-167 184 187-190 192 194 196 199 201 203 214-216 220-227 234 236-238 241 243-249 253 257-259 261-262 269 273 275 277-27
 5846 ?        S      0:00              \_ /bin/bash ./check 133-135 137-141 164-167 184 187-190 192 194 196 199 201 203 214-216 220-227 234 236-238 241 243-249 253 257-259 261-262 269 273 275 277-278
22068 ?        S      0:00                  \_ /bin/bash ./139
24258 ?        D      0:00                      \_ umount /home/ubuntu/cephtest/scratch_mnt.GfLsAfPi62

job was

ubuntu@teuthology:/a/teuthology-2013-04-04_19:47:57-kernel-next-testing-basic/9342$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 85b6aabe740024f9f6aaa54afc3195940e5fa12c
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 3c0debf99d51a8ec1cbd76d96c436674d56dfc6e
  s3tests:
    branch: next
  workunit:
    sha1: 3c0debf99d51a8ec1cbd76d96c436674d56dfc6e
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
- - client.1
- - client.2
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- rbd.xfstests:
    client.0:
      scratch_image: scratch_image-0
      test_image: test_image-0
      tests: 1-9 11-15 17 19-21 26-29 31-34 41 46-54 56 61 63-67 69-70 74-76 78-79
        84-89 91
    client.1:
      scratch_image: scratch_image-1
      test_image: test_image-1
      tests: 92 100 103 105 108 110 116-121 124 126 129-132
    client.2:
      scratch_image: scratch_image-2
      test_image: test_image-2
      tests: 133-135 137-141 164-167 184 187-190 192 194 196 199 201 203 214-216 220-227
        234 236-238 241 243-249 253 257-259 261-262 269 273 275 277-278

Actions

Copy link

Updated by Ian Colle about 11 years ago

Assignee set to Alex Elder

Actions

Copy link

Updated by Alex Elder about 11 years ago

Test 139 does a forceful shutdown of the file system (without syncing)
to check on how well XFS handles it.

In the first phase, it creates and fills 1000 files with 64KB of 0xff
bytes and truncates each file to 32KB after creation. It then sleeps
a bit so some data can start getting written (but no sync, so most
likely not a lot of the data). Then it abruptly shuts down the file
system, and then unmounts it.

In the second phase it re-mounts and unmounts it, and does an fsck.
Finally it mounts it once more and takes a look at what happened to
the files that were created during the first phase

I looked in /a/teuthology-2013-04-04_19:47:57-kernel-next-testing-basic
and I find very little to work with. Only the teuthology log, which
doesn't contain anything about the task that was running test 139.

The "ps" shown indicates it hung during unmount, but there are two
of those in the script, and possibly one run at the beginning
(which might point to test 138, but that's basically identical
to test 139). In any case, it's hard to tell without more logged
data what caused the unmount to get stuck.

So unless we can reproduce this, or get access to more information
about this particular incident I don't think I can do anything with
this.

Actions

Copy link

Updated by Ian Colle about 11 years ago

Status changed from New to Need More Info

Actions

Copy link

Updated by Sage Weil about 11 years ago

Priority changed from Urgent to High

Actions

Copy link

Updated by Sage Weil almost 11 years ago

happened again,

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-04-28_21:32:47-kernel-next-testing-basic/2713

plana73 hung with

15643 ?        S      0:01  \_ sshd: ubuntu@notty  
21452 ?        Ss     0:00      \_ /usr/bin/sudo TESTDIR=/home/ubuntu/cephtest/2713 /bin/bash /home/ubuntu/cephtest/2713/run_xfstests.sh -c 1 -f xfs -t /dev/rbd1 -s /dev/rbd2 133-135 137-141 164-167 184 187-190 192 194 196 199 201 203 214-216 220-227 234 236-238 241 243-249 2
21453 ?        S      0:00          \_ /bin/bash /home/ubuntu/cephtest/2713/run_xfstests.sh -c 1 -f xfs -t /dev/rbd1 -s /dev/rbd2 133-135 137-141 164-167 184 187-190 192 194 196 199 201 203 214-216 220-227 234 236-238 241 243-249 253 257-259 261-262 269 273 275 277-278
29819 ?        S      0:00              \_ /bin/bash ./check 133-135 137-141 164-167 184 187-190 192 194 196 199 201 203 214-216 220-227 234 236-238 241 243-249 253 257-259 261-262 269 273 275 277-278
14068 ?        S      0:00                  \_ /bin/bash ./139
16264 ?        D      0:00                      \_ umount /home/ubuntu/cephtest/2713/scratch_mnt.B0SqNFA5bv

Actions

Copy link

Updated by Sage Weil almost 11 years ago

and again!

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-04-29_01:01:03-kernel-next-testing-basic/3026

Actions

Copy link

Updated by Alex Elder almost 11 years ago

Status changed from Need More Info to In Progress

Once again, we have the same evidence in these last two
crashes as we did before, which is to say, not enough to
really conclude which of 2-3 possible unmount commands
hung, and why.

I'll allocate a small cluster and will see if I can reproduce
the problem myself.

Actions

Copy link

Updated by Alex Elder almost 11 years ago

I have been trying to reproduce this problem by having just
one client run tests 137-141, repeatedly. At this point I've
completed almost 400 of these runs consecutively without a
hang.

I'll now expand the test to include a second client running
xfstests over rbd concurrently. I'm trying to limit things
a bit, so I'm going to try to run the same set of 5 tests on
the second client as well.

Actions

Copy link

Updated by Alex Elder almost 11 years ago

135 iterations completed without hanging, running two clients
each running tests 137-141.

Quite honestly though I think I'm only seeing the output of
one of those, and I'm not sure what happens to the other one.
(I never saw interleaved output, even right after the tests
started.)

I will try again, this time with 3 clients, but again they
will all be running those same tests only.

Actions

Copy link

#10

Updated by Alex Elder almost 11 years ago

Still no hangs.

HOWEVER I screwed up. I put the iteration count
at 1000 for running xfstests. However I only did
that on the first client. When I enabled the second
I forgot to update its parameters for the run too,
and same when I enabled the third.

I'm going to try again, with three clients actually
doing 100 iterations concurrently.

Actions

Copy link

#11

Updated by Sage Weil almost 11 years ago

what might be more effective is taking the full sequence of tests run by the node when we do the full run and do that on N hosts. it is starting to sound like an interaction with a previous test?

Actions

Copy link

#12

Updated by Alex Elder almost 11 years ago

OK, I hit something.

Things were definitely running concurrently this time--I saw
interleaved output from all three machines.

Then I got a crash of some kind on one of the kernel clients.
The reset seemed to have reset and rebooted or something.

Anyway, it's a weird error. There were lots of stack traces
in the log, but here was the first one:

[ 8687.949500] libceph: osd3 10.214.135.124:6800 socket closed (con state OPEN)
[ 8700.821388] XFS (rbd2): xfs_log_force: error 5 returned.
[ 8700.829064] XFS (rbd2): xfs_log_force: error 5 returned.
[ 8700.857978] XFS (rbd2): xfs_log_force: error 5 returned.
[ 8700.864971] XFS (rbd2): xfs_log_force: error 5 returned.
[ 8700.887474] XFS (rbd2): Mounting Filesystem
[ 8701.427919] BUG: spinlock bad magic on CPU#0, kworker/0:1H/693
[ 8701.433792]  lock: 0xffff880413cd8800, .magic: ffff8804, .owner: <none>/-1, .owner_cpu: -1
[ 8701.443256] Pid: 693, comm: kworker/0:1H Not tainted 3.8.0-ceph-00339-g42c6070 #1
[ 8701.450777] Call Trace:
[ 8701.453251]  [<ffffffff81658113>] spin_dump+0x8a/0x8f
[ 8701.458329]  [<ffffffff81658139>] spin_bug+0x21/0x26
[ 8701.463321]  [<ffffffff813412e4>] do_raw_spin_lock+0x84/0x120
[ 8701.469091]  [<ffffffff8165e3b6>] _raw_spin_lock_irqsave+0x56/0x70
[ 8701.475305]  [<ffffffff81085322>] ? __wake_up+0x32/0x70
[ 8701.480609]  [<ffffffff81085322>] __wake_up+0x32/0x70
[ 8701.485723]  [<ffffffffa0431fc9>] xlog_state_do_callback+0x209/0x300 [xfs]
[ 8701.492628]  [<ffffffff8134141d>] ? do_raw_spin_unlock+0x5d/0xb0
[ 8701.498694]  [<ffffffffa0432149>] xlog_state_done_syncing+0x89/0xb0 [xfs]
[ 8701.505537]  [<ffffffffa04321be>] xlog_iodone+0x4e/0xa0 [xfs]
[ 8701.511325]  [<ffffffffa03da2e3>] xfs_buf_iodone_work+0x53/0xa0 [xfs]
[ 8701.517841]  [<ffffffff81072ba9>] process_one_work+0x199/0x510
[ 8701.523696]  [<ffffffff81072b3c>] ? process_one_work+0x12c/0x510
[ 8701.529741]  [<ffffffffa03da290>] ? xfs_buf_get_map+0x1a0/0x1a0 [xfs]
[ 8701.536206]  [<ffffffff81074895>] worker_thread+0x165/0x3f0
[ 8701.541809]  [<ffffffff81074730>] ? manage_workers+0x2a0/0x2a0
[ 8701.547665]  [<ffffffff8107a4ba>] kthread+0xea/0xf0
[ 8701.552617]  [<ffffffff8107a3d0>] ? flush_kthread_work+0x1a0/0x1a0
[ 8701.558823]  [<ffffffff81666dac>] ret_from_fork+0x7c/0xb0
[ 8701.564252]  [<ffffffff8107a3d0>] ? flush_kthread_work+0x1a0/0x1a0

Actions

Copy link

#13

Updated by Alex Elder almost 11 years ago

The kernel did not drop into kdb, but it's still alive.
(Machine is mira064, by the way.)

Well, not fully alive. This doesn't look good (and
it has to do with a root disk or something, not rbd).

mira064 login: ubuntu
[12598.677481] sd 4:0:0:0: rejecting I/O to offline device
[12598.682906] sd 4:0:0:0: rejecting I/O to offline device
[12608.683727] init: ttyS2 main process (1289) terminated with status 1
[12608.690200] init: ttyS2 main process ended, respawning

Ubuntu 12.04.2 LTS mira064 ttyS2

mira064 login: [12629.615723] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=393012 jiffies, g=1098697, c=1098696, q=197907)

Actions

Copy link

#14

Updated by Alex Elder almost 11 years ago

A little more info... I think this machine may need some repair
so I'm going to reboot it--I can't get anything more out of it
at this point.

The problem originally reported is corruption of a spinlock.

[ 8701.427919] BUG: spinlock bad magic on CPU#0, kworker/0:1H/693
[ 8701.433792] lock: 0xffff880413cd8800, .magic: ffff8804, .owner: <none>/-1, .owner_cpu: -1

It was due to this wake_up_all() call in xlog_state_do_callback():

                        /*
                         * Transition from DIRTY to ACTIVE if applicable.
                         * NOP if STATE_IOERROR.
                         */
                        xlog_state_clean_log(log);

                        /* wake up threads waiting in xfs_log_force() */
                        wake_up_all(&iclog->ic_force_wait);   <---------------

                        iclog = iclog->ic_next;
                } while (first_iclog != iclog);

The spinlock that's corrupted is just q->head in the
ic_force_wait wait queue in the xfs in-core log. There's
nothing special about that. It is, however, at the top
of the structure, so it could have been the victim of
someone writing beyond their own memory.

The in-core log is not embedded in a larger structure, it's
just a pointed-to structure allocated (with kmem_zalloc())
by xlog_alloc_log(). I point this out because it means that
if something did scribble on the structure it may not even
be something in XFS.

The value of "magic" shown is obviously the to 4 bytes of
a pointer in the same range as the spinlock's address.

I don't know, maybe this isn't even related to the original
problem. I'm going to see about getting this machine rebooted
and maybe fixed, and will restart testing.

Actions

Copy link

#15

Updated by Alex Elder almost 11 years ago

I spent a little time looking at the crash I've got, and
I'm getting fairly convinced this is an XFS problem. It
appears that xlog_state_do_callback() is traversing a list
of in-core log buffers and hitting one with a null next
pointer. This isn't supposed to happen, because they're
supposed to be linked together in a ring.

I still would like to examine this a little more, to make
sure the null/zero is not somehow something coming back
from rbd.

Actions

Copy link

#16

Updated by Alex Elder almost 11 years ago

Finally getting back to this.

Here is the end of the log:

<4>[16597.863442] libceph: osd1 10.214.134.124:6803 socket closed (con state OPEN)
<4>[16599.704274] libceph: osd4 10.214.135.124:6803 socket closed (con state OPEN)
<4>[16613.693720] XFS (rbd2): xfs_log_force: error 5 returned.
<4>[16613.702082] XFS (rbd2): xfs_log_force: error 5 returned.
<4>[16613.724109] XFS (rbd2): xfs_log_force: error 5 returned.
<4>[16613.731398] XFS (rbd2): xfs_log_force: error 5 returned.
<4>[16613.946993] INFO: trying to register non-static key.
<4>[16613.951995] the code is fine but needs lockdep annotation.
<4>[16613.957512] turning off the locking correctness validator.
<4>[16613.963030] Pid: 643, comm: kworker/0:1H Not tainted 3.9.0-ceph-00254-gb5b09be #1
<4>[16613.970559] Call Trace:
<4>[16613.973037]  [<ffffffff810b6872>] __lock_acquire+0xa72/0x1d90
<4>[16613.978858]  [<ffffffffa05e2269>] ? xlog_state_done_syncing+0x39/0xb0 [xfs]
<4>[16613.985855]  [<ffffffff810b8162>] lock_acquire+0xa2/0x140
<4>[16613.991319]  [<ffffffffa05e2269>] ? xlog_state_done_syncing+0x39/0xb0 [xfs]
<4>[16613.998321]  [<ffffffff81672071>] _raw_spin_lock+0x31/0x40
<4>[16614.003869]  [<ffffffffa05e2269>] ? xlog_state_done_syncing+0x39/0xb0 [xfs]
<4>[16614.010898]  [<ffffffffa05e2269>] xlog_state_done_syncing+0x39/0xb0 [xfs]
<4>[16614.017749]  [<ffffffffa05e232e>] xlog_iodone+0x4e/0xa0 [xfs]
<4>[16614.023545]  [<ffffffffa058a2a3>] xfs_buf_iodone_work+0x53/0xa0 [xfs]
<4>[16614.030017]  [<ffffffff81074b12>] process_one_work+0x1d2/0x4b0
<4>[16614.035888]  [<ffffffff81074aa1>] ? process_one_work+0x161/0x4b0
<4>[16614.041927]  [<ffffffff81076778>] worker_thread+0x118/0x340
<4>[16614.047535]  [<ffffffff81076660>] ? manage_workers+0x320/0x320
<4>[16614.053401]  [<ffffffff8107c46a>] kthread+0xea/0xf0
<4>[16614.058316]  [<ffffffff8107c380>] ? flush_kthread_work+0x1a0/0x1a0
<4>[16614.064528]  [<ffffffff8167b4ac>] ret_from_fork+0x7c/0xb0
<4>[16614.069965]  [<ffffffff8107c380>] ? flush_kthread_work+0x1a0/0x1a0
<1>[16614.076265] BUG: unable to handle kernel NULL pointer dereference at 00000000000000bc
<1>[16614.084192] IP: [<ffffffffa05e1fa8>] xlog_state_do_callback+0x78/0x300 [xfs]
<4>[16614.091334] PGD 0 
<4>[16614.093395] Oops: 0000 [#1] SMP

And here's a stack trace of the crash:

Stack traceback for pid 643
0xffff88042bb5bf20      643        2  1    0   R  0xffff88042bb5c3a8 *kworker/0:1H
 ffff88042bbf1c58 0000000000000018 ffffffffa05e1f62 ffff88041314b600
 ffff88042c095188 ffff88042c095190 0000000100000000 ffff88042c095000
 ffffffff8166c0e3 ffff880400000000 ffff88042c095198 ffff88042bbf1cb8
Call Trace:
 [<ffffffffa05e1f62>] ? xlog_state_do_callback+0x32/0x300 [xfs]
 [<ffffffff8166c0e3>] ? spin_bug+0x12/0x26
 [<ffffffff8134a68d>] ? do_raw_spin_unlock+0x5d/0xb0
 [<ffffffffa05e22b9>] ? xlog_state_done_syncing+0x89/0xb0 [xfs]
 [<ffffffffa05e232e>] ? xlog_iodone+0x4e/0xa0 [xfs]
 [<ffffffffa058a2a3>] ? xfs_buf_iodone_work+0x53/0xa0 [xfs]
 [<ffffffff81074b12>] ? process_one_work+0x1d2/0x4b0
 [<ffffffff81074aa1>] ? process_one_work+0x161/0x4b0
 [<ffffffff81076778>] ? worker_thread+0x118/0x340
 [<ffffffff81076660>] ? manage_workers+0x320/0x320
 [<ffffffff8107c46a>] ? kthread+0xea/0xf0
 [<ffffffff8107c380>] ? flush_kthread_work+0x1a0/0x1a0
 [<ffffffff8167b4ac>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff8107c380>] ? flush_kthread_work+0x1a0/0x1a0

And the contents of the registers:

ax: ffff88042c095188  bx: 0000000000000002  cx: 0000000000000201
dx: ffff88042c095190  si: ffff88042c095190  di: ffff88042c095198
bp: ffff88042bbf1cd8  sp: ffff88042bbf1c58  r8: 0000000000000002
r9: 0000000000000000  r10: 00000000000004ab  r11: 0000000000000000
r12: 0000000000000000  r13: 0000000000000000  r14: ffff88042c095000
r15: 0000000000000000  ip: ffffffffa05e1fa8  flags: 00010246  cs: 00000010
ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018

Actions

Copy link

#17

Updated by Alex Elder almost 11 years ago

Note that both the original crash and this latest one
involve (probably) some corruption found in a path
involving this partial path through the code:

xfs_buf_iodone_work
   xlog_iodone
     xlog_state_done_syncing

Actions

Copy link

#18

Updated by Ian Colle over 10 years ago

Status changed from In Progress to New
Assignee deleted (~~Alex Elder~~)

Actions

Copy link

#19

Updated by Sage Weil over 10 years ago

Status changed from New to Can't reproduce

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #4661

xfstest 139 hung

Updated by Ian Colle about 11 years ago

Updated by Alex Elder about 11 years ago

Updated by Ian Colle about 11 years ago

Updated by Sage Weil about 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Alex Elder almost 11 years ago

Updated by Ian Colle over 10 years ago

Updated by Sage Weil over 10 years ago