Project

General

Profile

Actions

Bug #3269

closed

nightly failure-kclient_workunit_suites_fsstress

Added by Tamilarasi muthamizhan over 11 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs: ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1453

ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1453$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1
nuke-on-error: true
overrides:
  ceph:
    coverage: true
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c
  s3tests:
    branch: master
  workunit:
    sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana70.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPImI9TdsBYWc5EAuRnrmrYFBJs+4HKRPVv/BLAhQRqpzeqXZNWsUwTCsAEqPGKDl3vt/6RuizxlSkWL1AdjomaVfVX6nULEzf0q3yEOrdSlpcPUUG8UFrFHo3IM+6AseIb9BtvV85WrSV8mYR+duhqV/UpgtFQTn5HhHmvP9Umx7cNvkmtYbM5kqPdWKIJlIMlDr/T7iGMd50ZcA1QFn2DeJyhsB1Izux793rS6r3EJfsQqaVO7W+sJ47zB0Q+wgDghX9LKxVV0B8ShU7ho7EzL97ZLSqKDyoDcqqP/N8CA59wKwmar//OuyBLAliqukTDGBTdrfQU+YVK13KCk2h
  ubuntu@plana72.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwa+tIQskKvsvQ1/J9QOYGeunl0M+mIAvxqGeKFawdMKoXCNNb5YLRL6Fr9y/smJVCBIGugSb8LBrjVWF14gUOXAk1j16qJ6rsvIF8176L8tsIMxhXx3dw7dCacaGEHCrmK6KO8YhFm6ky1qftGCu7amfyzeJTj8kvrY4tl1ifwH0sv2M0iEzLXx63xn2UpAMAIvo5v+eqPjo+1w2PFXe+r2dViDN5wjlwQTKOXAPFRewmDy2o6K/rW/iRRg0tHLO4atCZr5Y8XlXYQkIBQVlXrL9CRwe8rmrxHyH8wnFqYcvpzLFk1oKw93sFNauy4mCxVntvIm3WT1S8nQuEE6HJ
  ubuntu@plana73.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxoJnvRI1V0OJuQI9SosOedC7mj9O627LjoQWPKilJiBbHduPe1byBaKrgwTeEghl43VNf+EBs1+MwVH7zlDolnwN4tAlW9bRpC2SzURJfhZskp2CSQY3l8ca7a5f0J3hdOhx47oSSapN7O2cqmPzwlL/+MrFKGi+ITT613nUtzCjduZRPdhjyqZ0cQWeb0p1neDw5hbDBKd+HAH+ek/E6DK2PaqN6YAtmIgP76q0fQ85Omd0oDlmGXpKe3jlxlPT0W/5KD1+mpobPsh/EF2qar7IG/WqHHJ6NZAcXbdZ4KiMf9erP+Pk4KkD5SJ+e3GF7OEOwXtahKIIR1An4P2GD
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- kclient: null
- workunit:
    clients:
      all:
      - suites/fsstress.sh
ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1453$ cat summary.yaml 
ceph-sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c
client.0-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1
description: collection:kernel-basic clusters:fixed-3.yaml fs:btrfs.yaml tasks:kclient_workunit_suites_fsstress.yaml
duration: 1175.7552409172058
failure_reason: SSH session not active
flavor: gcov
mon.a-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1
mon.b-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1
owner: scheduled_teuthology@teuthology
success: false

From Alex:

Here is some information about #1453.

The machine crashed dereferencing a null pointer.

It occurred in this code in ceph_sync_write():

req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
ceph_vino(inode), pos, &len,
CEPH_OSD_OP_WRITE, flags,
ci->i_snap_realm->cached_context,
do_sync,
ci->i_truncate_seq, ci->i_truncate_size,
&mtime, false, 2, page_align);

The problem was that the ceph inode's snap realm (ci->i_snap_realm)
was a null pointer.

The crash occurred right after this message appeared:
ceph: ceph_add_cap: couldn't find snap realm 100
which gets printed in ceph_add_cap().

Hopefully that's enough information to go on. I don't have a lot of
time this morning. My main goal in looking at this was to rule out
recent changes as a cause, which I think I have.

Actions #1

Updated by Sage Weil about 11 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF