Actions
Bug #3269
closednightly failure-kclient_workunit_suites_fsstress
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Logs: ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1453
ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1453$ cat config.yaml kernel: &id001 kdb: true sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1 nuke-on-error: true overrides: ceph: coverage: true fs: btrfs log-whitelist: - slow request sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c s3tests: branch: master workunit: sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana70.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPImI9TdsBYWc5EAuRnrmrYFBJs+4HKRPVv/BLAhQRqpzeqXZNWsUwTCsAEqPGKDl3vt/6RuizxlSkWL1AdjomaVfVX6nULEzf0q3yEOrdSlpcPUUG8UFrFHo3IM+6AseIb9BtvV85WrSV8mYR+duhqV/UpgtFQTn5HhHmvP9Umx7cNvkmtYbM5kqPdWKIJlIMlDr/T7iGMd50ZcA1QFn2DeJyhsB1Izux793rS6r3EJfsQqaVO7W+sJ47zB0Q+wgDghX9LKxVV0B8ShU7ho7EzL97ZLSqKDyoDcqqP/N8CA59wKwmar//OuyBLAliqukTDGBTdrfQU+YVK13KCk2h ubuntu@plana72.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwa+tIQskKvsvQ1/J9QOYGeunl0M+mIAvxqGeKFawdMKoXCNNb5YLRL6Fr9y/smJVCBIGugSb8LBrjVWF14gUOXAk1j16qJ6rsvIF8176L8tsIMxhXx3dw7dCacaGEHCrmK6KO8YhFm6ky1qftGCu7amfyzeJTj8kvrY4tl1ifwH0sv2M0iEzLXx63xn2UpAMAIvo5v+eqPjo+1w2PFXe+r2dViDN5wjlwQTKOXAPFRewmDy2o6K/rW/iRRg0tHLO4atCZr5Y8XlXYQkIBQVlXrL9CRwe8rmrxHyH8wnFqYcvpzLFk1oKw93sFNauy4mCxVntvIm3WT1S8nQuEE6HJ ubuntu@plana73.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxoJnvRI1V0OJuQI9SosOedC7mj9O627LjoQWPKilJiBbHduPe1byBaKrgwTeEghl43VNf+EBs1+MwVH7zlDolnwN4tAlW9bRpC2SzURJfhZskp2CSQY3l8ca7a5f0J3hdOhx47oSSapN7O2cqmPzwlL/+MrFKGi+ITT613nUtzCjduZRPdhjyqZ0cQWeb0p1neDw5hbDBKd+HAH+ek/E6DK2PaqN6YAtmIgP76q0fQ85Omd0oDlmGXpKe3jlxlPT0W/5KD1+mpobPsh/EF2qar7IG/WqHHJ6NZAcXbdZ4KiMf9erP+Pk4KkD5SJ+e3GF7OEOwXtahKIIR1An4P2GD tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: null - kclient: null - workunit: clients: all: - suites/fsstress.sh ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1453$ cat summary.yaml ceph-sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c client.0-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1 description: collection:kernel-basic clusters:fixed-3.yaml fs:btrfs.yaml tasks:kclient_workunit_suites_fsstress.yaml duration: 1175.7552409172058 failure_reason: SSH session not active flavor: gcov mon.a-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1 mon.b-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1 owner: scheduled_teuthology@teuthology success: false
From Alex:
Here is some information about #1453.
The machine crashed dereferencing a null pointer.
It occurred in this code in ceph_sync_write():
req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
ceph_vino(inode), pos, &len,
CEPH_OSD_OP_WRITE, flags,
ci->i_snap_realm->cached_context,
do_sync,
ci->i_truncate_seq, ci->i_truncate_size,
&mtime, false, 2, page_align);
The problem was that the ceph inode's snap realm (ci->i_snap_realm)
was a null pointer.
The crash occurred right after this message appeared:
ceph: ceph_add_cap: couldn't find snap realm 100
which gets printed in ceph_add_cap().
Hopefully that's enough information to go on. I don't have a lot of
time this morning. My main goal in looking at this was to rule out
recent changes as a cause, which I think I have.
Actions