Project

General

Profile

Actions

Bug #4540

closed

libceph: problem in ceph_osdc_msg_data_set

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

[6]kdb> bt
Stack traceback for pid 2785
0xffff88020b665eb0     2785        2  1    6   R  0xffff88020b666330 *flush-ceph-1
 ffff88014f1a58d8 0000000000000018 ffffea00015adf00 ffff88014f1a58e8
 ffffffffa033077d ffff88014f1a5938 ffffffffa03313cb ffff88015e532e18
 ffff8801f2a1c858 ffffea00002d60c0 0000000000000001 000000000000000d
Call Trace:
 [<ffffffffa033077d>] ? ceph_osdc_msg_data_set+0x3d/0x70 [libceph]
 [<ffffffffa03313cb>] ? ceph_osdc_start_request+0x5b/0x160 [libceph]
 [<ffffffffa0662508>] ? ceph_writepages_start+0x8d8/0xf50 [ceph]
 [<ffffffff811aed09>] ? writeback_sb_inodes+0x1c9/0x3e0
 [<ffffffff8113c7e3>] ? do_writepages+0x23/0x40
 [<ffffffff811ac95b>] ? __writeback_single_inode+0x3b/0x180
 [<ffffffff8134141d>] ? do_raw_spin_unlock+0x5d/0xb0
 [<ffffffff811aedc0>] ? writeback_sb_inodes+0x280/0x3e0
 [<ffffffff811aefbe>] ? __writeback_inodes_wb+0x9e/0xd0
 [<ffffffff811af1eb>] ? wb_writeback+0x1fb/0x320
 [<ffffffff8113a4da>] ? global_dirtyable_memory+0x1a/0x50
 [<ffffffff811b0920>] ? wb_do_writeback+0x130/0x1d0
 [<ffffffff811b0a53>] ? bdi_writeback_thread+0x93/0x270
 [<ffffffff811b09c0>] ? wb_do_writeback+0x1d0/0x1d0
 [<ffffffff8107a4ba>] ? kthread+0xea/0xf0
 [<ffffffff8107a3d0>] ? flush_kthread_work+0x1a0/0x1a0
 [<ffffffff81666dac>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff8107a3d0>] ? flush_kthread_work+0x1a0/0x1a0

on plana66

job was

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-24_08:45:56-kernel-master-testing-basic/2505$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 06fb6a9f87bb1377a6549602fff230d4b352afe9
nuke-on-error: true
overrides:
  ceph:
    conf:
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7
  s3tests:
    branch: master
  workunit:
    sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph:
    conf:
      mds:
        debug journaler: 0/20
        debug mds: 0/20
        debug ms: 0/1
- kclient: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh

Actions #1

Updated by Alex Elder about 11 years ago

Is there any chance this problem started with the addition
of that last commit:
06fb6a9f ceph: fix buffer pointer advance in ceph_sync_write

Do we have any evidence this problem also occurs without that
commit in place?

I don't expect that's a cause of trouble (because that seems
to correct a real problem), I'm just looking to eliminate
variables.

Actions #2

Updated by Alex Elder about 11 years ago

I took the liberty of connecting to the console of plana66, which
appears to have been left in this state for that purpose...

The actual current instruction pointer is:
ffffffffa0328156 = 0xffffffffa0328156 ([libceph]ceph_msg_data_set_pages+0x36)

And that corresponds to this line of code:
BUG_ON(msg->p.type != CEPH_MSG_DATA_NONE);

That indicates that the data for the given message
was set more than once, which is assumed to never
happen. That assumption could have been wrong, I'll
have to look at the context again.

The message data type that had been set before was
CEPH_MSG_DATA_PAGES, which seems better than any
other value but it still violates the assumption.

Actions #3

Updated by Alex Elder about 11 years ago

It is the data out field in the request message that's
being set more than once. (What I expected, but I've
verified this.)

Actions #4

Updated by Alex Elder about 11 years ago

I should probably talk with someone about what's going on
in ceph_osdc_start_request(). It has some logic that's
a bit convoluted and I may have missed something in my
recent changes.

Here is the reasoning for how it works now:
- req is a local variable initialized to NULL
- deep in the function, of we haven't locked any
pages yet, we allocated a new osd request with
req = ceph_osdc_new_request(...)
- ceph_osdc_new_request() creates a new osd request.
This includes allocating new request and reply messages
that are tied to the osd request. The types for both
the in and out data fields are set to NONE at that time.
- req is never reassigned until later in the function,
right after a call to initiate the osd request:
rc = ceph_osdc_start_request(...);
BUG_ON(rc);
req = NULL;

And now, I think I may have found the problem...

Actions #5

Updated by Alex Elder about 11 years ago

I think the problem is that the message data fields
need to be initialized. That happens if the message
is allocated via ceph_msg_new(), but not if the mempool
is used for allocating the message.

A long time ago I found a similar issue, where mempool
allocated objects were not getting zeroed when they
were taken from a mempool rather than via kmalloc.

So rather than simply add the initialization here I'd
like to restructure it a bit so that after allocation
(either via mempool or kmalloc) a separate initialization
routine is called for both cases. This is a pattern that
should probably be repeated elsewhere that a mempool is
used.

Actions #6

Updated by Alex Elder about 11 years ago

I have dropped my connection to the plana66 console. I
think I'm done with it and will pursue a fix for this.

I'm going to leave it up to someone else (named "Sage")
to free up those resources if that's appropriate.

Actions #7

Updated by Alex Elder about 11 years ago

OK I have a fix, and it doesn't do what I said it should...

Basically it just re-initializes the message when the last
reference to it is dropped. To really test this we'd need
to reproduce the problem, but I am pretty comfortable that
the explanation I made earlier is the root cause.

I'm going to post the patch for review without testing it
(other than a compile).

Actions #8

Updated by Alex Elder about 11 years ago

  • Status changed from New to Fix Under Review

Posted for review. And, uh, whoops, I didn't mark this "in progress."

From 50688843c0a8f4a7bd56a27f943972351c4b9988 Mon Sep 17 00:00:00 2001
From: Alex Elder <>
Date: Mon, 25 Mar 2013 11:54:30 -0500
Subject: [PATCH] libceph: initialize data fields on last msg put

When the last reference to a ceph message is dropped,
ceph_msg_last_put() is called to clean things up. For "normal"
messages (allocated via ceph_msg_new() rather than being allocated
from a memory pool) it's sufficient to just release resources. But
for a mempool-allocated message we actually have to re-initialize
the data fields in the message back to initial state so they're
ready to go in the event the message gets reused.

Some of this was already done; this fleshes it out so it's done
more completely.

This resolves:
http://tracker.ceph.com/issues/4540

Signed-off-by: Alex Elder <>

Actions #9

Updated by Alex Elder about 11 years ago

This commit has been pushed to the ceph-client "testing" branch.
d51342b libceph: initialize data fields on last msg put

I am not marking this resolves just yet. I had no good
way of verifying the fix. We'll let it run with nightly
testing for a few days and if the problem doesn't happen
again we'll assume this fix worked and will close it at
that time.

Actions #10

Updated by Alex Elder about 11 years ago

  • Status changed from Fix Under Review to 7

It may not be the right status, but I'm marking
this "testing" for now.

Actions #11

Updated by Sage Weil about 11 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF