Project

General

Profile

Bug #4679

ceph: hang while running blogbench on mira nodes

Added by Alex Elder almost 11 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have seen this only on mira nodes, now twice on two
consecutive attempts. I've run the same set of tests
with the same configuration on a plana cluster but this
has never occurred.

I see three processes (potentially) involved:
- One waiting in futex()
- One in waiting for the inode mutex in ceph_aio_write()
- One waiting for completion of a setattr request.

The code I'm running is not all committed yet. It includes
a lot of new stuff in the messenger and osd client so it
is clearly suspect, but I'm going to run with the current
"testing" branch to see if I hit the problem again.

0001-ceph-add-osd-request-to-inode-unsafe-list-in-advance.patch View (2.75 KB) Zheng Yan, 04/12/2013 01:08 AM

0001-mds-pass-proper-mask-to-CInode-get_caps_issued.patch View (1.8 KB) Zheng Yan, 04/12/2013 01:08 AM

0002-ceph-take-i_mutex-before-getting-Fw-cap.patch View (3.54 KB) Zheng Yan, 04/12/2013 01:08 AM

0002-mds-change-XLOCK-XLOCKDONE-s-next-state-to-LOCK.patch View (2.55 KB) Zheng Yan, 04/12/2013 01:08 AM

0005-ceph-fix-race-between-writepages-and-truncate.patch View (2.69 KB) Zheng Yan, 04/12/2013 07:35 AM


Related issues

Related to CephFS - Bug #4706: kclient: Oops when two clients concurrently write a file Resolved 04/10/2013

History

#1 Updated by Alex Elder almost 11 years ago

Here is an excerpt of the stack trace generated using:
echo t > /proc/sysrq-trigger

[31482.585095] blogbench.sh S ffff880414719f90 0 22609 22607 0x00000000
[31482.592259] ffff88040e2d1e58 0000000000000046 ffff88040e2d1e68 ffffffff8105958d
[31482.599787] ffff880414719f90 ffff88040e2d1fd8 ffff88040e2d1fd8 ffff88040e2d1fd8
[31482.607435] ffffffff81c14440 ffff880414719f90 ffff88040e2d1ef0 0000000000000000
[31482.615111] Call Trace:
[31482.617588] [<ffffffff8105958d>] ? wait_consider_task+0x9d/0xbe0
[31482.623729] [<ffffffff8165cb69>] schedule+0x29/0x70
[31482.628715] [<ffffffff8105a2b9>] do_wait+0x1e9/0x260
[31482.633810] [<ffffffff8105b230>] sys_wait4+0xa0/0xf0
[31482.638885] [<ffffffff811536c5>] ? might_fault+0x45/0xa0
[31482.644438] [<ffffffff81058f60>] ? task_stopped_code+0x50/0x50
[31482.650382] [<ffffffff81666e59>] system_call_fastpath+0x16/0x1b
[31482.656408] blogbench S ffff880413033f20 0 25421 22609 0x00000000
[31482.663590] ffff88040ea89c28 0000000000000046 ffff88040ea89bc8 ffffffff82687ea8
[31482.671110] ffff880413033f20 ffff88040ea89fd8 ffff88040ea89fd8 ffff88040ea89fd8
[31482.678663] ffff88040d475eb0 ffff880413033f20 ffff88040ea89c18 ffff88040ea89cd0
[31482.686199] Call Trace:
[31482.688670] [<ffffffff8165cb69>] schedule+0x29/0x70
[31482.693750] [<ffffffff810b9007>] futex_wait_queue_me+0xc7/0x100
[31482.699783] [<ffffffff810ba467>] futex_wait+0x1a7/0x2c0
[31482.705151] [<ffffffff8119e1b1>] ? update_time+0x81/0xc0
[31482.710577] [<ffffffff811c4d2a>] ? fsnotify+0x8a/0x2f0
[31482.715900] [<ffffffff810bbb24>] do_futex+0x344/0xbb0
[31482.721064] [<ffffffff811c4eaa>] ? fsnotify+0x20a/0x2f0
[31482.726442] [<ffffffff811c4d2a>] ? fsnotify+0x8a/0x2f0
[31482.731691] [<ffffffff810bc4d7>] sys_futex+0x147/0x1a0
[31482.736957] [<ffffffff81183f50>] ? vfs_write+0x110/0x180
[31482.742417] [<ffffffff8133b0ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[31482.748881] [<ffffffff81666e59>] system_call_fastpath+0x16/0x1b
[31482.656408] blogbench S ffff880413033f20 0 25421 22609 0x00000000
[31482.663590] ffff88040ea89c28 0000000000000046 ffff88040ea89bc8 ffffffff82687ea8
[31482.671110] ffff880413033f20 ffff88040ea89fd8 ffff88040ea89fd8 ffff88040ea89fd8
[31482.678663] ffff88040d475eb0 ffff880413033f20 ffff88040ea89c18 ffff88040ea89cd0
[31482.686199] Call Trace:
[31482.688670] [<ffffffff8165cb69>] schedule+0x29/0x70
[31482.693750] [<ffffffff810b9007>] futex_wait_queue_me+0xc7/0x100
[31482.699783] [<ffffffff810ba467>] futex_wait+0x1a7/0x2c0
[31482.705151] [<ffffffff8119e1b1>] ? update_time+0x81/0xc0
[31482.710577] [<ffffffff811c4d2a>] ? fsnotify+0x8a/0x2f0
[31482.715900] [<ffffffff810bbb24>] do_futex+0x344/0xbb0
[31482.721064] [<ffffffff811c4eaa>] ? fsnotify+0x20a/0x2f0
[31482.726442] [<ffffffff811c4d2a>] ? fsnotify+0x8a/0x2f0
[31482.731691] [<ffffffff810bc4d7>] sys_futex+0x147/0x1a0
[31482.736957] [<ffffffff81183f50>] ? vfs_write+0x110/0x180
[31482.742417] [<ffffffff8133b0ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[31482.748881] [<ffffffff81666e59>] system_call_fastpath+0x16/0x1b
[31482.754958] blogbench D 0000000000000080 0 25426 22609 0x00000000
[31482.762133] ffff88042d5d59a8 0000000000000046 ffff88043fd146a8 ffff88041479bf20
[31482.769667] ffff88041479bf20 ffff88042d5d5fd8 ffff88042d5d5fd8 ffff88042d5d5fd8
[31482.777246] ffff88042d541f90 ffff88041479bf20 ffff880429fb5eb0 ffff880421ef0ee8
[31482.784818] Call Trace:
[31482.787296] [<ffffffff8165cb69>] schedule+0x29/0x70
[31482.792394] [<ffffffff8165a56d>] schedule_timeout+0x1bd/0x270
[31482.798296] [<ffffffff8165c930>] ? wait_for_common+0x30/0x160
[31482.804281] [<ffffffffa08c3a80>] ? queue_con+0x10/0x20 [libceph]
[31482.810397] [<ffffffff8165c9d6>] wait_for_common+0xd6/0x160
[31482.816113] [<ffffffff8108cd20>] ? try_to_wake_up+0x2f0/0x2f0
[31482.822016] [<ffffffff8165ca9d>] wait_for_completion_killable+0x1d/0x30
[31482.829910] [<ffffffffa092701d>] ceph_mdsc_do_request+0x12d/0x2a0 [ceph]
[31482.836788] [<ffffffffa090ded2>] ceph_setattr+0x3b2/0x920 [ceph]
[31482.842975] [<ffffffff8119fdda>] notify_change+0x1fa/0x3c0
[31482.848570] [<ffffffff81182330>] do_truncate+0x60/0xa0
[31482.853860] [<ffffffff81192751>] do_last+0x5f1/0xf20
[31482.858935] [<ffffffff8118f83d>] ? link_path_walk+0x24d/0x920
[31482.864873] [<ffffffff81193133>] path_openat+0xb3/0x4d0
[31482.870216] [<ffffffff811a1641>] ? __alloc_fd+0x31/0x120
[31482.875733] [<ffffffff811a1641>] ? __alloc_fd+0x31/0x120
[31482.881156] [<ffffffff81193cf2>] do_filp_open+0x42/0xa0
[31482.886535] [<ffffffff811a16e7>] ? __alloc_fd+0xd7/0x120
[31482.892011] [<ffffffff811834ee>] do_sys_open+0xfe/0x1e0
[31482.897354] [<ffffffff811835f1>] sys_open+0x21/0x30
[31482.902421] [<ffffffff81666e59>] system_call_fastpath+0x16/0x1b
[31482.908448] blogbench D 00000000ffffffff 0 25427 22609 0x00000000
[31482.915577] ffff88042c15dba8 0000000000000046 ffff88042c15dbc8 0000000000000246
[31482.923221] ffff880413c43f20 ffff88042c15dfd8 ffff88042c15dfd8 ffff88042c15dfd8
[31482.930746] ffff88042d540000 ffff880413c43f20 ffff8804276b6fa0 ffff8804276b74c8
[31482.938274] Call Trace:
[31482.940744] [<ffffffff8165cb69>] schedule+0x29/0x70
[31482.945783] [<ffffffff8165ce8e>] schedule_preempt_disabled+0xe/0x10
[31482.952230] [<ffffffff8165abe1>] mutex_lock_nested+0x151/0x320
[31482.958230] [<ffffffffa0912f00>] ? ceph_aio_write+0x940/0xae0 [ceph]
[31482.964745] [<ffffffffa0912f00>] ceph_aio_write+0x940/0xae0 [ceph]
[31482.971036] [<ffffffff8117a46a>] ? unfreeze_partials.isra.52+0x6a/0x1a0
[31482.977784] [<ffffffff8118dda2>] ? path_put+0x22/0x30
[31482.982987] [<ffffffff81183843>] do_sync_write+0xa3/0xe0
[31482.988411] [<ffffffff81183ef3>] vfs_write+0xb3/0x180
[31482.993614] [<ffffffff81184232>] sys_write+0x52/0xa0
[31482.998690] [<ffffffff81666e59>] system_call_fastpath+0x16/0x1b

#2 Updated by Alex Elder almost 11 years ago

Here is a log of the commits in place during these
tests. (I know, quite a few...) The last one is
the current testing branch.

ce4a368 libceph: make method call data be a separate data item
a527358 libceph: add, don't set data for a message
89fdf9c libceph: implement multiple data items in a message
a8c8368 libceph: replace message data pointer with list
33025f3 libceph: have cursor point to data
e83c1d5 libceph: move cursor into message
a81c8e8 libceph: record bio length
a6f7e36 libceph: skip message if too big to receive
dece343 libceph: fix possible CONFIG_BLOCK build problem
cd43efb libceph: kill off osd request r_data_in and r_data_out
7f79b45 libceph: set the data pointers when encoding ops
1dccacd libceph: combine initializing and setting osd data
20e214d libceph: set message data when building osd request
85634eb libceph: move ceph_osdc_build_request()
4d174e9 libceph: format class info at init time
2478e73 rbd: rearrange some code for consistency
7936dcc rbd: separate initialization of osd data
56bf6de rbd: don't set data in rbd_osd_req_format_op()
86a49e9 libceph: specify osd op by index in request
f6c637c libceph: add data pointers in osd op structures
83185f1 libceph: rename data out field in osd request op
0b8320d libceph: keep source rather than message osd op array
e642226 rbd: define rbd_osd_req_format_op()
e860930 libceph: a few more osd data cleanups
9b61fa2 libceph: define ceph_osd_data_length()
35a8106 libceph: define a few more helpers
4046404 libceph: define osd data initialization helpers
b0ae840 libceph: compute incoming bytes once
7b67a62 rbd: define inbound data size for method ops
0af231f libceph: provide data length when preparing message
b0bb70d ceph: build osd request message later for writepages

#3 Updated by Alex Elder almost 11 years ago

Here are the versions of ceph and teuthology I'm using
while running these tests:

ceph
f5ba0fb mon: make 'osd crush move ...' idempotent
teuthology
fa70eb8 radosgw-admin: Test bucket list for bucket starting with underscore.

#4 Updated by Alex Elder almost 11 years ago

Here is an excerpt of the yaml file driving the
tests, leading up to the blogbench run:

- kclient:
- workunit:
    clients:
      all:
        - misc/trivial_sync.sh
        - kernel_untar_build.sh
        - suites/fsstress.sh
        - suites/ffsb.sh
        - suites/blogbench.sh

#5 Updated by Ian Colle almost 11 years ago

  • Assignee set to Sandon Van Ness
  • Priority changed from Normal to High

#6 Updated by Ian Colle almost 11 years ago

  • Priority changed from High to Urgent

#7 Updated by Alex Elder almost 11 years ago

I ran those tests a few times with the testing branch and
the problem did not show up. I reduced the test to just
running "suites/blogbench.sh" and tried again with my
newer branch. One iteration passed. I'm going to keep
running with just this test to see if it shows up; if not
I'll expand it to include ffsb.sh again.

PS I'm not sure why Sandon should be looking at this.

#8 Updated by Alex Elder almost 11 years ago

  • Status changed from New to In Progress
  • Assignee changed from Sandon Van Ness to Alex Elder

Unfortunately it looks like I've reproduced the problem
with my patches. The common theme is ceph_aio_write(), so
I guess I should start there.

I'm really not sure why it is happening on the mira
machines but I still haven't seen it occur on my plana
cluster.

#9 Updated by Alex Elder almost 11 years ago

Actually, the other common theme (maybe more important)
is the involvement of an in-progress ceph_setattr() call.

It's waiting here, in ceph_mdsc_do_request():
} else {
err = wait_for_completion_killable(&req->r_completion);
}
after sending a CEPH_MDS_OP_SETATTR request to the chosen mds.

#10 Updated by Alex Elder almost 11 years ago

[ 2229.030720] Call Trace:
[ 2229.033194]  [<ffffffff8165cb69>] schedule+0x29/0x70
[ 2229.038278]  [<ffffffff8165a56d>] schedule_timeout+0x1bd/0x270
[ 2229.044131]  [<ffffffff810b75e6>] ? mark_held_locks+0x86/0x140
[ 2229.049979]  [<ffffffff8165de30>] ? _raw_spin_unlock_irq+0x30/0x40
[ 2229.056177]  [<ffffffff810b77a5>] ? trace_hardirqs_on_caller+0x105/0x190
[ 2229.062900]  [<ffffffff8165c9d6>] wait_for_common+0xd6/0x160
[ 2229.068619]  [<ffffffff8108cd20>] ? try_to_wake_up+0x2f0/0x2f0
[ 2229.074469]  [<ffffffff8165ca9d>] wait_for_completion_killable+0x1d/0x30
[ 2229.081204]  [<ffffffffa062001d>] ceph_mdsc_do_request+0x12d/0x2a0 [ceph]
[ 2229.088019]  [<ffffffffa0606ed2>] ceph_setattr+0x3b2/0x920 [ceph]
[ 2229.094180]  [<ffffffff8165acd9>] ? mutex_lock_nested+0x249/0x320
[ 2229.100290]  [<ffffffff8119fdda>] notify_change+0x1fa/0x3c0

Note:  do_truncate() contains this:

        mutex_lock(&dentry->d_inode->i_mutex);
        ret = notify_change(dentry, &newattrs);
        mutex_unlock(&dentry->d_inode->i_mutex);

So this is indeed the task that holds the inode mutex that the
aio_read is waiting for.

[ 2229.105982]  [<ffffffff81182330>] do_truncate+0x60/0xa0
[ 2229.111226]  [<ffffffff81192751>] do_last+0x5f1/0xf20
[ 2229.116295]  [<ffffffff8118f83d>] ? link_path_walk+0x24d/0x920
[ 2229.122148]  [<ffffffff81193133>] path_openat+0xb3/0x4d0
[ 2229.127535]  [<ffffffff811a1641>] ? __alloc_fd+0x31/0x120
[ 2229.132954]  [<ffffffff811a1641>] ? __alloc_fd+0x31/0x120
[ 2229.138500]  [<ffffffff81193cf2>] do_filp_open+0x42/0xa0
[ 2229.143830]  [<ffffffff811a16e7>] ? __alloc_fd+0xd7/0x120
[ 2229.149247]  [<ffffffff811834ee>] do_sys_open+0xfe/0x1e0
[ 2229.154577]  [<ffffffff811835f1>] sys_open+0x21/0x30
[ 2229.159560]  [<ffffffff81666e59>] system_call_fastpath+0x16/0x1b

So it looks to me like the request to the mds has not
received a response (or it didn't get handled properly).

I think I'd like to have someone who is proficient at interpreting
mds client logs take a look at what I've got.

#11 Updated by Alex Elder almost 11 years ago

  • Priority changed from Urgent to Normal

I talked with Sam Lang who said I should try again with
mds debugging on. That led to more info getting dumped
on the teuthology log--indicating a problem in the mds.
http://pastebin.com/4UE3PMDE

He said a fix had just gone in that addressed this.
http://tracker.ceph.com/issues/4660

I'm starting my testing up again...

#12 Updated by Alex Elder almost 11 years ago

One pass succeeded, so it's looking good.

I'll let it run 5 times and if all are successful, I'll just
close this and make a note in http://tracker.ceph.com/issues/4660
to indicate it seems to have fixed this problem.

#13 Updated by Alex Elder almost 11 years ago

It looked very promising. 4 successful passes, but the
last one hung again. This time there were two blogbench
tasks in play. Sam took another look at the mds log and
found nothing especially interesting. Trying again, this
time with
- ceph:
conf:
mds:
debug ms: 1
debug mds: 20

[ 4845.883434] [<ffffffff8165cb69>] schedule+0x29/0x70
[ 4845.888432] [<ffffffffa06127a5>] ceph_get_caps+0x125/0x210 [ceph]
[ 4845.894674] [<ffffffff8107ae10>] ? __init_waitqueue_head+0x60/0x60
[ 4845.901031] [<ffffffffa0607823>] ceph_aio_read+0xf3/0x890 [ceph]
[ 4845.907143] [<ffffffff8118e925>] ? terminate_walk+0x55/0x60
[ 4845.912851] [<ffffffff811923b7>] ? do_last+0x257/0xf20
[ 4845.918095] [<ffffffff8114b821>] ? kzfree+0x31/0x40
[ 4845.923277] [<ffffffff81183923>] do_sync_read+0xa3/0xe0
[ 4845.928609] [<ffffffff81184070>] vfs_read+0xb0/0x180
[ 4845.933740] [<ffffffff81184192>] sys_read+0x52/0xa0
[ 4845.938725] [<ffffffff81666e59>] system_call_fastpath+0x16/0x1b

[ 4845.748619] [<ffffffff8165cb69>] schedule+0x29/0x70
[ 4845.753608] [<ffffffff810b9007>] futex_wait_queue_me+0xc7/0x100
[ 4845.759634] [<ffffffff810ba467>] futex_wait+0x1a7/0x2c0
[ 4845.765049] [<ffffffff8113a233>] ? free_hot_cold_page_list+0x53/0xc0
[ 4845.771526] [<ffffffff8104bf94>] ? flush_tlb_mm_range+0x54/0x230
[ 4845.777640] [<ffffffff810bbb24>] do_futex+0x344/0xbb0
[ 4845.782857] [<ffffffff81349cc0>] ? __percpu_counter_add+0x50/0x80
[ 4845.789056] [<ffffffff8117a7e9>] ? kmem_cache_free+0x99/0x1b0
[ 4845.794962] [<ffffffff8115b226>] ? remove_vma+0x66/0x70
[ 4845.800407] [<ffffffff8134141d>] ? do_raw_spin_unlock+0x5d/0xb0
[ 4845.806430] [<ffffffff8165de6b>] ? _raw_spin_unlock+0x2b/0x40
[ 4845.812381] [<ffffffff81349cd7>] ? __percpu_counter_add+0x67/0x80
[ 4845.818580] [<ffffffff810bc4d7>] sys_futex+0x147/0x1a0
[ 4845.823959] [<ffffffff810b77a5>] ? trace_hardirqs_on_caller+0x105/0x190
[ 4845.831944] [<ffffffff8133b0ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 4845.838491] [<ffffffff81666e59>] system_call_fastpath+0x16/0x1b

#14 Updated by Alex Elder almost 11 years ago

I got another hang without any debug info being dumped
from the MDS. This time I just abandoned it. I'm about
to try Josh's suggestion, running with ceph branch "bobtail"
to see if it happens there.

#15 Updated by Alex Elder almost 11 years ago

  • Status changed from In Progress to Rejected

I re-ran the blogbench test 10 times using the "bobtail"
branch of ceph and never saw a hang.

I'm going to call this Not My Problem(tm).

I'm rejecting this bug even though I suppose someone working
on the mds might be interested in repurposing it...

#16 Updated by Greg Farnum almost 11 years ago

  • Project changed from Ceph to CephFS
  • Category set to 47
  • Status changed from Rejected to 12
  • Assignee deleted (Alex Elder)
  • Priority changed from Normal to High

sigh Yep...

I've marked this as an MDS issue for now, but it could be a broader protocol change or something as well.

#17 Updated by Sam Lang almost 11 years ago

We've only seen a certain set of errors at the mds with the kernel client (this one and #4660 - although they may be the same issue). If its only reproducible with the kernel client and not fuse, does that help narrow it down?

#18 Updated by Greg Farnum almost 11 years ago

Not off-hand, but I haven't spent any time thinking about it yet. This one could be differences between how aggressively the kclient and uclient keep or return caps, but #4660 is an MDS assert and I don't see how the client could pierce the MDS wall to cause that one.

#19 Updated by Zheng Yan almost 11 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Zheng Yan

I reproduced a hang, it is an 'i_mutex + cap revoking' deadlock.

thread1 (write)           thread2 (truncate)          MDS
-----------------------------------------------------------------
get Fw cap
                          lock i_mutex
lock i_mutex (blocked)
                          request setattr.size  ->
                                                <-   revoke Fw cap

#20 Updated by Zheng Yan almost 11 years ago

Found 5 bugs, fixed 4. The remaining one is a race between truncate and writepages. Truncate message from MDS can change inode's size while ceph_writepages_start is executing, leading to mismatch between data length of osd request and pages marked as writeback. So some pages can be left in writeback state forever. The bug is caused by removing r_num_pages from struct ceph_osd_request. I have some urgent task recently, no time to fix it. Alex, please take a look.

#21 Updated by Zheng Yan almost 11 years ago

The fix for writepages race is easier than I thought, patch is attached.

#22 Updated by Alex Elder almost 11 years ago

The fix for writepages race is easier than I thought, patch is attached.

This is interesting. When I was working with this particular block
of code I thought it was odd to re-read the inode, and was sure there
as a reason to to do so before submitting the osd request, so I remember
being very intentional about preserving the logic.

This patch looks good to me but someone who knows the file system
code better than I do should verify that.

Reviewed-by: Alex Elder <>

#23 Updated by Alex Elder almost 11 years ago

  • Status changed from In Progress to Fix Under Review
  • Assignee changed from Alex Elder to Greg Farnum

Found 5 bugs, fixed 4.

I reviewed the four kernel patches (they were posted on the mailing
list). I also providately showed Zheng a fix for the interrupt crash
I saw, along with my own fix for this original (which replaces his
PATCH 1/4), and he indicated he reviewed/approved of both.

I tested my patches and found they avoided the problems they intended
to fix. However I hit some other hangs. I then applied the follow-on
fixes provided by Zheng and they seemed to get rid of those hangs I
was seeing.

So... I'd say that as a group, this set of patches is probably
good to go, but I'd like to coordinate with Greg and/or Sage.

I posted my two patches to the list for visibility (though they're
both marked reviewed by Zheng).

I have not done anything with the two patches to the MDS code
that Zheng provided. Greg, I'll let you handle that.

#24 Updated by Alex Elder almost 11 years ago

FYI, these kernel patches (Zheng's and mine) are available on
the ceph-client git repository branch "review/wip-4706".

There was some overlap (and I guess confusion) between this
issue and 4706 so I've linked the two. These patches together
resolve both problems I think.

#25 Updated by Alex Elder almost 11 years ago

I ran the blogbench test with all of the above-mentioned
patches applied on a mira cluster and I never saw it hang.

I would say that the kernel portion of this is complete,
and I am content to mark this bug resolved once I get the
patches committed (preferably after a second opinion on
them for review, but that's not strictly necessary).

Greg, will you consider the two MDS patches, and if
you think they warrant more scrutiny will you please
create a new issue for them?

Otherwise I'd like to close this one.

#26 Updated by Greg Farnum almost 11 years ago

  • Assignee changed from Greg Farnum to Alex Elder

I believe Sage has been over all these now. I'm trying to go over the newest versions off the mailing list as well, now (will reply there with comments or reviewed-by tags), but am giving this back to you so you can poke if you need more or close otherwise. :)

#27 Updated by Alex Elder almost 11 years ago

  • Status changed from Fix Under Review to Resolved
  • Target version set to v0.62a

Sorry Greg, I should have been in better communication
with you. I have been testing these all afternoon and
Sage gave a verbal "looks good" to the last four of these,
along with two others I posted separately (the second of
which addressed the same thing as Yan's first patch).

In any case, the following have been committed to the
ceph-client "testing" branch now:

625d6ec ceph: fix race between writepages and truncate
8bc726f ceph: apply write checks in ceph_aio_write
e7a4c3e ceph: take i_mutex before getting Fw cap

Two others were committed at the same time, but they were
more directly related to http://tracker.ceph.com/issues/4706
so I'm recording them there.

#28 Updated by Greg Farnum over 7 years ago

  • Component(FS) MDS added

Also available in: Atom PDF