Project

General

Profile

Activity

From 03/25/2012 to 04/23/2012

04/23/2012

06:23 PM Bug #2338: mon: adding new monitors simultaneously can allow a new mon to become leader
Looks like the original monitor doesn't believe in the existence of the new monitors; I'll need to check out why.
<p...
Greg Farnum
05:46 PM Bug #2338: mon: adding new monitors simultaneously can allow a new mon to become leader
... Greg Farnum
05:42 PM Bug #2338 (Rejected): mon: adding new monitors simultaneously can allow a new mon to become leader
When you add two new monitors (out of 3 total) to a cluster you can end up with one of the new monitors being the lea... Greg Farnum
05:26 PM Bug #2286: mon: different full/near_full values on different monitors
Hi Guys,
After upgrading the patched-kernel btrfs test cluster from 0.45-1 to 0.45-281-g0777613, the full_ratio an...
Mark Nelson
04:38 PM Feature #2337 (Resolved): rgw and rados performance numbers
Sage Weil
04:16 PM Feature #2336 (Resolved): qemu: wire up discard
Sage Weil
04:10 PM Feature #2335 (Resolved): librbd: write-thru cache mode
Sage Weil
04:02 PM rgw Documentation #1813 (Fix Under Review): doc: document radosgw api diffs with s3
Sage Weil
03:51 PM Feature #2334 (Resolved): mon: set max mark-out or mark-down
Sage Weil
02:34 PM Feature #1044 (Resolved): librbd: discard support
Sage Weil
02:34 PM Feature #2296 (Resolved): librbd: allow resizing to arbitrary sizes
Sage Weil
02:29 PM Feature #1451 (Resolved): librbd: instrument via perfcounter
Sage Weil
02:22 PM Feature #1888 (Rejected): log: per-thread ring buffer
Sage Weil
02:20 PM Subtask #2333 (Resolved): create queueing for peering messages
Currently, the osd dispatch calls directly into the PG peering state machine. Instead, we need to queue the events g... Samuel Just
02:10 PM Subtask #2332 (Resolved): move pg queueing into pgs
Currently, the osd reaches into the pg to manipulate the pg queue during message receipt and during handle_osd_map. ... Samuel Just
02:07 PM Cleanup #2041 (In Progress): osd: move peering into worker threads
Sage Weil
11:31 AM Cleanup #2331 (Resolved): Makefile.am:182: `lib/libgtest.a' is not a standard libtool library name
Warning is still happening, despite git clean -fdx, git submodule freshening of various sorts, etc.
This should prob...
Dan Mick
11:27 AM Bug #2276 (Rejected): osd: eat cpu on restart
it's up now.. i think i just didn't wait long enough. Sage Weil
11:26 AM Bug #2266 (Resolved): teuthology: nuke after failure is failing
ignore errors caused by ntpdate vs ntpd race Sage Weil
11:25 AM Bug #2322 (Need More Info): osd/ReplicatedPG.cc: 3832: FAILED assert(!object_contexts.size())
also going to wait until the threading refactor is complete before diving into this further.
at this point asserti...
Sage Weil
11:24 AM Feature #2330 (Resolved): dump open files, sockets when we run out of fds
Sage Weil
11:23 AM Bug #2310 (Resolved): osd: too many open files
this is just sockets and hitting the flusher limit. we're both increasing 'max open files' and switching to vm limit... Sage Weil
11:04 AM Bug #2329 (Resolved): fix detection of C++11 atomic header
the C++11 atomic header is now <atomic> (I've checked gcc 4.6 and 4.7) and not <cstdatomic> Dan Horák
10:23 AM rgw Bug #1681 (Resolved): rgw: user rm with --purge doesn't remove data
Sage Weil
09:56 AM Feature #2251 (Resolved): rgw long run workloads
Sage Weil

04/21/2012

03:16 PM Feature #1451 (Fix Under Review): librbd: instrument via perfcounter
see commit:07ddff427145e109eb820b6ed0ddb6cca74b65b6 Sage Weil
03:15 PM Bug #2328 (Resolved): osd: mapext/fiemap doesn't work for small extents
see commit:c8377e466caace018eea06c1739265111ce72c48 for a kludge that detects the bug and disabled fiemap. Sage Weil
02:35 PM Bug #2328: osd: mapext/fiemap doesn't work for small extents
this works on a newer kernel (3.2.0-2-amd64).
should we check kernel versions in filestore and magically disable f...
Sage Weil
02:20 PM Bug #2328 (Resolved): osd: mapext/fiemap doesn't work for small extents
If you query the mapping for an extent that inside a larger allocated extent, the fiemap ioctl won't tell you:
<pr...
Sage Weil
11:15 AM CephFS Bug #2218: CephFS "mismatch between child accounted_rstats and my rstats!"
Logs from a clean cluster at http://matthew.royhousehold.net/cephLogs/cephLogs.mds.tar (4382MB md5 aaf9364c7e35bc6b5d... Matthew Roy

04/20/2012

09:18 PM Feature #2327 (Resolved): mon: use external keyring for inter-mon auth
Currently the mon. key is part of the internal mon auth database. It can't be modified without a running mon cluster... Sage Weil
09:17 PM rbd Feature #2326 (Resolved): krbd: use new class interfaces, new image format
Update rbd_types.h to match the userspace version, and add support for opening new-format images while keeping suppor... Sage Weil
09:04 PM Feature #2325 (Resolved): setup new email/etc
Sage Weil
08:32 PM Linux kernel client Tasks #2138 (Resolved): rbd: run xfstests on a local XFS filesystem over RBD
I have two files that implement automated testing using
xfstests over rbd devices.
One is now in the ceph git tre...
Alex Elder
08:28 PM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
A kernel dump would likely help, but there's no guarantee because
of the delayed execution of the operation. It wou...
Alex Elder
07:38 PM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
Would a kernel core dump help here? Sage Weil
07:37 PM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
This is one of a family of bugs we've been trying to understand.
Here is another one:
http://tracker.newdream.n...
Alex Elder
08:00 PM Linux kernel client Bug #2243: btrfs: warning in orphan_commit_root
I mentioned this somewhat informally to Chris Mason last week. I
provided him the message, and he said:
Well, ...
Alex Elder
07:39 PM Feature #2251 (In Progress): rgw long run workloads
Sage Weil
07:37 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Another report, likely related:
http://tracker.newdream.net/issues/2287
I don't understand it well enough yet...
Alex Elder
07:37 PM rgw Documentation #1813 (In Progress): doc: document radosgw api diffs with s3
Sage Weil
07:35 PM Subtask #825 (In Progress): osd: remove pg map updating from handle_osd_map
Sage Weil
07:35 PM Feature #2314 (Fix Under Review): remove localized pgs
wip-lpg
Did a basic test of a cluster with localized pgs and upgraded to this, no problems. A reasonably thorough...
Sage Weil
07:12 PM Linux kernel client Bug #2298 (Resolved): rbd: broken encode_op for big-endian hosts?
This has been fixed. I have been testing it in a private branch
and will shortly be updating the ceph-client testin...
Alex Elder
07:10 PM Linux kernel client Bug #2242 (Resolved): rbd: spinlock on wrong cpu
This was fixed a couple of weeks ago, and the result has been committed
both to the testing and master branches of t...
Alex Elder
05:00 PM Bug #2324 (Resolved): osd: assert("q.empty()") failed in OpSequencer destructor
sam fixed this in commit:888a082f23974b1f7a63f302e29a326182e7dc41 Sage Weil
03:56 PM Bug #2324 (Resolved): osd: assert("q.empty()") failed in OpSequencer destructor
This is consistently reproducible with 2 osds started by vstart on vit, but only happens intermittently with 1 osd.
...
Josh Durgin
04:32 PM Feature #2245 (Resolved): rgw long run ceph install
Sage Weil
03:09 PM Bug #2079 (Duplicate): rbd: creating a snapshot with the same name doesn't return an error
i think this was caused by the rados class return values. in any case, it works correctly now. Sage Weil
03:07 PM Bug #2084 (Can't reproduce): segfault in tcmalloc
Sage Weil
12:16 PM Feature #1618 (Resolved): libvirt: make sure migration works
we demoed this Sage Weil
12:05 PM Feature #2323 (Resolved): osd: limit 'old request' messages generated
If there are hundreds of old requests queued, let's say that, instead of generated gigabytes of logs.
Maybe a simp...
Sage Weil
10:44 AM Bug #2322 (Resolved): osd/ReplicatedPG.cc: 3832: FAILED assert(!object_contexts.size())
... Sage Weil

04/19/2012

09:49 PM Bug #2291 (Can't reproduce): objectcacher perfcounters don't work with test_librbd_fsx
This works fine for me. Maybe it was fixed when the objectcacher naming thing was changed?... Sage Weil
09:17 PM Messengers Cleanup #2150: repair the Simple/Messenger interface
Looks good to me, provided it makes it through the regression suite without problems! Sage Weil
07:11 PM Messengers Cleanup #2150: repair the Simple/Messenger interface
wip-msgr-interface Greg Farnum
09:08 PM Feature #2321 (Resolved): osd: investigate memory consumption from peering backlog
Sage Weil
09:07 PM Feature #2320 (Duplicate): mon: detect and throttle osd flapping
Sage Weil
09:07 PM Feature #2319 (Resolved): mon: block osd mark-down
Sage Weil
09:07 PM Feature #2318 (Resolved): mon: block osd boot
Sage Weil
09:06 PM Feature #2317 (Resolved): mon: pause/unpause auto-mark-out
Sage Weil
04:37 PM rgw Feature #2313 (Resolved): rgw: expose extra bucket info trough S3 api
Already pushed a test to the s3-tests functional. Yehuda Sadeh
04:33 PM rgw Feature #2313 (In Progress): rgw: expose extra bucket info trough S3 api
wip-2313 looks sane.. let's add a test and merge for 0.46. Sage Weil
04:25 PM rgw Bug #1681 (In Progress): rgw: user rm with --purge doesn't remove data
merged the radosgw-admin change to make it fail. let's add a test for it and then close this bug. Sage Weil
04:17 PM Feature #2296 (Fix Under Review): librbd: allow resizing to arbitrary sizes
see wip-discard Sage Weil
02:13 PM Feature #2296: librbd: allow resizing to arbitrary sizes
more importantly, we need to either error out with EINVAL if it's not a block multiple, or do it... currently we sile... Sage Weil
02:37 PM Bug #2307 (Resolved): OSD & Monitor disagree on the contents of pg_temp
Just changing the pg_num and pgp_num did fix it up, so with the osdmap workaround we should be all good now. Greg Farnum
02:10 PM Feature #1044 (Fix Under Review): librbd: discard support
Sage Weil
12:47 PM Bug #2263 (Resolved): obsync: move man page to section 1
Sage Weil
12:45 PM Bug #2311 (Need More Info): rbd: delete + create image led to EEXIST
Sage Weil
07:20 AM Bug #2311 (In Progress): rbd: delete + create image led to EEXIST
Can you generate a log? Ideally 'debug ms = 1'?
Also, attach the output of 'ceph --show-config'?
Thanks!
Sage Weil
02:43 AM Bug #2311: rbd: delete + create image led to EEXIST
Hi Sage,
uhm, not solved yet as per ceph version 0.45-207-g3053e47 (commit:3053e4773bae93cfa3158882aa4963803862f9b...
Oliver Francke
12:44 PM Bug #2262 (Resolved): qa: osd-recovery tasks fails on flush_pg_stats
fixed by teuthology commit:6a58314d4627d106c5fd6df186e191c19a01f64b Sage Weil
10:47 AM Bug #2192: ceph-mon hangs consuming 100% CPU
It was some 3.0.0 Ubuntu kernel, backed by btrfs. Vladimir Kulev
10:06 AM Bug #2192: ceph-mon hangs consuming 100% CPU
I missed this when it came in, and I don't know where the 100% CPU usage is coming from, but the hung filesystem soun... Greg Farnum
03:19 AM Bug #2316 (Resolved): rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g30...
Hi,
in my current test-setup all four VM's are started with rbd_cache parm. After all VM's are started and began t...
Oliver Francke

04/18/2012

10:51 PM Bug #2263: obsync: move man page to section 1
Sage Weil
10:42 PM CephFS Bug #2293 (Resolved): admin sockets don't persist with ceph-fuse
commit:e82c33099a0efda027bc7fa991dcd2073baea539 Sage Weil
09:46 PM rgw Bug #2027: rgw -> apache miscommunication
Not completely unlikely. We can set it to "can't reproduce", and reopen if we see it again. Yehuda Sadeh
06:12 PM rgw Bug #2027: rgw -> apache miscommunication
do we think this is fixed now by the rgw throttling? Sage Weil
06:00 PM Bug #2310: osd: too many open files
failed to capture a full strace.. try it again (once we find a failing osd on congress) with
strace -e trace=open,...
Sage Weil
09:48 AM Bug #2310 (Resolved): osd: too many open files
... Sage Weil
04:39 PM Bug #2315 (Resolved): unrecognized admin socket command 'objecter_requests'
From teuthology:/a/nightly_coverage_2012-04-18-a/1602/teuthology.log:... Josh Durgin
03:57 PM rgw Bug #2312 (Resolved): rgw: create user and subuser in a single radosgw-admin command
Fixed, commit:aab516da7f89310445be4e4fb61836084d2dac32. Yehuda Sadeh
02:01 PM rgw Bug #2312 (Resolved): rgw: create user and subuser in a single radosgw-admin command
Yehuda Sadeh
03:41 PM Bug #2211 (Resolved): osd: entity_inst_t OSDMap::get_inst(int) const
Sage Weil
03:41 PM Bug #2262 (In Progress): qa: osd-recovery tasks fails on flush_pg_stats
Sage Weil
03:27 PM Feature #2314 (Resolved): remove localized pgs
Sage Weil
02:47 PM rgw Feature #2313: rgw: expose extra bucket info trough S3 api
Ok, let's just send those extra headers anyway. Otherwise we'd have some issue creating the request signature for the... Yehuda Sadeh
02:11 PM rgw Feature #2313 (Resolved): rgw: expose extra bucket info trough S3 api
syntax:
HEAD /<bucket>
X-RGW-Params: extrainfo
extra response headers:
X-RGW-Object-Count: <object count>
X-RG...
Yehuda Sadeh
02:23 PM Feature #2252 (Resolved): rgw long run kernels
Sage Weil
02:22 PM Feature #2250 (Resolved): rgw long run raid config
Sage Weil
02:14 PM Feature #2265 (Rejected): make sure objecter/kclient error out when localized pgs don't exist
Sage Weil
01:58 PM rgw Feature #2308 (Resolved): radosgw-admin: make user create idempotent
done, commit:5a6bbd0c473e15aa7642da367e7936015d19d77a. Yehuda Sadeh
01:46 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
And I gave him a patched monitor so he could set pg_num, which should fix it. Waiting to hear back, and will apply th... Greg Farnum
01:16 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
pushed workaround that will repair osdmaps that saw your corruption, commit:eea982e56739a7a91ca907ccc5c5ec1f78d9460d. Sage Weil
01:30 PM Bug #2311 (Resolved): rbd: delete + create image led to EEXIST
this is 'rbd writeback window' at its best. long live 'rbd cache'! Sage Weil
01:06 PM Bug #2311: rbd: delete + create image led to EEXIST
Congrats for closing the annoying ticket #2178 :-D
Fair enough, to have a new one on this issue, here my last note...
Oliver Francke
12:46 PM Bug #2311: rbd: delete + create image led to EEXIST
Is it possible there is some other user, or the logs are from the wrong cluster?
I see:
- client.13507 deletes 90...
Sage Weil
12:45 PM Bug #2311 (Resolved): rbd: delete + create image led to EEXIST
Here is a sequence copy-n-pasted:
rbd rm data/905-testdisk.rbd
Removing image: 100% complete...done.
rbd create ...
Sage Weil
01:28 PM Bug #2286 (Resolved): mon: different full/near_full values on different monitors
commit:9ef953b5e20c3d232cfe4aa90f26476a2a2f911b Sage Weil
11:18 AM Bug #2286 (Fix Under Review): mon: different full/near_full values on different monitors
Check out wip-2286-ratio-a and see what you think. It fills in the ratios from g_conf on create_initial, only changes... Greg Farnum
12:51 PM Bug #2178: rbd: corruption of first block
Hi Sage,
sorry, was not clear enough. The logfiles provide informations for "907-testdisk.rbd..." not "906..."
Th...
Oliver Francke
12:46 PM Bug #2178 (Resolved): rbd: corruption of first block
moved this new issue to #2311, and resolving this bug. hooray! Sage Weil
12:45 PM Bug #2178: rbd: corruption of first block
Oliver Francke wrote:
> Here is a sequence copy-n-pasted:
>
> rbd rm data/905-testdisk.rbd
> Removing image: 100...
Sage Weil
10:41 AM Bug #2178: rbd: corruption of first block
Oliver Francke wrote:
> Hi Sage,
>
> here my notes, after almost 40 tests no bad things happened, only once a min...
Sage Weil
07:22 AM Bug #2178: rbd: corruption of first block
second logfile here, sorry. Oliver Francke
07:18 AM Bug #2178: rbd: corruption of first block
Here is a sequence copy-n-pasted:
rbd rm data/905-testdisk.rbd
Removing image: 100% complete...done.
rbd create ...
Oliver Francke
05:51 AM Bug #2178: rbd: corruption of first block
Meanwhile continued to test...:
I noticed some negative degredation:
2012-04-18 14:43:37.282634 pg v128104: ...
Oliver Francke
05:36 AM Bug #2178: rbd: corruption of first block
Hi Sage,
here my notes, after almost 40 tests no bad things happened, only once a minor hickup, where the rbd-head...
Oliver Francke
08:22 AM Linux kernel client Bug #2298 (In Progress): rbd: broken encode_op for big-endian hosts?
I sent a note to the various lists Al Viro posted to, to confirm the
bug (wasn't sure whether Sage had or not).
I...
Alex Elder
05:58 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
Interesting... The warning showed up again despite test 232 being
removed from the list. Based on the time stamp o...
Alex Elder

04/17/2012

08:47 PM Bug #2286: mon: different full/near_full values on different monitors
Greg Farnum wrote:
> Hmm. I looked at redoing this and got stuck on the semantics we want. If we're interested in fu...
Sage Weil
04:52 PM Bug #2286 (In Progress): mon: different full/near_full values on different monitors
Hmm. I looked at redoing this and got stuck on the semantics we want. If we're interested in full_ratio == 0 being an... Greg Farnum
11:00 AM Bug #2286: mon: different full/near_full values on different monitors
yeah. actually, i think the check should go in tick() inside the is_leader() block, and not update_from_paxos().
Sage Weil
10:54 AM Bug #2286: mon: different full/near_full values on different monitors
Oh, I see...I wasn't following that need_*_ratio_update stuff properly. And update_full_ratios() will be called on th... Greg Farnum
10:30 AM Bug #2286: mon: different full/near_full values on different monitors
Greg Farnum wrote:
> I'm looking at your patch and it doesn't make a lot of sense to me.
> First off, when do you t...
Sage Weil
09:45 AM Bug #2286: mon: different full/near_full values on different monitors
I'm looking at your patch and it doesn't make a lot of sense to me.
First off, when do you think that peon monitors ...
Greg Farnum
08:43 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
Greg Farnum wrote:
> I'm confused how you're getting that pool_max printout — I don't see it at all when I run that ...
Sage Weil
06:57 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
I'm confused how you're getting that pool_max printout — I don't see it at all when I run that command with a ceph-de... Greg Farnum
04:16 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
at some point the osdmap pool_max got set to -1.
nine:2307 04:15 PM $ ~/src/ceph/src/ceph-dencoder type OSDMap i...
Sage Weil
03:56 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
nine:2307 03:56 PM $ osdmaptool osdmap_full/5754 -p | grep ^pool
pool 0 'data' rep size 2 crush_ruleset 0 object_has...
Sage Weil
03:52 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
It looks to me liek the 'data' pool (0) was deleted, and then a new one (vmimages) was created. but somehow that was... Sage Weil
10:34 AM Bug #2307 (Resolved): OSD & Monitor disagree on the contents of pg_temp
See: http://marc.info/?t=133352732900001&r=1&w=2
It seems that (for example) pg 0.138 is in pg_temp, but the OSD c...
Greg Farnum
03:03 PM Feature #2309 (Duplicate): rados namespaces
Sage Weil
01:23 PM rgw Bug #2289 (Resolved): rgw: listing a bucket hangs after removing inexisting object
Fixes merged into master at commit:3053e4773bae93cfa3158882aa4963803862f9b2. Yehuda Sadeh
01:13 PM Bug #2304 (Resolved): rbd import fails on block device
Sage Weil
11:57 AM CephFS Bug #2299 (Rejected): all MDS commit suicide on startup
Sage Weil
11:54 AM Bug #2219 (Can't reproduce): OSD's commit suicide with 0.44
Let us know if you see this again! Thanks Sage Weil
11:40 AM Bug #2301 (Resolved): librados: LibRadosMisc.AioOperatePP failure
Sage Weil
11:27 AM rgw Feature #2308 (Resolved): radosgw-admin: make user create idempotent
radosgw-admin user create should be idempotent and work similar to user modify. We would need to verify that the same... Yehuda Sadeh
08:11 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
I believe we are seeing the same problem here. I have been able to reproduce it each time I have tried. The hardwar... Nick Bartos
06:36 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
I have updated the run_xfstests.sh script so that it simply no longer
runs test 232. That way we can still benefit ...
Alex Elder

04/16/2012

10:02 PM Bug #2178: rbd: corruption of first block
The most recent occurrence has been confirmed to be a replay issue with non-btrfs filesystems. The wip-guard branch ... Sage Weil
09:54 PM Bug #2255 (Resolved): osd: fix object name collisions between pools in temp collection
Sage Weil
09:52 PM Bug #2286: mon: different full/near_full values on different monitors
pushed a patch that confines the logic of when to update this into a single bit of code. look okay?
i think the b...
Sage Weil
12:57 PM Bug #2286: mon: different full/near_full values on different monitors
Sage asked on irc about just setting it up on the initial create_empty. The problem with that the only data which is ... Greg Farnum
11:26 AM Bug #2286 (Fix Under Review): mon: different full/near_full values on different monitors
This got (obviously) broken by commit:b6d1c0c9b7290a237560528b6ff0d6b2b2998ee2, which put in the use of magic numbers... Greg Farnum
09:37 PM Feature #2113 (Resolved): objectcacher perfcounters
Sage Weil
11:24 AM Feature #2113: objectcacher perfcounters
My bad — I'll try and do that today! Greg Farnum
11:13 AM Feature #2113 (Fix Under Review): objectcacher perfcounters
not merged yet! i wanted to get feedback first on my naming kludge... Sage Weil
10:02 AM Feature #2113 (Resolved): objectcacher perfcounters
Sage merged this. Greg Farnum
04:12 PM Bug #2306: objecter: accessing empty object maps to pool 0
that looks right to me.
and yeah, i don't think object operations should be possible on an empty object name...
Sage Weil
04:03 PM Bug #2306: objecter: accessing empty object maps to pool 0
Yep, that's pretty much exactly what I was thinking.
The only other question is if this fix is the right approach ...
Greg Farnum
04:00 PM Bug #2306: objecter: accessing empty object maps to pool 0
Would something like this work (not tested)?... Yehuda Sadeh
03:52 PM Bug #2306: objecter: accessing empty object maps to pool 0
i prefer an explicit separate field for oid-vs-pg mode so that we can distinguish between pg 0.0 (really) and no pg/n... Sage Weil
03:07 PM Bug #2306: objecter: accessing empty object maps to pool 0
Ah, nope. list_objects is broken. Yehuda Sadeh
03:06 PM Bug #2306: objecter: accessing empty object maps to pool 0
From what I see, the pg ops call pool_op_submit() and not op_submit() so Greg's fix might be ok? Yehuda Sadeh
02:53 PM Bug #2306: objecter: accessing empty object maps to pool 0
Ah, you're right. I missed that function when looking to see who filled in the op->pgid.
In that case we should ma...
Greg Farnum
02:33 PM Bug #2306: objecter: accessing empty object maps to pool 0
i think that if was there for the pg ops (PGLS) where there is no object... the list_objects code is filling in the p... Sage Weil
02:07 PM Bug #2306 (Fix Under Review): objecter: accessing empty object maps to pool 0
Yep, the Objecter doesn't calculate pg placement for objects with a zero-length name. I'm pretty sure the if guard th... Greg Farnum
01:51 PM Bug #2306: objecter: accessing empty object maps to pool 0
Empty object <== object with empty name Yehuda Sadeh
01:51 PM Bug #2306 (Resolved): objecter: accessing empty object maps to pool 0
Even if different pool was specified. Yehuda Sadeh
03:34 PM CephFS Bug #2299: all MDS commit suicide on startup
this issue can be closed, there was an error in the underlying fileystem of osd.0 :) Martin Scheffler
02:59 PM CephFS Bug #2277: qa: flock test broken
I was going to move this over to the kernel client project and then realized I can't — should we close this bug (reje... Greg Farnum
02:46 PM CephFS Bug #2277: qa: flock test broken
... Greg Farnum
02:15 PM Linux kernel client Bug #2298: rbd: broken encode_op for big-endian hosts?
there are some old g5's in the closet here at aon that we can use.
in the past we've found/fixed these issues with...
Sage Weil
01:46 PM Linux kernel client Bug #2298: rbd: broken encode_op for big-endian hosts?
I haven't looked at this in any detail but I presume Al is correct.
We don't have any big endian hardware anywhere, ...
Alex Elder
01:49 PM CephFS Bug #2288: libcephfs: setxattr returns EEXIST following removexattr
More info:
That branch has a patch which adds a call to removexattr before the setxattr. If you run testceph twice i...
Greg Farnum
01:38 PM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
Should have waited. It have reproduced the problem by running test 232.
Alex Elder
01:37 PM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
After a lot of repetitions, I've narrowed it down to test 232 or 234. Alex Elder
10:12 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
I ran subsets of that list at least three times and never
reproduced it. I tried again after a reboot, and again,
...
Alex Elder
06:43 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
Looking at the list of tests that indicate they include quota testing,
the ones that are currently being run by the ...
Alex Elder
06:36 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
I sent a report to the XFS mailing list about the warning. I have to try
to narrow down which test was running when...
Alex Elder
01:18 PM CephFS Bug #2285: libcephfs: failure with empty name components
Yep, it's client-local; there's no request to the MDS for this either.
Guess that means we don't care right now?
Greg Farnum
01:04 PM CephFS Bug #2285 (In Progress): libcephfs: failure with empty name components
Oddly, this looks like it's a race. I can't reproduce it with any client debugging on... Greg Farnum
11:33 AM Feature #2305: Moving rbd images between pools
Not quite; copy works, but slowly (because of course it's duplicating all the data). I don't know if mv/rename could... Dan Mick
11:24 AM Feature #2305 (Rejected): Moving rbd images between pools
We discovered it does work if you keep the image names the same and vary the pool names. :) Greg Farnum
11:01 AM Feature #2305 (Resolved): Moving rbd images between pools
It would be nice to have an option to move rbd's between pools with a syntax like:
rbd mv <first poolname>/<image na...
Stefan Kleijkers
10:02 AM Messengers Cleanup #2150 (In Progress): repair the Simple/Messenger interface
Not really done! ;) Greg Farnum
08:44 AM rbd Feature #2297: ObjectCacher: mark buffers mergeable for ksm
I'm really not sure this is something we want to do, especially unconditionally. Let's wait until we get some idea of... Greg Farnum
07:27 AM Bug #2304 (Resolved): rbd import fails on block device
root@burnupi30:~# rbd import /dev/sda burnupi30.sda
fiemap ioctl() failed
Importing image: 100% complete...done.
...
Sage Weil

04/15/2012

08:30 PM Bug #2303 (Can't reproduce): osd: failed to peer on startup
ubuntu@teuthology:/a/nightly_coverage_2012-04-14-b/994 Sage Weil
08:24 PM Linux kernel client Bug #2302 (Can't reproduce): xfs: warning at mutex_remove_waiter
... Sage Weil
03:33 PM Feature #1044 (In Progress): librbd: discard support
Sage Weil
03:33 PM Feature #2163 (Resolved): qa: full xfstests on rbd
Sage Weil
03:33 PM Subtask #2249 (Resolved): teuthology task (3)
Sage Weil
03:33 PM Feature #2226 (Resolved): osd: better filestore idempotency test
Sage Weil
05:39 AM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
Here some more info from the crash:
@[58113.180039] libceph: tid 387083 timed out on osd92, will reset osd
[5818...
Danny Kukawka

04/14/2012

08:07 PM CephFS Bug #2299: all MDS commit suicide on startup
after i told osd.0 to get lost and reformatted it, the cluster started resyncing.
then (magically) mds.0 started up ...
Martin Scheffler
09:39 AM CephFS Bug #2299 (Rejected): all MDS commit suicide on startup
my setup is: 1 MON, 2 MDS and 4 OSD.
ceph version is commit:1e76a8713feac6883c648512dcdc28c83f7ff69e.
after copyi...
Martin Scheffler
04:41 PM Bug #2301: librados: LibRadosMisc.AioOperatePP failure
the problem is that the completion callback is now async, but wait_for_complete() is not.
do we think that is ok?
Sage Weil
02:59 PM Bug #2301 (Resolved): librados: LibRadosMisc.AioOperatePP failure
2012-04-14T00:11:00.763 INFO:teuthology.task.workunit.client.0.out:[ RUN ] LibRadosMisc.AioOperatePP
2012-04-14...
Sage Weil
01:58 PM Bug #2300 (Rejected): objecter: not sending stat request
Yehuda Sadeh
01:50 PM Bug #2300: objecter: not sending stat request
Ah, actually we try to access an object with empty oid, which is obviously wrong. Probably due to #2289 issues. Yehuda Sadeh
12:34 PM Bug #2300 (Rejected): objecter: not sending stat request
Happens in rgw (can only see it on congress). Following a rgw.bucket_list call response, we call librados io_ctx->sta... Yehuda Sadeh
12:38 PM rgw Bug #2289: rgw: listing a bucket hangs after removing inexisting object
Pushed several fixes to wip-2289. The scenario was:
creating bucket
trying to remove object that does not exist
...
Yehuda Sadeh

04/13/2012

11:03 PM Feature #1044 (Resolved): librbd: discard support
Sage Weil
11:03 PM Feature #2163: qa: full xfstests on rbd
Sage Weil
11:02 PM Feature #2052 (Resolved): librbd: caching
Sage Weil
06:37 PM Feature #2052: librbd: caching
This is passing long-running fsx with osd thrashing consistently, and all the other rbd tests. I think the branch (wi... Josh Durgin
10:40 PM Linux kernel client Bug #2298 (Resolved): rbd: broken encode_op for big-endian hosts?
... Sage Weil
10:17 PM Subtask #2249: teuthology task (3)
Sage Weil
09:26 PM Subtask #2237 (Resolved): failure+replay tester (8)
Sage Weil
06:39 PM Bug #2278 (Resolved): librados: python read has arguments swapped
Fixed by 76799680546a79fc73ad7bbc58960a31ae2290ad. Josh Durgin
10:10 AM Bug #2278: librados: python read has arguments swapped
Sage Weil
07:56 AM Bug #2278 (Resolved): librados: python read has arguments swapped
Object.read from rados.py is passing arguments to ioctx.read in a wrong order.
--- rados.py.dist 2012-04-13 16:5...
Tomasz Paskowski
06:38 PM rbd Feature #2297 (New): ObjectCacher: mark buffers mergeable for ksm
This is done with a simple madvise call, but we should test that it works with ksm and verify that all the buffers ar... Josh Durgin
06:29 PM Feature #2296 (Resolved): librbd: allow resizing to arbitrary sizes
Right now resizing to a non-object-size multiple will round down the remainder. With discard support, we support this... Josh Durgin
06:25 PM Feature #2295 (Resolved): make qemu cache=writeback,writethrough option turn on librbd caching
This will enable more familiar use of caching with qemu/rbd, and let people configure it with libvirt's existing xml. Josh Durgin
05:51 PM rbd Feature #2294 (New): librbd: optionally cache entire objects, instead of only requesting the part...
This may save many round trips for small read sizes (common to vms). Josh Durgin
05:49 PM Feature #2113 (Fix Under Review): objectcacher perfcounters
Okay, I checked and these work — if you run ceph-fuse -f and play around you can do a dump_perfcounters and see the v... Greg Farnum
05:46 PM CephFS Bug #2293 (Resolved): admin sockets don't persist with ceph-fuse
It looks like the admin socket is associated with the launching process, rather than the background process that cont... Greg Farnum
05:46 PM rbd Feature #2292 (New): ObjectCacher: support sparse objects
The ObjectCacher doesn't store which objects or parts of objects don't exist. This info could improve read performance. Josh Durgin
05:46 PM Bug #2291 (Can't reproduce): objectcacher perfcounters don't work with test_librbd_fsx
The admin socket perfcounters_dump command only outputs objecter data. I'm speculating that it has to do with the obj... Greg Farnum
05:30 PM Feature #2290 (Resolved): ObjectCacher: handle read/write errors
Currently the return value of the underlying read/write calls is ignored (I left TODO notes there). We should figure ... Josh Durgin
05:02 PM rgw Bug #2289 (Resolved): rgw: listing a bucket hangs after removing inexisting object
Yehuda Sadeh
03:12 PM Subtask #2235 (Resolved): generate deterministic sequence of transactions (5)
Sage Weil
02:01 PM CephFS Bug #2288 (Resolved): libcephfs: setxattr returns EEXIST following removexattr
running cephtest a couple of time (out of wip-testlibcephfs):... Yehuda Sadeh
01:48 PM Linux kernel client Bug #2287 (Resolved): rbd: crashes with 10Gbit network and fio
From http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/5968:... Josh Durgin
01:47 PM Bug #2286 (Resolved): mon: different full/near_full values on different monitors
If you run vstart, you get... Greg Farnum
01:39 PM CephFS Bug #2285 (Resolved): libcephfs: failure with empty name components
the following in client/testceph.cc fails:
// test empty name components
my_fd = ret = ceph_open(cmount, "rea...
Yehuda Sadeh
11:19 AM rgw Feature #2284 (Resolved): rgw: bench based on rados_bench
Yehuda Sadeh
11:17 AM rgw Feature #2171 (Rejected): rgw: asynchronously calculate md5
Sage Weil
11:16 AM Feature #2283: The ceph command should time out
Sage Weil
10:21 AM Feature #2283 (New): The ceph command should time out
When using ceph to query certain parts of the cluster, there should be an option to time out after a certain set numb... Bernard Grymonpon
09:44 AM Subtask #2282 (Resolved): Handle map updates on a per-pg basis
Currently, we advance all pgs to the next map at once. This requires us to flush the filestore queue and basically h... Samuel Just
09:27 AM Feature #2281 (Resolved): build big burnupi cluster for testing
Sage Weil
09:23 AM Feature #2280 (Resolved): improve gitbuilder infrastructure
* do not fill up local disk; sync results out immediately
* resolve branches immediately, not after each full pass
?
Sage Weil
09:20 AM rbd Feature #2279 (Resolved): rbd: trivial layering design doc
- how parent images are marked read-only
- how parent/child relationship is represented
- possibly how this allow...
Sage Weil
09:16 AM Bug #2192 (Need More Info): ceph-mon hangs consuming 100% CPU
Sage Weil
09:14 AM Feature #2246 (Resolved): force10s on sepia
Sage Weil
09:13 AM Messengers Cleanup #2150 (Resolved): repair the Simple/Messenger interface
Sage Weil
09:13 AM Feature #2240 (Resolved): osd: new default locations
Sage Weil

04/12/2012

11:17 PM Subtask #2237 (In Progress): failure+replay tester (8)
Sage Weil
11:17 PM Subtask #2235: generate deterministic sequence of transactions (5)
Sage Weil
11:15 PM Feature #2240: osd: new default locations
Sage Weil
10:58 PM CephFS Bug #2277 (New): qa: flock test broken
ubuntu@teuthology:/a/nightly_coverage_2012-04-12-b/687
ubuntu@teuthology:/a/nightly_coverage_2012-04-11-b/525
thi...
Sage Weil
10:48 PM CephFS Bug #1737: ceph-fuse crash in xlist::remove
ubuntu@teuthology:/a/nightly_coverage_2012-04-12-b/717
- chef: null
- ceph: null
- ceph-fuse: null
- workunit:
...
Sage Weil
10:45 PM CephFS Bug #2187: pjd chown/00.t failed test 97
2012-04-12T13:09:27.496 INFO:teuthology.task.workunit.client.0.out:../pjd-fstest-20080816/tests/chown/00.t (Wstat: ... Sage Weil
10:35 PM Bug #2276 (Rejected): osd: eat cpu on restart
osd.856 on congress. Sage Weil
09:35 PM Bug #2275 (Resolved): osd: crash in FileJournal::wrap_read_bl
... Sage Weil
04:29 PM Documentation #2274 (Closed): Basic Availability Model
(1) Construct a continuous-time markov availability model for a basic cluster (3 mons, 4 osds, 2 copy)
(Petri ne...
Anonymous
04:19 PM Documentation #2273 (Closed): basic reliability models
1. construct a probabilistic model for data loss in 1, 2, and 3 copy systems, assuming independent failures
2. plug ...
Anonymous
04:13 PM RADOS Documentation #2272 (Closed): FAQs: RADOS reliability and availability
I expect others to improve this, but this is just to capture the ideas.
It is probably more of a white paper than an...
Anonymous
04:06 PM Documentation #2271 (Resolved): FAQ: BTRFS vs XFS
I expect others to improve this list, but to start it out ...
what file systems we run on (and test on)
how you...
Anonymous
12:15 PM Feature #2223 (Resolved): Tracing facility on FileStore
Sage Weil
09:05 AM RADOS Feature #2268 (Resolved): crush: update item's position in crush map
via crushtool and 'ceph osd crush ...' Sage Weil
03:55 AM Bug #2267 (Closed): Ceph client crashed after shutting down one mds and osd
Ceph version: 0.44.1-1~bpo70+1
Kernel version: 3.2.12-1
Ceph config:
[global]
auth supported = cephx
keyri...
Maciej Galkiewicz

04/11/2012

06:18 PM Messengers Cleanup #2150 (In Progress): repair the Simple/Messenger interface
I haven't done it, but I had enough time to glance over it and see at least a couple things that need fixing before t... Greg Farnum
05:49 PM Feature #2113: objectcacher perfcounters
Sage asked me to run it under an rbd mount and look at it. Need to get tests from Josh and then figure out how to do ... Greg Farnum
04:30 PM Feature #2113 (Fix Under Review): objectcacher perfcounters
Compile-tested. Greg Farnum
10:51 AM Feature #2113 (In Progress): objectcacher perfcounters
Yoink. Greg Farnum
04:30 PM Bug #2266 (Resolved): teuthology: nuke after failure is failing
it fails, and then fails to unlock, and eats up machines.
for example, ubuntu@teuthology:/a/nightly_coverage_2012-...
Sage Weil
03:08 PM Feature #2265 (Rejected): make sure objecter/kclient error out when localized pgs don't exist
Sage Weil
11:02 AM Bug #2264 (Can't reproduce): mon: failed assert in bump_epoch
During startup of a teuthology run on commit 1775301bb46379648f3f88914ef56aa1982db020 (before the cluster was healthy... Josh Durgin
10:48 AM Bug #2263 (Resolved): obsync: move man page to section 1
Sage Weil
09:25 AM Bug #2262 (Resolved): qa: osd-recovery tasks fails on flush_pg_stats
consistently Sage Weil
08:09 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Looks like the problem arose while running fsstress on the xfs loop
mount on top of a file on the ext2 filesystem.
...
Alex Elder
07:56 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
FYI, xfstests 49 tests running XFS on a loop device. I have to wait for a
reboot in order to see if I can tell at w...
Alex Elder
07:49 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Looks like xfstests #49 is a reproducer for this problem, at least
after running the tests that lead up to it first ...
Alex Elder
05:29 AM Linux kernel client Bug #2261 (In Progress): paging error in libceph after crashed osd comes back online
Alex Elder
05:22 AM Linux kernel client Bug #2261 (Can't reproduce): paging error in libceph after crashed osd comes back online
... Pim van Riezen
02:25 AM Bug #2178: rbd: corruption of first block
Well Sage,
I have a torture-test already :-D
OK, so it's independent from yours and that's good. It sounds, we ar...
Oliver Francke

04/10/2012

11:24 PM Feature #2223: Tracing facility on FileStore
did some cleanup, changed the way the output is structured wrt the transaction lists, and tweaked a few other things.... Sage Weil
10:23 PM Bug #2002 (Resolved): osd: racy push/pull for clones
Sage Weil
10:19 PM Bug #2161 (Resolved): nonlinear scaling for PGMap::pg_stat encode
commit:bd518e998c0ff12d611db19a8cff6da3622597cb Sage Weil
10:18 PM Bug #1953 (Resolved): teuthology: core files aren't archived when using valgrind
it works! Sage Weil
10:10 PM Bug #2225 (Resolved): gitbuilder.ceph.com returning 503: Service Temporarily Unavailable.
Yehuda found the bad apache option.. override it in the domain_service (maxconnperip=1000 param) Sage Weil
09:49 PM Messengers Cleanup #2150 (Resolved): repair the Simple/Messenger interface
Sage Weil
09:49 PM Feature #1044 (Fix Under Review): librbd: discard support
Sage Weil
09:04 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
I'm going to have to look at this again in the morning, but I think
we're in this block of code:
#ifdef CONFIG_BL...
Alex Elder
08:37 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Here's a disassembled block of the code where the fault occurred.
The address listed corresponds to offset 3468 belo...
Alex Elder
08:10 PM Linux kernel client Bug #2260 (Resolved): libceph: null pointer dereference at try_write+0x638+0xfb0
It's not an exact match but it's close enough that I wanted to reopen
bug 1793 or 1866, but found myself unable to. ...
Alex Elder
03:27 PM Feature #2246: force10s on sepia
Fabric brought up by Networking group. Interfaces up, configured, and working (nuttcp shows 9.5GB/s or so with
defa...
Dan Mick
01:26 PM Feature #2111: msgr workloads
I think the messenger tester may be at a point where we can call this bug satisfied. Greg Farnum
01:18 PM Bug #2178: rbd: corruption of first block
the good news is i see the problem. the bad news is its the exact bug we thought we fixed. the other good news is w... Sage Weil
07:38 AM Bug #2178: rbd: corruption of first block
Hi Sage,
just in case, the reply from yesterday did not reach you:
--- 8-< ---
Good morning,
it's already...
Oliver Francke
12:27 PM Feature #2258 (Resolved): use external leveldb package
autoconf lets you use the installed library. not doing so by default to avoid the pain of building on older distros. Sage Weil

04/09/2012

04:30 PM rgw Bug #2259 (Resolved): rgw: object name cut after slash when virtual host style is used
Fixed, commit:8d5c87a86e070b4e95ef0d58a469bdbbef4a826c. Yehuda Sadeh
03:42 PM rgw Bug #2259 (Resolved): rgw: object name cut after slash when virtual host style is used
Yehuda Sadeh
09:32 AM Bug #2178: rbd: corruption of first block
The missing piece of information is mapping the file offset to a block device offset. Can you, inside the VM,... Sage Weil

04/08/2012

09:53 PM Feature #2258 (Resolved): use external leveldb package
- make our configure take/require a --with-system-leveldb or similar to not use the bundled leveldb
- update the deb...
Sage Weil
08:31 AM Bug #2178: rbd: corruption of first block
Hi Sage and *Happy easter*,
yesterday I had some "luck" after 10 tries....
Here is what I have for you:
first ...
Oliver Francke

04/06/2012

09:27 PM Feature #1692 (Duplicate): librbd: Support TRIM (hole punching) (userspace client)
dup of #1044 Sage Weil
03:47 PM rgw Feature #2257 (Rejected): rgw: detect fastcgi module 100-continue support automatically
The current default that is used doesn't work with vanilla fastcgi module. It'd be great if that could be set automat... Yehuda Sadeh
02:46 PM rbd Feature #2256 (Resolved): rbd: parallelize deletions
There are a few places where we delete things one at a time: resizing to a smaller size, deleting all snapshots, and ... Josh Durgin
02:04 PM Feature #2240 (Fix Under Review): osd: new default locations
wip-defaults Sage Weil
12:05 PM Bug #2161: nonlinear scaling for PGMap::pg_stat encode
wip-encoding Sage Weil
09:18 AM Bug #2161: nonlinear scaling for PGMap::pg_stat encode
Ake van der Meer wrote:
> My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pasteb...
Sage Weil
08:25 AM Bug #2161: nonlinear scaling for PGMap::pg_stat encode
My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pastebin.com/wYnPKWeJ
In src/i...
Ake van der Meer
10:05 AM Feature #2246 (In Progress): force10s on sepia
Ports being mapped yesterday and today in preparation for switch config review. Dan Mick
09:21 AM Bug #2255 (Resolved): osd: fix object name collisions between pools in temp collection
Sage Weil
08:28 AM Feature #2223: Tracing facility on FileStore
Made some changes to the ObjectStore.cc, regarding code duplication of the transaction's dump methods. Feedback would... Joao Eduardo Luis

04/05/2012

02:21 PM Feature #2248 (Resolved): cluster naming
Sage Weil
02:20 PM Subtask #2236 (Resolved): filestore failure injection (3)
wip-filestore-failure
I don't think enumerating/identifying the callers is needed here. For the idempotency teste...
Sage Weil
01:19 PM Feature #2226: osd: better filestore idempotency test
Thought about the a bit more. The filestore failure injection is easiest to implement with an _exit(1) or something,... Sage Weil
01:13 PM Feature #1890 (Resolved): log: async log writeout
Sage Weil
01:13 PM Feature #1889 (Resolved): log: structure log records
Sage Weil
12:30 PM Feature #2254 (Resolved): doc: cephx
pending improved documentation:
* was is, is not protected
* how to convert/upgrade a non-cephx cluster to cephx (e...
Sage Weil
12:22 PM Subtask #2235 (In Progress): generate deterministic sequence of transactions (5)
Joao Eduardo Luis
10:51 AM Bug #2178: rbd: corruption of first block
Ok, my attempts to parse the log to find out of order replies is quickly snowballing. (complexity of dropped replies... Sage Weil
08:21 AM Bug #2178: rbd: corruption of first block
Oliver Francke wrote:
> Uhm...
>
> ... I thought, we were talking about the same issue since the very beginning.....
Sage Weil
01:25 AM Bug #2178: rbd: corruption of first block
Uhm...
... I thought, we were talking about the same issue since the very beginning... corruption of .rbd-blocks.....
Oliver Francke

04/04/2012

11:11 PM Feature #2248 (Fix Under Review): cluster naming
Sage Weil
11:00 AM Feature #2248: cluster naming
- new ocmmand line arg (-C, --cluster)
- controls default config files
- becomes another subst ($cluster) to be use...
Sage Weil
10:38 AM Feature #2248 (Resolved): cluster naming
Sage Weil
04:09 PM Bug #2253 (Resolved): rados import: uploaded objects are empty
Fixed, commit:0df6fbd3a66741ad02c7556b0c4026dc3577d797. Yehuda Sadeh
03:37 PM Bug #2253 (Resolved): rados import: uploaded objects are empty
Yehuda Sadeh
03:33 PM rgw Documentation #1813: doc: document radosgw api diffs with s3
We'd like to have it for the current sprint, or at least no later than the next sprint. 5/1 as an upperbound target d... Yehuda Sadeh
12:45 PM Bug #2233: Throttle when there are lots of large conccurent IOs
Yeah, it's the failing gracefully bit that I'm interested in. :) Mark Nelson
12:38 PM Bug #2233: Throttle when there are lots of large conccurent IOs
Just the rados bench tool itself is allocating 16GB to feed into librados.
Now that you mention it, librados might...
Greg Farnum
12:29 PM Bug #2233: Throttle when there are lots of large conccurent IOs
Aha! The plana nodes appear to only have 8GB of ram and 8GB of swap.
Is the allocation of that memory part of libra...
Mark Nelson
11:20 AM Linux kernel client Bug #2242: rbd: spinlock on wrong cpu
OK, I think this problem arises because of the switch to a spinlock to
protect the client list. Doing so was the ri...
Alex Elder
09:53 AM Linux kernel client Bug #2242 (Resolved): rbd: spinlock on wrong cpu
... Sage Weil
11:19 AM Bug #2178: rbd: corruption of first block
Oliver Francke wrote:
> Hi Sage,
>
> I was talking about the verbose logfiles from monday. TBH, I don't expect Ba...
Sage Weil
10:32 AM Bug #2178: rbd: corruption of first block
Hi Sage,
I was talking about the verbose logfiles from monday. TBH, I don't expect BadThings without "rbd_writebac...
Oliver Francke
09:49 AM Bug #2178: rbd: corruption of first block
Oliver Francke wrote:
> Whew, that was fast,
>
> after second run I had some errors in one file with:
> [osd]
>...
Sage Weil
07:01 AM Bug #2178: rbd: corruption of first block
Whew, that was fast,
after second run I had some errors in one file with:
[osd]
filestore fiemap threshol...
Oliver Francke
05:43 AM Bug #2178: rbd: corruption of first block
Well Sage,
its harder these days to reproduce, cause I think the current version has made "something more stable"(...
Oliver Francke
10:57 AM Feature #2252 (Resolved): rgw long run kernels
Sage Weil
10:54 AM Feature #2251 (Resolved): rgw long run workloads
Sage Weil
10:53 AM Feature #2250 (Resolved): rgw long run raid config
Sage Weil
10:47 AM Subtask #2249 (Resolved): teuthology task (3)
Sage Weil
10:35 AM Feature #2246 (Resolved): force10s on sepia
Sage Weil
10:32 AM Feature #2245 (Resolved): rgw long run ceph install
Sage Weil
10:29 AM Messengers Feature #2244 (New): msgr: performance tester
Sage Weil
09:54 AM Linux kernel client Bug #2243 (Resolved): btrfs: warning in orphan_commit_root
2012-04-04T01:02:59.191518-07:00 plana32 kernel: [ 8815.371555] ------------[ cut here ]------------
2012-04-04T01:0...
Sage Weil
09:45 AM Feature #2241 (Rejected): upstart
Sage Weil
09:45 AM Feature #2240 (Resolved): osd: new default locations
Sage Weil
09:42 AM Subtask #2239 (New): install + configure package everywhere
chef! Sage Weil
09:42 AM Subtask #2238 (Rejected): vm for coredump archive
Sage Weil
09:41 AM Subtask #2237 (Resolved): failure+replay tester (8)
Sage Weil
09:39 AM Subtask #2236 (Resolved): filestore failure injection (3)
add a hook to operations that we want to potentially fail.
need to identify the caller so that the tester can pote...
Sage Weil
09:38 AM Subtask #2235 (Resolved): generate deterministic sequence of transactions (5)
Sage Weil
09:22 AM Bug #2234 (Resolved): Sometimes 'ceph -s' is unable to show pg data and crashes
ceph -s / ceph -w sometimes gives me output as below:... Szymon Szypulski
09:15 AM CephFS Feature #1237: mds caps limit mount to some subdir
Nope — as with all the other MDS stuff, this is currently not a priority. Greg Farnum
07:10 AM CephFS Feature #1237: mds caps limit mount to some subdir
Is there any progress on this issue? Maciej Galkiewicz

04/03/2012

10:37 PM Messengers Bug #1674 (Need More Info): daemons crash when sent random data
FWIW I was unable to reproduce this with the current code, with or without cephx enabled. Sage Weil
10:07 PM Bug #1627 (Can't reproduce): ceph-mon memleak if ceph-osd cluster ip is not reachable, but public...
Sage Weil
04:52 PM rgw Bug #1681: rgw: user rm with --purge doesn't remove data
Maybe we should disallow removal of user that has data? We can suspend it instead. Yehuda Sadeh
03:57 PM Bug #1921 (Resolved): teuthology: silently continues when len(targets) != len(roles)
Sage Weil
02:43 PM Feature #2226: osd: better filestore idempotency test
Sage Weil
02:32 PM Documentation #2175 (Resolved): doc: fix doc build errors
got this to yellow (only warnnings), yay! Sage Weil
01:39 PM Feature #1890: log: async log writeout
Sage Weil
01:39 PM Feature #1889: log: structure log records
Sage Weil
10:45 AM Feature #2134 (Resolved): qa: smoke suite
Sage Weil
10:31 AM Bug #2178: rbd: corruption of first block
Hi Oliver,
I have two things to try:
- 'rbd writeback window = 0'. I know it's not what you want to run, but t...
Sage Weil
10:29 AM Bug #2233: Throttle when there are lots of large conccurent IOs
That is 16GB of RAM being allocated and used — I don't remember what hardware these are running on and have no idea w... Greg Farnum
09:47 AM Bug #2233 (Won't Fix): Throttle when there are lots of large conccurent IOs
When sending large amounts of data via a single client (ie 256 concurrent 64MB IOs) we can hit a bad_alloc on the cli... Mark Nelson
09:15 AM Cleanup #2191 (Resolved): reexamine simple_spinlock
Sage Weil
08:51 AM Feature #2087 (Resolved): lightweight filestore workload generator
Sage Weil

04/02/2012

02:30 PM rgw Bug #1853 (Resolved): rgw: qa test to verify bucket recreation does not override bucket
Implemented, commit:1551c5b08714b415c49fc759002b7c6a6d4d611a. Yehuda Sadeh
01:26 PM rgw Bug #1856 (Resolved): It is possible to look up an rgw user by a subuser that does not exist as l...
Fixed, commit:addc744692f60885a747c4531cd12bf19b3a7f2a. Yehuda Sadeh
11:15 AM rgw Feature #2171: rgw: asynchronously calculate md5
Thinking about it some more, it's probably not the best use of time and effort. We initiate the md5 calculation after... Yehuda Sadeh
08:29 AM Bug #2178: rbd: corruption of first block
Hi Sage,
here we go again, with ceph-0.44.1-1-g41f84fa
One bad file with following infos:
20120402 171642.12...
Oliver Francke

04/01/2012

07:23 PM Bug #2221: Monitor setup bugs
2) ... Greg Farnum
06:35 PM rbd Feature #2232: qemu: resize guest disk when rbd image is resized
I tested this on Friday, and qemu rereads the size (at least when using virtio) when the guest requests it (i.e. echo... Josh Durgin
04:21 PM rbd Feature #2232 (New): qemu: resize guest disk when rbd image is resized
According to Christoph, this is probably just a matter of calling bdrv_truncate() with the new size. If that doesn't... Sage Weil
04:19 PM rbd Feature #2231 (Resolved): librbd: expose header change (resize?) via api
we need a callback or something so that users (qemu) can be informed when the header changes. this will let them, sa... Sage Weil

03/31/2012

03:22 PM Feature #1655: gitbuilder aggregator page
I took some inspiration from the updated aggregator script that is now at http://ceph.newdream.net/gitbuilder.cgi. I'... Jimmy Tang

03/30/2012

09:11 PM Cleanup #2230 (Resolved): deprecate 'btrfs devs'
Sage Weil
09:00 PM rgw Feature #2229 (New): rgw: functional tests for rgw class
A series of simple functional tests to verify the rgw class methods behave as they should. Sage Weil
08:58 PM Bug #2148 (Resolved): osd: class error return not propagated to client
commit:f8a53869f6db4c76516ee525f00f87f930920692 Sage Weil
05:27 PM Bug #2221: Monitor setup bugs
(1) is a problem due to options parsing collisions...fixed!
(2) is directly contradicted by my testing...?
(3) I ne...
Greg Farnum
04:25 PM Bug #2026 (Can't reproduce): osd: ceph::HeartbeatMap::check_touch_file
Sage Weil
04:25 PM Bug #2045 (Can't reproduce): osd: dout_lock deadlock
haven't seen this in a while.
also, this code is about to go away anyway with wip-log.
Sage Weil
04:16 PM Bug #2102 (Can't reproduce): osd: pg stuck in backfill
Sage Weil
04:15 PM Bug #2102 (Duplicate): osd: pg stuck in backfill
Sage Weil
04:14 PM Bug #2002: osd: racy push/pull for clones
i take that back; this wasn't enabled in qa. adding to the teuthology ceph.conf file. Sage Weil
04:12 PM Bug #2002 (Resolved): osd: racy push/pull for clones
haven't seen this in forever; looks fixed. Sage Weil
04:11 PM Bug #2209 (Resolved): osd: read kb stats not tracked?
commit:aa31035e555129e56888320b84f16264f28bd7df Sage Weil
03:59 PM Bug #2116 (Resolved): Repeated messages of "heartbeat_check: no heartbeat from"
fixed by commit:374bef9c97266600b4c6b83100485d7250363213 Sage Weil
03:59 PM Bug #2165 (Resolved): osd: recovering ending with missing
fixed with merge of commit:75e3b9b309e5365975e3e5855c065bd4fe28b64c Sage Weil
03:58 PM Bug #2178: rbd: corruption of first block
Sage Weil
02:51 PM Bug #2178: rbd: corruption of first block
Please build the current git stable branch, which includes 41f84fac1ae4b4c72bf9bfe07614c4066c916fd1. The version sho... Sage Weil
07:35 AM Bug #2178: rbd: corruption of first block
Here the remaining timestamps from the other VM's with bad blocks:
VM-2:
20120330 105139.579830 filling block 171...
Oliver Francke
07:12 AM Bug #2178: rbd: corruption of first block
Hi *,
I needed a couple of runs, but managed now to provide some 81MiB/97MiB osd.X.log-files, where in between sh....
Oliver Francke
03:58 PM Bug #2164 (Resolved): osd: scrub missing _, snapset attrs
commit:41f84fac1ae4b4c72bf9bfe07614c4066c916fd1 Sage Weil
12:50 PM Feature #2227 (Closed): QA: create a test to verify operation with non-default layouts
I submitted a patch that modified ceph_calc_file_object_mapping()
in the ceph client, and when reviewing it Sage poi...
Alex Elder
09:53 AM Feature #2226 (Resolved): osd: better filestore idempotency test
... Sage Weil
12:35 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
I think I can be optimistic :)... Wido den Hollander

03/29/2012

10:06 PM Bug #2178: rbd: corruption of first block
Okay, I suspect this is actually bug #2164, which was causing the _ xattr to get lost when ceph-osd restarts on non-b... Sage Weil
09:52 PM Bug #2225 (Resolved): gitbuilder.ceph.com returning 503: Service Temporarily Unavailable.
I can't find any 503 in the apache logs on this machine. Could it be on the client side? Sage Weil
09:48 PM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
Well, I fixed one problem, but I can't see how it could have resulted in the log you posted.
Pushed a few more pat...
Sage Weil
11:36 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
I collected logs from 4 OSDs, they can be downloaded at: http://logger.ceph.widodh.nl/ceph/issues/2212/
At 10:13 t...
Wido den Hollander
09:21 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
Der.. do you have a log you can attach/post? Sage Weil
02:59 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
I reverted the extra debugging for the heartbeat stuff, but that didn't seem to consume all the CPU time.
The load...
Wido den Hollander
01:40 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
I just installted the code on my cluster and things do not seem to behave yet.
The cluster is still jumping around...
Wido den Hollander
08:54 PM Linux kernel client Bug #1940 (Resolved): locking cycle in ceph_osdc_start_request
commit:ab434b60ab07f8c44246b6fb0cddee436687a09a Sage Weil
07:53 PM Linux kernel client Bug #1793 (Can't reproduce): NULL pointer dereference at try_write+0x627/0x1060
Marking this Can't Reproduce. Will reopen if it shows up again. Alex Elder
03:21 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
Another 100 iterations of kernel_untar_build.sh using the current
master branch (c666601a935b94cc0f3310339411b6940de...
Alex Elder
07:51 AM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
Bugs 1793 and 2081 have a signature of a page fault/bad memory reference
from process_one_work() -> con_work(), and ...
Alex Elder
07:53 PM Linux kernel client Bug #2069 (Can't reproduce): client crash during kernel_untar_build rm -r step
I just finished at least 150 iterations of kernel_untar.sh and never
hit this using the current master branch of cep...
Alex Elder
07:51 PM Linux kernel client Bug #2081 (Can't reproduce): msgr: spinlock badness?
Marking this Can't Reproduce. Will reopen if it happens again. Alex Elder
07:43 PM Linux kernel client Bug #2081: msgr: spinlock badness?
Another 100 iterations of kernel_untar_build.sh using the current
master branch (c666601a935b94cc0f3310339411b6940de...
Alex Elder
07:51 AM Linux kernel client Bug #2081 (Need More Info): msgr: spinlock badness?
Bugs 1793 and 2081 have a signature of a page fault/bad memory reference
from process_one_work() -> con_work(), and ...
Alex Elder
07:50 PM Linux kernel client Bug #2174 (Can't reproduce): rbd: iozone thrashing failure
OK, I'll go ahead and state that I can't reproduce this... Alex Elder
07:46 PM Linux kernel client Bug #2174: rbd: iozone thrashing failure
Status was Verified. Changing it to Need More Info because I can't even
seem to reproduce it at this point. (I sup...
Alex Elder
07:44 PM Linux kernel client Bug #2174: rbd: iozone thrashing failure
Another 12 iterations of suites/iozone.sh using the current
master branch (c666601a935b94cc0f3310339411b6940de751ba)...
Alex Elder
07:59 AM Linux kernel client Bug #2174: rbd: iozone thrashing failure
I don't know whether we've adequately captured the signature or symptoms
of this problem. I believe though that it ...
Alex Elder
07:20 AM Linux kernel client Bug #2174: rbd: iozone thrashing failure
I have been trying to reproduce this using the latest testing/master/for-linus
branch (they're the same right now) a...
Alex Elder
09:27 AM Linux kernel client Bug #2224 (Rejected): Oops in __cfh_to_dentry
I setup an HA pair of NFS servers which re-export Ceph to NFS clients.
The HA pair is in active/standby mode, using...
Henry Chang
07:42 AM Feature #2087: lightweight filestore workload generator
Memory leak fixed.
Apparently, the FileStore does not cleanup after transactions once they are applied, which may ...
Joao Eduardo Luis
06:21 AM Feature #2087 (In Progress): lightweight filestore workload generator
Looks like some memory should be leaking bad, such that valgrind hangs on exit.
==19080==
==19080== HEAP SUMMARY...
Joao Eduardo Luis
07:24 AM Linux kernel client Bug #2064 (Resolved): ceph-client: messenger: nocrc flag not implemented correctly
Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved.
Alex Elder
07:12 AM Linux kernel client Bug #2157 (Resolved): ceph: xattr: fix nanosecond display on i_rctime
Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved.
Alex Elder
07:12 AM Linux kernel client Bug #2156 (Resolved): ceph: xattr: fix a possible buffer overrun bug
Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved.
Alex Elder
07:11 AM Linux kernel client Bug #2155 (Resolved): ceph: xattr: wrong value assumed for "no preferred PG"
Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved.
Alex Elder
05:56 AM Feature #2223 (Resolved): Tracing facility on FileStore
Allow a user to specify a file onto which log the transactions that come through OSDs' FileStores.
This should all...
Joao Eduardo Luis

03/28/2012

11:12 PM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
Ah, I see the bug now. Pushed a fix to wip-osd-hb, thanks!
Let us know if this behaves for you.. if so I'll pull ...
Sage Weil
04:23 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
It's quite large (222MB), so I uploaded the file, available at: http://logger.ceph.widodh.nl/ceph/osd.1.log_27-03-201... Wido den Hollander
10:51 PM Bug #2165: osd: recovering ending with missing
see wip-osd-recovery-sources Sage Weil
10:46 PM CephFS Bug #1811: 2 pjd chown tests failed on cfuse
... Sage Weil
03:21 PM Feature #2222: osd: distinguish between 'degraded' and 'misplaced'
We should pick a designator that doesn't make it sound like the objects are lost. Greg Farnum
02:27 PM Feature #2222 (Resolved): osd: distinguish between 'degraded' and 'misplaced'
normal data migration happens with a acting set > the up set, so that we never drop below N replicas, but we still ca... Sage Weil
02:45 PM Feature #2087: lightweight filestore workload generator
Joao Eduardo Luis
02:07 PM Bug #2221 (Resolved): Monitor setup bugs
Carl reported several configuration issues when creating new monitors (based on the instructions at http://ceph.newdr... Greg Farnum
08:35 AM rgw Bug #2220 (Resolved): rgw: librgw dep on g_ceph_context
Fixed, commit:18d219e512a8e0f427a2229a71e15869cac3b593. Yehuda Sadeh
07:16 AM rgw Bug #2220 (Resolved): rgw: librgw dep on g_ceph_context
from last night's qa,... Sage Weil
04:37 AM Bug #2219: OSD's commit suicide with 0.44
I accidentally removed the core file(s) :(
Hope this one pops up again so I have a core file.
Wido den Hollander
04:11 AM Linux kernel client Tasks #2138: rbd: run xfstests on a local XFS filesystem over RBD
After setting up two rbd devices and making some fairly simple changes
to xfstests, then setting up appropriate envi...
Alex Elder
04:04 AM Linux kernel client Bug #2155: ceph: xattr: wrong value assumed for "no preferred PG"
This got rebased: 3489b42a72a41d477665ab37f196ae9257180abb
This has been sent as part of a pull request to Linus ...
Alex Elder
04:04 AM Linux kernel client Bug #2156: ceph: xattr: fix a possible buffer overrun bug
This got rebased: 3489b42a72a41d477665ab37f196ae9257180abb
This has been sent as part of a pull request to Linus ...
Alex Elder
04:03 AM Linux kernel client Bug #2157: ceph: xattr: fix nanosecond display on i_rctime
This got rebased: 3489b42a72a41d477665ab37f196ae9257180abb
This has been sent as part of a pull request to Linus ...
Alex Elder
04:01 AM Linux kernel client Bug #2064: ceph-client: messenger: nocrc flag not implemented correctly
It got rebased once more, and this should be the last:
37675b0f42a8f7699c3602350d1c3b2a1698a3d3
This has been s...
Alex Elder
03:52 AM Bug #2178: rbd: corruption of first block
Hi,
I decided to upgrade to "latest-n-greatest" in the test-cluster, to make sure, that if I hit the error again w...
Oliver Francke

03/27/2012

06:31 PM CephFS Bug #2218: CephFS "mismatch between child accounted_rstats and my rstats!"
The MDS log is at https://matthew.royhousehold.net/mds.a.log.1.gz (1505MB, md5 197ef232d50d27e2b7c2f62370c9c6b6) Matthew Roy
02:45 PM CephFS Bug #2218 (Need More Info): CephFS "mismatch between child accounted_rstats and my rstats!"
There's not enough info in the attached log to figure out what happened. I can tell you that your home directory beli... Greg Farnum
04:26 PM rgw Bug #2197 (Resolved): rgw: need to throttle incoming requests
Fixed, commit:a52d048ac429c3d2b6a9286d96253308f6588762. Yehuda Sadeh
04:10 PM Bug #2178: rbd: corruption of first block
The next step is to reproduce the corruption on the test cluster with logs:
debug osd = 20
debug ms = 1
debug...
Sage Weil
08:37 AM Bug #2178: rbd: corruption of first block
Well,
one more comment:
my guess would be, it has todo something with expansion of the "sparse-file" while writin...
Oliver Francke
05:24 AM Bug #2178: rbd: corruption of first block
Good morning ;)
meanwhile I have not been lazy. I've managed - with current setup in test-cluster - to produce "in...
Oliver Francke
04:07 PM Bug #2164: osd: scrub missing _, snapset attrs
wip-2164
it's a problem with the collection_move guard (or lack thereof)
Sage Weil
03:40 PM rgw Bug #2208 (Resolved): rgw: radosgw-admin temp remove failure
Fixed, merged at commit:93ba4c004a9269148a75b67da2522855cb1842a3. Yehuda Sadeh
02:19 PM Bug #2219 (Need More Info): OSD's commit suicide with 0.44
Can you look at the core file and 'thread apply all bt'? Sage Weil
05:57 AM Bug #2219: OSD's commit suicide with 0.44
... Wido den Hollander
05:03 AM Bug #2219 (Can't reproduce): OSD's commit suicide with 0.44
I noticed this myself today, but on IRC somebody else came along:... Wido den Hollander
02:03 PM Bug #2199 (Resolved): mon: get_bl osdmap_full/9583 No such file or directory
Merged to master in commit:1814aac17593dee0fa4c774d5b462f277f6698da, reviewed by Sage — even though I forgot to add t... Greg Farnum
12:25 PM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
Can you attach the full osd.1 log? Sage Weil
12:36 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
Over night I saw 16 OSD's go down with the same backtrace.
All OSD's were running with debug ms/osd set to 1, this...
Wido den Hollander
09:07 AM Linux kernel client Bug #2174: rbd: iozone thrashing failure
I've been off on other things, but this problem apparently recurred
even if the latest check-in (Josh's change) in p...
Alex Elder
08:38 AM CephFS Bug #2217: sync and O_DIRECT writes only write first extent in iov vector
The code should not be written that way.
However I think it doesn't matter at this point, because the only caller
...
Alex Elder

03/26/2012

06:24 PM CephFS Bug #2218 (Resolved): CephFS "mismatch between child accounted_rstats and my rstats!"
The mismatch is detected at 2012-03-26 18:39:54.306661... Matthew Roy
03:51 PM Bug #2192: ceph-mon hangs consuming 100% CPU
It was reproduced all the time, for 0.44 also. After I adjusted cluster to have only one monitor problem has gone. (U... Vladimir Kulev
02:44 PM CephFS Bug #2217 (Resolved): sync and O_DIRECT writes only write first extent in iov vector
static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t po...
Sage Weil
01:34 PM Bug #2199 (Fix Under Review): mon: get_bl osdmap_full/9583 No such file or directory
Re-pushed misc-fixes-for-review. Greg Farnum
09:59 AM Bug #2199 (In Progress): mon: get_bl osdmap_full/9583 No such file or directory
Sage pointed out the stash data structure isn't necessarily the same as the other stored data structures, so this nee... Greg Farnum
12:47 PM Messengers Cleanup #2216 (Resolved): SimpleMessenger should make sure it owns passed-in Connections
Sage Weil
10:50 AM Messengers Cleanup #2216 (Resolved): SimpleMessenger should make sure it owns passed-in Connections
Otherwise we get weird issues like #2212. Greg Farnum
12:38 PM Cleanup #2191: reexamine simple_spinlock
my log branch drops this for the dout logging. the last user is the buffer.h debugging (enabled manually via a macro... Sage Weil
12:06 PM RADOS Bug #2047: crush: with a rack->host->device hierarchy, several down devices are likely to cause b...
fwiw dropping the local search behavior fixes this bad behavior. the question is what probably was the local search ... Sage Weil
11:27 AM RADOS Bug #2047: crush: with a rack->host->device hierarchy, several down devices are likely to cause b...
Sage Weil
11:27 AM Bug #2210 (Duplicate): osd: some PGs remains remapped or degraded
this is actually a crush problem, see #2047. Sage Weil
09:45 AM Bug #2210: osd: some PGs remains remapped or degraded
#2173 has some osd logs and related info for the same problem on a less clean cluster. Thanks for the detailed steps ... Josh Durgin
10:36 AM CephFS Fix #2215 (Resolved): ceph-fuse does not invalidate page cache
Right now the userspace client doesn't invalidate the page cache when it loses the cache capability on an inode. Appa... Greg Farnum
09:58 AM Bug #2212 (Resolved): osd: FAILED assert(msgr->lock.is_locked())
ah, i was using wrong msgr, fixing! Sage Weil
05:50 AM Bug #2212 (Resolved): osd: FAILED assert(msgr->lock.is_locked())
With the new heartbeat code I noticed a couple of OSD's go down with:... Wido den Hollander
09:58 AM RADOS Bug #2214 (Resolved): crush: pgs only mapped to 2 devices with replication level 3
This is from #2173. Note that all 3 osds are up.... Josh Durgin
09:43 AM Bug #2173 (Resolved): MDS crash when start with end of buffer
Josh Durgin
06:04 AM Feature #2213 (Resolved): rbd: shouldn't need config file to get help
I just ran "rbd --help" on a pretty much un-configured machine and got:
global_init: unable to open config file.
...
Alex Elder
05:22 AM Bug #2211 (Resolved): osd: entity_inst_t OSDMap::get_inst(int) const
While trying out the new heartbeat code I encountered this crash:... Wido den Hollander

03/25/2012

08:39 PM Bug #2173: MDS crash when start with end of buffer
Shall we colse this bug, as the mds server was recovered by providing an empty session map and we can not reproduced ... soft crack
08:39 PM Bug #2210 (Duplicate): osd: some PGs remains remapped or degraded
Some PGs remains 'remapped' or 'degraded' status after adding an osd server.
The steps to re-produce the bugs:
1....
soft crack
09:54 AM Feature #2087: lightweight filestore workload generator
Pushed a new commit to [1], making the code compliant with the CodeStyle and with Sage's suggestions on github.
[1...
Joao Eduardo Luis
 

Also available in: Atom