Activity
From 03/25/2012 to 04/23/2012
04/23/2012
- 06:23 PM Bug #2338: mon: adding new monitors simultaneously can allow a new mon to become leader
- Looks like the original monitor doesn't believe in the existence of the new monitors; I'll need to check out why.
<p... - 05:46 PM Bug #2338: mon: adding new monitors simultaneously can allow a new mon to become leader
- ...
- 05:42 PM Bug #2338 (Rejected): mon: adding new monitors simultaneously can allow a new mon to become leader
- When you add two new monitors (out of 3 total) to a cluster you can end up with one of the new monitors being the lea...
- 05:26 PM Bug #2286: mon: different full/near_full values on different monitors
- Hi Guys,
After upgrading the patched-kernel btrfs test cluster from 0.45-1 to 0.45-281-g0777613, the full_ratio an... - 04:38 PM Feature #2337 (Resolved): rgw and rados performance numbers
- 04:16 PM Feature #2336 (Resolved): qemu: wire up discard
- 04:10 PM Feature #2335 (Resolved): librbd: write-thru cache mode
- 04:02 PM rgw Documentation #1813 (Fix Under Review): doc: document radosgw api diffs with s3
- 03:51 PM Feature #2334 (Resolved): mon: set max mark-out or mark-down
- 02:34 PM Feature #1044 (Resolved): librbd: discard support
- 02:34 PM Feature #2296 (Resolved): librbd: allow resizing to arbitrary sizes
- 02:29 PM Feature #1451 (Resolved): librbd: instrument via perfcounter
- 02:22 PM Feature #1888 (Rejected): log: per-thread ring buffer
- 02:20 PM Subtask #2333 (Resolved): create queueing for peering messages
- Currently, the osd dispatch calls directly into the PG peering state machine. Instead, we need to queue the events g...
- 02:10 PM Subtask #2332 (Resolved): move pg queueing into pgs
- Currently, the osd reaches into the pg to manipulate the pg queue during message receipt and during handle_osd_map. ...
- 02:07 PM Cleanup #2041 (In Progress): osd: move peering into worker threads
- 11:31 AM Cleanup #2331 (Resolved): Makefile.am:182: `lib/libgtest.a' is not a standard libtool library name
- Warning is still happening, despite git clean -fdx, git submodule freshening of various sorts, etc.
This should prob... - 11:27 AM Bug #2276 (Rejected): osd: eat cpu on restart
- it's up now.. i think i just didn't wait long enough.
- 11:26 AM Bug #2266 (Resolved): teuthology: nuke after failure is failing
- ignore errors caused by ntpdate vs ntpd race
- 11:25 AM Bug #2322 (Need More Info): osd/ReplicatedPG.cc: 3832: FAILED assert(!object_contexts.size())
- also going to wait until the threading refactor is complete before diving into this further.
at this point asserti... - 11:24 AM Feature #2330 (Resolved): dump open files, sockets when we run out of fds
- 11:23 AM Bug #2310 (Resolved): osd: too many open files
- this is just sockets and hitting the flusher limit. we're both increasing 'max open files' and switching to vm limit...
- 11:04 AM Bug #2329 (Resolved): fix detection of C++11 atomic header
- the C++11 atomic header is now <atomic> (I've checked gcc 4.6 and 4.7) and not <cstdatomic>
- 10:23 AM rgw Bug #1681 (Resolved): rgw: user rm with --purge doesn't remove data
- 09:56 AM Feature #2251 (Resolved): rgw long run workloads
04/21/2012
- 03:16 PM Feature #1451 (Fix Under Review): librbd: instrument via perfcounter
- see commit:07ddff427145e109eb820b6ed0ddb6cca74b65b6
- 03:15 PM Bug #2328 (Resolved): osd: mapext/fiemap doesn't work for small extents
- see commit:c8377e466caace018eea06c1739265111ce72c48 for a kludge that detects the bug and disabled fiemap.
- 02:35 PM Bug #2328: osd: mapext/fiemap doesn't work for small extents
- this works on a newer kernel (3.2.0-2-amd64).
should we check kernel versions in filestore and magically disable f... - 02:20 PM Bug #2328 (Resolved): osd: mapext/fiemap doesn't work for small extents
- If you query the mapping for an extent that inside a larger allocated extent, the fiemap ioctl won't tell you:
<pr... - 11:15 AM CephFS Bug #2218: CephFS "mismatch between child accounted_rstats and my rstats!"
- Logs from a clean cluster at http://matthew.royhousehold.net/cephLogs/cephLogs.mds.tar (4382MB md5 aaf9364c7e35bc6b5d...
04/20/2012
- 09:18 PM Feature #2327 (Resolved): mon: use external keyring for inter-mon auth
- Currently the mon. key is part of the internal mon auth database. It can't be modified without a running mon cluster...
- 09:17 PM rbd Feature #2326 (Resolved): krbd: use new class interfaces, new image format
- Update rbd_types.h to match the userspace version, and add support for opening new-format images while keeping suppor...
- 09:04 PM Feature #2325 (Resolved): setup new email/etc
- 08:32 PM Linux kernel client Tasks #2138 (Resolved): rbd: run xfstests on a local XFS filesystem over RBD
- I have two files that implement automated testing using
xfstests over rbd devices.
One is now in the ceph git tre... - 08:28 PM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
- A kernel dump would likely help, but there's no guarantee because
of the delayed execution of the operation. It wou... - 07:38 PM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
- Would a kernel core dump help here?
- 07:37 PM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
- This is one of a family of bugs we've been trying to understand.
Here is another one:
http://tracker.newdream.n... - 08:00 PM Linux kernel client Bug #2243: btrfs: warning in orphan_commit_root
- I mentioned this somewhat informally to Chris Mason last week. I
provided him the message, and he said:
Well, ... - 07:39 PM Feature #2251 (In Progress): rgw long run workloads
- 07:37 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
- Another report, likely related:
http://tracker.newdream.net/issues/2287
I don't understand it well enough yet... - 07:37 PM rgw Documentation #1813 (In Progress): doc: document radosgw api diffs with s3
- 07:35 PM Subtask #825 (In Progress): osd: remove pg map updating from handle_osd_map
- 07:35 PM Feature #2314 (Fix Under Review): remove localized pgs
- wip-lpg
Did a basic test of a cluster with localized pgs and upgraded to this, no problems. A reasonably thorough... - 07:12 PM Linux kernel client Bug #2298 (Resolved): rbd: broken encode_op for big-endian hosts?
- This has been fixed. I have been testing it in a private branch
and will shortly be updating the ceph-client testin... - 07:10 PM Linux kernel client Bug #2242 (Resolved): rbd: spinlock on wrong cpu
- This was fixed a couple of weeks ago, and the result has been committed
both to the testing and master branches of t... - 05:00 PM Bug #2324 (Resolved): osd: assert("q.empty()") failed in OpSequencer destructor
- sam fixed this in commit:888a082f23974b1f7a63f302e29a326182e7dc41
- 03:56 PM Bug #2324 (Resolved): osd: assert("q.empty()") failed in OpSequencer destructor
- This is consistently reproducible with 2 osds started by vstart on vit, but only happens intermittently with 1 osd.
... - 04:32 PM Feature #2245 (Resolved): rgw long run ceph install
- 03:09 PM Bug #2079 (Duplicate): rbd: creating a snapshot with the same name doesn't return an error
- i think this was caused by the rados class return values. in any case, it works correctly now.
- 03:07 PM Bug #2084 (Can't reproduce): segfault in tcmalloc
- 12:16 PM Feature #1618 (Resolved): libvirt: make sure migration works
- we demoed this
- 12:05 PM Feature #2323 (Resolved): osd: limit 'old request' messages generated
- If there are hundreds of old requests queued, let's say that, instead of generated gigabytes of logs.
Maybe a simp... - 10:44 AM Bug #2322 (Resolved): osd/ReplicatedPG.cc: 3832: FAILED assert(!object_contexts.size())
- ...
04/19/2012
- 09:49 PM Bug #2291 (Can't reproduce): objectcacher perfcounters don't work with test_librbd_fsx
- This works fine for me. Maybe it was fixed when the objectcacher naming thing was changed?...
- 09:17 PM Messengers Cleanup #2150: repair the Simple/Messenger interface
- Looks good to me, provided it makes it through the regression suite without problems!
- 07:11 PM Messengers Cleanup #2150: repair the Simple/Messenger interface
- wip-msgr-interface
- 09:08 PM Feature #2321 (Resolved): osd: investigate memory consumption from peering backlog
- 09:07 PM Feature #2320 (Duplicate): mon: detect and throttle osd flapping
- 09:07 PM Feature #2319 (Resolved): mon: block osd mark-down
- 09:07 PM Feature #2318 (Resolved): mon: block osd boot
- 09:06 PM Feature #2317 (Resolved): mon: pause/unpause auto-mark-out
- 04:37 PM rgw Feature #2313 (Resolved): rgw: expose extra bucket info trough S3 api
- Already pushed a test to the s3-tests functional.
- 04:33 PM rgw Feature #2313 (In Progress): rgw: expose extra bucket info trough S3 api
- wip-2313 looks sane.. let's add a test and merge for 0.46.
- 04:25 PM rgw Bug #1681 (In Progress): rgw: user rm with --purge doesn't remove data
- merged the radosgw-admin change to make it fail. let's add a test for it and then close this bug.
- 04:17 PM Feature #2296 (Fix Under Review): librbd: allow resizing to arbitrary sizes
- see wip-discard
- 02:13 PM Feature #2296: librbd: allow resizing to arbitrary sizes
- more importantly, we need to either error out with EINVAL if it's not a block multiple, or do it... currently we sile...
- 02:37 PM Bug #2307 (Resolved): OSD & Monitor disagree on the contents of pg_temp
- Just changing the pg_num and pgp_num did fix it up, so with the osdmap workaround we should be all good now.
- 02:10 PM Feature #1044 (Fix Under Review): librbd: discard support
- 12:47 PM Bug #2263 (Resolved): obsync: move man page to section 1
- 12:45 PM Bug #2311 (Need More Info): rbd: delete + create image led to EEXIST
- 07:20 AM Bug #2311 (In Progress): rbd: delete + create image led to EEXIST
- Can you generate a log? Ideally 'debug ms = 1'?
Also, attach the output of 'ceph --show-config'?
Thanks! - 02:43 AM Bug #2311: rbd: delete + create image led to EEXIST
- Hi Sage,
uhm, not solved yet as per ceph version 0.45-207-g3053e47 (commit:3053e4773bae93cfa3158882aa4963803862f9b... - 12:44 PM Bug #2262 (Resolved): qa: osd-recovery tasks fails on flush_pg_stats
- fixed by teuthology commit:6a58314d4627d106c5fd6df186e191c19a01f64b
- 10:47 AM Bug #2192: ceph-mon hangs consuming 100% CPU
- It was some 3.0.0 Ubuntu kernel, backed by btrfs.
- 10:06 AM Bug #2192: ceph-mon hangs consuming 100% CPU
- I missed this when it came in, and I don't know where the 100% CPU usage is coming from, but the hung filesystem soun...
- 03:19 AM Bug #2316 (Resolved): rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g30...
- Hi,
in my current test-setup all four VM's are started with rbd_cache parm. After all VM's are started and began t...
04/18/2012
- 10:51 PM Bug #2263: obsync: move man page to section 1
- 10:42 PM CephFS Bug #2293 (Resolved): admin sockets don't persist with ceph-fuse
- commit:e82c33099a0efda027bc7fa991dcd2073baea539
- 09:46 PM rgw Bug #2027: rgw -> apache miscommunication
- Not completely unlikely. We can set it to "can't reproduce", and reopen if we see it again.
- 06:12 PM rgw Bug #2027: rgw -> apache miscommunication
- do we think this is fixed now by the rgw throttling?
- 06:00 PM Bug #2310: osd: too many open files
- failed to capture a full strace.. try it again (once we find a failing osd on congress) with
strace -e trace=open,... - 09:48 AM Bug #2310 (Resolved): osd: too many open files
- ...
- 04:39 PM Bug #2315 (Resolved): unrecognized admin socket command 'objecter_requests'
- From teuthology:/a/nightly_coverage_2012-04-18-a/1602/teuthology.log:...
- 03:57 PM rgw Bug #2312 (Resolved): rgw: create user and subuser in a single radosgw-admin command
- Fixed, commit:aab516da7f89310445be4e4fb61836084d2dac32.
- 02:01 PM rgw Bug #2312 (Resolved): rgw: create user and subuser in a single radosgw-admin command
- 03:41 PM Bug #2211 (Resolved): osd: entity_inst_t OSDMap::get_inst(int) const
- 03:41 PM Bug #2262 (In Progress): qa: osd-recovery tasks fails on flush_pg_stats
- 03:27 PM Feature #2314 (Resolved): remove localized pgs
- 02:47 PM rgw Feature #2313: rgw: expose extra bucket info trough S3 api
- Ok, let's just send those extra headers anyway. Otherwise we'd have some issue creating the request signature for the...
- 02:11 PM rgw Feature #2313 (Resolved): rgw: expose extra bucket info trough S3 api
- syntax:
HEAD /<bucket>
X-RGW-Params: extrainfo
extra response headers:
X-RGW-Object-Count: <object count>
X-RG... - 02:23 PM Feature #2252 (Resolved): rgw long run kernels
- 02:22 PM Feature #2250 (Resolved): rgw long run raid config
- 02:14 PM Feature #2265 (Rejected): make sure objecter/kclient error out when localized pgs don't exist
- 01:58 PM rgw Feature #2308 (Resolved): radosgw-admin: make user create idempotent
- done, commit:5a6bbd0c473e15aa7642da367e7936015d19d77a.
- 01:46 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
- And I gave him a patched monitor so he could set pg_num, which should fix it. Waiting to hear back, and will apply th...
- 01:16 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
- pushed workaround that will repair osdmaps that saw your corruption, commit:eea982e56739a7a91ca907ccc5c5ec1f78d9460d.
- 01:30 PM Bug #2311 (Resolved): rbd: delete + create image led to EEXIST
- this is 'rbd writeback window' at its best. long live 'rbd cache'!
- 01:06 PM Bug #2311: rbd: delete + create image led to EEXIST
- Congrats for closing the annoying ticket #2178 :-D
Fair enough, to have a new one on this issue, here my last note... - 12:46 PM Bug #2311: rbd: delete + create image led to EEXIST
- Is it possible there is some other user, or the logs are from the wrong cluster?
I see:
- client.13507 deletes 90... - 12:45 PM Bug #2311 (Resolved): rbd: delete + create image led to EEXIST
- Here is a sequence copy-n-pasted:
rbd rm data/905-testdisk.rbd
Removing image: 100% complete...done.
rbd create ... - 01:28 PM Bug #2286 (Resolved): mon: different full/near_full values on different monitors
- commit:9ef953b5e20c3d232cfe4aa90f26476a2a2f911b
- 11:18 AM Bug #2286 (Fix Under Review): mon: different full/near_full values on different monitors
- Check out wip-2286-ratio-a and see what you think. It fills in the ratios from g_conf on create_initial, only changes...
- 12:51 PM Bug #2178: rbd: corruption of first block
- Hi Sage,
sorry, was not clear enough. The logfiles provide informations for "907-testdisk.rbd..." not "906..."
Th... - 12:46 PM Bug #2178 (Resolved): rbd: corruption of first block
- moved this new issue to #2311, and resolving this bug. hooray!
- 12:45 PM Bug #2178: rbd: corruption of first block
- Oliver Francke wrote:
> Here is a sequence copy-n-pasted:
>
> rbd rm data/905-testdisk.rbd
> Removing image: 100... - 10:41 AM Bug #2178: rbd: corruption of first block
- Oliver Francke wrote:
> Hi Sage,
>
> here my notes, after almost 40 tests no bad things happened, only once a min... - 07:22 AM Bug #2178: rbd: corruption of first block
- second logfile here, sorry.
- 07:18 AM Bug #2178: rbd: corruption of first block
- Here is a sequence copy-n-pasted:
rbd rm data/905-testdisk.rbd
Removing image: 100% complete...done.
rbd create ... - 05:51 AM Bug #2178: rbd: corruption of first block
- Meanwhile continued to test...:
I noticed some negative degredation:
2012-04-18 14:43:37.282634 pg v128104: ... - 05:36 AM Bug #2178: rbd: corruption of first block
- Hi Sage,
here my notes, after almost 40 tests no bad things happened, only once a minor hickup, where the rbd-head... - 08:22 AM Linux kernel client Bug #2298 (In Progress): rbd: broken encode_op for big-endian hosts?
- I sent a note to the various lists Al Viro posted to, to confirm the
bug (wasn't sure whether Sage had or not).
I... - 05:58 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
- Interesting... The warning showed up again despite test 232 being
removed from the list. Based on the time stamp o...
04/17/2012
- 08:47 PM Bug #2286: mon: different full/near_full values on different monitors
- Greg Farnum wrote:
> Hmm. I looked at redoing this and got stuck on the semantics we want. If we're interested in fu... - 04:52 PM Bug #2286 (In Progress): mon: different full/near_full values on different monitors
- Hmm. I looked at redoing this and got stuck on the semantics we want. If we're interested in full_ratio == 0 being an...
- 11:00 AM Bug #2286: mon: different full/near_full values on different monitors
- yeah. actually, i think the check should go in tick() inside the is_leader() block, and not update_from_paxos().
- 10:54 AM Bug #2286: mon: different full/near_full values on different monitors
- Oh, I see...I wasn't following that need_*_ratio_update stuff properly. And update_full_ratios() will be called on th...
- 10:30 AM Bug #2286: mon: different full/near_full values on different monitors
- Greg Farnum wrote:
> I'm looking at your patch and it doesn't make a lot of sense to me.
> First off, when do you t... - 09:45 AM Bug #2286: mon: different full/near_full values on different monitors
- I'm looking at your patch and it doesn't make a lot of sense to me.
First off, when do you think that peon monitors ... - 08:43 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
- Greg Farnum wrote:
> I'm confused how you're getting that pool_max printout — I don't see it at all when I run that ... - 06:57 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
- I'm confused how you're getting that pool_max printout — I don't see it at all when I run that command with a ceph-de...
- 04:16 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
- at some point the osdmap pool_max got set to -1.
nine:2307 04:15 PM $ ~/src/ceph/src/ceph-dencoder type OSDMap i... - 03:56 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
- nine:2307 03:56 PM $ osdmaptool osdmap_full/5754 -p | grep ^pool
pool 0 'data' rep size 2 crush_ruleset 0 object_has... - 03:52 PM Bug #2307: OSD & Monitor disagree on the contents of pg_temp
- It looks to me liek the 'data' pool (0) was deleted, and then a new one (vmimages) was created. but somehow that was...
- 10:34 AM Bug #2307 (Resolved): OSD & Monitor disagree on the contents of pg_temp
- See: http://marc.info/?t=133352732900001&r=1&w=2
It seems that (for example) pg 0.138 is in pg_temp, but the OSD c... - 03:03 PM Feature #2309 (Duplicate): rados namespaces
- 01:23 PM rgw Bug #2289 (Resolved): rgw: listing a bucket hangs after removing inexisting object
- Fixes merged into master at commit:3053e4773bae93cfa3158882aa4963803862f9b2.
- 01:13 PM Bug #2304 (Resolved): rbd import fails on block device
- 11:57 AM CephFS Bug #2299 (Rejected): all MDS commit suicide on startup
- 11:54 AM Bug #2219 (Can't reproduce): OSD's commit suicide with 0.44
- Let us know if you see this again! Thanks
- 11:40 AM Bug #2301 (Resolved): librados: LibRadosMisc.AioOperatePP failure
- 11:27 AM rgw Feature #2308 (Resolved): radosgw-admin: make user create idempotent
- radosgw-admin user create should be idempotent and work similar to user modify. We would need to verify that the same...
- 08:11 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
- I believe we are seeing the same problem here. I have been able to reproduce it each time I have tried. The hardwar...
- 06:36 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
- I have updated the run_xfstests.sh script so that it simply no longer
runs test 232. That way we can still benefit ...
04/16/2012
- 10:02 PM Bug #2178: rbd: corruption of first block
- The most recent occurrence has been confirmed to be a replay issue with non-btrfs filesystems. The wip-guard branch ...
- 09:54 PM Bug #2255 (Resolved): osd: fix object name collisions between pools in temp collection
- 09:52 PM Bug #2286: mon: different full/near_full values on different monitors
- pushed a patch that confines the logic of when to update this into a single bit of code. look okay?
i think the b... - 12:57 PM Bug #2286: mon: different full/near_full values on different monitors
- Sage asked on irc about just setting it up on the initial create_empty. The problem with that the only data which is ...
- 11:26 AM Bug #2286 (Fix Under Review): mon: different full/near_full values on different monitors
- This got (obviously) broken by commit:b6d1c0c9b7290a237560528b6ff0d6b2b2998ee2, which put in the use of magic numbers...
- 09:37 PM Feature #2113 (Resolved): objectcacher perfcounters
- 11:24 AM Feature #2113: objectcacher perfcounters
- My bad — I'll try and do that today!
- 11:13 AM Feature #2113 (Fix Under Review): objectcacher perfcounters
- not merged yet! i wanted to get feedback first on my naming kludge...
- 10:02 AM Feature #2113 (Resolved): objectcacher perfcounters
- Sage merged this.
- 04:12 PM Bug #2306: objecter: accessing empty object maps to pool 0
- that looks right to me.
and yeah, i don't think object operations should be possible on an empty object name... - 04:03 PM Bug #2306: objecter: accessing empty object maps to pool 0
- Yep, that's pretty much exactly what I was thinking.
The only other question is if this fix is the right approach ... - 04:00 PM Bug #2306: objecter: accessing empty object maps to pool 0
- Would something like this work (not tested)?...
- 03:52 PM Bug #2306: objecter: accessing empty object maps to pool 0
- i prefer an explicit separate field for oid-vs-pg mode so that we can distinguish between pg 0.0 (really) and no pg/n...
- 03:07 PM Bug #2306: objecter: accessing empty object maps to pool 0
- Ah, nope. list_objects is broken.
- 03:06 PM Bug #2306: objecter: accessing empty object maps to pool 0
- From what I see, the pg ops call pool_op_submit() and not op_submit() so Greg's fix might be ok?
- 02:53 PM Bug #2306: objecter: accessing empty object maps to pool 0
- Ah, you're right. I missed that function when looking to see who filled in the op->pgid.
In that case we should ma... - 02:33 PM Bug #2306: objecter: accessing empty object maps to pool 0
- i think that if was there for the pg ops (PGLS) where there is no object... the list_objects code is filling in the p...
- 02:07 PM Bug #2306 (Fix Under Review): objecter: accessing empty object maps to pool 0
- Yep, the Objecter doesn't calculate pg placement for objects with a zero-length name. I'm pretty sure the if guard th...
- 01:51 PM Bug #2306: objecter: accessing empty object maps to pool 0
- Empty object <== object with empty name
- 01:51 PM Bug #2306 (Resolved): objecter: accessing empty object maps to pool 0
- Even if different pool was specified.
- 03:34 PM CephFS Bug #2299: all MDS commit suicide on startup
- this issue can be closed, there was an error in the underlying fileystem of osd.0 :)
- 02:59 PM CephFS Bug #2277: qa: flock test broken
- I was going to move this over to the kernel client project and then realized I can't — should we close this bug (reje...
- 02:46 PM CephFS Bug #2277: qa: flock test broken
- ...
- 02:15 PM Linux kernel client Bug #2298: rbd: broken encode_op for big-endian hosts?
- there are some old g5's in the closet here at aon that we can use.
in the past we've found/fixed these issues with... - 01:46 PM Linux kernel client Bug #2298: rbd: broken encode_op for big-endian hosts?
- I haven't looked at this in any detail but I presume Al is correct.
We don't have any big endian hardware anywhere, ... - 01:49 PM CephFS Bug #2288: libcephfs: setxattr returns EEXIST following removexattr
- More info:
That branch has a patch which adds a call to removexattr before the setxattr. If you run testceph twice i... - 01:38 PM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
- Should have waited. It have reproduced the problem by running test 232.
- 01:37 PM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
- After a lot of repetitions, I've narrowed it down to test 232 or 234.
- 10:12 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
- I ran subsets of that list at least three times and never
reproduced it. I tried again after a reboot, and again,
... - 06:43 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
- Looking at the list of tests that indicate they include quota testing,
the ones that are currently being run by the ... - 06:36 AM Linux kernel client Bug #2302: xfs: warning at mutex_remove_waiter
- I sent a report to the XFS mailing list about the warning. I have to try
to narrow down which test was running when... - 01:18 PM CephFS Bug #2285: libcephfs: failure with empty name components
- Yep, it's client-local; there's no request to the MDS for this either.
Guess that means we don't care right now? - 01:04 PM CephFS Bug #2285 (In Progress): libcephfs: failure with empty name components
- Oddly, this looks like it's a race. I can't reproduce it with any client debugging on...
- 11:33 AM Feature #2305: Moving rbd images between pools
- Not quite; copy works, but slowly (because of course it's duplicating all the data). I don't know if mv/rename could...
- 11:24 AM Feature #2305 (Rejected): Moving rbd images between pools
- We discovered it does work if you keep the image names the same and vary the pool names. :)
- 11:01 AM Feature #2305 (Resolved): Moving rbd images between pools
- It would be nice to have an option to move rbd's between pools with a syntax like:
rbd mv <first poolname>/<image na... - 10:02 AM Messengers Cleanup #2150 (In Progress): repair the Simple/Messenger interface
- Not really done! ;)
- 08:44 AM rbd Feature #2297: ObjectCacher: mark buffers mergeable for ksm
- I'm really not sure this is something we want to do, especially unconditionally. Let's wait until we get some idea of...
- 07:27 AM Bug #2304 (Resolved): rbd import fails on block device
- root@burnupi30:~# rbd import /dev/sda burnupi30.sda
fiemap ioctl() failed
Importing image: 100% complete...done.
...
04/15/2012
- 08:30 PM Bug #2303 (Can't reproduce): osd: failed to peer on startup
- ubuntu@teuthology:/a/nightly_coverage_2012-04-14-b/994
- 08:24 PM Linux kernel client Bug #2302 (Can't reproduce): xfs: warning at mutex_remove_waiter
- ...
- 03:33 PM Feature #1044 (In Progress): librbd: discard support
- 03:33 PM Feature #2163 (Resolved): qa: full xfstests on rbd
- 03:33 PM Subtask #2249 (Resolved): teuthology task (3)
- 03:33 PM Feature #2226 (Resolved): osd: better filestore idempotency test
- 05:39 AM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
- Here some more info from the crash:
@[58113.180039] libceph: tid 387083 timed out on osd92, will reset osd
[5818...
04/14/2012
- 08:07 PM CephFS Bug #2299: all MDS commit suicide on startup
- after i told osd.0 to get lost and reformatted it, the cluster started resyncing.
then (magically) mds.0 started up ... - 09:39 AM CephFS Bug #2299 (Rejected): all MDS commit suicide on startup
- my setup is: 1 MON, 2 MDS and 4 OSD.
ceph version is commit:1e76a8713feac6883c648512dcdc28c83f7ff69e.
after copyi... - 04:41 PM Bug #2301: librados: LibRadosMisc.AioOperatePP failure
- the problem is that the completion callback is now async, but wait_for_complete() is not.
do we think that is ok?
- 02:59 PM Bug #2301 (Resolved): librados: LibRadosMisc.AioOperatePP failure
- 2012-04-14T00:11:00.763 INFO:teuthology.task.workunit.client.0.out:[ RUN ] LibRadosMisc.AioOperatePP
2012-04-14... - 01:58 PM Bug #2300 (Rejected): objecter: not sending stat request
- 01:50 PM Bug #2300: objecter: not sending stat request
- Ah, actually we try to access an object with empty oid, which is obviously wrong. Probably due to #2289 issues.
- 12:34 PM Bug #2300 (Rejected): objecter: not sending stat request
- Happens in rgw (can only see it on congress). Following a rgw.bucket_list call response, we call librados io_ctx->sta...
- 12:38 PM rgw Bug #2289: rgw: listing a bucket hangs after removing inexisting object
- Pushed several fixes to wip-2289. The scenario was:
creating bucket
trying to remove object that does not exist
...
04/13/2012
- 11:03 PM Feature #1044 (Resolved): librbd: discard support
- 11:03 PM Feature #2163: qa: full xfstests on rbd
- 11:02 PM Feature #2052 (Resolved): librbd: caching
- 06:37 PM Feature #2052: librbd: caching
- This is passing long-running fsx with osd thrashing consistently, and all the other rbd tests. I think the branch (wi...
- 10:40 PM Linux kernel client Bug #2298 (Resolved): rbd: broken encode_op for big-endian hosts?
- ...
- 10:17 PM Subtask #2249: teuthology task (3)
- 09:26 PM Subtask #2237 (Resolved): failure+replay tester (8)
- 06:39 PM Bug #2278 (Resolved): librados: python read has arguments swapped
- Fixed by 76799680546a79fc73ad7bbc58960a31ae2290ad.
- 10:10 AM Bug #2278: librados: python read has arguments swapped
- 07:56 AM Bug #2278 (Resolved): librados: python read has arguments swapped
- Object.read from rados.py is passing arguments to ioctx.read in a wrong order.
--- rados.py.dist 2012-04-13 16:5... - 06:38 PM rbd Feature #2297 (New): ObjectCacher: mark buffers mergeable for ksm
- This is done with a simple madvise call, but we should test that it works with ksm and verify that all the buffers ar...
- 06:29 PM Feature #2296 (Resolved): librbd: allow resizing to arbitrary sizes
- Right now resizing to a non-object-size multiple will round down the remainder. With discard support, we support this...
- 06:25 PM Feature #2295 (Resolved): make qemu cache=writeback,writethrough option turn on librbd caching
- This will enable more familiar use of caching with qemu/rbd, and let people configure it with libvirt's existing xml.
- 05:51 PM rbd Feature #2294 (New): librbd: optionally cache entire objects, instead of only requesting the part...
- This may save many round trips for small read sizes (common to vms).
- 05:49 PM Feature #2113 (Fix Under Review): objectcacher perfcounters
- Okay, I checked and these work — if you run ceph-fuse -f and play around you can do a dump_perfcounters and see the v...
- 05:46 PM CephFS Bug #2293 (Resolved): admin sockets don't persist with ceph-fuse
- It looks like the admin socket is associated with the launching process, rather than the background process that cont...
- 05:46 PM rbd Feature #2292 (New): ObjectCacher: support sparse objects
- The ObjectCacher doesn't store which objects or parts of objects don't exist. This info could improve read performance.
- 05:46 PM Bug #2291 (Can't reproduce): objectcacher perfcounters don't work with test_librbd_fsx
- The admin socket perfcounters_dump command only outputs objecter data. I'm speculating that it has to do with the obj...
- 05:30 PM Feature #2290 (Resolved): ObjectCacher: handle read/write errors
- Currently the return value of the underlying read/write calls is ignored (I left TODO notes there). We should figure ...
- 05:02 PM rgw Bug #2289 (Resolved): rgw: listing a bucket hangs after removing inexisting object
- 03:12 PM Subtask #2235 (Resolved): generate deterministic sequence of transactions (5)
- 02:01 PM CephFS Bug #2288 (Resolved): libcephfs: setxattr returns EEXIST following removexattr
- running cephtest a couple of time (out of wip-testlibcephfs):...
- 01:48 PM Linux kernel client Bug #2287 (Resolved): rbd: crashes with 10Gbit network and fio
- From http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/5968:...
- 01:47 PM Bug #2286 (Resolved): mon: different full/near_full values on different monitors
- If you run vstart, you get...
- 01:39 PM CephFS Bug #2285 (Resolved): libcephfs: failure with empty name components
- the following in client/testceph.cc fails:
// test empty name components
my_fd = ret = ceph_open(cmount, "rea... - 11:19 AM rgw Feature #2284 (Resolved): rgw: bench based on rados_bench
- 11:17 AM rgw Feature #2171 (Rejected): rgw: asynchronously calculate md5
- 11:16 AM Feature #2283: The ceph command should time out
- 10:21 AM Feature #2283 (New): The ceph command should time out
- When using ceph to query certain parts of the cluster, there should be an option to time out after a certain set numb...
- 09:44 AM Subtask #2282 (Resolved): Handle map updates on a per-pg basis
- Currently, we advance all pgs to the next map at once. This requires us to flush the filestore queue and basically h...
- 09:27 AM Feature #2281 (Resolved): build big burnupi cluster for testing
- 09:23 AM Feature #2280 (Resolved): improve gitbuilder infrastructure
- * do not fill up local disk; sync results out immediately
* resolve branches immediately, not after each full pass
? - 09:20 AM rbd Feature #2279 (Resolved): rbd: trivial layering design doc
- - how parent images are marked read-only
- how parent/child relationship is represented
- possibly how this allow... - 09:16 AM Bug #2192 (Need More Info): ceph-mon hangs consuming 100% CPU
- 09:14 AM Feature #2246 (Resolved): force10s on sepia
- 09:13 AM Messengers Cleanup #2150 (Resolved): repair the Simple/Messenger interface
- 09:13 AM Feature #2240 (Resolved): osd: new default locations
04/12/2012
- 11:17 PM Subtask #2237 (In Progress): failure+replay tester (8)
- 11:17 PM Subtask #2235: generate deterministic sequence of transactions (5)
- 11:15 PM Feature #2240: osd: new default locations
- 10:58 PM CephFS Bug #2277 (New): qa: flock test broken
- ubuntu@teuthology:/a/nightly_coverage_2012-04-12-b/687
ubuntu@teuthology:/a/nightly_coverage_2012-04-11-b/525
thi... - 10:48 PM CephFS Bug #1737: ceph-fuse crash in xlist::remove
- ubuntu@teuthology:/a/nightly_coverage_2012-04-12-b/717
- chef: null
- ceph: null
- ceph-fuse: null
- workunit:
... - 10:45 PM CephFS Bug #2187: pjd chown/00.t failed test 97
- 2012-04-12T13:09:27.496 INFO:teuthology.task.workunit.client.0.out:../pjd-fstest-20080816/tests/chown/00.t (Wstat: ...
- 10:35 PM Bug #2276 (Rejected): osd: eat cpu on restart
- osd.856 on congress.
- 09:35 PM Bug #2275 (Resolved): osd: crash in FileJournal::wrap_read_bl
- ...
- 04:29 PM Documentation #2274 (Closed): Basic Availability Model
- (1) Construct a continuous-time markov availability model for a basic cluster (3 mons, 4 osds, 2 copy)
(Petri ne... - 04:19 PM Documentation #2273 (Closed): basic reliability models
- 1. construct a probabilistic model for data loss in 1, 2, and 3 copy systems, assuming independent failures
2. plug ... - 04:13 PM RADOS Documentation #2272 (Closed): FAQs: RADOS reliability and availability
- I expect others to improve this, but this is just to capture the ideas.
It is probably more of a white paper than an... - 04:06 PM Documentation #2271 (Resolved): FAQ: BTRFS vs XFS
- I expect others to improve this list, but to start it out ...
what file systems we run on (and test on)
how you... - 12:15 PM Feature #2223 (Resolved): Tracing facility on FileStore
- 09:05 AM RADOS Feature #2268 (Resolved): crush: update item's position in crush map
- via crushtool and 'ceph osd crush ...'
- 03:55 AM Bug #2267 (Closed): Ceph client crashed after shutting down one mds and osd
- Ceph version: 0.44.1-1~bpo70+1
Kernel version: 3.2.12-1
Ceph config:
[global]
auth supported = cephx
keyri...
04/11/2012
- 06:18 PM Messengers Cleanup #2150 (In Progress): repair the Simple/Messenger interface
- I haven't done it, but I had enough time to glance over it and see at least a couple things that need fixing before t...
- 05:49 PM Feature #2113: objectcacher perfcounters
- Sage asked me to run it under an rbd mount and look at it. Need to get tests from Josh and then figure out how to do ...
- 04:30 PM Feature #2113 (Fix Under Review): objectcacher perfcounters
- Compile-tested.
- 10:51 AM Feature #2113 (In Progress): objectcacher perfcounters
- Yoink.
- 04:30 PM Bug #2266 (Resolved): teuthology: nuke after failure is failing
- it fails, and then fails to unlock, and eats up machines.
for example, ubuntu@teuthology:/a/nightly_coverage_2012-... - 03:08 PM Feature #2265 (Rejected): make sure objecter/kclient error out when localized pgs don't exist
- 11:02 AM Bug #2264 (Can't reproduce): mon: failed assert in bump_epoch
- During startup of a teuthology run on commit 1775301bb46379648f3f88914ef56aa1982db020 (before the cluster was healthy...
- 10:48 AM Bug #2263 (Resolved): obsync: move man page to section 1
- 09:25 AM Bug #2262 (Resolved): qa: osd-recovery tasks fails on flush_pg_stats
- consistently
- 08:09 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
- Looks like the problem arose while running fsstress on the xfs loop
mount on top of a file on the ext2 filesystem.
... - 07:56 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
- FYI, xfstests 49 tests running XFS on a loop device. I have to wait for a
reboot in order to see if I can tell at w... - 07:49 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
- Looks like xfstests #49 is a reproducer for this problem, at least
after running the tests that lead up to it first ... - 05:29 AM Linux kernel client Bug #2261 (In Progress): paging error in libceph after crashed osd comes back online
- 05:22 AM Linux kernel client Bug #2261 (Can't reproduce): paging error in libceph after crashed osd comes back online
- ...
- 02:25 AM Bug #2178: rbd: corruption of first block
- Well Sage,
I have a torture-test already :-D
OK, so it's independent from yours and that's good. It sounds, we ar...
04/10/2012
- 11:24 PM Feature #2223: Tracing facility on FileStore
- did some cleanup, changed the way the output is structured wrt the transaction lists, and tweaked a few other things....
- 10:23 PM Bug #2002 (Resolved): osd: racy push/pull for clones
- 10:19 PM Bug #2161 (Resolved): nonlinear scaling for PGMap::pg_stat encode
- commit:bd518e998c0ff12d611db19a8cff6da3622597cb
- 10:18 PM Bug #1953 (Resolved): teuthology: core files aren't archived when using valgrind
- it works!
- 10:10 PM Bug #2225 (Resolved): gitbuilder.ceph.com returning 503: Service Temporarily Unavailable.
- Yehuda found the bad apache option.. override it in the domain_service (maxconnperip=1000 param)
- 09:49 PM Messengers Cleanup #2150 (Resolved): repair the Simple/Messenger interface
- 09:49 PM Feature #1044 (Fix Under Review): librbd: discard support
- 09:04 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
- I'm going to have to look at this again in the morning, but I think
we're in this block of code:
#ifdef CONFIG_BL... - 08:37 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
- Here's a disassembled block of the code where the fault occurred.
The address listed corresponds to offset 3468 belo... - 08:10 PM Linux kernel client Bug #2260 (Resolved): libceph: null pointer dereference at try_write+0x638+0xfb0
- It's not an exact match but it's close enough that I wanted to reopen
bug 1793 or 1866, but found myself unable to. ... - 03:27 PM Feature #2246: force10s on sepia
- Fabric brought up by Networking group. Interfaces up, configured, and working (nuttcp shows 9.5GB/s or so with
defa... - 01:26 PM Feature #2111: msgr workloads
- I think the messenger tester may be at a point where we can call this bug satisfied.
- 01:18 PM Bug #2178: rbd: corruption of first block
- the good news is i see the problem. the bad news is its the exact bug we thought we fixed. the other good news is w...
- 07:38 AM Bug #2178: rbd: corruption of first block
- Hi Sage,
just in case, the reply from yesterday did not reach you:
--- 8-< ---
Good morning,
it's already... - 12:27 PM Feature #2258 (Resolved): use external leveldb package
- autoconf lets you use the installed library. not doing so by default to avoid the pain of building on older distros.
04/09/2012
- 04:30 PM rgw Bug #2259 (Resolved): rgw: object name cut after slash when virtual host style is used
- Fixed, commit:8d5c87a86e070b4e95ef0d58a469bdbbef4a826c.
- 03:42 PM rgw Bug #2259 (Resolved): rgw: object name cut after slash when virtual host style is used
- 09:32 AM Bug #2178: rbd: corruption of first block
- The missing piece of information is mapping the file offset to a block device offset. Can you, inside the VM,...
04/08/2012
- 09:53 PM Feature #2258 (Resolved): use external leveldb package
- - make our configure take/require a --with-system-leveldb or similar to not use the bundled leveldb
- update the deb... - 08:31 AM Bug #2178: rbd: corruption of first block
- Hi Sage and *Happy easter*,
yesterday I had some "luck" after 10 tries....
Here is what I have for you:
first ...
04/06/2012
- 09:27 PM Feature #1692 (Duplicate): librbd: Support TRIM (hole punching) (userspace client)
- dup of #1044
- 03:47 PM rgw Feature #2257 (Rejected): rgw: detect fastcgi module 100-continue support automatically
- The current default that is used doesn't work with vanilla fastcgi module. It'd be great if that could be set automat...
- 02:46 PM rbd Feature #2256 (Resolved): rbd: parallelize deletions
- There are a few places where we delete things one at a time: resizing to a smaller size, deleting all snapshots, and ...
- 02:04 PM Feature #2240 (Fix Under Review): osd: new default locations
- wip-defaults
- 12:05 PM Bug #2161: nonlinear scaling for PGMap::pg_stat encode
- wip-encoding
- 09:18 AM Bug #2161: nonlinear scaling for PGMap::pg_stat encode
- Ake van der Meer wrote:
> My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pasteb... - 08:25 AM Bug #2161: nonlinear scaling for PGMap::pg_stat encode
- My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pastebin.com/wYnPKWeJ
In src/i... - 10:05 AM Feature #2246 (In Progress): force10s on sepia
- Ports being mapped yesterday and today in preparation for switch config review.
- 09:21 AM Bug #2255 (Resolved): osd: fix object name collisions between pools in temp collection
- 08:28 AM Feature #2223: Tracing facility on FileStore
- Made some changes to the ObjectStore.cc, regarding code duplication of the transaction's dump methods. Feedback would...
04/05/2012
- 02:21 PM Feature #2248 (Resolved): cluster naming
- 02:20 PM Subtask #2236 (Resolved): filestore failure injection (3)
- wip-filestore-failure
I don't think enumerating/identifying the callers is needed here. For the idempotency teste... - 01:19 PM Feature #2226: osd: better filestore idempotency test
- Thought about the a bit more. The filestore failure injection is easiest to implement with an _exit(1) or something,...
- 01:13 PM Feature #1890 (Resolved): log: async log writeout
- 01:13 PM Feature #1889 (Resolved): log: structure log records
- 12:30 PM Feature #2254 (Resolved): doc: cephx
- pending improved documentation:
* was is, is not protected
* how to convert/upgrade a non-cephx cluster to cephx (e... - 12:22 PM Subtask #2235 (In Progress): generate deterministic sequence of transactions (5)
- 10:51 AM Bug #2178: rbd: corruption of first block
- Ok, my attempts to parse the log to find out of order replies is quickly snowballing. (complexity of dropped replies...
- 08:21 AM Bug #2178: rbd: corruption of first block
- Oliver Francke wrote:
> Uhm...
>
> ... I thought, we were talking about the same issue since the very beginning..... - 01:25 AM Bug #2178: rbd: corruption of first block
- Uhm...
... I thought, we were talking about the same issue since the very beginning... corruption of .rbd-blocks.....
04/04/2012
- 11:11 PM Feature #2248 (Fix Under Review): cluster naming
- 11:00 AM Feature #2248: cluster naming
- - new ocmmand line arg (-C, --cluster)
- controls default config files
- becomes another subst ($cluster) to be use... - 10:38 AM Feature #2248 (Resolved): cluster naming
- 04:09 PM Bug #2253 (Resolved): rados import: uploaded objects are empty
- Fixed, commit:0df6fbd3a66741ad02c7556b0c4026dc3577d797.
- 03:37 PM Bug #2253 (Resolved): rados import: uploaded objects are empty
- 03:33 PM rgw Documentation #1813: doc: document radosgw api diffs with s3
- We'd like to have it for the current sprint, or at least no later than the next sprint. 5/1 as an upperbound target d...
- 12:45 PM Bug #2233: Throttle when there are lots of large conccurent IOs
- Yeah, it's the failing gracefully bit that I'm interested in. :)
- 12:38 PM Bug #2233: Throttle when there are lots of large conccurent IOs
- Just the rados bench tool itself is allocating 16GB to feed into librados.
Now that you mention it, librados might... - 12:29 PM Bug #2233: Throttle when there are lots of large conccurent IOs
- Aha! The plana nodes appear to only have 8GB of ram and 8GB of swap.
Is the allocation of that memory part of libra... - 11:20 AM Linux kernel client Bug #2242: rbd: spinlock on wrong cpu
- OK, I think this problem arises because of the switch to a spinlock to
protect the client list. Doing so was the ri... - 09:53 AM Linux kernel client Bug #2242 (Resolved): rbd: spinlock on wrong cpu
- ...
- 11:19 AM Bug #2178: rbd: corruption of first block
- Oliver Francke wrote:
> Hi Sage,
>
> I was talking about the verbose logfiles from monday. TBH, I don't expect Ba... - 10:32 AM Bug #2178: rbd: corruption of first block
- Hi Sage,
I was talking about the verbose logfiles from monday. TBH, I don't expect BadThings without "rbd_writebac... - 09:49 AM Bug #2178: rbd: corruption of first block
- Oliver Francke wrote:
> Whew, that was fast,
>
> after second run I had some errors in one file with:
> [osd]
>... - 07:01 AM Bug #2178: rbd: corruption of first block
- Whew, that was fast,
after second run I had some errors in one file with:
[osd]
filestore fiemap threshol... - 05:43 AM Bug #2178: rbd: corruption of first block
- Well Sage,
its harder these days to reproduce, cause I think the current version has made "something more stable"(... - 10:57 AM Feature #2252 (Resolved): rgw long run kernels
- 10:54 AM Feature #2251 (Resolved): rgw long run workloads
- 10:53 AM Feature #2250 (Resolved): rgw long run raid config
- 10:47 AM Subtask #2249 (Resolved): teuthology task (3)
- 10:35 AM Feature #2246 (Resolved): force10s on sepia
- 10:32 AM Feature #2245 (Resolved): rgw long run ceph install
- 10:29 AM Messengers Feature #2244 (New): msgr: performance tester
- 09:54 AM Linux kernel client Bug #2243 (Resolved): btrfs: warning in orphan_commit_root
- 2012-04-04T01:02:59.191518-07:00 plana32 kernel: [ 8815.371555] ------------[ cut here ]------------
2012-04-04T01:0... - 09:45 AM Feature #2241 (Rejected): upstart
- 09:45 AM Feature #2240 (Resolved): osd: new default locations
- 09:42 AM Subtask #2239 (New): install + configure package everywhere
- chef!
- 09:42 AM Subtask #2238 (Rejected): vm for coredump archive
- 09:41 AM Subtask #2237 (Resolved): failure+replay tester (8)
- 09:39 AM Subtask #2236 (Resolved): filestore failure injection (3)
- add a hook to operations that we want to potentially fail.
need to identify the caller so that the tester can pote... - 09:38 AM Subtask #2235 (Resolved): generate deterministic sequence of transactions (5)
- 09:22 AM Bug #2234 (Resolved): Sometimes 'ceph -s' is unable to show pg data and crashes
- ceph -s / ceph -w sometimes gives me output as below:...
- 09:15 AM CephFS Feature #1237: mds caps limit mount to some subdir
- Nope — as with all the other MDS stuff, this is currently not a priority.
- 07:10 AM CephFS Feature #1237: mds caps limit mount to some subdir
- Is there any progress on this issue?
04/03/2012
- 10:37 PM Messengers Bug #1674 (Need More Info): daemons crash when sent random data
- FWIW I was unable to reproduce this with the current code, with or without cephx enabled.
- 10:07 PM Bug #1627 (Can't reproduce): ceph-mon memleak if ceph-osd cluster ip is not reachable, but public...
- 04:52 PM rgw Bug #1681: rgw: user rm with --purge doesn't remove data
- Maybe we should disallow removal of user that has data? We can suspend it instead.
- 03:57 PM Bug #1921 (Resolved): teuthology: silently continues when len(targets) != len(roles)
- 02:43 PM Feature #2226: osd: better filestore idempotency test
- 02:32 PM Documentation #2175 (Resolved): doc: fix doc build errors
- got this to yellow (only warnnings), yay!
- 01:39 PM Feature #1890: log: async log writeout
- 01:39 PM Feature #1889: log: structure log records
- 10:45 AM Feature #2134 (Resolved): qa: smoke suite
- 10:31 AM Bug #2178: rbd: corruption of first block
- Hi Oliver,
I have two things to try:
- 'rbd writeback window = 0'. I know it's not what you want to run, but t... - 10:29 AM Bug #2233: Throttle when there are lots of large conccurent IOs
- That is 16GB of RAM being allocated and used — I don't remember what hardware these are running on and have no idea w...
- 09:47 AM Bug #2233 (Won't Fix): Throttle when there are lots of large conccurent IOs
- When sending large amounts of data via a single client (ie 256 concurrent 64MB IOs) we can hit a bad_alloc on the cli...
- 09:15 AM Cleanup #2191 (Resolved): reexamine simple_spinlock
- 08:51 AM Feature #2087 (Resolved): lightweight filestore workload generator
04/02/2012
- 02:30 PM rgw Bug #1853 (Resolved): rgw: qa test to verify bucket recreation does not override bucket
- Implemented, commit:1551c5b08714b415c49fc759002b7c6a6d4d611a.
- 01:26 PM rgw Bug #1856 (Resolved): It is possible to look up an rgw user by a subuser that does not exist as l...
- Fixed, commit:addc744692f60885a747c4531cd12bf19b3a7f2a.
- 11:15 AM rgw Feature #2171: rgw: asynchronously calculate md5
- Thinking about it some more, it's probably not the best use of time and effort. We initiate the md5 calculation after...
- 08:29 AM Bug #2178: rbd: corruption of first block
- Hi Sage,
here we go again, with ceph-0.44.1-1-g41f84fa
One bad file with following infos:
20120402 171642.12...
04/01/2012
- 07:23 PM Bug #2221: Monitor setup bugs
- 2) ...
- 06:35 PM rbd Feature #2232: qemu: resize guest disk when rbd image is resized
- I tested this on Friday, and qemu rereads the size (at least when using virtio) when the guest requests it (i.e. echo...
- 04:21 PM rbd Feature #2232 (New): qemu: resize guest disk when rbd image is resized
- According to Christoph, this is probably just a matter of calling bdrv_truncate() with the new size. If that doesn't...
- 04:19 PM rbd Feature #2231 (Resolved): librbd: expose header change (resize?) via api
- we need a callback or something so that users (qemu) can be informed when the header changes. this will let them, sa...
03/31/2012
- 03:22 PM Feature #1655: gitbuilder aggregator page
- I took some inspiration from the updated aggregator script that is now at http://ceph.newdream.net/gitbuilder.cgi. I'...
03/30/2012
- 09:11 PM Cleanup #2230 (Resolved): deprecate 'btrfs devs'
- 09:00 PM rgw Feature #2229 (New): rgw: functional tests for rgw class
- A series of simple functional tests to verify the rgw class methods behave as they should.
- 08:58 PM Bug #2148 (Resolved): osd: class error return not propagated to client
- commit:f8a53869f6db4c76516ee525f00f87f930920692
- 05:27 PM Bug #2221: Monitor setup bugs
- (1) is a problem due to options parsing collisions...fixed!
(2) is directly contradicted by my testing...?
(3) I ne... - 04:25 PM Bug #2026 (Can't reproduce): osd: ceph::HeartbeatMap::check_touch_file
- 04:25 PM Bug #2045 (Can't reproduce): osd: dout_lock deadlock
- haven't seen this in a while.
also, this code is about to go away anyway with wip-log. - 04:16 PM Bug #2102 (Can't reproduce): osd: pg stuck in backfill
- 04:15 PM Bug #2102 (Duplicate): osd: pg stuck in backfill
- 04:14 PM Bug #2002: osd: racy push/pull for clones
- i take that back; this wasn't enabled in qa. adding to the teuthology ceph.conf file.
- 04:12 PM Bug #2002 (Resolved): osd: racy push/pull for clones
- haven't seen this in forever; looks fixed.
- 04:11 PM Bug #2209 (Resolved): osd: read kb stats not tracked?
- commit:aa31035e555129e56888320b84f16264f28bd7df
- 03:59 PM Bug #2116 (Resolved): Repeated messages of "heartbeat_check: no heartbeat from"
- fixed by commit:374bef9c97266600b4c6b83100485d7250363213
- 03:59 PM Bug #2165 (Resolved): osd: recovering ending with missing
- fixed with merge of commit:75e3b9b309e5365975e3e5855c065bd4fe28b64c
- 03:58 PM Bug #2178: rbd: corruption of first block
- 02:51 PM Bug #2178: rbd: corruption of first block
- Please build the current git stable branch, which includes 41f84fac1ae4b4c72bf9bfe07614c4066c916fd1. The version sho...
- 07:35 AM Bug #2178: rbd: corruption of first block
- Here the remaining timestamps from the other VM's with bad blocks:
VM-2:
20120330 105139.579830 filling block 171... - 07:12 AM Bug #2178: rbd: corruption of first block
- Hi *,
I needed a couple of runs, but managed now to provide some 81MiB/97MiB osd.X.log-files, where in between sh.... - 03:58 PM Bug #2164 (Resolved): osd: scrub missing _, snapset attrs
- commit:41f84fac1ae4b4c72bf9bfe07614c4066c916fd1
- 12:50 PM Feature #2227 (Closed): QA: create a test to verify operation with non-default layouts
- I submitted a patch that modified ceph_calc_file_object_mapping()
in the ceph client, and when reviewing it Sage poi... - 09:53 AM Feature #2226 (Resolved): osd: better filestore idempotency test
- ...
- 12:35 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- I think I can be optimistic :)...
03/29/2012
- 10:06 PM Bug #2178: rbd: corruption of first block
- Okay, I suspect this is actually bug #2164, which was causing the _ xattr to get lost when ceph-osd restarts on non-b...
- 09:52 PM Bug #2225 (Resolved): gitbuilder.ceph.com returning 503: Service Temporarily Unavailable.
- I can't find any 503 in the apache logs on this machine. Could it be on the client side?
- 09:48 PM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- Well, I fixed one problem, but I can't see how it could have resulted in the log you posted.
Pushed a few more pat... - 11:36 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- I collected logs from 4 OSDs, they can be downloaded at: http://logger.ceph.widodh.nl/ceph/issues/2212/
At 10:13 t... - 09:21 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- Der.. do you have a log you can attach/post?
- 02:59 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- I reverted the extra debugging for the heartbeat stuff, but that didn't seem to consume all the CPU time.
The load... - 01:40 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- I just installted the code on my cluster and things do not seem to behave yet.
The cluster is still jumping around... - 08:54 PM Linux kernel client Bug #1940 (Resolved): locking cycle in ceph_osdc_start_request
- commit:ab434b60ab07f8c44246b6fb0cddee436687a09a
- 07:53 PM Linux kernel client Bug #1793 (Can't reproduce): NULL pointer dereference at try_write+0x627/0x1060
- Marking this Can't Reproduce. Will reopen if it shows up again.
- 03:21 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
- Another 100 iterations of kernel_untar_build.sh using the current
master branch (c666601a935b94cc0f3310339411b6940de... - 07:51 AM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
- Bugs 1793 and 2081 have a signature of a page fault/bad memory reference
from process_one_work() -> con_work(), and ... - 07:53 PM Linux kernel client Bug #2069 (Can't reproduce): client crash during kernel_untar_build rm -r step
- I just finished at least 150 iterations of kernel_untar.sh and never
hit this using the current master branch of cep... - 07:51 PM Linux kernel client Bug #2081 (Can't reproduce): msgr: spinlock badness?
- Marking this Can't Reproduce. Will reopen if it happens again.
- 07:43 PM Linux kernel client Bug #2081: msgr: spinlock badness?
- Another 100 iterations of kernel_untar_build.sh using the current
master branch (c666601a935b94cc0f3310339411b6940de... - 07:51 AM Linux kernel client Bug #2081 (Need More Info): msgr: spinlock badness?
- Bugs 1793 and 2081 have a signature of a page fault/bad memory reference
from process_one_work() -> con_work(), and ... - 07:50 PM Linux kernel client Bug #2174 (Can't reproduce): rbd: iozone thrashing failure
- OK, I'll go ahead and state that I can't reproduce this...
- 07:46 PM Linux kernel client Bug #2174: rbd: iozone thrashing failure
- Status was Verified. Changing it to Need More Info because I can't even
seem to reproduce it at this point. (I sup... - 07:44 PM Linux kernel client Bug #2174: rbd: iozone thrashing failure
- Another 12 iterations of suites/iozone.sh using the current
master branch (c666601a935b94cc0f3310339411b6940de751ba)... - 07:59 AM Linux kernel client Bug #2174: rbd: iozone thrashing failure
- I don't know whether we've adequately captured the signature or symptoms
of this problem. I believe though that it ... - 07:20 AM Linux kernel client Bug #2174: rbd: iozone thrashing failure
- I have been trying to reproduce this using the latest testing/master/for-linus
branch (they're the same right now) a... - 09:27 AM Linux kernel client Bug #2224 (Rejected): Oops in __cfh_to_dentry
- I setup an HA pair of NFS servers which re-export Ceph to NFS clients.
The HA pair is in active/standby mode, using... - 07:42 AM Feature #2087: lightweight filestore workload generator
- Memory leak fixed.
Apparently, the FileStore does not cleanup after transactions once they are applied, which may ... - 06:21 AM Feature #2087 (In Progress): lightweight filestore workload generator
- Looks like some memory should be leaking bad, such that valgrind hangs on exit.
==19080==
==19080== HEAP SUMMARY... - 07:24 AM Linux kernel client Bug #2064 (Resolved): ceph-client: messenger: nocrc flag not implemented correctly
- Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved. - 07:12 AM Linux kernel client Bug #2157 (Resolved): ceph: xattr: fix nanosecond display on i_rctime
- Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved. - 07:12 AM Linux kernel client Bug #2156 (Resolved): ceph: xattr: fix a possible buffer overrun bug
- Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved. - 07:11 AM Linux kernel client Bug #2155 (Resolved): ceph: xattr: wrong value assumed for "no preferred PG"
- Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved. - 05:56 AM Feature #2223 (Resolved): Tracing facility on FileStore
- Allow a user to specify a file onto which log the transactions that come through OSDs' FileStores.
This should all...
03/28/2012
- 11:12 PM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- Ah, I see the bug now. Pushed a fix to wip-osd-hb, thanks!
Let us know if this behaves for you.. if so I'll pull ... - 04:23 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- It's quite large (222MB), so I uploaded the file, available at: http://logger.ceph.widodh.nl/ceph/osd.1.log_27-03-201...
- 10:51 PM Bug #2165: osd: recovering ending with missing
- see wip-osd-recovery-sources
- 10:46 PM CephFS Bug #1811: 2 pjd chown tests failed on cfuse
- ...
- 03:21 PM Feature #2222: osd: distinguish between 'degraded' and 'misplaced'
- We should pick a designator that doesn't make it sound like the objects are lost.
- 02:27 PM Feature #2222 (Resolved): osd: distinguish between 'degraded' and 'misplaced'
- normal data migration happens with a acting set > the up set, so that we never drop below N replicas, but we still ca...
- 02:45 PM Feature #2087: lightweight filestore workload generator
- 02:07 PM Bug #2221 (Resolved): Monitor setup bugs
- Carl reported several configuration issues when creating new monitors (based on the instructions at http://ceph.newdr...
- 08:35 AM rgw Bug #2220 (Resolved): rgw: librgw dep on g_ceph_context
- Fixed, commit:18d219e512a8e0f427a2229a71e15869cac3b593.
- 07:16 AM rgw Bug #2220 (Resolved): rgw: librgw dep on g_ceph_context
- from last night's qa,...
- 04:37 AM Bug #2219: OSD's commit suicide with 0.44
- I accidentally removed the core file(s) :(
Hope this one pops up again so I have a core file. - 04:11 AM Linux kernel client Tasks #2138: rbd: run xfstests on a local XFS filesystem over RBD
- After setting up two rbd devices and making some fairly simple changes
to xfstests, then setting up appropriate envi... - 04:04 AM Linux kernel client Bug #2155: ceph: xattr: wrong value assumed for "no preferred PG"
- This got rebased: 3489b42a72a41d477665ab37f196ae9257180abb
This has been sent as part of a pull request to Linus ... - 04:04 AM Linux kernel client Bug #2156: ceph: xattr: fix a possible buffer overrun bug
- This got rebased: 3489b42a72a41d477665ab37f196ae9257180abb
This has been sent as part of a pull request to Linus ... - 04:03 AM Linux kernel client Bug #2157: ceph: xattr: fix nanosecond display on i_rctime
- This got rebased: 3489b42a72a41d477665ab37f196ae9257180abb
This has been sent as part of a pull request to Linus ... - 04:01 AM Linux kernel client Bug #2064: ceph-client: messenger: nocrc flag not implemented correctly
- It got rebased once more, and this should be the last:
37675b0f42a8f7699c3602350d1c3b2a1698a3d3
This has been s... - 03:52 AM Bug #2178: rbd: corruption of first block
- Hi,
I decided to upgrade to "latest-n-greatest" in the test-cluster, to make sure, that if I hit the error again w...
03/27/2012
- 06:31 PM CephFS Bug #2218: CephFS "mismatch between child accounted_rstats and my rstats!"
- The MDS log is at https://matthew.royhousehold.net/mds.a.log.1.gz (1505MB, md5 197ef232d50d27e2b7c2f62370c9c6b6)
- 02:45 PM CephFS Bug #2218 (Need More Info): CephFS "mismatch between child accounted_rstats and my rstats!"
- There's not enough info in the attached log to figure out what happened. I can tell you that your home directory beli...
- 04:26 PM rgw Bug #2197 (Resolved): rgw: need to throttle incoming requests
- Fixed, commit:a52d048ac429c3d2b6a9286d96253308f6588762.
- 04:10 PM Bug #2178: rbd: corruption of first block
- The next step is to reproduce the corruption on the test cluster with logs:
debug osd = 20
debug ms = 1
debug... - 08:37 AM Bug #2178: rbd: corruption of first block
- Well,
one more comment:
my guess would be, it has todo something with expansion of the "sparse-file" while writin... - 05:24 AM Bug #2178: rbd: corruption of first block
- Good morning ;)
meanwhile I have not been lazy. I've managed - with current setup in test-cluster - to produce "in... - 04:07 PM Bug #2164: osd: scrub missing _, snapset attrs
- wip-2164
it's a problem with the collection_move guard (or lack thereof) - 03:40 PM rgw Bug #2208 (Resolved): rgw: radosgw-admin temp remove failure
- Fixed, merged at commit:93ba4c004a9269148a75b67da2522855cb1842a3.
- 02:19 PM Bug #2219 (Need More Info): OSD's commit suicide with 0.44
- Can you look at the core file and 'thread apply all bt'?
- 05:57 AM Bug #2219: OSD's commit suicide with 0.44
- ...
- 05:03 AM Bug #2219 (Can't reproduce): OSD's commit suicide with 0.44
- I noticed this myself today, but on IRC somebody else came along:...
- 02:03 PM Bug #2199 (Resolved): mon: get_bl osdmap_full/9583 No such file or directory
- Merged to master in commit:1814aac17593dee0fa4c774d5b462f277f6698da, reviewed by Sage — even though I forgot to add t...
- 12:25 PM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- Can you attach the full osd.1 log?
- 12:36 AM Bug #2211: osd: entity_inst_t OSDMap::get_inst(int) const
- Over night I saw 16 OSD's go down with the same backtrace.
All OSD's were running with debug ms/osd set to 1, this... - 09:07 AM Linux kernel client Bug #2174: rbd: iozone thrashing failure
- I've been off on other things, but this problem apparently recurred
even if the latest check-in (Josh's change) in p... - 08:38 AM CephFS Bug #2217: sync and O_DIRECT writes only write first extent in iov vector
- The code should not be written that way.
However I think it doesn't matter at this point, because the only caller
...
03/26/2012
- 06:24 PM CephFS Bug #2218 (Resolved): CephFS "mismatch between child accounted_rstats and my rstats!"
- The mismatch is detected at 2012-03-26 18:39:54.306661...
- 03:51 PM Bug #2192: ceph-mon hangs consuming 100% CPU
- It was reproduced all the time, for 0.44 also. After I adjusted cluster to have only one monitor problem has gone. (U...
- 02:44 PM CephFS Bug #2217 (Resolved): sync and O_DIRECT writes only write first extent in iov vector
- static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t po... - 01:34 PM Bug #2199 (Fix Under Review): mon: get_bl osdmap_full/9583 No such file or directory
- Re-pushed misc-fixes-for-review.
- 09:59 AM Bug #2199 (In Progress): mon: get_bl osdmap_full/9583 No such file or directory
- Sage pointed out the stash data structure isn't necessarily the same as the other stored data structures, so this nee...
- 12:47 PM Messengers Cleanup #2216 (Resolved): SimpleMessenger should make sure it owns passed-in Connections
- 10:50 AM Messengers Cleanup #2216 (Resolved): SimpleMessenger should make sure it owns passed-in Connections
- Otherwise we get weird issues like #2212.
- 12:38 PM Cleanup #2191: reexamine simple_spinlock
- my log branch drops this for the dout logging. the last user is the buffer.h debugging (enabled manually via a macro...
- 12:06 PM RADOS Bug #2047: crush: with a rack->host->device hierarchy, several down devices are likely to cause b...
- fwiw dropping the local search behavior fixes this bad behavior. the question is what probably was the local search ...
- 11:27 AM RADOS Bug #2047: crush: with a rack->host->device hierarchy, several down devices are likely to cause b...
- 11:27 AM Bug #2210 (Duplicate): osd: some PGs remains remapped or degraded
- this is actually a crush problem, see #2047.
- 09:45 AM Bug #2210: osd: some PGs remains remapped or degraded
- #2173 has some osd logs and related info for the same problem on a less clean cluster. Thanks for the detailed steps ...
- 10:36 AM CephFS Fix #2215 (Resolved): ceph-fuse does not invalidate page cache
- Right now the userspace client doesn't invalidate the page cache when it loses the cache capability on an inode. Appa...
- 09:58 AM Bug #2212 (Resolved): osd: FAILED assert(msgr->lock.is_locked())
- ah, i was using wrong msgr, fixing!
- 05:50 AM Bug #2212 (Resolved): osd: FAILED assert(msgr->lock.is_locked())
- With the new heartbeat code I noticed a couple of OSD's go down with:...
- 09:58 AM RADOS Bug #2214 (Resolved): crush: pgs only mapped to 2 devices with replication level 3
- This is from #2173. Note that all 3 osds are up....
- 09:43 AM Bug #2173 (Resolved): MDS crash when start with end of buffer
- 06:04 AM Feature #2213 (Resolved): rbd: shouldn't need config file to get help
- I just ran "rbd --help" on a pretty much un-configured machine and got:
global_init: unable to open config file.
... - 05:22 AM Bug #2211 (Resolved): osd: entity_inst_t OSDMap::get_inst(int) const
- While trying out the new heartbeat code I encountered this crash:...
03/25/2012
- 08:39 PM Bug #2173: MDS crash when start with end of buffer
- Shall we colse this bug, as the mds server was recovered by providing an empty session map and we can not reproduced ...
- 08:39 PM Bug #2210 (Duplicate): osd: some PGs remains remapped or degraded
- Some PGs remains 'remapped' or 'degraded' status after adding an osd server.
The steps to re-produce the bugs:
1.... - 09:54 AM Feature #2087: lightweight filestore workload generator
- Pushed a new commit to [1], making the code compliant with the CodeStyle and with Sage's suggestions on github.
[1...
Also available in: Atom