Project

General

Profile

Activity

From 04/24/2012 to 05/23/2012

05/23/2012

07:05 PM Bug #2475 (Resolved): rbd.py can leave Image object in inconsistent state on failure to construct
Constructing an Image object with bad parameters (say, an nonexistent rbd image name) can leave the
resulting object...
Dan Mick
05:48 PM Documentation #2474 (Resolved): re-document using autobuilt branches
It seems John removed the docs for setting up autobuilt apt repos, the signing key etc. That's still needed. Anonymous
04:56 PM rgw Feature #2473 (Resolved): rgw: revisit operation logging
Sending append for each client operation is expensive. We can definitely find better solution. Yehuda Sadeh
04:49 PM CephFS Bug #733: cmds crash: mds/LogEvent.cc:88: FAILED assert(p.end())
We'll need a detailed log (and possibly access to the data that's causing the crash) to diagnose this. Can you turn o... Greg Farnum
02:25 PM CephFS Bug #733: cmds crash: mds/LogEvent.cc:88: FAILED assert(p.end())
here is a backtrace:
Core was generated by `/usr/bin/ceph-mds -i alpha --pid-file /var/run/ceph/mds.alpha.pid -c ...
Eric Dold
10:55 AM CephFS Bug #733: cmds crash: mds/LogEvent.cc:88: FAILED assert(p.end())
I get the same with v0.47.1:
0> 2012-05-23 19:50:20.105956 7f7c87482700 -1 mds/LogEvent.cc: In function 'stat...
Eric Dold
04:41 PM Feature #2405 (Resolved): osd: Make ceph-osd --mkfs idempotent
Sage Weil
04:41 PM Bug #2443 (Resolved): Anyone can list all keys, even with caps mon 'allow rwx' and not 'allow *'
commit:b5e7fdd3d7a413e03bb5bb43b689e06c9cd6ffd9 Sage Weil
04:20 PM Bug #2470 (Closed): cookbook: keyword "relase" in apt.rb causes wget to fail.
Fixed in ab51f4dcd69774411015548db46dc18c198e4181. Anonymous
03:15 PM Bug #2470 (Closed): cookbook: keyword "relase" in apt.rb causes wget to fail.
in ceph-cookbooks/ceph/recipes: the keyword "relase" should be "release". The wget fails in it's current form but co... Ken Franklin
04:08 PM Feature #2413: qa: Test co-existence of sysvinit and upstart, 3: upstart controlled
Branch to test is now available as "chef-3". Anonymous
04:08 PM Feature #2412: qa: Test co-existence of sysvinit and upstart, 2: sysvinit controlled, using /var/...
Branch to test is now available as "chef-3". Anonymous
04:08 PM Feature #2411: qa: Test co-existence of sysvinit and upstart, 1: sysvinit controlled, outside of ...
Branch to test is now available as "chef-3". Anonymous
04:05 PM Feature #2472: osd: add opaque 'class <name> <foo>' cap that class can interpret/enforce
allow class 'foo' bar
Should allow the class 'foo' to do the operation IF it is happy with 'bar'. That is, the cl...
Sage Weil
04:02 PM Feature #2472 (New): osd: add opaque 'class <name> <foo>' cap that class can interpret/enforce
Sage Weil
04:03 PM Feature #2471: osd: add prefix match to OSDCaps
"allow prefix 'foo' r" should allow 'r' access only to objects (and locator keys) that are prefixed by 'foo'. Sage Weil
04:02 PM Feature #2471 (Resolved): osd: add prefix match to OSDCaps
Sage Weil
02:17 PM Documentation #2271 (In Progress): FAQ: BTRFS vs XFS
http://ceph.com/docs/master/rec/filesystem/ still needs some info on how easy it is to change from XFS to btrfs. John Wilkins
12:59 PM Feature #2399 (Resolved): qa: haproxy + rgw + jenkins
Yehuda Sadeh
12:57 PM rgw Cleanup #2469 (Resolved): rgw: replace Formatter->dump_format(..., "%d", ...) with Formatter->dum...
Yehuda Sadeh
12:52 PM Feature #2426 (Resolved): precise packages for apache2, fastcgi
OK, I think the packages are workable now. Dan Mick
12:52 PM rbd Feature #2468 (Resolved): librbd: provide a way for a user to flush and invalidate the cache
This could be an admin socket command. This would make live migration work with older qemu when rbd caching is enabled. Josh Durgin
12:37 PM rbd Feature #2467 (Resolved): qemu: implement bdrv_invalidate_cache
This is used during live migration to clear librbd's cache on the destination host before starting the guest there. I... Josh Durgin
12:35 PM rbd Feature #2466 (Resolved): librbd: add invalidate_cache function to interface
Qemu requires this to make live migration work when caching is enabled. Josh Durgin
12:10 PM Bug #2219: OSD's commit suicide with 0.44
i added one node to my small test cluster. it has now totally three nodes. so rados is filling the new node.
the mac...
Eric Dold
11:29 AM Bug #2219: OSD's commit suicide with 0.44
Eric Dold wrote:
> dmesg looks ok to me.
was the system heavily loaded?
do you have a core file?
this basic...
Sage Weil
11:09 AM Bug #2219: OSD's commit suicide with 0.44
dmesg looks ok to me. Eric Dold
09:05 AM Bug #2219: OSD's commit suicide with 0.44
Eric Dold wrote:
> I just hit this with v0.47.1:
>
> 2012-05-23 13:33:08.958564 7fe61124d700 -1 common/Heartbeat...
Sage Weil
05:31 AM Bug #2219: OSD's commit suicide with 0.44
I just hit this with v0.47.1:
2012-05-23 13:33:08.958564 7fe61124d700 -1 common/HeartbeatMap.cc: In function 'boo...
Eric Dold
11:21 AM rgw Bug #2465: rgw: bad marker output when listing a bucket
Actually, it'll only happen with '%'. We were using formatter->dump_format() istead of formatter->dump_string(). Yehuda Sadeh
11:03 AM rgw Bug #2465 (Resolved): rgw: bad marker output when listing a bucket
When providing a marker that contains '%' (and possibly other characters that are url-escaped), the returned result c... Yehuda Sadeh
09:32 AM Bug #2464 (Resolved): osdmap: assert in get_inst()
INFO:teuthology.orchestra.run.err: ceph version 0.47.1-128-gc31ab04 (commit:c31ab04b572ef5b5cf622b33b92c9e17c7070662)... Sage Weil
09:17 AM Linux kernel client Bug #2389: rbd: hung xfstest 67
ubuntu@teuthology:/a/nightly_coverage_2012-05-22-b/3062 Sage Weil
09:16 AM CephFS Bug #2187: pjd chown/00.t failed test 97
2012-05-22T12:08:06.355 INFO:teuthology.task.workunit.client.0.out:Test Summary Report
2012-05-22T12:08:06.355 INFO:...
Sage Weil
07:56 AM Feature #2463 (Resolved): adminsocket: 'show_config' command
Sage Weil
01:50 AM Bug #2462 (Resolved): osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)
2012-05-23 06:16:37.080317 7f18f6012700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction&, pg_... Eric Dold

05/22/2012

10:03 PM Bug #2461 (Resolved): DBObjectMap is incompatible with collection_rename
Objects are stored using a (collection_name, object_name) prefix. When a collection is renamed from A to B, objects ... Samuel Just
09:41 PM Bug #2234 (Resolved): Sometimes 'ceph -s' is unable to show pg data and crashes
This code has all been replaced! Sage Weil
09:40 PM Bug #2391 (Resolved): librados docs bug
commit:c31ab04b572ef5b5cf622b33b92c9e17c7070662 Sage Weil
09:34 PM Bug #2379: Mon crash after start
If this happens again, can you grab a tarball of the mon data directory before fixing/restarting?
Also, if you cou...
Sage Weil
09:24 PM Bug #2443 (Fix Under Review): Anyone can list all keys, even with caps mon 'allow rwx' and not 'a...
see wip-mon-auth Sage Weil
07:33 PM Feature #2426: precise packages for apache2, fastcgi
Iiuc its apt and not dpkg that checks sigs. Creating and signing a repo and pointing apt at it is enough to generate... Sage Weil
07:27 PM Feature #2426: precise packages for apache2, fastcgi
So I've built the packages from the git repo versions (not source, but binary) and they install on my
precise deskto...
Dan Mick
05:22 PM Feature #2405 (Fix Under Review): osd: Make ceph-osd --mkfs idempotent
Sage Weil
05:22 PM Feature #2418 (Resolved): mon: Take mandatory initial quorum members from ceph.conf
Sage Weil
05:22 PM Feature #2419 (Resolved): mon: Take peer hints via admin socket
Sage Weil
05:06 PM Feature #2427 (Resolved): precise gitbuilder http
Sage Weil
10:38 AM Feature #2427: precise gitbuilder http
Fixed commit:0e4f131ebfd0dd43593d1d95b544318fb749ad53
lighttpd is now installed and configured
Dan Mick
04:55 PM rgw Bug #2027 (Can't reproduce): rgw -> apache miscommunication
Yehuda Sadeh
04:50 PM rgw Feature #1726 (Rejected): rgw: improve multipart upload performance
Actually, this is obsolete. Nowadays when we complete the upload we only create a new index, which we point at all th... Yehuda Sadeh
12:45 PM Bug #2454: "rbd info xyz" hanging forever sometimes
I notice that the pgid output on these lines don't match, even though they're using the same output function and the ... Greg Farnum
11:40 AM Bug #2454: "rbd info xyz" hanging forever sometimes
Some additional information:
I'm running Kernel 3.3.6 from kernel.org compiled two days ago from kernel.org on the t...
Simon Frerichs
11:27 AM Bug #2454: "rbd info xyz" hanging forever sometimes
I had some spare time to do some testing, today.
As mentioned on my initial post we're running KVM VPS on Ceph.
I r...
Simon Frerichs
11:01 AM Bug #2454: "rbd info xyz" hanging forever sometimes
Oh geeze, my bad! Greg Farnum
10:50 AM Bug #2454: "rbd info xyz" hanging forever sometimes
Greg, this has nothing to do with the kernel - it's the rbd command line tool. I can't seem to change the tracker bac... Josh Durgin
10:39 AM Bug #2454: "rbd info xyz" hanging forever sometimes
Argh, you're right. This looks like a problem with a CRUSH mismatch between the userspace and kernelspace implementat... Greg Farnum
12:36 AM Bug #2454: "rbd info xyz" hanging forever sometimes
I've attached rbd and osd log. The hanging rbd was started at 09:10:33 and ended at 09:12:18.
One thing came to my...
Simon Frerichs
12:02 PM rgw Feature #2460 (Rejected): rgw: support multiple ceph backends
Yehuda Sadeh
11:58 AM Feature #2459 (Rejected): admin socket: config reload
Currently, the only way to reload configuration is by restarting daemons. Yehuda Sadeh
09:27 AM Feature #2295 (Resolved): make qemu cache=writeback,writethrough option turn on librbd caching
Applied upstream, should be in qemu 1.2. Josh Durgin
08:18 AM Bug #2446 (Resolved): libceph: corrupt inc osdmap epoch 24630 off 702 (ffff88001e5d876c of ffff88...
Thanks for testing! Sage Weil
08:04 AM Bug #2446: libceph: corrupt inc osdmap epoch 24630 off 702 (ffff88001e5d876c of ffff88001e5d84ae-...
I have tested this patch for a couple hours today and there were no 'corrupt inc osdmap' messages. Thanks. Karol Jurak

05/21/2012

08:59 PM Cleanup #2458 (Resolved): filestore: backend abstraction
We should create a backend abstraction layer in the filestore. This layer will hold all the filesystem specific opera... Yehuda Sadeh
04:24 PM Bug #2454: "rbd info xyz" hanging forever sometimes
Yep, the OSD gets the message, acks it, and acks several following pings. Looks like some strange bug on the OSD side... Greg Farnum
02:48 PM rbd Bug #2457 (Resolved): libvirt: migration fails with rbd in 0.9.11 and 0.9.12
As reported on libvirt-users:
https://www.redhat.com/archives/libvirt-users/2012-May/msg00088.html
Looks like a...
Josh Durgin
11:16 AM Linux kernel client Bug #2395: kernel crash after unmap a rdb device while the cluster is down
I am unable to reproduce this with the current code. There was a lot of rbd setup/teardown code that got cleaned up ... Sage Weil
10:58 AM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
Danny Kukawka wrote:
> We used the attached patch to resolve the immediate problem.
That attachment didn't seem ...
Sage Weil
09:07 AM Bug #2446: libceph: corrupt inc osdmap epoch 24630 off 702 (ffff88001e5d876c of ffff88001e5d84ae-...
Aha, I see the bug. You can apply the following patch and the problem should go away:... Sage Weil
02:45 AM Bug #2446: libceph: corrupt inc osdmap epoch 24630 off 702 (ffff88001e5d876c of ffff88001e5d84ae-...
The monitors deleted older osdmaps from their mondata directories over the weekend, however I managed to reproduce th... Karol Jurak
04:12 AM Bug #2267: Ceph client crashed after shutting down one mds and osd
I fairly often see almost identical crashes. They're triggered by simply restarting an OSD.
Ceph version: 0.46
Ke...
Karol Jurak

05/20/2012

10:34 PM Bug #2454: "rbd info xyz" hanging forever sometimes
I wrote a little script which calls "rbd ls" and then loops through "rbd info $x" for every volume.
We've about 80 ...
Simon Frerichs
04:22 PM Bug #2454 (Need More Info): "rbd info xyz" hanging forever sometimes
How reproducible is this? If you can reproduce with 'debug ms = 20' on the client side, we can be sure it's the osd'... Sage Weil
09:18 AM Bug #2454 (Resolved): "rbd info xyz" hanging forever sometimes
We're running ceph with 3 mon and 21 osds to host about 80 KVM VMs.
Sometimes "rbd info" is hanging forever without ...
Simon Frerichs
09:01 PM Bug #2456 (Resolved): librbd: failed LibRBD.TestIOToSnapshot
... Sage Weil
05:09 PM Bug #2446 (Need More Info): libceph: corrupt inc osdmap epoch 24630 off 702 (ffff88001e5d876c of ...
Is the osdmap/24630 present on all monitors? Is it identical on all of them?
The attachment is 1386 bytes.
The d...
Sage Weil
03:03 PM Bug #2455 (Resolved): debian: lintian errors
commit:31102d317d7a091f49f9126a6df9087cde0d8118 Sage Weil
01:07 PM Bug #2455 (Resolved): debian: lintian errors
W: ceph-kdump-copy: binary-without-manpage usr/bin/ceph-kdump-copy
W: ceph-kdump-copy: init.d-script-missing-start /...
Sage Weil
02:57 PM Feature #2449 (Resolved): dho rsync email
commit:31102d317d7a091f49f9126a6df9087cde0d8118 Sage Weil
12:29 PM Linux kernel client Bug #2389: rbd: hung xfstest 67
again, ubuntu@teuthology:/a/nightly_coverage_2012-05-20-a/2523 Sage Weil

05/19/2012

06:11 PM Bug #2453 (Resolved): osd/OSD.h: 840: FAILED assert(last_scrub_pg.count(p))
... Sage Weil
05:06 PM Feature #2405: osd: Make ceph-osd --mkfs idempotent
Sage Weil
03:48 PM Bug #2452 (Resolved): filestore: running daemon check broken
fixed by commit:1314a00798ed4b7ef2f2686f0195c5c53c98c2ce Sage Weil
03:48 PM Bug #2452 (Resolved): filestore: running daemon check broken
the xattr checks in _detect_fs on fsid file break the fcntl lock.
broken by commit:f03dc34f7e2fc1707fa00339b917c0d...
Sage Weil
02:19 PM Bug #2448 (Resolved): osdmap: mapping doesn't work without encoding and decoding
Sage Weil
02:19 PM Bug #2448: osdmap: mapping doesn't work without encoding and decoding
fixed by commit:ba2488f238923199534d56a8a86df4e48c2ddd96 Sage Weil
10:39 AM Bug #2451 (Can't reproduce): qa: networking doesn't always start after reboot
... Sage Weil

05/18/2012

05:55 PM Feature #2450 (Resolved): dho git commit emails
Send git commit emails for dho branch to dho folk. Just need some git hook to send this to the dho list for the dho ... Sage Weil
05:54 PM Feature #2449 (Resolved): dho rsync email
Sage Weil
05:52 PM Bug #2420 (Resolved): ceph crash while under iogen load
yay! Sage Weil
04:49 PM Bug #2420: ceph crash while under iogen load
Updated the cluster with the kernel 3.3.0-ceph-00110-g1d4a9bf and ran iogen for 2 hrs, cleared the mounted file store... Ken Franklin
05:35 PM Linux kernel client Bug #2447 (Resolved): prepare_write_connect NULL pointer dereference
this was a bug in the new messenger refactor, pushed an updated commit:3da54776e2c0385c32d143fd497a7f40a88e29dd Sage Weil
05:00 PM Linux kernel client Bug #2447 (Resolved): prepare_write_connect NULL pointer dereference
... Sage Weil
05:09 PM Bug #2448 (Resolved): osdmap: mapping doesn't work without encoding and decoding
adamcrume on irc was having trouble with this sample program only mapping to the same two osds for all objects. encod... Josh Durgin
04:23 PM Feature #2418 (Fix Under Review): mon: Take mandatory initial quorum members from ceph.conf
Sage Weil
04:23 PM Feature #2419 (Fix Under Review): mon: Take peer hints via admin socket
Sage Weil
04:15 PM Feature #2423 (Resolved): gceph: remove it
Sage Weil
04:15 PM Feature #717 (Resolved): cephtool: make -s/-w use subscribe instead of paxos watch; deprecate pax...
Sage Weil
12:54 PM Feature #717: cephtool: make -s/-w use subscribe instead of paxos watch; deprecate paxos watch
Sage Weil wrote:
> The --watch-error etc. options don't seem to work.. otherwise this is basically ready. Pushed a ...
Joao Eduardo Luis
08:38 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
Hi Oliver,
The fix will be in 0.47, which should be out in the next couple days. Glad to hear it's fixed!
Josh
Josh Durgin
06:00 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
Hi Josh,
yeah, seems to be fixed. When do we have to expect it in a more "stable" release?
Thnx for all efforts...
Oliver Francke
02:14 AM Bug #2446 (Resolved): libceph: corrupt inc osdmap epoch 24630 off 702 (ffff88001e5d876c of ffff88...
Ceph version: 0.46
Kernel client version: Debian's linux-image-3.3.0-trunk-amd64=3.3.4-1~experimental.1 patched so t...
Karol Jurak

05/17/2012

09:24 PM Bug #2441 (Resolved): haproxy, rgw: returns 502
Ok, hasn't reproduced anymore. Resolving. Yehuda Sadeh
04:37 PM Bug #2441: haproxy, rgw: returns 502
Our haproxy configuration was broken (values are in milliseconds, not in seconds):... Yehuda Sadeh
09:37 AM Bug #2441 (Resolved): haproxy, rgw: returns 502
We see occasional 502 responses that originate at haproxy. The apache and rgw logs don't have any indications about s... Yehuda Sadeh
09:22 PM Feature #2327 (Resolved): mon: use external keyring for inter-mon auth
Sage Weil
09:19 PM Feature #2405 (In Progress): osd: Make ceph-osd --mkfs idempotent
Sage Weil
09:18 PM Feature #717: cephtool: make -s/-w use subscribe instead of paxos watch; deprecate paxos watch
The --watch-error etc. options don't seem to work.. otherwise this is basically ready. Pushed a bunch of cleanups to... Sage Weil
06:20 PM Feature #2419: mon: Take peer hints via admin socket
wip-quorum
passing my test, qa/mon/bootstrap/initial_members_asok.sh
Sage Weil
03:39 PM Feature #2418: mon: Take mandatory initial quorum members from ceph.conf
dropped mon_initial_hosts.. that was a bad idea.
see wip-quorum
Sage Weil
03:35 PM CephFS Bug #2445 (Can't reproduce): crash when removing a non-empty directory
From http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6237:... Josh Durgin
03:11 PM CephFS Bug #2444 (Can't reproduce): null pointer deference in ceph_d_prune inside kvm
From http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6180:... Josh Durgin
10:09 AM Bug #2443 (Resolved): Anyone can list all keys, even with caps mon 'allow rwx' and not 'allow *'
Caps are kind of pointless if I can just ask for any secret I want.
ubuntu@inst03:~$ sudo ceph --name=osd.4 --keyr...
Anonymous
10:06 AM Bug #2442 (Resolved): bash completion is broken
"ceph --keyring=/et" TAB TAB TAB nothing happens. Even the built-in, non-customized completion for "doesnotexist --ke... Anonymous
09:17 AM Feature #2440 (Resolved): osd: understand btrfs performance
Sage Weil
08:46 AM Bug #2420: ceph crash while under iogen load
it should still be reproducible. I left the configuration up and I was able to reproduce it a couple of days ago. Ken Franklin
05:24 AM Linux kernel client Bug #2439 (Resolved): ceph-client: auth: handle null verify_authorizer_reply method
I added code to the client messenger avoid dereferencing a null
auth_client->ops or auth_client->ops->(method) point...
Alex Elder
05:18 AM Linux kernel client Cleanup #2438 (Closed): ceph-client: use BUG_ON() for null auth_client->ops pointers
I added code to in the client messenger to verify auth_client->ops
and auth_client->ops->(method) were non-null befo...
Alex Elder

05/16/2012

09:47 PM Feature #2426: precise packages for apache2, fastcgi
Pretty sure i built this on pudgy, whose disk has just died.
I'd apt-get source these and verify we can rebuild ...
Sage Weil
07:14 AM Feature #2426: precise packages for apache2, fastcgi
Ok, I just rsynced the dho packages (manually built) over to the gitbuilder url. At the very least, need to document... Sage Weil
07:14 AM Feature #2426: precise packages for apache2, fastcgi
Either set up a gitbuilder, or build the packages manually and put in a repo, or just use the dho packages+repo (buil... Sage Weil
07:10 AM Feature #2426 (Resolved): precise packages for apache2, fastcgi
Sage Weil
09:44 PM Bug #2420 (Fix Under Review): ceph crash while under iogen load
I think fix-unregister-race will fix this.. Alex or Yehuda, does that make sense?
Hopefully the crash is reproduci...
Sage Weil
12:03 PM Bug #2420: ceph crash while under iogen load
The iogen command used was:
sudo iogen -s 2g -b 128k -t 1 -d /mnt/osd -n 5
Ken Franklin
08:50 PM Feature #2295: make qemu cache=writeback,writethrough option turn on librbd caching
Looks good! Reviewed-by: etc. Sage Weil
03:43 PM Feature #2295 (Fix Under Review): make qemu cache=writeback,writethrough option turn on librbd ca...
See wip-cache Josh Durgin
08:46 PM Bug #2437 (Resolved): osd: very slow during recovery
On congress, s3tests runtimes go from 1-2 minutes to 7+ minutes when there is any recovery going on. This appears to... Sage Weil
06:08 PM Feature #2290 (Resolved): ObjectCacher: handle read/write errors
Josh Durgin
02:01 PM Feature #2290 (Fix Under Review): ObjectCacher: handle read/write errors
Updated wip-oc-error-handling to use a separate BufferHead state for read errors, and just reset the state to dirty o... Josh Durgin
04:53 PM Cleanup #2344 (Resolved): convert Monitor maps to use ENCODE_START
Sage Weil
04:50 PM Cleanup #2344: convert Monitor maps to use ENCODE_START
Looks good. Reviewed-by: Greg Farnum. :) Greg Farnum
04:10 PM Cleanup #2344 (Fix Under Review): convert Monitor maps to use ENCODE_START
Sage Weil
02:23 PM Cleanup #2344: convert Monitor maps to use ENCODE_START
MonCaps look good; I'm a bit concerned about the "all features" default used for the MonMap (left a note on GitHub) b... Greg Farnum
04:52 PM Feature #2407 (Resolved): auth: "ceph auth get NAME"
Sage Weil
04:52 PM Feature #2404 (Resolved): init-ceph: Make /etc/init.d/ceph ignore entries without explicit host= ...
Sage Weil
04:52 PM Feature #2406 (Resolved): auth: "ceph auth get-or-create NAME CAPS.."
Sage Weil
03:36 PM Bug #2436 (Resolved): mon: suicides when trying to join an existing quorum
commit:515649558d5edebfd705e63bc34cd74d2db1f682 Sage Weil
03:18 PM Bug #2436: mon: suicides when trying to join an existing quorum
This is because when you provide a new monitor with a monmap that includes both itself and existing monitors, the new... Greg Farnum
03:11 PM Bug #2436 (Resolved): mon: suicides when trying to join an existing quorum
I have mon.single running happily at 192.168.122.91. On 192.168.122.159, I run (using the right values for key and fs... Anonymous
02:39 PM Feature #717 (In Progress): cephtool: make -s/-w use subscribe instead of paxos watch; deprecate ...
Joao Eduardo Luis
01:54 PM Documentation #2388 (Resolved): librbd python doc lacks ioctx parameter to rbd.Image() constructor
commit:48d97fe79634d4d4bb7ea2237083c3cd694ff3fe Greg Farnum
01:51 PM Cleanup #2435 (Resolved): Remove binary keyring support
Plaintext keyrings have been supported (and default) for almost a year and a half (#705, cfae10b8f8b0d91f37dc6eb72f3b... Anonymous
01:32 PM rgw Bug #2433 (Resolved): rgw: failing atomic reads/writes
Problem was in the s3-tests test suite.
Fixed, commit:adabd0ba7def8fc12e00b2c19a37d5936d53eff6 and commit:f1f86a0d...
Yehuda Sadeh
11:39 AM rgw Bug #2433: rgw: failing atomic reads/writes
Triggered on boto 2.4.0. Yehuda Sadeh
11:18 AM rgw Bug #2433 (Resolved): rgw: failing atomic reads/writes
... Sage Weil
01:04 PM Linux kernel client Bug #2392: First read of symlink after ceph filesystem mounted gives error
The problem is the lookup open intents stuff. We try to do a lookup + open, but it ends up that the lookup result is... Sage Weil
11:45 AM Linux kernel client Bug #2392: First read of symlink after ceph filesystem mounted gives error
Okay, this looks to me like it has to be a problem with the kernel client. The MDS definitely knows it's a symlink at... Greg Farnum
11:37 AM rgw Bug #2434 (Duplicate): rgw: failing readwrite test
Duplicates #2433. Yehuda Sadeh
11:18 AM rgw Bug #2434 (Duplicate): rgw: failing readwrite test
... Sage Weil
10:36 AM Linux kernel client Cleanup #2432 (Resolved): ceph-client: messenger: refactor to simplify state model
There is a mix of states and flags used in the client messenger code
to track what's going on. The result is a litt...
Alex Elder
10:21 AM Feature #2431 (Duplicate): teuthology: qemu + rbd testing
Sage Weil
10:13 AM Feature #1711: chef: multiple monitor support
Depends: #2406 Anonymous
10:08 AM Feature #2241 (Rejected): upstart
Sage Weil
10:05 AM Feature #2399 (In Progress): qa: haproxy + rgw + jenkins
Sage Weil
09:55 AM Subtask #2430 (Resolved): simplify pg removal
PG resurrection complicates the implementation and can be removed with little harm. Samuel Just
09:03 AM Feature #2428: auth: revise auth config params
We need to keep backwards compatibility with auth supported so libvirt doesn't break. Josh Durgin
08:03 AM Feature #2428 (Resolved): auth: revise auth config params
new:
* auth cluster required = [cephx] what mon requires of mon, mds, osd daemons
* auth service required = [cephx]...
Sage Weil
08:58 AM Linux kernel client Bug #2429: ceph-client: verify_authrizer_reply con method never called
Whoops, forgot to fill in a real subject/title. Not sure how to
fix it either.
Alex Elder
08:54 AM Linux kernel client Bug #2429 (Resolved): ceph-client: verify_authrizer_reply con method never called
Both ceph_connection_operations and ceph_auth_client_ops define
a verify_authorizer_reply method.
The only caller...
Alex Elder
07:11 AM Feature #2427: precise gitbuilder http
* thttpd dne on precise. make a package, or use apache, nginx, or something. Sage Weil
07:10 AM Feature #2427 (Resolved): precise gitbuilder http
Sage Weil
06:22 AM Linux kernel client Bug #2424 (Resolved): ceph-client: messenger: badness in prepare_write_connect()
At the end of prepare_write_connect() there is a call to prepare_connect_authorizer().
That function gets an authori...
Alex Elder

05/15/2012

09:00 PM Feature #2423: gceph: remove it
This is the only remaining user of paxos observer framework. We could easily add a "pgmap" subscribe to support it, ... Sage Weil
08:59 PM Feature #2423 (Resolved): gceph: remove it
Sage Weil
08:56 PM Feature #1651 (Resolved): command line tool to interact with admin socket
ceph --admin-daemon <socket> <command ...> Sage Weil
08:05 PM Feature #1488 (Resolved): chef: spec/break down osd addition, replacement
Sage Weil
08:02 PM RADOS Feature #2422 (Resolved): crush: test that mapping result is uncorrelated
Verify that crush outputs are not correlated between devices. e.g., that all items with primaries on device A also g... Sage Weil
07:57 PM RADOS Feature #2421 (Resolved): crush: quantitatively validate mapping quality
- measure variance of resulting mapping vs expected (based on weights)
- give some quantitative comparison with expe...
Sage Weil
05:07 PM Bug #2420 (Resolved): ceph crash while under iogen load
ken.franklin@inktank.com was running a cluster with three nodes, one OSD and one monitor per node, and one mds.
ceph...
Dan Mick
04:26 PM Linux kernel client Bug #2392: First read of symlink after ceph filesystem mounted gives error
Sorry - remembering to enable debugging that would have been more helpful! Logs with debugging turned on attached. Mark Kirkwood
09:51 AM Linux kernel client Bug #2392: First read of symlink after ceph filesystem mounted gives error
Mark, can you repeat these with debug logging turned up? It'll take a fair bit of disk space but there's not very muc... Greg Farnum
11:27 AM Feature #1711: chef: multiple monitor support
Depends: #2418 #2419
Current architecture:
Assumption: mon id is always chef nodename (= short hostname)
Assum...
Anonymous
11:19 AM Feature #2419 (Resolved): mon: Take peer hints via admin socket
Allow starting a mon with an empty monmap and "mon hosts". Make it
take "peer hints" via the admin socket. Peer hint...
Anonymous
11:16 AM Feature #2418 (Resolved): mon: Take mandatory initial quorum members from ceph.conf
Make it so that forming an initial quorum is only possible if the
majority of members from a "initial members" are p...
Anonymous
11:04 AM Bug #2316 (Resolved): rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g30...
More than 100 runs with no problems, I'd say this is fixed. commit:d7343814a01257a4f727fdfc752361b930ab5719 Josh Durgin
09:37 AM devops Feature #2417: chef: support radosgw
Depends on #2415. Anonymous
09:36 AM devops Feature #2417 (Rejected): chef: support radosgw
Anonymous
09:36 AM devops Feature #2416: chef: support mds
Depends on #2414. Anonymous
09:36 AM devops Feature #2416 (Rejected): chef: support mds
Anonymous
09:36 AM devops Feature #2415 (Resolved): upstart: support radosgw
Anonymous
09:35 AM Feature #2414 (Resolved): upstart: support mds
Anonymous
09:08 AM Feature #2413 (Resolved): qa: Test co-existence of sysvinit and upstart, 3: upstart controlled
Depends on #2404. Depends on branch chef-2. TODO merge not available currently.
Deploy using ceph-cookbooks.git.
En...
Anonymous
09:07 AM Feature #2412 (Resolved): qa: Test co-existence of sysvinit and upstart, 2: sysvinit controlled, ...
Depends on #2404. Depends on branch chef-2. TODO merge not available currently.
Ensure osd data disks do NOT use GPT...
Anonymous
09:07 AM Feature #2411 (Resolved): qa: Test co-existence of sysvinit and upstart, 1: sysvinit controlled, ...
Depends on #2404. Depends on branch chef-2. TODO merge not available currently.
Ensure osd data disks do NOT use GPT...
Anonymous
03:39 AM Bug #2379: Mon crash after start
The problem occurred again.... Maciej Galkiewicz

05/14/2012

09:34 PM rbd Bug #2410 (Closed): hung xfstest #68
... Sage Weil
09:22 PM Feature #2408: librbd: track latency with perfcounters
Sage Weil
04:49 PM Feature #2408 (Resolved): librbd: track latency with perfcounters
We're currently counting read/write/etc op and byte counts. We also want to know latencies in each category.
(For...
Sage Weil
09:07 PM Bug #2409 (Resolved): osd: pgs stuck in active
congress has several pgs stuck in active, and objects counted as degraded. looks like we missed some corner case in ... Sage Weil
04:47 PM Linux kernel client Bug #2392: First read of symlink after ceph filesystem mounted gives error
Okay, no guarantees but I will try and check this out at least briefly in the next day or two. :) Greg Farnum
04:37 PM Linux kernel client Bug #2392: First read of symlink after ceph filesystem mounted gives error
Ah - good point, no I had not updated max_mds. I redid the setup with 1 mds and 1 osd. Same issue, logs attached. Mark Kirkwood
02:26 PM Linux kernel client Bug #2392: First read of symlink after ceph filesystem mounted gives error
I notice looking at your conf file that you have 3 MDSes. Are they all active? (ie, did you increase max_mds to 3)
I...
Greg Farnum
04:31 PM Feature #2327: mon: use external keyring for inter-mon auth
wip-mon-keyring Sage Weil
03:44 PM Feature #2327: mon: use external keyring for inter-mon auth
Which branch? Greg Farnum
03:23 PM Feature #2327 (Fix Under Review): mon: use external keyring for inter-mon auth
Sage Weil
03:28 PM Bug #2393 (Duplicate): objecter: dropping messages (old connection being used)
Sage Weil
02:34 PM Bug #2393: objecter: dropping messages (old connection being used)
Just to make sure we're on the same page: there is nothing in that snippet indicating that there is an active Connect... Greg Farnum
03:27 PM RADOS Bug #2401 (Resolved): Remove OSD from CRUSH map fails if OSD Running
this bug was just fixed by wip-osdmap branch, yay! Sage Weil
10:28 AM RADOS Bug #2401 (Resolved): Remove OSD from CRUSH map fails if OSD Running
Removing an OSD from the CRUSH map requires the user to stop the OSD first. If the user doesn't stop the OSD and runs... John Wilkins
03:23 PM Cleanup #2344 (In Progress): convert Monitor maps to use ENCODE_START
Sage Weil
11:23 AM Cleanup #2344: convert Monitor maps to use ENCODE_START
The PGMap changes there are fine, but there are several others:
AuthMonitor::Incremental
MonCaps
MonMap
OSDMap
...
Greg Farnum
03:20 PM Feature #2404 (Fix Under Review): init-ceph: Make /etc/init.d/ceph ignore entries without explici...
wip-upstart Sage Weil
02:09 PM Feature #2404 (Resolved): init-ceph: Make /etc/init.d/ceph ignore entries without explicit host= ...
Anonymous
03:20 PM Feature #2407 (Fix Under Review): auth: "ceph auth get NAME"
wip-upstart Sage Weil
02:12 PM Feature #2407 (Resolved): auth: "ceph auth get NAME"
Comparable to "ceph auth list" and filtering the results, but doesn't leak as many secrets.
Not a dependency of anyt...
Anonymous
03:19 PM Feature #2406 (Fix Under Review): auth: "ceph auth get-or-create NAME CAPS.."
wip-upstart Sage Weil
02:11 PM Feature #2406 (Resolved): auth: "ceph auth get-or-create NAME CAPS.."
If the key does not exist, atomically generate a key and add it to the monitor. Output the new key.
If the key exist...
Anonymous
03:18 PM Feature #2290 (In Progress): ObjectCacher: handle read/write errors
Sage actually reviewed this a while ago, and I was changing it to separate the read/write errors and not return the w... Josh Durgin
02:46 PM Feature #2290: ObjectCacher: handle read/write errors
This looks okay to me, although I didn't check it for comprehensive-ness (just looked at the diff). Greg Farnum
02:20 PM CephFS Bug #2385: max mds = 2, mds hang and crash
The full Ceph filesystem is not currently well-tested, but if you can recreate this with MDS logging on and post the ... Greg Farnum
01:09 PM CephFS Bug #2385: max mds = 2, mds hang and crash
ceph hate symlink. :)
i can't rsync debian and ubuntu mirror from local disk to cephfs.
allways, allways when...
Yavuz Selim Komur
02:10 PM Feature #2405 (Resolved): osd: Make ceph-osd --mkfs idempotent
This will help upgrades from versions where mkfs did not set the "ready" flag. Without this, the upgrade would re-mkf... Anonymous
01:31 PM Bug #2396 (Resolved): don't fail on mismatched size CRUSH and OSD maps
Merged to master in commit:84f335a6894db6d6b993d9cba584b4c3ab5365d0 Greg Farnum
01:31 PM Bug #2397 (Resolved): mon: prevent addition of CRUSH items past the max_osd
Merged to master in commit:84f335a6894db6d6b993d9cba584b4c3ab5365d0 Greg Farnum
01:10 PM Subtask #2403 (Resolved): remove osd pointer from PG
In order to more clearly delineate the osd methods which can be correctly called from PG, those services should be pr... Samuel Just
01:09 PM Subtask #2402 (Resolved): audit calls into osd from pg for locking correctness
With pg peering in a workqueue, we can't hold the osd_lock while handling peering events. Thus, pg calls into the os... Samuel Just
12:38 PM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
Er, the important hung thread:... Josh Durgin
12:28 PM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
The deadlock was due to throttling of resent linger requests during map changes. wip-objecter-throttle should fix thi... Josh Durgin
11:29 AM Bug #2379: Mon crash after start
Sage, did you check out the logging for slurping yet?
Maciej, do you still need instructions on core files?
Greg Farnum
10:11 AM Bug #2022 (Need More Info): osd: misdirectect request
not fixed after all, failed again on
ubuntu@teuthology:/a/nightly_coverage_2012-05-13-b/1268
Sage Weil
10:08 AM rbd Bug #2400 (Resolved): xfstest: failed #84
2012-05-13T01:51:49.654 INFO:teuthology.orchestra.run.out:084 - output mismatch (see 084.out.bad)
2012-05-13T01:5...
Sage Weil
09:47 AM Feature #2399 (Resolved): qa: haproxy + rgw + jenkins
Sage Weil
09:45 AM devops Feature #2398 (Rejected): chef: external osd journal support
Sage Weil
09:35 AM RADOS Feature #2268 (Resolved): crush: update item's position in crush map
Sage Weil
09:26 AM rgw Bug #2369 (Resolved): rgw: bucket attr update does not propagate correctly to all rgw instances
There were two different issues, both are fixed now:
1. Directly using un-normalized rgw_obj with the cache (Fixed...
Yehuda Sadeh
03:23 AM rbd Feature #1484: libvirt: map rbd via kernel driver
In the current design of libvirt I don't see how you could achieve this.
With my storage pool work I found out tha...
Wido den Hollander
03:21 AM Feature #1422: libvirt: rbd storage pool
I just submitted my patch to the libvirt mailinglist. This is a revised version of the patch and should hopefully mak... Wido den Hollander

05/12/2012

04:38 PM Bug #2390: dencoder: depends on expat
Hmm, ideally ceph-dencoder would still understand the types but not need expat to compile. Which is shouldn't, in pr... Sage Weil
04:15 PM Feature #2305 (Resolved): Moving rbd images between pools
commit:ee26c5d73a48b64292d16a87ebe69908c142048e Sage Weil
03:53 PM Bug #2396 (Fix Under Review): don't fail on mismatched size CRUSH and OSD maps
wip-osdmap Sage Weil
03:53 PM Bug #2397: mon: prevent addition of CRUSH items past the max_osd
wip-osdmap Sage Weil
03:53 PM Bug #2397 (Fix Under Review): mon: prevent addition of CRUSH items past the max_osd
Sage Weil

05/11/2012

11:46 PM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
Hi Josh,
should've been done, if I missedt s/t, stuff is in /usr/src...
Hope it helps,
Oliver.
Oliver Francke
09:03 PM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
Here's a log in which the dispatcher never runs again, despite the osd_map_ack and other messages being received.
...
Josh Durgin
09:14 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
The dispatch thread hasn't run since this point in the log:... Josh Durgin
09:12 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
So the connection with no pipe problem only happens when the monitor is restarted. The issue still happens otherwise ... Josh Durgin
11:12 PM Bug #2397 (Resolved): mon: prevent addition of CRUSH items past the max_osd
To make osd addition a little friendlier, we should add checks to make sure that people run "ceph osd create" before ... Greg Farnum
11:10 PM Bug #2396 (Resolved): don't fail on mismatched size CRUSH and OSD maps
See osd-crush-resize. Come up with a better fix to apply to master (or next, if that's going). Greg Farnum
11:22 AM Linux kernel client Bug #2287: rbd: crashes with 10Gbit network and fio
We used the attached patch to resolve the immediate problem.
But we still see other crashes over the time. I foun...
Danny Kukawka
11:05 AM Linux kernel client Bug #2395 (Resolved): kernel crash after unmap a rdb device while the cluster is down
1) get a ceph cluster running
2) create a RBD and map it to an client
3) shutdown the cluster
4) as soon as the cl...
Danny Kukawka

05/10/2012

05:44 PM Feature #2394 (Resolved): Provide tool to answer: "when is it safe to kill this osd"
After "ceph osd out 123", when is it safe to kill the ceph-osd daemon?
Assume a busy cluster where there's other f...
Anonymous
04:13 PM Bug #2393: objecter: dropping messages (old connection being used)
Ah, too bad. Yehuda Sadeh
03:55 PM Bug #2393: objecter: dropping messages (old connection being used)
The caller always holds the lock for the objecter. RadosClient::ms_handle_reset grabs the lock and calls objecter->ms... Josh Durgin
01:49 PM Bug #2393: objecter: dropping messages (old connection being used)
Objecter::ms_handle_reset() was not acquiring a lock. ms_handle_reset() racing with any operation that grabs the sess... Yehuda Sadeh
10:57 AM Bug #2393: objecter: dropping messages (old connection being used)
Ah, need to look at the logs again. There's nothing in this excerpt to say that both requests were supposed to go to ... Yehuda Sadeh
03:39 PM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
With more debugging, this looks like a problem with the messenger or the monitor client's use of it. After a fault, t... Josh Durgin

05/09/2012

11:32 PM Bug #2393: objecter: dropping messages (old connection being used)
I think the ping is a red herring. In tick() we go over all the regular sessions, and then over all the lingering ses... Yehuda Sadeh
09:21 PM Bug #2393: objecter: dropping messages (old connection being used)
One more point to note is that all the following ping messages show the same issue (dropped message). Yehuda Sadeh
09:11 PM Bug #2393 (Duplicate): objecter: dropping messages (old connection being used)
... Yehuda Sadeh
04:55 PM Linux kernel client Bug #2392 (Resolved): First read of symlink after ceph filesystem mounted gives error
On client machine (Ubuntu 12.04):... Mark Kirkwood
02:39 PM Bug #2391 (Resolved): librados docs bug
This was reported on IRC:
jeffp> hi, i just noticed an error in the librados docs. http://ceph.com/docs/master/api/...
Alex Elder
11:04 AM Bug #2390 (Resolved): dencoder: depends on expat
dencoder uses rgw_dencoder.cc which requires the expat library. Yehuda Sadeh

05/08/2012

04:04 PM Feature #2335 (Resolved): librbd: write-thru cache mode
Sage Weil
01:02 PM Linux kernel client Bug #2389 (Duplicate): rbd: hung xfstest 67
2012-05-08T12:18:39.909946-07:00 plana52 kernel: [43825.460582] libceph: osd4 10.214.132.28:6803 socket closed
2012-...
Sage Weil
12:08 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
You're right that one of the requests was dropped, but it was just because osd.1 was killed, and it was the second me... Josh Durgin

05/07/2012

10:32 PM Feature #2325 (Resolved): setup new email/etc
Sage Weil
06:51 PM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
op 5400 was resent in epoch 8565, but the osd never saw it arrive the second time around. looking at the other laggy... Sage Weil
06:05 PM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
The most obvious problem is with clients thinking their requests outstanding, when the osds think they've completed a... Josh Durgin
04:19 PM Documentation #2388 (Resolved): librbd python doc lacks ioctx parameter to rbd.Image() constructor
The librbd Python binding doc (http://ceph.com/docs/master/api/librbdpy/) has the following in its example:
image ...
Dan Mick
04:00 PM rgw Feature #2284 (Resolved): rgw: bench based on rados_bench
Sage Weil
03:51 PM RADOS Subtask #2340 (Resolved): crush: remove forcefeeding
Sage Weil
03:50 PM Bug #2022 (Resolved): osd: misdirectect request
Sage Weil
12:47 PM Bug #2022: osd: misdirectect request
running this in a loop against the wip-crush kernel branch and the problem seems to be gone! Sage Weil
11:08 AM Bug #2379: Mon crash after start
the problem was that latest was at a higher version than last_committed but slurping wasn't set.
checking whethe...
Sage Weil
09:45 AM Bug #2379 (Need More Info): Mon crash after start
What would actually be better is if you could fire up gdb on ceph-mon and the core file and paste a backtrace here. ... Sage Weil
09:42 AM Bug #2379: Mon crash after start
I can provide it for you. Please describe how to do this. Maciej Galkiewicz
09:40 AM Bug #2379: Mon crash after start
Do you have a core file for this? Sage Weil
09:24 AM Bug #2373 (Resolved): osd: last_epoch_clean > last_epoch_started
commit:efc0701cf97f6a936c8f253b5449216f309fe4a3 Sage Weil
09:07 AM Bug #2387 (Duplicate): mon: could not get service secret for auth subsystem
... Sage Weil
09:04 AM Bug #2386 (Resolved): xfstests: failed #34
... Sage Weil

05/05/2012

09:27 PM RADOS Subtask #2340 (Fix Under Review): crush: remove forcefeeding
Sage Weil
02:11 PM Bug #2188 (Resolved): mon: mds rm should be harder to break things with
added checks that the gid exist and that it is not active. with those in place, this is no longer dangerous... all y... Sage Weil
02:02 PM Cleanup #2344 (Fix Under Review): convert Monitor maps to use ENCODE_START
wip-mon-encoding Sage Weil
01:57 PM Bug #2373 (Fix Under Review): osd: last_epoch_clean > last_epoch_started
wip-osd-peering Sage Weil
11:43 AM Bug #2373: osd: last_epoch_clean > last_epoch_started
the problem is that we set the CLEAN state bit and report that to the monitor before last_epoch_started has propagate... Sage Weil
11:10 AM Bug #2192 (Won't Fix): ceph-mon hangs consuming 100% CPU
Yep, this sounds like the writeback sync deadlock:
- ceph-mon calls sync
- the kernel client flushes it's dirty...
Sage Weil
11:01 AM Bug #2124 (Resolved): crash when malformed auth key is provided
commit:ae0ca7be3c8f1aaf5f0bf7534363ffd60b6a04e2 Sage Weil
10:53 AM Bug #2275 (Resolved): osd: crash in FileJournal::wrap_read_bl
converted int -> ssize_t in common/safe_io.c, commit:3509b039a28d41c7ae1b3d482d67a27f8e5739e8 Sage Weil
01:40 AM CephFS Bug #2385: max mds = 2, mds hang and crash
Sorry..
ceph.com/debian repo wheezy binaries. hang all client task....
Yavuz Selim Komur
01:38 AM CephFS Bug #2385 (Can't reproduce): max mds = 2, mds hang and crash
Yavuz Selim Komur

05/04/2012

03:48 PM Bug #2022 (Fix Under Review): osd: misdirectect request
i just found a crush algorithm fix/change that was in ceph.git but on the kernel... i bet that is the problem. now s... Sage Weil
02:17 PM RADOS Subtask #2340 (In Progress): crush: remove forcefeeding
just need to apply these to the kernel implementation, too Sage Weil
02:09 PM Feature #2335 (Fix Under Review): librbd: write-thru cache mode
wip-rbd-wt Sage Weil
12:33 PM Feature #2290 (Fix Under Review): ObjectCacher: handle read/write errors
Forgot to update this before, but it's in wip-oc-error-handling. Josh Durgin
12:24 PM Linux kernel client Bug #2384 (Resolved): libceph: fix all vmalloc (buffer_new) callers to use GFP_KERNEL
GFP_NOFS is not safe because __vmalloc may ahve to adjust page tables with GFP_KERNEL. need to fix our callers to no... Sage Weil
09:55 AM rbd Feature #2383 (New): tool for exposing block existence and location for an rbd image
This would give users some transparency about rbd block placement and usage. Samuel Just
01:25 AM CephFS Bug #2375: rrdtoll data malfuntion..
it's still in ceph-0.46 from ceph.com/debian repository.. Yavuz Selim Komur

05/03/2012

09:44 PM RADOS Subtask #2340 (Fix Under Review): crush: remove forcefeeding
Sage Weil
09:41 PM Feature #2367 (Resolved): mon: osd crush add should move item if it exists elsewhere
Sage Weil
09:40 PM Bug #2372 (Resolved): librbd: unit tests get different error codes
Sage Weil
08:13 PM Bug #2382 (Resolved): osd: unable to start due to 1 child already started
Sounds good to me; I never liked depending on /proc for that anyway.
Merged into master. We probably want to put it ...
Greg Farnum
07:38 PM Bug #2382 (Fix Under Review): osd: unable to start due to 1 child already started
wip-2382 Sage Weil
07:35 PM Bug #2382: osd: unable to start due to 1 child already started
ok, this is just a bad check. we're verifying there aren't threads because fork()/daemonize() will destroy them. th... Sage Weil
06:11 PM Bug #2382: osd: unable to start due to 1 child already started
I just built fresh 0.46 rpms (ran 0.45 before) and now I'm seeing this too.
Notice the timestamps. I had to call thi...
Dennis Jacobfeuerborn
06:09 PM Bug #2382: osd: unable to start due to 1 child already started
Also, occasionally, this also happens with ./init-ceph when starting all services, or each one individually. For inst... Joao Eduardo Luis
06:04 PM Bug #2382: osd: unable to start due to 1 child already started
I re-triggered this using 'CEPH_NUM_OSD=1 CEPH_NUM_MDS=1 CEPH_NUM_MON=1 ./vstart.sh' on my desktop (granted, it's a d... Joao Eduardo Luis
04:21 PM Bug #2382: osd: unable to start due to 1 child already started
i saw this on congress too. will reproduce on my burnupi cluster and investigate. Sage Weil
04:12 PM Bug #2382: osd: unable to start due to 1 child already started
update: if i start the osd's with ceph-osd manually this doesn't seem to happen - just with /etc/init.d/ceph start an... Jeff Plaisance
04:01 PM Bug #2382 (Resolved): osd: unable to start due to 1 child already started
I had seen this bug a few days ago while setting up ceph on my desktop, but it went away by rerunning ./ceph-osd so I... Joao Eduardo Luis
12:51 PM rgw Bug #2381 (Resolved): rgw: radosgw-admin operations don't trigger notifications
Fixed, commit:3228643f1ec34a500246ddc1e16025f05b587342 Yehuda Sadeh
12:31 PM rgw Bug #2381 (Resolved): rgw: radosgw-admin operations don't trigger notifications
radosgw-admin operations that update user info are not triggering notifications to the rgw daemons, thus rgw instance... Yehuda Sadeh
12:51 PM Bug #2377 (Resolved): watch is lost if tcp timeout reached on the connection
Fixed, commit:a5f533a7ed210c274c44a7fa922377e1a0ab9900. Yehuda Sadeh
10:35 AM CephFS Bug #2380 (Rejected): kclient: aufs over a cephfs mount fails with Stale NFS file handle
The aufs directory is inaccessible.
<notextile>#</notextile> mkdir /tmp/dir{1..3}
<notextile>#</notextile> mount ...
Eric Dold
03:06 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
Hi Josh, Sage,
as we are now proud to be mentioned on ceph.com, we should push things a bit, cause have to breathe...
Oliver Francke
02:28 AM Bug #2379 (Resolved): Mon crash after start
... Maciej Galkiewicz

05/02/2012

11:39 PM CephFS Cleanup #2378 (Resolved): "ceph -s" MDS output is confusing
If you're running an RBD/RGW cluster without an MDS daemon, having output like... Greg Farnum
11:13 PM Feature #685: libcephmon: interact with ceph monitors via a library
Has there been any progress on this?
My idea is still to make some sort of nice WebGUI where you can see the state...
Wido den Hollander
04:38 PM Bug #2377: watch is lost if tcp timeout reached on the connection
Anything that relies on the underlying TCP connection to stick around forever is going to be a problem for us as we s... Greg Farnum
04:23 PM Bug #2377 (Resolved): watch is lost if tcp timeout reached on the connection
Once tcp timeout is reached on a connection that has a watch on, we don't try to reconnect immediately (should we eve... Yehuda Sadeh
04:23 PM Feature #2335 (In Progress): librbd: write-thru cache mode
just setting the dirty limit to 0 won't work (well), because we'll end up with a single io in flight at any point in ... Sage Weil
03:32 PM Feature #2141 (Resolved): ceph: 'object map <poolname> <objectname>' or similar
Sage Weil
03:22 PM Feature #2141: ceph: 'object map <poolname> <objectname>' or similar
Looks good, assuming you actually ran it and tested something. Greg Farnum
03:02 PM Feature #2141 (Fix Under Review): ceph: 'object map <poolname> <objectname>' or similar
ceph osd map poolname objectorkeyname Sage Weil
02:47 PM Feature #2349: rados bench: Extra statistics
Also, a header showing how the command was invoked! Mark Nelson
02:19 PM Bug #2370: rbd doesn't support mv/rename across pools; tool should detect
Trying to fix this:
commit:ee26c5d73a48b64292d16a87ebe69908c142048e
Dan Mick
01:59 PM Bug #2372: librbd: unit tests get different error codes
merged into master Sage Weil
10:30 AM Bug #2372 (Fix Under Review): librbd: unit tests get different error codes
Fixed in wip-rbdpy Josh Durgin
01:55 PM Bug #2221 (Resolved): Monitor setup bugs
I have spent way too long doing monitor additions lately, and I've gotten a few things cleaned up and checking the do... Greg Farnum
01:54 PM Bug #2338 (Rejected): mon: adding new monitors simultaneously can allow a new mon to become leader
Everything *does* work properly if you aren't located in the monmap. I would love to come up with a way of providing ... Greg Farnum
01:43 PM Cleanup #2376 (Resolved): ceph-authtool -C option doesn't work
commit:e20cd4baf912414b089adb2e2386c4dc94088d30 Sage Weil
01:29 PM Cleanup #2376 (In Progress): ceph-authtool -C option doesn't work
I broke this.. -C is short for --cluster. reverting and pushing to stable. Sage Weil
01:27 PM Cleanup #2376: ceph-authtool -C option doesn't work
Also:... Greg Farnum
01:26 PM Cleanup #2376 (Resolved): ceph-authtool -C option doesn't work
... Greg Farnum
01:43 PM Feature #2367 (Fix Under Review): mon: osd crush add should move item if it exists elsewhere
wip-crush-update Sage Weil
10:31 AM Feature #2336 (Resolved): qemu: wire up discard
Accepted upstream in Kevin's block branch. Josh Durgin
09:18 AM CephFS Bug #2375 (Closed): rrdtoll data malfuntion..
Sorry for my english.
i prepared 4 debian-amd64-wheezy host cephfs. mount in /mnt
when copy rrd db data to mnt
...
Yavuz Selim Komur
09:03 AM Linux kernel client Feature #2374: ceph-client: start laying the groundwork for Linux tracepoints
Sounds like a great idea to me! Sage Weil
05:46 AM Linux kernel client Feature #2374 (New): ceph-client: start laying the groundwork for Linux tracepoints
Linux supports a mechanism for very efficiently inserting trace points
in code, which allow for an arbitrary functio...
Alex Elder

05/01/2012

10:42 PM Bug #2373 (Resolved): osd: last_epoch_clean > last_epoch_started
... Sage Weil
10:24 PM Bug #2372 (Resolved): librbd: unit tests get different error codes
2012-05-01T02:28:28.447 INFO:teuthology.task.workunit.client.0.err:test_rbd.TestImage.test_remove_with_snap ... ERROR... Sage Weil
10:21 PM Linux kernel client Bug #2371: pjd chown/00.t 141, 145, 153 fail on kclient
maybe related to toher pjd failures on ceph-fuse, #2187, #1811, #1586 Sage Weil
10:19 PM Linux kernel client Bug #2371 (Can't reproduce): pjd chown/00.t 141, 145, 153 fail on kclient
... Sage Weil
10:12 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Not sure if that's a problem, need to dig deeper into that, but note that in the OPENING case there we're not jumping... Yehuda Sadeh
08:00 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Well, after getting my dout() lines to produce useful information
I have a new theory. Basically I think this error...
Alex Elder
07:20 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
@Wish I knew about the "pahole" utility (part of the "dwarves" package)
last week.
% pahole libceph.ko -C ceph_co...
Alex Elder
05:20 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Last night I continued to decode assembly, working through making
sense of the particular instructions and the state...
Alex Elder
08:13 PM Bug #2370 (Resolved): rbd doesn't support mv/rename across pools; tool should detect
Dan Mick
08:10 PM Bug #2370: rbd doesn't support mv/rename across pools; tool should detect
commit: ee26c5d73a48b64292d16a87ebe69908c142048e Dan Mick
07:30 PM Bug #2370 (Resolved): rbd doesn't support mv/rename across pools; tool should detect
rbd doesn't support moving an image between RADOS pools; the rbd CLI should detect this case and issue an error. rbd... Dan Mick
04:49 PM Bug #2364 (Won't Fix): mon: can't specify monitor to join with -m
I didn't have any troubles doing any of this with multiple machines so it does appear to only be a problem if you're ... Greg Farnum
04:01 PM Bug #2364: mon: can't specify monitor to join with -m
I've investigated this and it only tries to bind to existing monitor addrs if they have a local IP in the list provid... Greg Farnum
04:39 PM rgw Bug #2369 (Resolved): rgw: bucket attr update does not propagate correctly to all rgw instances
Yehuda Sadeh
01:49 PM Bug #2365 (Resolved): osd: don't notice when nodes in probe set are down
commit:bb7e5da322871bfa1ac596411c596f83083b02a6 Samuel Just
01:46 PM Feature #2323 (Resolved): osd: limit 'old request' messages generated
Joao Eduardo Luis
12:05 PM Feature #2323 (Fix Under Review): osd: limit 'old request' messages generated
Joao Eduardo Luis
12:39 PM Feature #2358 (Resolved): throttle: expose via perfcounters
Sage Weil
12:37 PM Bug #2306 (Resolved): objecter: accessing empty object maps to pool 0
Sage Weil
12:35 PM Bug #2368 (Resolved): injectargs not working for fliestore flusher
commit:06fd0b68f655376f1c468edacb8655f1c2b163c1 Sage Weil
12:24 PM Bug #2368 (Resolved): injectargs not working for fliestore flusher
Sage and I were trying to turn off the filestore flusher and doing a:
ceph osd tell \* injectargs '--no-filestore-...
Mark Nelson
12:25 PM Bug #2303 (Can't reproduce): osd: failed to peer on startup
i'm going to guess this is a dup of #2355, or fixed.. haven't seen it since. Sage Weil
10:50 AM Bug #2355 (Resolved): pgs stuck creating (with thrashing)
commit:81f51d28d67c2a58ab621405c3da65aac726d719 Sage Weil
10:42 AM Bug #2355 (Fix Under Review): pgs stuck creating (with thrashing)
see wip-2355 Sage Weil
10:31 AM Feature #2367 (Resolved): mon: osd crush add should move item if it exists elsewhere
Sage Weil
09:57 AM rgw Bug #2366 (Resolved): rgw: bucket index update rely on pg state
Since the object version number is being used to update the bucket index, if we'd backup, remove, restore the pool th... Yehuda Sadeh
09:41 AM Bug #2357: mds takes down ceph
Are you running with both of the MDSes active, or just one of them? You'll have better luck with one active and one s... Greg Farnum
09:33 AM Bug #2363 (Resolved): mon: ./ceph-mon -i b --mkfs -c ceph.conf segfaults
Merged into master in commit:22bd5dfa25a1b93ccbacf882cea2e55dc8744004 Greg Farnum

04/30/2012

05:25 PM Bug #2345 (Resolved): mon: users can create both pool snapshots and self-managed snapshots on a s...
Sage Weil
05:09 PM Feature #2358 (Fix Under Review): throttle: expose via perfcounters
see wip-throttle
Sage Weil
02:05 PM Feature #2358 (In Progress): throttle: expose via perfcounters
Sage Weil
04:58 PM Bug #2364: mon: can't specify monitor to join with -m
It does work if you have a monmap, though (although it's noisy for things like lack of keyrings, admin socket locatio... Greg Farnum
04:39 PM Bug #2364 (In Progress): mon: can't specify monitor to join with -m
Of course, at that point the -m is essentially ignored.
If I merge in my no-conf-necessary changes and run without...
Greg Farnum
04:09 PM Bug #2364: mon: can't specify monitor to join with -m
Oh, and if you do specify the fsid in the above step (apparently required, my bad) and try to start up the mon:
<pre...
Greg Farnum
03:48 PM Bug #2364 (Won't Fix): mon: can't specify monitor to join with -m
... Greg Farnum
04:47 PM Bug #2365 (Resolved): osd: don't notice when nodes in probe set are down
we currently do
set<int> new_peers;
if (role == 0) {
for (unsigned i=0; i<acting.size(); i++)
n...
Sage Weil
03:46 PM Bug #2363 (Resolved): mon: ./ceph-mon -i b --mkfs -c ceph.conf segfaults
It tries to identify if there's a local monitor based on the network interfaces, and segfaults inside of have_local_a... Greg Farnum
02:04 PM Feature #2321 (Resolved): osd: investigate memory consumption from peering backlog
Sage Weil
01:51 PM Bug #2360 (Resolved): osd: inconsistent use of dirty_info = true vs write_info()
wip-pi merged into master. may still be some cleanup possible, but that's an optimization at this point. Sage Weil
01:16 PM Bug #2356 (Resolved): make dist rebuilds libcommon.la
Fixed in a477d6be7effd26bd018e6af907ce2d8a691db85 Dan Mick
11:52 AM Feature #2213 (Resolved): rbd: shouldn't need config file to get help
Okay, just made it not care about config files in wip-2352. Greg Farnum
11:52 AM Bug #2352 (Resolved): ceph -s without a conf file doesn't work when it should
Okay, just made it not care about config files in wip-2352. Greg Farnum
10:50 AM Feature #2362 (New): rados: support omap (leveldb) and locator key in import/export
Observed by Henry C Chang: http://marc.info/?l=ceph-devel&m=133575684918035&w=2
Also see #1069 for similar ticket ...
Anonymous
08:43 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
I've completed decoding all of the fields of the ceph_connection
structure, along with some other stuff dereferenced...
Alex Elder
08:36 AM Linux kernel client Bug #2359: xfstest 62 failing
No I don't think that's the issue.
The problem lies in this function, defined in xfstests "common.attr":
_sort_ge...
Alex Elder
01:51 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
Hi Josh, ( sending this second time, "bad browser request..." grrr)
I had some spare-time over the weekend and dec...
Oliver Francke

04/29/2012

10:28 AM Bug #2361 (Resolved): osd: failed exists() assertion for identify_osd()
Sage Weil
10:05 AM Bug #2361 (Resolved): osd: failed exists() assertion for identify_osd()
2012-04-29T01:43:43.581 INFO:teuthology.task.ceph.mds.a.err:osd/OSDMap.h: In function 'const entity_addr_t& OSDMap::g... Sage Weil

04/28/2012

11:00 PM Bug #2360: osd: inconsistent use of dirty_info = true vs write_info()
void PG::all_activated_and_committed()
creates and submits its own transaction. it's called from _activate_commit...
Sage Weil
09:40 PM Bug #2360 (Resolved): osd: inconsistent use of dirty_info = true vs write_info()
The dirty_info flag was added ages ago to avoid stuff done during advance*N + activate_map from rewriting the pg info... Sage Weil
06:14 PM Feature #2334 (Resolved): mon: set max mark-out or mark-down
Sage Weil
06:14 PM Feature #2317 (Resolved): mon: pause/unpause auto-mark-out
Sage Weil
06:14 PM Feature #2318 (Resolved): mon: block osd boot
Sage Weil
06:14 PM Feature #2319 (Resolved): mon: block osd mark-down
Sage Weil
05:50 PM Bug #2350 (Resolved): conf: can't set subsystem settings (debug levels, logging settings) via lib...
Sage Weil
05:49 PM Bug #2322 (Resolved): osd/ReplicatedPG.cc: 3832: FAILED assert(!object_contexts.size())
commit:92becb696bde7f0aa9687b2fe7505ed1ac9f493b Sage Weil
04:39 PM Bug #2353 (Resolved): osd: current/ snap check problem
Sage Weil
03:41 PM Linux kernel client Bug #2359 (Can't reproduce): xfstest 62 failing
> 324: (3320s) collection:basic clusters:fixed-3.yaml fs:btrfs.yaml tasks:rbd_xfstests.yaml
> Command failed wit...
Sage Weil
02:29 PM Feature #2358: throttle: expose via perfcounters
see wip-throttle Sage Weil
01:58 PM Feature #2358 (Resolved): throttle: expose via perfcounters
it would be nice to track all throttlers for
- current utilization
- peak utilizatoin
- times they've actuall...
Sage Weil
11:27 AM Bug #2357 (Can't reproduce): mds takes down ceph
Hi guys !
Really impressive DFS... nice performance... cool stuff !
But while starting to use it live I unfortunate...
Jörg Ebeling

04/27/2012

09:23 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Revisiting my decoding from before, in light of my new clarity on
padding and field layout, things in the message st...
Alex Elder
08:14 PM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Clarifying something I described earlier, with respect to alignment
and sizing of fields, including those that have ...
Alex Elder
07:27 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
The skipped 7 bytes could be explained this way.
The sender sends out over the wire the sub-components of the
str...
Alex Elder
07:13 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Here is my hunch about what's going on, and what I mentioned was a
cause for trouble interpreting the memory content...
Alex Elder
07:02 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
After some false starts I decoded the beginning of the connection
structure, and I found a problem, but I have not y...
Alex Elder
05:51 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
I meant to update this yesterday morning... KDB on the system definitely
works, which is really great. But since i...
Alex Elder
05:47 PM Bug #2356 (Resolved): make dist rebuilds libcommon.la
Sage noticed this and asked me to investigate; it turns out a _SOURCES macro was mistakenly including
libcommon.la r...
Dan Mick
03:38 PM Bug #2353: osd: current/ snap check problem
Problem was that more than one daemon was brought up for the same osd, racing for the fs type check before mounting i... Yehuda Sadeh
03:00 PM Bug #2353: osd: current/ snap check problem
... Sage Weil
02:50 PM Bug #2353: osd: current/ snap check problem
I'd say the error message is bad:... Yehuda Sadeh
09:21 AM Bug #2353 (Resolved): osd: current/ snap check problem
2012-04-27 12:18:08.015605 7f281c7c17a0 0 filestore(/var/lib/ceph/osd/ceph-153) mount found snaps <42996,42997,42998... Sage Weil
01:51 PM Bug #2355 (Resolved): pgs stuck creating (with thrashing)
I was running the following teuthology config, and since test_librbd_fsx creates a pool on each run, new pgs were cre... Josh Durgin
01:29 PM Bug #2354 (Resolved): osd: make watch timeout configurable
The *notify* timeout is configurable, but the *watch* timeout is still hardcoded as 30 seconds. From ReplicatedPG.cc:... Josh Durgin
11:34 AM Bug #2352 (In Progress): ceph -s without a conf file doesn't work when it should
I've got a stupid hack that makes this work, but I'm going to see if I can generalize our common config stuff so this... Greg Farnum
11:34 AM Feature #2213 (In Progress): rbd: shouldn't need config file to get help
This is actually a little different from #2352. But keeping both in mind is better than providing a specific hack, so... Greg Farnum
10:49 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
There was a fix to librbd's handling of short reads that span objects (commit:b94d6a6cf459478b27539eb8eccd30e19a67bbc... Josh Durgin
09:06 AM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
Hi Josh,
well, just allowed myself another time-window for some investigations. Do you have some pointers to the f...
Oliver Francke

04/26/2012

10:51 PM Feature #2314 (Resolved): remove localized pgs
Sage Weil
10:40 PM Bug #2350: conf: can't set subsystem settings (debug levels, logging settings) via librados or li...
commit:4e2e87941b5f078eadf90a2c58a12765375ac66b, tested by commit:dbd99129ce58b321c0ecd641a1f8b45bd94b3b33 Sage Weil
09:55 AM Bug #2350: conf: can't set subsystem settings (debug levels, logging settings) via librados or li...
Sage Weil
10:21 PM Bug #2351 (Resolved): osd: bad state machine event
commit:3e880174dd233a3df88c63785186d36f9b12a137 Sage Weil
09:43 AM Bug #2351: osd: bad state machine event
i saw this on my dev copy, might be the same... Sage Weil
09:42 AM Bug #2351 (Resolved): osd: bad state machine event
... Sage Weil
07:41 PM Feature #2323 (In Progress): osd: limit 'old request' messages generated
Sage Weil
07:40 PM Feature #2334 (Fix Under Review): mon: set max mark-out or mark-down
Sage Weil
06:55 PM Bug #2342 (Resolved): librados: notify deadlock
Yes. RadosClient::watch_notify() doesn't call ->notify() .. -> _notify_ack() directly anymore.
Fixed, commit:70f7...
Yehuda Sadeh
06:31 PM Bug #2342: librados: notify deadlock
is this fixed with the unwatch change? Sage Weil
04:02 PM Feature #2349: rados bench: Extra statistics
Update: Also, please include a timestamp for each line of output. This can be in addition to or as a replacement for... Mark Nelson
02:25 PM Feature #2290 (In Progress): ObjectCacher: handle read/write errors
Josh Durgin
02:16 PM Bug #2316: rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"
There were a couple bugs fixed that may have caused this problem. Could you try upgrading to the 'next' branch (at co... Josh Durgin
10:40 AM Feature #2213 (Duplicate): rbd: shouldn't need config file to get help
Josh Durgin
09:58 AM Bug #2352 (Resolved): ceph -s without a conf file doesn't work when it should
... Greg Farnum

04/25/2012

07:05 PM Bug #2350 (Resolved): conf: can't set subsystem settings (debug levels, logging settings) via lib...
With the recent rework of the logging infrastructure, md_config_t::set_val() was not updated to take into account sub... Josh Durgin
01:57 PM Feature #2321 (In Progress): osd: investigate memory consumption from peering backlog
Sage Weil
01:57 PM Feature #2281 (Resolved): build big burnupi cluster for testing
Sage Weil
01:45 PM Feature #2349 (Resolved): rados bench: Extra statistics
* min/max for throughput, and std deviations for both throughput and latency.
* timestamp each line of output
* hea...
Mark Nelson
01:42 PM Feature #2111 (Fix Under Review): msgr workloads
This has been sitting in my issue area for a while and it's distracting, so I'm reassigning for review. :) Greg Farnum
01:42 PM Bug #2348 (Resolved): osd: peer_info_requested not pruned with prior_set.probe
commit:9023aedf143aa22c14f7a73b73837080b455942d Sage Weil
01:41 PM Bug #2345 (Fix Under Review): mon: users can create both pool snapshots and self-managed snapshot...
Pushed a fix (and tested it using rados snap create and rbd snap create) to wip-2345-snaps. Greg Farnum
01:14 PM Bug #2345 (In Progress): mon: users can create both pool snapshots and self-managed snapshots on ...
Ah, there are Monitor checks for this but they aren't quite right: they look at the contents of snaps and removed_sna... Greg Farnum
11:42 AM Bug #2345 (Resolved): mon: users can create both pool snapshots and self-managed snapshots on a s...
When the OSD gets the map it breaks, but the Monitor will happily set both. There should be guard against that.
(See...
Greg Farnum
01:33 PM rbd Cleanup #2347 (Resolved): The rbd help text is misleading on required arguments
The help text has a lot of arguments that look optional (being enclosed by brackets or angle brackets) that are actua... Greg Farnum
01:29 PM Feature #2336 (Fix Under Review): qemu: wire up discard
See wip-discard in qemu-kvm.git. This works with ide and scsi, but qemu doesn't have trim support for virtio right no... Josh Durgin
01:02 PM Bug #2316 (In Progress): rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-...
Josh Durgin
12:35 PM Bug #2346 (Resolved): xfs filesystem on top of rbd volume corrupts
Ceph version: 0.44.1-1~bpo70+1
Kernel version: 3.2.12-1
Ceph config:
[global]
auth supported = cephx
keyring =...
Maciej Galkiewicz
10:35 AM Linux kernel client Bug #2260: libceph: null pointer dereference at try_write+0x638+0xfb0
Yay!!! I reproduced this with KDB enabled. Now to start poking around on the
machine to see what I can learn.
Alex Elder

04/24/2012

07:03 PM Feature #2336: qemu: wire up discard
Trying to get gitbuilders to work again so I can test with packages... Josh Durgin
12:17 PM Feature #2336 (In Progress): qemu: wire up discard
Josh Durgin
06:04 PM Feature #1937 (Resolved): teuthology: --unlock option for -nuke
Added in teuthology commit:25114bf9a4e2e237e0d3cb8fcf77b66d4ff87234. Josh Durgin
12:16 PM Feature #1937 (In Progress): teuthology: --unlock option for -nuke
Josh Durgin
05:07 PM Bug #2286 (Resolved): mon: different full/near_full values on different monitors
Sage Weil
03:58 PM Bug #2286 (Fix Under Review): mon: different full/near_full values on different monitors
Actually I broke out the ENCODE_START bit into #2344.
So Sage, you want to merge the new wip-2286-map-fix into nex...
Greg Farnum
02:21 PM Bug #2286 (In Progress): mon: different full/near_full values on different monitors
Sage points out we need to convert the PGMap (and Incremental) encode/decode to use the auto-versioning stuff. I'll p... Greg Farnum
01:37 PM Bug #2286 (Fix Under Review): mon: different full/near_full values on different monitors
wip-2286-map-fix has the commit to fix this.
I couldn't come up with a great test for it but I did create a cluster ...
Greg Farnum
01:08 PM Bug #2286: mon: different full/near_full values on different monitors
Okay, pretty sure this is a result of a pre-upgrade PGMap::Incremental being committed by post-upgrade code. Sage poi... Greg Farnum
09:43 AM Bug #2286 (In Progress): mon: different full/near_full values on different monitors
Greg Farnum
04:58 PM Feature #2319 (Fix Under Review): mon: block osd mark-down
Sage Weil
10:21 AM Feature #2319 (In Progress): mon: block osd mark-down
Sage Weil
04:58 PM Feature #2317 (Fix Under Review): mon: pause/unpause auto-mark-out
Sage Weil
10:21 AM Feature #2317 (In Progress): mon: pause/unpause auto-mark-out
Sage Weil
04:58 PM Feature #2318 (Fix Under Review): mon: block osd boot
Sage Weil
10:21 AM Feature #2318 (In Progress): mon: block osd boot
Sage Weil
03:54 PM Bug #2338: mon: adding new monitors simultaneously can allow a new mon to become leader
Okay, looks like the new Monitor isn't joining properly because in this case it's been fed a monmap that contains its... Greg Farnum
03:51 PM Bug #2267: Ceph client crashed after shutting down one mds and osd
One more update. Maciej reports that after restoring the server again,
and restarting things (this again on the pro...
Alex Elder
12:52 PM Bug #2267: Ceph client crashed after shutting down one mds and osd

Staging cluster:
- running 0.45 on node n2cc: mds.n2cc, osd.1
- running 0.44 on node cc: mon.cc, mds.cc, osd.0
...
Alex Elder
12:29 PM Bug #2267: Ceph client crashed after shutting down one mds and osd
Reportedly, with the ceph configuration defined herein, the problem
shows up after simply restarting one of the osd ...
Alex Elder
12:15 PM Bug #2267: Ceph client crashed after shutting down one mds and osd
This could be in the same family as other problems in con_work() we've
seen lately (for example, 2260). Hoping a si...
Alex Elder
03:26 PM Cleanup #2344 (Resolved): convert Monitor maps to use ENCODE_START
Looks like the Monitor stuff doesn't use the ENCODE_START idiom at all, and it should. Greg Farnum
02:01 PM Bug #2339: osd: EBUSY on object delete when watchers present
see #2343 Sage Weil
01:49 PM Bug #2339: osd: EBUSY on object delete when watchers present
After some discussion, it seems like the right path is to:
- associate an error code with notify events (e.g., 0 =...
Sage Weil
06:44 AM Bug #2339 (Resolved): osd: EBUSY on object delete when watchers present
in particular, you can't delete an rbd image that has watchers. that may be okay if there is someone actively mounti... Sage Weil
01:52 PM Fix #2343 (Resolved): librados, osd: notify with special type on object deletion
If a watched object is deleted, it would be useful to have a different type of notify sent to the watchers, letting t... Josh Durgin
01:37 PM Bug #2342: librados: notify deadlock
This bug was seen repeatedly on the 3 test clusters built for congress. After several hours of testing rgw would hang... Mark Nelson
01:15 PM Bug #2342 (Resolved): librados: notify deadlock
... Yehuda Sadeh
10:21 AM Feature #2334 (In Progress): mon: set max mark-out or mark-down
Sage Weil
10:03 AM Bug #2341 (Resolved): mon: pgstats timeout still broken
commit:2fa90e02711100170423565302ec07b68d5e0aef Sage Weil
09:48 AM Bug #2341 (Resolved): mon: pgstats timeout still broken
2012-04-24 13:22:47.888291 7fa5bc587700 mon.peon5752@0(leader).osd e21633 OSDMonitor::handle_osd_timeouts: never got ... Sage Weil
07:08 AM Bug #2311: rbd: delete + create image led to EEXIST
Hi Sage,
yeah, this time might have been from a stale VM, but the other tests should have shown, that I normally s...
Oliver Francke
06:45 AM Bug #2311 (Resolved): rbd: delete + create image led to EEXIST
Ah, the problem is that the rbd head object has watchers (it is mounted) and the delete request returned EBUSY, but l... Sage Weil
12:51 AM Bug #2311: rbd: delete + create image led to EEXIST
Well, here we go with some output:
rbd create --size 2048 data/906-testdisk.rbd
create error: 2012-04-24 09:36:01...
Oliver Francke
07:01 AM RADOS Subtask #2340 (Resolved): crush: remove forcefeeding
Sage Weil
07:01 AM Feature #2314: remove localized pgs
did another test that included broken lpgs (wouldn't create) and verified things behaved. merged the first piece of ... Sage Weil
 

Also available in: Atom