Project

General

Profile

Activity

From 10/31/2011 to 11/29/2011

11/29/2011

11:36 PM Revision 30ede648 (ceph): Makefile: ipaddr.h, pick_address.h
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:05 PM rbd Cleanup #1761: krbd: make block/segment naming consistent
Segment refers to a partial range, a part of an object, so I think we should keep it in this context. So object shoul... Yehuda Sadeh
09:15 PM rbd Cleanup #1761 (Resolved): krbd: make block/segment naming consistent
pick consistent term for an object (segment or object, but not block) and use throughout. Sage Weil
09:31 PM Revision 77a62fdc (ceph): Makefile: add missing uuid.h to tarball
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:30 PM Revision ebb585d9 (ceph): Objecter: fix local reads in recalc_op_target
We want to use the actual OSD, not the index into the array!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
05:27 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Actually, maybe you run with the wip-truncate branch on the mds and see if you triggers a failed assertion on the MDS... Sage Weil
05:19 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Do you by chance have the log preceeding the first crash?
Working around this is probably a matter of patching wit...
Sage Weil
11:28 AM Bug #1759 (Resolved): mds/client: truncate size overflow, fails with EINVAL
My version of ceph is a minor variant of 0.38, running with ext4, and ceph-fuse. It looks like my fs has gotten corr... Sam Lang
05:07 PM CephFS Cleanup #814: hadoop: refactor hadoop shim in terms of java libceph bindings
http://www.debian.org/doc/packaging-manuals/java-policy/x105.html Sage Weil
04:28 PM Revision 8788a404 (ceph): osd: subscribe to next map if flagged FULL
This ensures the osd finds out when we become un-full in a timely manner.
Fixes: #1755
Signed-off-by: Sage Weil <sag...
Sage Weil
04:26 PM CephFS Bug #1760 (Resolved): multiple_rsync workunit cannot remove non-empty directory intermittently
This has occurred in half of the regression runs since 11/24: ... Josh Durgin
10:52 AM Bug #1757: oi disagrees with stat, or error code on stat
As we talked at #ceph, I've updated kernel to 3.2-rc2 and patched osd with this workaround http://fpaste.org/PKwW/, n... Szymon Szypulski
08:25 AM Bug #1757: oi disagrees with stat, or error code on stat
The fix for #1612 is upstream kernel commit:ed3ee9f44ba55eb6acfbfc8caa881e0253710d2a. Does your kernel on the osds h... Sage Weil
01:52 AM Bug #1757 (Closed): oi disagrees with stat, or error code on stat
I've similar bugs #1334, #1473 which should be solved by #1612, but it doesn't help.
Ubuntu natty, ceph 0.38 with ...
Szymon Szypulski
09:05 AM Bug #1758 (Can't reproduce): OSD segfault in SimpleMessenger::send_message
in the 11/29 nightlies, cfuse_workunit_misc (3335) the osd on sepia5 seg-faulted.
The end of the osd log is:
2011-1...
Anonymous
08:59 AM Bug #1755 (Resolved): OSD: subscribe to map updates on FULL flag
commit:8788a404ae4a10cd10ec8048f0b32d473640a607 Sage Weil
08:25 AM Bug #1612: osd/PG.cc: 3839: FAILED assert(missing[oid].need <= v)
upstream kernel commit:ed3ee9f44ba55eb6acfbfc8caa881e0253710d2a Sage Weil
05:39 AM Revision c2889fef (ceph): mds: encode truncate_pending in inode
Otherwise we don't actually journal this value, and we get confused when
we replay a start_truncate and try to restar...
Sage Weil

11/28/2011

10:11 PM CephFS Bug #1756: mds crash right after successful recovery
This should let you restart your mds:... Sage Weil
09:28 AM CephFS Bug #1756 (Resolved): mds crash right after successful recovery
Ubuntu Natty, ceph 0.38, kernel 2.6.38-12-server, 2x separate mds daemons crashed in the middle of the night
* sho...
Szymon Szypulski
08:52 PM Revision 98e0a6fd (ceph): uclient: remove filer_flags and use Objecter::global_op_flags instead
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
08:52 PM Revision da2e0c3c (ceph): Objecter: add a new global_op_flags that is passed to every Op construc...
We can use this for a global use of LOCALIZE_READS (and are about
to do so!).
Signed-off-by: Greg Farnum <gregory.fa...
Greg Farnum
08:30 PM Revision 51385930 (ceph): Objecter: remove unused variable in op_submit
These flags are probably relics from when the function got split;
they belong in send_op now.
Signed-off-by: Greg Fa...
Greg Farnum
06:32 PM Revision 4974a9c2 (ceph): uclient: remove useless if-else based on snapid
These are the same command anyway!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
05:01 PM Revision cef16732 (ceph): debian init: Do not stop or start daemons when installing or upgrading
Signed-off-by: Wido den Hollander <wido@widodh.nl> Wido den Hollander
03:49 PM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
This is using the ceph filesystem, not rbd. Josh Durgin
11:12 AM CephFS Bug #1749: nonexistent directory in kclient_workunit_kernel_untar_build
This could have the same (unknown) root cause as #1741. Anonymous
09:46 AM Feature #1736 (Resolved): collectd: hacky script to generate types.db from perfcounter schema
Sage Weil
09:26 AM Bug #1755 (Resolved): OSD: subscribe to map updates on FULL flag
When the OSDs get a full flag they stop most of their activity, which shuts down the usual map propagation methods. T... Greg Farnum
09:14 AM Bug #1631: osd: failed assert(repop_queue.front() == repop)
Ok, pretty sure this is related to the reconnect. We need to put together a test that artificially triggers messenge... Sage Weil
12:11 AM Revision ce657227 (ceph): mon: search for local ip during mkfs
If an address isn't explicitly specified during mkfs, look for an unnamed
monitor in the (generated) monmap and see i...
Sage Weil
12:11 AM Revision 61b9db3a (ceph): pick_address: implement have_local_addr()
Check for a local ip from within a list of addresses.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:04 AM Revision 84b00597 (ceph): monclient: name nameless monitors noname-<foo>
This makes them easy to pick out as unnamed.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil

11/27/2011

10:50 PM Revision 7a453402 (ceph): pick_address: whitespace
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:44 PM Bug #1751: Copy in CEPH too slow
rbd only. there no plan yet for reflink(2) in the ceph filesystem. Sage Weil
02:48 PM Bug #1751: Copy in CEPH too slow
Is clone for rbd only or for files too.
Copy of files is slow too.
max mikheev
02:45 PM Bug #1751 (Duplicate): Copy in CEPH too slow
A 'clone' operation that does copy-on-write is coming in the next couple weeks. See #988 Sage Weil
05:39 PM Feature #1754 (Resolved): qa: run other suites nightly as well
stick suite name in mail subject?
run all suites nightly (not just regression)
Sage Weil
04:32 PM CephFS Bug #1746 (Resolved): PerfCounters::set segfault
Sage Weil
04:32 PM Bug #1727 (Resolved): osd: failed assert(pending_ops > 0) in dequeue_op
Sage Weil
04:30 PM Feature #1647 (Resolved): mon: robust bootstrap
Sage Weil

11/25/2011

02:08 PM CephFS Bug #1753 (Won't Fix): ceph copy raw images from qemu incorrectly
Hi,
Ceph cannot correctly handle raw images from qemu incorrectly:
oneadmin@s2-8core:~/OpenNebula/var/images/tm...
max mikheev

11/24/2011

01:02 PM CephFS Bug #1752 (Can't reproduce): ceph-fuse isn't releasing caps without flushing data?
Xiaofei Du reported on the mailing list that running an "ls" on a directory with multiple writers takes a while (much... Greg Farnum
10:16 AM Bug #1751 (Duplicate): Copy in CEPH too slow
Hi,
The copy operations for files and for rbd images are too slow. The ceph is a copy on write system I think the c...
max mikheev

11/23/2011

11:56 PM Revision 30def38d (ceph): corrected variable (con) to be consistent with prior examples (cluster)
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com> Mark Kampe
10:07 PM Revision 934e1e52 (ceph): ReplicatedPG: Also count overlaps for snapsets on snapdirs
Previously, the overlaps for snapdirs would not be included in
cstat causing the computed total to be incorrect.
Sig...
Samuel Just
10:07 PM Revision 97d82ed9 (ceph): ReplicatedPG: Account for clone space usage in make_writeable
Previously, we accounted for clone space usage inconsistently in
write_update_size_and_usage etc when walking through...
Samuel Just
05:09 PM Bug #1631: osd: failed assert(repop_queue.front() == repop)
This happened again with the same workload in /var/lib/teuthworker/archive/nightly_coverage_2011-11-23-b/3034/remote/... Josh Durgin
05:06 PM Bug #1530: osd crash during build_inc_scrub_map
A new crash during scrub from /var/lib/teuthworker/archive/nightly_coverage_2011-11-23-b/3051/remote/ubuntu@sepia71.c... Josh Durgin
05:02 PM Bug #1676 (Resolved): stats mismatch during snaps workunit
97d82ed950b26cfaef4267ee44edd9ad927fb828 and 934e1e52514b6036c91c1c7db1c8b6727ac8c6d8 should take care of the size di... Samuel Just
09:41 AM Bug #1676: stats mismatch during snaps workunit
I do not know if this is likely to be related, but in the 11/23a nightlies, 3027 (rgw_s3tests)
1 Aborts found in 3...
Anonymous
05:00 PM Bug #1750 (Closed): xattr errors silently ignored, cause trouble later
Comment
I do not know if this is likely to be related, but in the 11/23a nightlies, 3027 (rgw_s3tests)
1 Aborts f...
Samuel Just
02:45 PM Revision 32a68378 (ceph): Merge branch 'wip-mon'
Sage Weil
02:44 PM Revision ad13d0b7 (ceph): ceph: fix shutdown race
Shut down MonClient before messenger, to avoid race with MonClient::tick()
and MonClient::shutdown().
Fixes
#0 __l...
Sage Weil
01:33 PM Bug #1744: teuthology: race with daemon shutdown?
Josh saw similar, it seems the ctx.daemons data structure loses entries / they never get added / something. So far, r... Anonymous
09:27 AM CephFS Bug #1749 (Can't reproduce): nonexistent directory in kclient_workunit_kernel_untar_build
In the 11/23a nightlies, 3003, there may have been
a transient directory access error:
... lots of stuff works
2...
Anonymous
09:11 AM CephFS Bug #1748 (Can't reproduce): mds segfault CDir::project_fnode
In the 11/23a nighlies, 2995/remote/ubuntu@sepia75.ceph.dreamhost.com/log/mds.0.log.gz
2011-11-22 23:59:14.857453 ...
Anonymous
07:16 AM Feature #1487 (Resolved): config: {cluster,public}_subnets
Sage Weil
04:52 AM Revision 414caa7d (ceph): common/pick_address: Fix IP address stringification.
Different sockaddr_* have the actual address (sin_addr, sin6_addr)
at different offsets, and sockaddr->sa_data just i...
Tommi Virtanen
12:28 AM Revision 9870e2f7 (ceph): mon: pick_addresses before common_init_finish
We can't modify g_conf->public_addr after that.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:22 AM Revision 036ad4c7 (ceph): mon: set default port if not specified...
...when looking for self in monmap during mkfs.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:04 AM Revision 0045c901 (ceph): monmap: assign rank by sorting addr, not name
This allows monitors to bootstrap knowing peer addrs but not their names,
as when we specify mon_host.
Signed-off-by...
Sage Weil
12:04 AM Revision 36978a63 (ceph): mon: calculate rank by addr, not name
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

11/22/2011

11:06 PM Revision ebe5fc60 (ceph): obsync: tear out rgw
Yehuda Sadeh
10:53 PM Revision 3a20b425 (ceph): mon: name self in monmap if --public-addr specified during mkfs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:40 PM Messengers Bug #1747 (Resolved): msgr: osd connection originates from wrong port
osd.2 sends a couple messages to osd.1:... Sage Weil
06:31 PM Revision a859763b (ceph): rgw: don't remove tail of lru if that's what we touch
Yehuda Sadeh
06:09 PM Revision aeeeade6 (ceph): mon: mark down all connections when rank changes
The election and some other stuff depend on msg->get_source().num() to get
the peer rank, and that is part of the con...
Sage Weil
06:08 PM Revision bed3c472 (ceph): mon: handle rank change in bootstrap
The rank can change either because we probe and get a new monmap, or
because we get one via paxos. Move the checks t...
Sage Weil
05:53 PM Revision 8b464093 (ceph): mon: pick an address when joining and existing cluster
If we are joining an existing cluster, we can pick whatever address we
want (e.g., one specified by public_addr or pu...
Sage Weil
05:52 PM Revision 5ba356b3 (ceph): mon: remove unused myaddr
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:52 PM Revision 0c9724d6 (ceph): mon: simplify suicide when removed from map
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:02 PM rgw Feature #1697 (Resolved): s3-tests: test bucket headers
Fixed, added the following tests:
s3tests.functional.test_headers.test_bucket_put_bad_canned_acl
s3tests.function...
Yehuda Sadeh
10:33 AM rgw Bug #1719 (Resolved): rgw: crash in ObjectCache::touch_lru
should be fixed by commit:a859763b1cba844d0d56b861a372e5f63f87c607. Yehuda Sadeh
05:58 AM Revision 24ee09b0 (ceph): Revert "more logs (yuck) for #1682"
This reverts commit ea00114f08440563bce8e27ae2cd887bbc85aba5. Sage Weil
01:46 AM Revision eb8d91fe (ceph): PG: it's not necessary to call build_inc_scrub_map in build_scrub_map
Because we have called osr.flush(), it's safe to tag map.valid_through
as last_update. We will still have to catch ...
Samuel Just
12:17 AM Revision 0f4b59a4 (ceph): Merge remote branch 'gh/subnet'
Sage Weil
12:00 AM Revision c651c88e (ceph): Properly handle case where first error is inside a context manager __ex...
Closes: http://tracker.newdream.net/issues/1743 Tommi Virtanen
12:00 AM Revision fab1e55e (ceph): Merge remote branch 'gh/wip-mon'
Sage Weil

11/21/2011

10:27 PM Revision eec61b48 (ceph): common/ipaddr: Add utility function to parse ip/cidr style networks.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
10:27 PM Revision 0477f238 (ceph): common/pickaddr: Pick cluster_addr/public_addr based on *_network.
Tommi Virtanen
10:27 PM Revision c066e926 (ceph): mds, osd, synclient: Pick cluster_addr/public_addr based on *_network.
Instead of specifying an IP address in ceph.conf like
[global]
cluster_addr = 10.1.2.3
you can now avoid the node...
Tommi Virtanen
10:27 PM Revision 0f748d4c (ceph): common/ipaddr: Find a configured IP address in given subnet.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
10:07 PM CephFS Bug #1549 (In Progress): mds: zeroed root CDir* vtable in scatter_writebehind_finish
Sage Weil
09:56 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
happened again on /var/lib/teuthworker/archive/nightly_coverage_2011-11-21-b/2818
This may be the same root cause ...
Sage Weil
09:37 PM Revision 2bae3506 (ceph): osd: Remove unused variable.
Tommi Virtanen
09:37 PM Revision 0f9a0605 (ceph): common/str_list: Make unused return value void.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
09:37 PM Revision 97464bca (ceph): msg: Move public_addr use outside ->bind()
Tommi Virtanen
09:28 PM Revision 3c8fec2d (ceph): osd: fix 'stop' command
Special case. We can't join the command_tp thread from itself.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
09:23 PM Revision b47347bd (ceph): osd: protect handle_osd_map requeueing with queue lock
pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock. Messy. Also, ...
Sage Weil
07:15 PM Revision 70dfe8e9 (ceph): osd: lock pg when requeuing requests
The op queue is shut down, so this is mostly safe, unless someone comes
through and does requeue_ops() from a callbac...
Sage Weil
06:33 PM Revision 811145f7 (ceph): paxosservice: tolerate _active() call when not active
This can happen when multiple C_Active events are queued, and the first
does a propose_pending() (moving us into upda...
Sage Weil
05:19 PM Revision 88963a18 (ceph): objecter: simplify map request check
We should request a missing/intervening map if it appears to exist.
Otherwise, skip it.
Signed-off-by: Sage Weil <sa...
Sage Weil
05:19 PM Revision cd2e523f (ceph): objecter: cancel tick event on shutdown
Hopefully this is the root cause for
2011-11-20 23:57:41.555292 7f75dd743780 ceph version 0.38-205-g3b53b72
(commit:...
Sage Weil
05:01 PM rgw Bug #1719: rgw: crash in ObjectCache::touch_lru
I think what happens here is that the entry that we touch happens to be the one that we dispose of (at the tail of th... Yehuda Sadeh
04:02 PM Bug #1743 (Closed): teuthology: not exiting with error when ceph-fuse shutdown fails
commit c651c88eacf9c3bbf1f037be3a5dc0425308c730
Author: Tommi Virtanen <tv@eagain.net>
Date: 2011-11-21 16:00:19 ...
Anonymous
03:42 PM Bug #1743: teuthology: not exiting with error when ceph-fuse shutdown fails
This reproduced it nicely:
diff --git a/teuthology/task/internal.py b/teuthology/task/internal.py
index 58e7f14...
Anonymous
03:57 PM Bug #1744: teuthology: race with daemon shutdown?
Tommi Virtanen wrote:
> Was this using any one of the following?
>
> teuthology/task/lost_unfound.py
> teutholog...
Sage Weil
03:33 PM Bug #1744: teuthology: race with daemon shutdown?
Was this using any one of the following?
teuthology/task/lost_unfound.py
teuthology/task/mon_recovery.py
teuthol...
Anonymous
02:57 PM Bug #1741: teuthology: failed to untar
The path mentioned above is incorrect. Run nightly_coverage_2011-11-18-2/2663 failed because of network failure.
T...
Anonymous
02:52 PM Bug #1741: teuthology: failed to untar
This is exactly what would happen if someone nuked the machine, or locking failed and someone else ran a faster test ... Anonymous
01:29 PM Bug #1727: osd: failed assert(pending_ops > 0) in dequeue_op
hopefully fixed by commit:b47347bd7c377037f7fbc199f0c88b447c9626d1 Sage Weil
08:59 AM Bug #1727: osd: failed assert(pending_ops > 0) in dequeue_op
Happened again in the 11/21 nightlies - 2791, sepia33 Anonymous
09:53 AM Bug #1742 (Rejected): qa: s3-tests failed 100-continue test on sepia
This was due to an old entry in /etc/apt/sources.list - older versions of the apache packages were still used. The ch... Josh Durgin
09:43 AM rbd Feature #1713: teuthology: qemu tasks, tests
Sorry comment #2 was meant for another bug.
Anonymous
09:42 AM rbd Feature #1713: teuthology: qemu tasks, tests
This is in the plans after the new sepia hardware is in place; current sepia re-install is too slow & painful to dare... Anonymous
09:23 AM CephFS Bug #1746: PerfCounters::set segfault
i think this is objecter event teardown. see commit:cd2e523fba1d6cf8d15e7a349ad700b744f24ecf Sage Weil
09:05 AM CephFS Bug #1746 (Resolved): PerfCounters::set segfault
In the 11/21 nightlies, while trying to run workunit/ffsb,
2779/remote/ubuntu@sepia57.ceph.dreamhost.com/log/mon.2.l...
Anonymous
08:57 AM Bug #1530: osd crash during build_inc_scrub_map
Both of the above described variants occurred in the 11/21 nightlies
(2775:sepia17, 2783:sepia81, 2805:sepia82)
Anonymous

11/20/2011

11:24 PM Revision ea00114f (ceph): more logs (yuck) for #1682
Sage Weil
10:26 PM Revision f6070282 (ceph): paxos: fix sharing of learned commits during collect/last
We can learn either an uncommitted or committed value during the
collect/last recovery phase. For the committed valu...
Sage Weil
09:18 PM Revision 3b53b722 (ceph): rgw: support alternative date formatting
being used by s3cmd Yehuda Sadeh
09:05 PM Feature #1745 (Closed): teuthology: make interactive-on-error stop further cleanup
It would be nice if a failure in cleanup with prevent further cleanup when interactive-on-error is true. For example... Sage Weil
09:03 PM Bug #1744 (Resolved): teuthology: race with daemon shutdown?
... Sage Weil
08:02 PM Bug #1743 (Closed): teuthology: not exiting with error when ceph-fuse shutdown fails
here's the log tail:... Sage Weil
03:23 PM CephFS Bug #1682: mds: segfault in CInode::authority
Hrm, this has me stumped.
The log leading up is...
Sage Weil
04:56 AM Revision 4b53288b (ceph): ceph_manager: %
Sage Weil
04:56 AM Revision 721c0e97 (ceph): nuke: don't specify full path
/tmp/cephtest/binary may have been removed; kill stray daemons by name
only. we really don't care about false positi...
Sage Weil
03:28 AM Revision dcab329b (ceph): fix conf thinko
'int' object has no attribute 'iteritems' Sage Weil

11/19/2011

10:30 PM Revision becfce35 (ceph): mon: share random osd map from update_from_paxos, not committed()
This will let us remove committed() entirely.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:30 PM Revision b521710f (ceph): mon: mdsmon: tick() from on_active() instead of committed()
Same effect, and avoids useless committed().
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:30 PM Revision 10fed791 (ceph): paxosservice: remove unused committed() callback
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:30 PM Revision 9aabd398 (ceph): paxosservice: consolidate _active and _commit
Use the same callback for when paxos goes active and for when it commits
something. The response in both cases is th...
Sage Weil
09:56 PM Revision 9920a168 (ceph): config: support --no-<foo> for bool options
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:56 PM Revision 1a468c7e (ceph): config: whitespace
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:56 PM Revision a08e7f12 (ceph): regression/basic/tasks/kclient_workunit_misc: turn on mds log
Hopefully will catch #1682 Sage Weil
09:45 PM Revision 13c98df9 (ceph): regression/basic/tasks/cfuse_dbench: turn up client debugging
Hopefully we'll hit #1737... Sage Weil
02:28 PM Bug #1732 (Can't reproduce): osdmap assert fail during rados bench
Sage Weil
02:03 PM Bug #1742 (Rejected): qa: s3-tests failed 100-continue test on sepia
/var/lib/teuthworker/archive/nightly_coverage_2011-11-18-2/2683
and the chef task _did_ run...
Sage Weil
01:59 PM Bug #1741 (Can't reproduce): teuthology: failed to untar
teuthology:/var/lib/teuthworker/archive/nightly_coverage_2011-11-18-2/2662... Sage Weil
01:54 PM CephFS Bug #1573 (Duplicate): mds crash during multiple_rsync workunit
Sage Weil
12:13 AM Revision cc5b5e17 (ceph): osdmon: set the maps-to-keep floor to be at least epoch 0
Looks like this conditional was just set backwards by mistake. There
have been a number of issues with OSDMap version...
Greg Farnum

11/18/2011

11:57 PM Revision 45cf89c1 (ceph): Revert "osd: simplify finalizing scrub on replica"
This reverts commit dd5087fabb2a743741a96ee4610379afa8431f68.
Calling osr.flush() is not quite enough since the onre...
Samuel Just
11:56 PM Revision 57ad8b2e (ceph): FileStore.cc: onreadable callbacks in OpSequencer order is enough
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:19 PM rbd Bug #1740: krbd: don't return head data when reading from a non-existent snapshot
The requests are made for the head version, since the removed snapid is not found when looking up the snapshot name i... Josh Durgin
08:58 PM rbd Bug #1740: krbd: don't return head data when reading from a non-existent snapshot
Hmm, what should they return? -ENXIO or -EIO or something? What is the OSD returning in this case?
Sage Weil
05:11 PM rbd Bug #1740 (Resolved): krbd: don't return head data when reading from a non-existent snapshot
If you have an rbd image mapped at a snapshot, and then delete the snapshot, any subsequent reads succeed and give yo... Josh Durgin
09:53 PM Revision 508f4f83 (ceph): Save summary after nuking machines.
This way you can tell when tests are entirely finished running. Josh Durgin
08:22 PM Revision 91cfdfea (ceph): Add an example overrides file for running regression tests.
Josh Durgin
06:21 PM Revision 7c8a7a89 (ceph): Move multimds tests to a new suite, 'experimental'.
This suite is for testing features that aren't expected to be stable yet. Josh Durgin
05:49 PM Revision 09c20c51 (ceph): objecter: trigger oncommit acks if the request returns an error code.
Many users only set oncommit acks, so if they get an error code
(which comes only as a CEPH_OSD_OP_ACK right now) the...
Greg Farnum
05:49 PM Revision dedf2c4a (ceph): osd: error responses should trigger all requested notifications.
There's no good reason I can find to limit error code responses to
the ACK.
Signed-off-by: Greg Farnum <gregory.farn...
Greg Farnum
05:49 PM Revision 9800faeb (ceph): paxos: do not create_pending if !active
This avoids a scenario like:
- _active()
- proposes value
- _commit()
- creates new pending, even though in upda...
Sage Weil
05:43 PM Revision fa587687 (ceph): Revert "mon: don't propose new state from update_from_paxos"
This reverts commit 66c628acc8be71a92e801179431e4b938b857b3d. Sage Weil
05:15 PM rgw Feature #1482 (Resolved): qa: swift-tests
testswift was added to teuthology. Yehuda Sadeh
05:14 PM rgw Feature #1664 (Resolved): rgw: pass swift tests
We pass most of the tests, other than a few which we don't intend to fix at this point (different enforced limits) an... Yehuda Sadeh
05:00 PM rgw Feature #1739 (Resolved): rgw: multipart upload should use manifest object
Yehuda Sadeh
04:39 PM RADOS Bug #1738 (Duplicate): bad crushmap behavior
./osdmaptool --test-map-pg 1.21 <attached osdmap>
pg 1.21 ends up mapped only to osd3 despite there being two othe...
Samuel Just
02:40 PM Bug #1530: osd crash during build_inc_scrub_map
Got a couple more of these today: teuthworker/archive/nightly_coverage_2011-11-18-2/2649/remote/ubuntu@sepia56.ceph.d... Josh Durgin
02:37 PM CephFS Bug #1682: mds: segfault in CInode::authority
Another crash is CInode::Authority happened today, although a different backtrace.
From teuthology:~teuthworker/arc...
Josh Durgin
02:35 PM CephFS Bug #1737 (Resolved): ceph-fuse crash in xlist::remove
From teuthology:~teuthworker/archive/nightly_coverage_2011-11-18-2/2645/remote/ubuntu@sepia13.ceph.dreamhost.com/log/... Josh Durgin
10:11 AM Bug #1351 (Resolved): rados bench should report errors
Fixed by commit:dedf2c4a066876bdab9a0b0154196194cefc1340. Greg Farnum
04:45 AM Revision 66c628ac (ceph): mon: don't propose new state from update_from_paxos
Proposing a new state from within update_from_paxos() confuses some callers,
like PaxosService::_active(). Instead, ...
Sage Weil
04:28 AM phprados Tasks #869 (Resolved): Update to new librados API
Ok, it took some time, but it's done.
v0.9.3 is updated to the librados2 API and wraps all the C functions into PHP.
Wido den Hollander
01:57 AM Revision 94100ad0 (ceph): Move collections into separate suites
For now, there are just two suites:
* regression - tests that should always pass
* stress - tests that have p...
Josh Durgin
01:26 AM Revision 42cecb5e (ceph): suite: put common config before facets
This lets you add tasks to the beginning of a run, like the chef task. Josh Durgin
01:16 AM Revision 044a88ce (ceph): suite: schedule a list of collections for running instead of a single s...
Josh Durgin
01:00 AM Revision d8fc1513 (ceph): Clean up C++isms.
Tommi Virtanen
12:55 AM Revision 6ae0f81e (ceph): rgw: if swift url is not set up, just use whatever client used
Yehuda Sadeh
12:53 AM Revision 23aae67a (ceph): testswift: fix config
Yehuda Sadeh
12:53 AM Revision 6236e7db (ceph): testswift: fix config
Yehuda Sadeh
12:49 AM Revision c5450948 (ceph): Add a task for easily running chef-solo on all the nodes.
Tommi Virtanen

11/17/2011

11:01 PM Revision ef5ca293 (ceph): fuse: fix readdir return code
Ignore ENOSPC generated by our own callback, as it is only used to
terminate the loop.
Broken by commit cd90061239a5...
Sage Weil
10:11 PM Revision d61ba644 (ceph): paxos: fix trimming when we skip over incrementals
Remove open-coded trimming of old states and use our method (that also
removes additional per-state files). Fixes ol...
Sage Weil
10:10 PM Revision 367ab142 (ceph): paxos: store stashed state _and_ incrementals
Paxos::share_state() may share a stashed state and incrementals that
follow; we need to store the same.
Signed-off-b...
Sage Weil
09:53 PM Revision 6bc9a544 (ceph): mon: elector: always start election via monitor
Don't go from active -> electing without passing (monitor) go.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:46 PM Revision 89f80412 (ceph): ceph_manager: fix logging
Sage Weil
09:23 PM Bug #1708 (Resolved): mon/PGMonitor.cc: 218: FAILED assert(paxos->get_version() + 1 == pending_in...
This latest variation should be fixed by commit:66c628acc8be71a92e801179431e4b938b857b3d. Thanks for the log! Sage Weil
05:18 PM Bug #1708: mon/PGMonitor.cc: 218: FAILED assert(paxos->get_version() + 1 == pending_inc.version)
Yes, I still get the problem with an updated master 6bc9a544b62bb21f6ee7ef51bfbe9111f7add9cb
I had monitor debuggi...
Josh Pieper
09:07 PM Revision f85f5dd7 (ceph): ceph: deep merge overrides, so e.g. log whitelists can be overridden
Josh Durgin
09:06 PM Revision a7632976 (ceph): misc: move deep_merge out of the MergeConfig class - it's generic
Josh Durgin
08:07 PM Revision 685450b7 (ceph): common: libraries should not log to stdout/stderr
Certainly not by default.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:57 PM Revision c6988a07 (ceph): Save config after locking nodes, so targets are included.
Josh Durgin
07:56 PM Revision f1dd56d9 (ceph): objecter: set skipped_map if we skip a map
This ensures that we resend _all_ requests, since we aren't sure which
may have mapped to a different primary and the...
Sage Weil
07:39 PM Revision 5afef020 (ceph): objecter: add is_locked() asserts
Sanity check.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:39 PM Revision bf91177e (ceph): objecter: send slow osd MPing via Connection*
This may address #1732 indirectly because we have a Connection* reference
here. However, it's still not clear how we...
Sage Weil
07:18 PM Revision 4e6cd55c (ceph): filestore_idempotent: remove unused import
Josh Durgin
07:16 PM Revision 7d51e3d3 (ceph): mon_recovery: remove unused code and import
Josh Durgin
07:11 PM Revision f4d527e7 (ceph): thrashosds: timeout for every clean check, not just the last one
Josh Durgin
07:05 PM Revision 9d12b720 (ceph): ceph_manager: add a default timeout of 5 minutes for mon quorum
Josh Durgin
06:45 PM Revision cb9ac089 (ceph): ceph_manager: log mon quorum status so the logs show progress (or lack ...
Josh Durgin
05:42 PM Bug #1351: rados bench should report errors
Quick skim analysis:
If there's an error, the OSD returns it as an ACK.
The objecter only sends back data on the re...
Greg Farnum
11:05 AM Bug #1351: rados bench should report errors
This is probably what caused #1734. Josh Durgin
05:03 PM Feature #1736 (Resolved): collectd: hacky script to generate types.db from perfcounter schema
... Sage Weil
04:48 PM rgw Bug #1729 (Resolved): test_object_create_bad_expect_empty
Sage Weil
03:22 PM rgw Bug #1729: test_object_create_bad_expect_empty
Yehuda thinks this was a problem with not having the right Apache package installed; I think he's right and I've seen... Greg Farnum
04:43 PM Feature #1387 (Closed): teuthology-nuke: don't fail on down nodes
Josh Durgin
04:36 PM Bug #1723 (Rejected): timeouts during ffsb
Sage Weil
04:36 PM Bug #1723: timeouts during ffsb
also didn't have the umount bug fix.
i think the osd timeouts are just sluggish server, not actual errors per se.....
Sage Weil
04:33 PM Bug #1724 (Resolved): timeout during tiobench test
this test ran commit:dfc3ddc8983fbc7c376394067335b360c68cd314, which did not include the root dentry fix in commit:77... Sage Weil
03:06 PM CephFS Bug #1728 (Resolved): multiple cfuse tests failing with non-empty directories
fixed by commit:ef5ca293a7eee6fd37c1ea8e8027a5f6d83b66da Sage Weil
02:13 PM CephFS Bug #1728: multiple cfuse tests failing with non-empty directories
My guess is the warning cleanup patch that added an error check in the readdir code, commit:cd90061239a598f6fca94326b... Sage Weil
02:41 PM Bug #1731 (Resolved): PAXOS assert(begin->last_committed == last_committed)
fixed by commit:367ab142d7bc938c5a8b40027acd2431a11c8022 Sage Weil
11:56 AM Bug #1732: osdmap assert fail during rados bench
with commit:bf91177e57a4fae54882d78aa6b2bcf1adccae5d this won't crash, but its still not clear how we got an OSDSessi... Sage Weil
08:51 AM Bug #1732 (Can't reproduce): osdmap assert fail during rados bench
... Josh Durgin
11:39 AM Feature #1262 (Closed): teuthology: monitor health during run
Duplicate of #1240. Josh Durgin
11:06 AM Bug #1733 (Duplicate): rados bench duration can be ignored
Probably caused by #1351. Josh Durgin
09:05 AM Bug #1733: rados bench duration can be ignored
Is it generating new writes, or waiting for old writes to complete?
The time you give rados bench was never intend...
Greg Farnum
08:58 AM Bug #1733 (Duplicate): rados bench duration can be ignored
Sometimes a thrashing run with rados bench will continue indefinitely, with rados bench continuing to write after its... Josh Durgin
10:57 AM Bug #1730 (Rejected): mysterious compilation error
These were actually just warnings - the test passed. Josh Durgin
12:00 AM Revision f3c569ee (ceph): rgw: add swift task
still not completely working (for some reason it skips all the tests) Yehuda Sadeh
12:00 AM Revision 1dd607ca (ceph): rgw: add swift task
still not completely working (for some reason it skips all the tests) Yehuda Sadeh

11/16/2011

09:11 PM Revision fa4b0fb9 (ceph): osd: add pending_ops assert
Just a sanity check, hopefully helping us track down #1727.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:01 PM Revision 17fa1e0d (ceph): mon: renamed get_latest* -> get_stashed*
This makes e.g. get_latest_version() vs get_last_committed() less
confusing.
Signed-off-by: Sage Weil <sage@newdream...
Sage Weil
06:57 PM Revision b9d5fbe4 (ceph): mon: fix ver tracking for auth database
Local variable keys_ver needs to be updated when we slurp up latest stashed
version.
Signed-off-by: Sage Weil <sage@...
Sage Weil
06:54 PM Revision b425f6d6 (ceph): mon: always load stashed version when version doesn't match
The slurp process can happen after the monitor has started and has some
in-memory version of the state, and that proc...
Sage Weil
06:30 PM Bug #1731 (Resolved): PAXOS assert(begin->last_committed == last_committed)
In the 11/16 nightlies, there were numerous coredumps in:
sepia72 mon.{f,l,o,r,u}.log
sepia74 mon.q.log
All ...
Anonymous
06:23 PM Bug #1730 (Rejected): mysterious compilation error
In the 11/16 nightlies, 2071 rbd_dbench a compile failed ... with some warnings.
Has this worked in the past?
20...
Anonymous
06:19 PM rgw Bug #1729 (Resolved): test_object_create_bad_expect_empty
in the 11/16 nightly, 2080 rgw_s3tests
2011-11-16T00:51:18.914 INFO:teuthology.orchestra.run.err:s3tests.functional....
Anonymous
05:59 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
This happened again on 11/16, 2056 kclient_workunit_kernel_untar_build
2011-11-16T00:36:30.996 INFO:teuthology.task....
Anonymous
05:51 PM CephFS Bug #1728 (Resolved): multiple cfuse tests failing with non-empty directories
All from the 11/16 nightlies:
2044 cfuse_workunit_snaps ...
2011-11-16T00:05:11.781 INFO:teuthology.task.workunit...
Anonymous
01:10 PM Bug #1727 (Resolved): osd: failed assert(pending_ops > 0) in dequeue_op
from ml:... Sage Weil

11/15/2011

04:55 PM Bug #1432 (Resolved): libvirt: fix definition for rbd params/sources/etc
Merged upstream. Josh Durgin
11:12 AM rgw Cleanup #1716: rgw: remove curl use
We might want to hold this until we figure out whether and how we want to support openstack keystone. Yehuda Sadeh
11:08 AM rgw Bug #1721: rgw: spurious multipart-upload failures
It seems that the osd is a bit sluggish when we see those errors. Basically the complete (or abort) multipart takes t... Yehuda Sadeh
11:04 AM rgw Feature #1726 (Rejected): rgw: improve multipart upload performance
Currently when the upload completes, for each part we do:
- prepare index
- remove object
- complete index
E...
Yehuda Sadeh
10:24 AM Bug #1725 (Rejected): osd: os/FileStore.cc: 2426: FAILED assert(0 == "unexpected error")
btrfs bug, fixable by http://article.gmane.org/gmane.comp.file-systems.btrfs/13630/match=large+xattr Samuel Just
07:00 AM Bug #1725 (Rejected): osd: os/FileStore.cc: 2426: FAILED assert(0 == "unexpected error")
Getting a crash on one OSD when it tries to start up after upgrading to 0.38.
Here is the log of start up to crash...
Damien Churchill
01:02 AM Revision 2e195500 (ceph): rgw: don't log entries with bad utf8
Yehuda Sadeh

11/14/2011

10:39 PM Revision 0276eab4 (ceph): rgw: adjust error code in swift copy failures
Yehuda Sadeh
09:55 PM Revision 1fe16923 (ceph): rgw: fix swift responses encoding
Yehuda Sadeh
09:23 PM Revision 2445fd84 (ceph): rgw: Fix some merge problems uncovered by gcc warnings:
* a refactor in e2100bce left the mod_ptr and unmod_ptr members set
incorrectly in RGWCopyObj::init_common
* a fi...
Josh Pieper
09:23 PM Revision cd900612 (ceph): Resolve gcc warnings.
These should have no functional changes:
* Check errors from functions that currently cannot return any
* Initializ...
Josh Pieper
08:15 PM Revision a5b8c851 (ceph): osd: remove dead osd_max_opq code
This is no longer used as of a while ago!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:02 PM rgw Bug #1698 (Resolved): radosgw-admin log list returns invalid json when a log object was created w...
Fixed, commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97. Not logging entries with non-utf8 bucket name. Yehuda Sadeh
04:30 PM Bug #1676: stats mismatch during snaps workunit
Still happening in 11/11 nightly
1812/remote/ubuntu@sepia69.ceph.dreamhost.com/log/osd.1.log.gz
Anonymous
04:27 PM Bug #1530: osd crash during build_inc_scrub_map
Still happening in 11/11 nightly
1814/remote/ubuntu@sepia55.ceph.dreamhost.com/log/osd.1.log.gz
Anonymous
04:24 PM Bug #1722: osd_class_dir must reflect autoconf libdir
the original commit is commit:7e5dee907a8218647a88d1c7d3316cc277e1c44b. iirc that approach didn't work because autom... Sage Weil
02:11 PM Bug #1722: osd_class_dir must reflect autoconf libdir
See also #1614, which for some reason doesn't let me edit it anymore. Anonymous
02:11 PM Bug #1722 (Resolved): osd_class_dir must reflect autoconf libdir
These two end up at different values for systems using /usr/lib64:
src/common/config_opts.h:285:OPTION(osd_class_d...
Anonymous
04:19 PM Bug #1614 (Duplicate): default rados class location needs to be depend on autoconf libdir
Sage Weil
02:08 PM Bug #1614: default rados class location needs to be depend on autoconf libdir
Sage Weil
04:18 PM Revision f418775d (ceph): workunits: rados python workunit should be executable
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
04:12 PM Bug #1659: Upgrade from 0.27 -> 0.37 going wrong, OSDs miss map updates
I saw a very similar stack trace in the 11/11 Nightly
1862/remote/ubuntu@sepia9.ceph.dreamhost.com/log/osd.5.log.gz
Anonymous
04:06 PM Revision b43981b8 (ceph): multimon: need at least 2 osds to go healthy
Josh Durgin
04:04 PM Bug #1724: timeout during tiobench test
(I should have said the other problem was filed as bug 1723) Anonymous
04:03 PM Bug #1724 (Resolved): timeout during tiobench test
During the 11/11 nightlies, the tiotest task blocked multiple times. The first stack trace
(from 1831/remote/ubuntu...
Anonymous
03:57 PM Bug #1723 (Rejected): timeouts during ffsb
During the 11/11 nightlies, in suite 1827, sepia65 experienced multiple timeout events.
The first (from 1827/remote/...
Anonymous
12:34 PM rgw Bug #1721 (Can't reproduce): rgw: spurious multipart-upload failures
Sage Weil
11:56 AM Bug #1707 (Resolved): After fresh install, OSD initialization fails with: error error 17: File ex...
great, thanks! Sage Weil
03:53 AM Bug #1707: After fresh install, OSD initialization fails with: error error 17: File exists not ha...
Yes, I tested after that revision and could not reproduce the problem. Josh Pieper

11/13/2011

10:18 PM Revision 102c4342 (ceph): crush: send debug output to dout, not stdout/err
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
10:16 PM Revision 25eee416 (ceph): test/run_cmd: use mkstemp instead of mkstemps
my box didn't have mkstemps
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
10:07 PM Revision 18009866 (ceph): ceph-authtool: fix clitests
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
02:20 PM Bug #1688: Benjamin: pg stuck in scrub
is this addressed by the pg lock vs transaction submit ordering changes? Sage Weil
02:13 PM Bug #1707: After fresh install, OSD initialization fails with: error error 17: File exists not ha...
I think this is fixed by commit:7fb182a17b703002c1bd098391fb688b5b1e2749. Can you retest against latest master? Sage Weil
02:06 PM Bug #1708 (Can't reproduce): mon/PGMonitor.cc: 218: FAILED assert(paxos->get_version() + 1 == pen...
I fixed a number of bugs in this area, and there was a big refactor. Can you retest the latest and see if you run in... Sage Weil
02:05 PM Feature #1720 (Duplicate): qa: rpm autobuilders
probably start with opensuse and fedora, but eventually we probably want
- fedora (+ rawhide)
- opensuse (+ tumbl...
Sage Weil

11/12/2011

11:17 PM Revision d476ae25 (ceph): test_str_list: make sure ' ' and ', ' separaters work for str lists
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
10:55 PM Revision ecd713c5 (ceph): ceph-authtool: make error msg more helpful
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
10:55 PM Revision 4f39aaa7 (ceph): keyring: don't print auid if it is the default
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
10:55 PM Revision ee02a1e1 (ceph): mon: implement 'fsid' command
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
10:19 PM Revision 5a3004e2 (ceph): Merge branch 'stable'
Sage Weil
10:08 PM Revision 73f99a18 (ceph): mon: fix 'osd crush add ..' weight
This was changed to floating point in commit 3f67893.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
10:05 PM Revision 1b843e0e (ceph): osdmap: build_simple with normal osd/host/rack/pool hierarchy
This will be useful in the general case where the cluster is created with
an empty map and useful crush hierarchy.
S...
Sage Weil
10:04 PM Revision ec97c852 (ceph): mon: fix 'osd crush add ..' weight
This was changed to floating point in commit 3f67893.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
09:42 PM Revision 0349fa96 (ceph): vstart.sh: don't generate initial osdmap explicitly
This is simpler and exercises the monitors ability to start with a generic
osdmap and build it out as new osds are ad...
Sage Weil
09:41 PM Revision 30ddc85e (ceph): mon: make initial osdmap optional
If an initial osdmap is not provided, we generate an empty one. The user
add osds on their own after that.
Signed-o...
Sage Weil
09:41 PM Revision 0d812252 (ceph): osdmap: build_simple: create reasonable pools when numosd==0
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
09:16 PM Revision 8e150fb4 (ceph): mon: add '--fsid foo' arg for setting generated monmap fsid
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
05:04 AM Revision b51d817e (ceph): mon: take '--fsid foo' arg with --mkfs
This will set the seed monmap's fsid. This is useful if the monmap is
dynamically generated (e.g., based on ceph.con...
Sage Weil
05:04 AM Revision 0c731ed7 (ceph): osd: fix warnings
osd/ReplicatedPG.cc: In member function 'virtual void ReplicatedPG::remove_watchers_and_notifies()':
osd/ReplicatedPG...
Sage Weil
04:52 AM Revision 73705f66 (ceph): monmaptool: fix clitests
Initial map is epoch 0. Modifications still bump epoch by one.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:49 AM Revision 36241da4 (ceph): paxos: discard waiting_for_active events on reset
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:48 AM Revision 2253c016 (ceph): use libuuid for fsid
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:48 AM Revision 80ab6568 (ceph): monclient: use blank fsid (instead of epoch==0) for monmap checks
We can safely mkfs with an epoch=0 monmap as long as the fsid is set. And
that is what commit f31825cee5300c708800a0...
Sage Weil

11/11/2011

10:59 PM Revision 07950bb8 (ceph): crush: grammer: allow '.' in name token
These are now in the generated crush maps, so it seems appropriate to
recompile them :).
Reported-by: Martin Mailand...
Sage Weil
10:54 PM Revision cf0a53e1 (ceph): mon: fix seed monmap removal
Remove if we previous had no latest, not based on which map we now have.
It's possible we join when monmap epoch is s...
Sage Weil
10:52 PM Revision 6d370f3b (ceph): mon: allow monitor to automagically join cluster
If a monitor starts up with the correct fsid and auth keys, it will now
add itself to the monmap (and subsequently tr...
Sage Weil
08:52 PM Revision d56485a8 (ceph): osd: pass monclient::init errors up the stack
Fixes crash like
ceph version 0.38-149-gbf254de (commit:bf254de5cf8a17ce9467d166d87f3ab93170ae13)
1: (ceph::BackTr...
Sage Weil
08:37 PM Revision bf254de5 (ceph): mon: verify fsid during probe and election
This will keep mismatched fsids out of the same quorum.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:22 PM Revision f1a98fb8 (ceph): mon: tolerate won election while active
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:22 PM Revision cd736b9d (ceph): mon: clean up logic a bit
More explicit.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:22 PM Revision 2633d71d (ceph): mon: only re bootstrap if monmap actually changes
If we go thru here just to update latest, that's fine; no need to restart
the bootstrap process.
Signed-off-by: Sage...
Sage Weil
08:15 PM Revision 622fbadd (ceph): paxos: fix off-by-one in share_state
We hit this on adding a new monitor to an existing cluster.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:05 PM Revision 6c663d85 (ceph): mon: fix monmap update
It's on the stack; update in place.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:02 PM Revision 1134fdfe (ceph): mon: properly process monmaps even when i have the latest
We may get the latest monmap when we are doing our probing, but we still
need to process it in update_from_paxos(). ...
Sage Weil
07:55 PM Revision c097e634 (ceph): mon: fix up update_from_paxos() methods
Make sure they behave when the initial state is learned from paxos.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:41 PM Revision aea7563f (ceph): mon: create initial states after quorum is formed
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:41 PM Revision e545af2d (ceph): mon: remove empty monstore dirs
This is sloppy, but it works well enough since we mkdir dirs as needed
too.
Signed-off-by: Sage Weil <sage@newdream....
Sage Weil
07:41 PM Revision 65f797ea (ceph): mon: clean up mkfs seed data
And make sure the monmap/latest gets written properly.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:41 PM Revision f31825ce (ceph): monmaptool: new maps get epoch 0
Just for consistency's sake.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:45 PM Revision 1533f1c0 (ceph): mon: stage mkfs seed info in mkfs/ dir
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:34 PM Revision 9e941c43 (ceph): mon: eliminate PaxosService::init()
update_from_paxos() is sufficient
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:19 PM Revision 0a926ef5 (ceph): mon: include monmap dump in mon_status and quorum_status
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:15 PM Revision 8c3d872e (ceph): mon: pull initial monmap from monmap/latest OR mkfs/monmap
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:05 PM Revision 0ecae996 (ceph): mon: take explicit initial monmap -or- generate one via MonClient
This will simplify bootstrapping a cluster via e.g. mon_host.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:58 AM Linux kernel client Bug #1704 (Resolved): oid limited to 40 chars, rbd images can be longer
fixed by commit:224736d9113ab4a7cf3f05c05377492bd99b4b02
still need to do some cleanup here
Sage Weil
09:57 AM Linux kernel client Bug #1696 (Resolved): kclient: crash in ceph_d_prune
fixed by commit:774ac21da76f5c3018428725074e27a3fd40b128 Sage Weil
07:17 AM rgw Bug #1719 (Resolved): rgw: crash in ObjectCache::touch_lru
... Sage Weil
05:36 AM Revision 2bad0115 (ceph): filestore-idempotent
run filestore_idempotent.py task. Sage Weil
05:35 AM Revision c5f070b8 (ceph): filestore_idempotent.py: simple task to test non-idempotent osd ops
Write some non-idempotent events to the osd. Simulate a failure. Verify
the result is correct on replay.
This must...
Sage Weil
05:12 AM Revision 69cd3625 (ceph): filestore: sync after non-idempotent operations
This is a big hammer to fix journal replay on non-btrfs fs backends (extN,
xfs, whatever). The problem is that it is...
Sage Weil
05:12 AM Revision 09811120 (ceph): filestore: document the btrfs_* fields
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:12 AM Revision 8df0cd38 (ceph): filestore: make trigger_commit() wake up sync; adjust locking
We need to wake up the sync thread (duh).
Also, we need to obey the FileJournal::lock -> journal_lock locking
order....
Sage Weil
05:12 AM Revision 9f1673c1 (ceph): test_filestore_idempotent: transactions are individually idempotent
Make individual transactions idempotent, but their interactions
non-idempotent. I.e. A A A A is okay, but A B A is n...
Sage Weil
05:12 AM Revision add04d15 (ceph): filejournal: fix replay of non-idempotent ops
- start sync thread prior to replay, so that we can commit as we replay
operations
- keep applied_seq accurate
- pa...
Sage Weil
05:12 AM Revision dae6c956 (ceph): test_filestore_idempotent: detect commit cycles due to non-idempotent ops
If we do a non-idempotent op and it does a commit itself, we don't see
fs->is_committed() true ever. Also count full...
Sage Weil
04:50 AM Revision fa5047b3 (ceph): Merge remote branch 'gh/stable'
Sage Weil
01:15 AM Revision 1c1ebb4d (ceph): Add rados python tests.
Josh Durgin
01:10 AM Revision 2fb70297 (ceph): rgw: remove warning
Yehuda Sadeh
01:03 AM Revision 5407fa70 (ceph): workunits: add workunit for running rgw and rados python tests
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
12:52 AM Revision 71bfe897 (ceph): test/pybind: add test_rgw
Forgot to add this in the previous commit.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
12:46 AM Revision ea42e02c (ceph): test/pybind: convert python rados and rgw tests to be runnable by nose
These tests can now be run automatically more easily.
Fixes: #1653
Signed-off-by: Josh Durgin <josh.durgin@dreamhost...
Josh Durgin
12:37 AM CephFS Bug #1702: Ceph MDS crash + client mount problem
Yes I am stopping the clients and remounting...but if im doing a mkcephfs, i make sure to umount all the clients befo... Gokul Krishnan
12:33 AM Revision 25cde7f9 (ceph): rados.py: fix Snap.get_timestamp
This now uses datetime, imports the right things, and calls the right function.
Fixes #1577
Signed-off-by: Josh Durg...
Josh Durgin

11/10/2011

11:07 PM Revision b600ec2a (ceph): v0.38
Sage Weil
11:05 PM Revision 2a7fbe0c (ceph): common: return null if mc.init() unsuccessful
Prevents ceph.cc from segfaulting on missing keyring.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just
11:05 PM Revision a177a702 (ceph): rbd.py: fix list when there are no images
It should return [], not [''].
Reported-by: Eric Chen <Eric_YH_Chen@wistron.com>
Signed-off-by: Josh Durgin <josh.du...
Josh Durgin
11:05 PM Revision 27bb48c5 (ceph): mon: overwrite in put_bl
This fixes a situation where we accept a large value, there is some failure
and recovery, and then we commit a smalle...
Sage Weil
11:05 PM Revision 2f97a222 (ceph): PG: mark scrubmap entry as not absent when we see an update
Previously, there would be an assert failure in _scan_list if we see an
object deleted and then recreated.
Signed-of...
Samuel Just
10:58 PM Revision 87941128 (ceph): rgw: implement swift copy, fix copy auth
Yehuda Sadeh
10:13 PM Revision 77c977c1 (ceph): misc: allow >1 monitor per role in get_mon_names()
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:09 PM Revision 704644bc (ceph): PG: gen_prefix: use osdmap_ref rather than osd->osdmap
Otherwise, the debug output might not match the map used by
the pg logic.
Signed-off-by: Samuel Just <samuel.just@dr...
Samuel Just
10:09 PM Revision 7fb182a1 (ceph): OSD: sync_and_flush afer mkfs to create first snap
Previously, if we kill the OSD process before the filestore
does its first sync, we end up replaying the journal on t...
Samuel Just
09:41 PM Bug #1670 (Can't reproduce): osd: crash in update_heartbeat_peers
Sage Weil
09:38 PM Bug #213 (Resolved): non-idempotent transactions (clone) under ext3 may not replay correct result
commit:dae6c956543276e103a272eb1e897db17b840348 Sage Weil
08:54 PM Bug #1530: osd crash during build_inc_scrub_map
Sage Weil
05:29 PM Bug #1530: osd crash during build_inc_scrub_map
We just found surprisingly similar stack traces in three of last night's failures:
nightly_coverage_2011-11-10/1740/...
Anonymous
06:45 PM Feature #1516 (Resolved): openstack: single node dev environment
Josh Durgin
05:06 PM rgw Feature #1717 (Resolved): rgw: support json input
Yehuda Sadeh
05:06 PM Feature #1653 (Resolved): librados: python binding nose tests
Fixed by commit:ea42e02ca2fd3655dbaf2e720e31d78da5022e21. Josh Durgin
05:05 PM rgw Cleanup #1716 (Closed): rgw: remove curl use
Yehuda Sadeh
05:05 PM Bug #1577 (Resolved): rados.py: Snap.get_timestamp does not work
Fixed by commit:25cde7f98ac195b0458830a3e345db54a994384b. Josh Durgin
04:57 PM Feature #1539 (Duplicate): libvirt: make sure snapshots work
Sage Weil
04:11 PM rgw Feature #1715 (Rejected): rgw: use RENAME osd operation to avoid slow CLONE operations
add to osd too Sage Weil
04:03 PM rbd Feature #1713 (Resolved): teuthology: qemu tasks, tests
gitbuilder
teuthology task
some tests that run in it
Sage Weil
03:29 PM CephFS Bug #1702: Ceph MDS crash + client mount problem
Gokul Krishnan wrote:
> Thank you for reverting back so quickly.
>
> Well in my scenario, i just have one Ceph se...
Sage Weil
03:29 PM CephFS Bug #1702: Ceph MDS crash + client mount problem
Gokul Krishnan wrote:
> by the way,
> you have assigned a target version as v0.39...but in the site i can find only...
Sage Weil
01:50 AM CephFS Bug #1702: Ceph MDS crash + client mount problem
by the way,
you have assigned a target version as v0.39...but in the site i can find only the source for v0.37...
e...
Gokul Krishnan
12:45 AM CephFS Bug #1702: Ceph MDS crash + client mount problem
Thank you for reverting back so quickly.
Well in my scenario, i just have one Ceph server running. And yes, every ...
Gokul Krishnan
03:29 PM rgw Feature #1712 (Resolved): rgw: support swift manifest objects
Yehuda Sadeh
03:22 PM Feature #1711 (Resolved): chef: multiple monitor support
Sage Weil
03:22 PM Bug #1669 (Resolved): linux 32 bit kernel client ld libraries and rm issue
Sage Weil
03:14 PM Feature #1709 (Resolved): specfile: merge suse spec file changes
Sage Weil
03:00 PM rgw Bug #1706 (Resolved): rgw: copy object auth verification (probably) broken
Yehuda Sadeh
02:59 PM rgw Bug #1706: rgw: copy object auth verification (probably) broken
Fixed, commit:87941128b60608d66dc5327038f099a1fb2a99c3. Yehuda Sadeh
02:59 PM rgw Bug #1705 (Resolved): rgw: swift copy is broken
Fixed, commit:87941128b60608d66dc5327038f099a1fb2a99c3. Yehuda Sadeh
02:57 PM CephFS Feature #1448: test hadoop on sepia
The following benchmark, TestDFSIO, is for 12 OSDs, 1 MDS/MON. There is a single ext4 disk per node dedicated to Ceph... Noah Watkins
02:46 PM Bug #1632 (Can't reproduce): osd: crash in dequeue_op
Sage Weil
01:54 PM Bug #1708 (In Progress): mon/PGMonitor.cc: 218: FAILED assert(paxos->get_version() + 1 == pending...
Sage Weil
01:45 PM Bug #1708 (Resolved): mon/PGMonitor.cc: 218: FAILED assert(paxos->get_version() + 1 == pending_in...
Running ceph version from git: a3dd5bd67ba19aae51a51318138ef10213a91449
Slaves are all ubuntu 11.10, 3.0.0-12
Files...
Josh Pieper
12:06 PM Bug #1707 (Resolved): After fresh install, OSD initialization fails with: error error 17: File ex...
Running ceph from git @ a3dd5bd6 with btrfs
Ubuntu 11.10, 3.0.0-12 on all machines
After installing my compiled c...
Josh Pieper
01:17 AM Revision a3dd5bd6 (ceph): PG: update info.history even if lastmap is absent
Previously, we did not update same_interval_since etc if
we do not have the previous map.
Signed-off-by: Samuel Just...
Samuel Just
12:36 AM Revision 023ff590 (ceph): Makefile: add MMonProbe.h
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:33 AM Revision fd5fb993 (ceph): osd: remove useless proc_replica_log() side-effect
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil

11/09/2011

11:38 PM Revision 78ad144a (ceph): hadoop: update patch and Readme.
Patch generated by Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
11:30 PM Revision 386c0db3 (ceph): rgw: swift guesses mime type if not specified
Yehuda Sadeh
10:50 PM Revision 78ccb2a9 (ceph): osd: comment PG::lock*(), whitespace
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:46 PM Revision 87318389 (ceph): Merge branch 'master' of github.com:NewDreamNetwork/ceph
Conflicts:
src/osd/PG.cc
Sage Weil
10:32 PM Revision 5fa8df1e (ceph): osd: improve last_peering_reset debugging
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
10:32 PM Revision 383dfa33 (ceph): crypto: make crypto handlers non-static
These were static in auth/Crypto.cc, which was mostly fine, except when
we got a signal shutting everything down for ...
Sage Weil
10:15 PM Revision 9db994a5 (ceph): PG: always add backlog entry
Previously, we did not add a backlog entry if the object already had an
entry in the log along with an entry for that...
Samuel Just
10:15 PM Revision 0dffddf3 (ceph): osd/: change type of osd::osdmap to a shared_ptr
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:15 PM Revision 5df28ece (ceph): OSDMap,CrushWrapper: const cleanup on OSDMap
The osd's cached maps are not actually modified once cached. Marking
these methods const (which they should be) allo...
Samuel Just
10:15 PM Revision b41b1fa5 (ceph): PG: cache read-only reference to the current osdmap on pg lock
Previously, we needed to grab an osd_map read lock to send messages,
among other things. Now, we grab a reference to...
Samuel Just
10:04 PM Revision 15da4787 (ceph): rbd: Fix the showmapped cmd usage
If the rbd showmapped cmd is given any extra arguments, rbd will fail
with "assert(0)". Fix it by exiting with "usage...
Stratos Psomadakis
09:37 PM Revision 303e863d (ceph): add hammer.sh
simple script to repeat a test until it fails. can probably do something much more sophisticated
here, but this works.
Sage Weil
09:28 PM Revision 33549333 (ceph): hadoop: return all replica hostnames
Updates CephFileSystem to return all replica locations,
and in addition attempts to use reverse DNS to convert
the OS...
Noah Watkins
09:23 PM Revision e6035a62 (ceph): hadoop: make listStatus quiet
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Noah Watkins
09:23 PM Revision d7f911fb (ceph): hadoop: handle new ceph_get_file_stripe_address
Updates the Hadoop JNI/CephFileSystem to handle
the new version of ceph_get_file_stripe_address
which returns the loc...
Noah Watkins
09:23 PM Revision 619430a7 (ceph): client: return stripe address replicas
Changes ceph_get_file_stripe_address to return a
vector of entity_addr_t's for the primary and the
replicas. libcephf...
Noah Watkins
09:15 PM Revision c5c50377 (ceph): client: fix bad perfcounter fset callers
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:50 PM Revision 808c6442 (ceph): Improve use of syncfs.
Test syncfs return value and fallback to btrfs sync and then sync.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unic...
Alexandre Oliva
08:48 PM Revision c51e2f72 (ceph): osd: fix perfcounter typo
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:43 PM Revision 1ac6b47c (ceph): os: rename and make use of the split_threshold parameter.
This was accidentally left out of the must_split calculation. Put it
in, and rename it to split_multiplier (as that i...
Greg Farnum
07:03 PM Revision 09455eea (ceph): perfcounters: fix users of fset on averages
I forgot to audit these before merging the assert and they popped up
in teuthology and stuff. :(
Signed-off-by: Greg...
Greg Farnum
06:49 PM Revision afa56f16 (ceph): nuke: increase reboot timeout
Some sepia nodes are very slow to reboot. Josh Durgin
05:35 PM Bug #1690: osd re-created from scratch will crash on start-up
I was using v0.37; in order to debug this, I first build top of the tree stable (b8979f4d292f6a739daac81ce8e59aa084e1... Alexandre Oliva
05:11 PM rgw Bug #1706 (Resolved): rgw: copy object auth verification (probably) broken
Looking at RGWCopyObj::verify_permission(), we don't look at the source acl, but rather at the source bucket's acl. Yehuda Sadeh
05:07 PM rgw Bug #1705 (Resolved): rgw: swift copy is broken
Swift can accept alternative HTTP COPY method (with src/dest transposed). Yehuda Sadeh
04:38 PM Bug #213: non-idempotent transactions (clone) under ext3 may not replay correct result
Sage Weil
02:55 PM Bug #213: non-idempotent transactions (clone) under ext3 may not replay correct result
Update: the current first pass plan is to initiate a FileStore sync after any non-idempotent operation. This updates... Sage Weil
03:35 PM Linux kernel client Bug #1701: krbd: limits and constants are not consistent in kernel and userspace
Also related: we have MAX_POOL_NAME_SIZE and MAX_SNAP_NAME_SIZE as 128 in qemu right now. Josh Durgin
02:37 PM Linux kernel client Bug #1701: krbd: limits and constants are not consistent in kernel and userspace
Stratos Psomadakis wrote:
> Instead of opening a new issue, I think I can add it here.
>
> Besides those limits o...
Sage Weil
02:18 PM Linux kernel client Bug #1701: krbd: limits and constants are not consistent in kernel and userspace
Instead of opening a new issue, I think I can add it here.
Besides those limits on the RBD images, there's also a ...
Stratos Psomadakis
12:44 PM Linux kernel client Bug #1701 (New): krbd: limits and constants are not consistent in kernel and userspace
There are a few things that exist in the kernel but not userspace:
* SNAP_NAME_LEN
* (MIN|MAX)_OBJECT_ORDER
Also...
Josh Durgin
03:00 PM CephFS Bug #1702: Ceph MDS crash + client mount problem
Ok, so generally speaking, the only time you shoudl see fsid mismatches like that is if you have daemons from multipl... Sage Weil
02:55 PM CephFS Bug #1702: Ceph MDS crash + client mount problem
Hello,
thank you for the reply.
no, unfortunately i am not able to reproduce the error using debug ms = 20(for MD...
Gokul Krishnan
01:23 PM CephFS Bug #1702 (Need More Info): Ceph MDS crash + client mount problem
Are you able to reproduce this with 'debug mds = 20' and 'debug ms = 20' in your ceph.conf [mds section]?
Not sure...
Sage Weil
12:51 PM CephFS Bug #1702 (Can't reproduce): Ceph MDS crash + client mount problem
Hello,
i have configured ceph using a configuration as shown here[[http://pastebin.com/sQb8WZbx]].
The Ceph serve...
Gokul Krishnan
02:43 PM Bug #1684 (Duplicate): mon: crash in CryptoKey::encrypt
Sage Weil
02:42 PM Bug #1633 (Resolved): osd crash in CryptoKey::decrypt
should be fixed by commit:383dfa33682abeae7348655fc103dd80c41b7ba7 Sage Weil
02:39 PM Linux kernel client Feature #962 (Resolved): d_prune
Sage Weil
02:39 PM Linux kernel client Bug #850 (Resolved): make NULL lookup using I_COMPLETE work
Sage Weil
02:39 PM Linux kernel client Bug #851 (Resolved): make dcache readdir with I_COMPLETE work
Sage Weil
02:38 PM Linux kernel client Bug #1704 (Resolved): oid limited to 40 chars, rbd images can be longer
From Stratos Psomadakis:
"Besides those limits on the RBD images, there's also a hardcoded limit in
libceph (mess...
Sage Weil
02:27 PM rgw Bug #1698: radosgw-admin log list returns invalid json when a log object was created with a name ...
This is my vote for "let's not allow radosgw clients to create artifacts with non-utf8 names in the first place". Anonymous
02:19 PM Bug #1530 (Resolved): osd crash during build_inc_scrub_map
Samuel Just
02:08 PM Bug #1703 (Resolved): rbd: showmapped cmd fails, when extra args are present
Sage Weil
02:00 PM Bug #1703 (Resolved): rbd: showmapped cmd fails, when extra args are present
rbd showmapped cmd will fail with assert(0), when given any extra arguments.
Patch to fix it attached (exiting wit...
Stratos Psomadakis
01:02 PM Bug #1695 (Rejected): wrong path to ceph's libs / bash scripts in /etc/init.d/ceph
Serge Rittscher wrote:
> ok, the output is:
> @
> rm -f init-ceph init-ceph.tmp
> sed -e 's|@bindir[@]|/usr/local...
Sage Weil
11:39 AM Bug #1695: wrong path to ceph's libs / bash scripts in /etc/init.d/ceph
ok, the output is:
@
rm -f init-ceph init-ceph.tmp
sed -e 's|@bindir[@]|/usr/local/bin|g' -e 's|@libdir[@]|/usr/lo...
Serge Rittscher
11:04 AM Bug #1695: wrong path to ceph's libs / bash scripts in /etc/init.d/ceph
oops, 'touch init-ceph.in' first, then 'make init-ceph' Sage Weil
12:49 AM Bug #1695: wrong path to ceph's libs / bash scripts in /etc/init.d/ceph
@make init-ceph@
returns:
@make: `init-ceph' is up to date.@
Serge Rittscher
11:10 AM Bug #1700 (Resolved): osd: invalid perfcounter usage
Should be fixed in commit:09455eeac4fb37c31998202ad9503901f53c21dc. My bad! Greg Farnum
10:14 AM Bug #1700 (Resolved): osd: invalid perfcounter usage
During dbench, two osds crashed on this assert:... Josh Durgin
11:09 AM Bug #1694 (Resolved): monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
Sage Weil
11:09 AM Bug #1694: monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
oh nevermind, didn't see that second comment. the fix is commit:0bcdd4f3b2a2dba405639122b84f7aad978f347b, which come... Sage Weil
11:06 AM Bug #1694: monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
Great. Can you attach (or email) the ceph.conf you're using?
Thanks!
Sage Weil
07:55 AM Bug #1694: monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
The monitor that was generating the osdmap was running commit:5bd029ef01fcb59bea9170af563c3499cce1e8c4 and that faile... Wido den Hollander
02:25 AM Bug #1694: monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
Ok, I've ran those commands and it gives me:... Wido den Hollander
07:19 AM CephFS Bug #1472: cfuse hangs with v0.34
Some of the hangs we've been seeing on the client may have been related to having two nics on each node. We had seen... Sam Lang
06:17 AM Revision 6d39cc11 (ceph): ceph: keep ceph.conf at ctx.ceph.conf
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:17 AM Revision 60863f70 (ceph): ceph_manager: manipulate monitors
Sage Weil
06:17 AM Revision 6618a027 (ceph): mon_recovery: add task to test monitor cluster failure recovery
Some simple tests to start with. We still need some sort of mon cluster
thrashing.
Signed-off-by: Sage Weil <sage@n...
Sage Weil
06:16 AM Revision 9acea7a6 (ceph): multimon mon_recovery tests on variously sized monitor clusters
Sage Weil
06:11 AM Revision 6ab14874 (ceph): Merge branch 'wip-mon'
Sage Weil
05:58 AM Revision 87634ce1 (ceph): osd: don't open deleted map from generate_past_intervals
The first get_map() call needs to be avoided when stop < last_epoch. This
fixes a crash like
2011-11-08 21:51:09.04...
Sage Weil
05:13 AM Revision 20cf1e96 (ceph): automake: enable 'make V=0'
Enables silent mode for automake generated Makefiles,
and silent mode is _off_ by default. Using V=0 the output
is mu...
Sage Weil
12:45 AM Revision 4b0cf89b (ceph): Add rbd python binding test.
Josh Durgin
12:24 AM Revision 1bc1a244 (ceph): mon: handle active -> electing transition properly
If we are already active, make sure we reset things properly before going
into an election.
Signed-off-by: Sage Weil...
Sage Weil
12:09 AM Revision 5d32bcae (ceph): Add nuke-on-error option.
This lets automated jobs nuke and unlock machines after failed
tests. Each machine is nuke individually, so one down ...
Josh Durgin
12:09 AM Revision 006a0dd4 (ceph): Remove unused imports and variable.
Josh Durgin

11/08/2011

10:21 PM Feature #1007 (Resolved): qa: osd failure and cluster recovery test(s)
yay thrashing Sage Weil
10:20 PM Bug #1694 (Need More Info): monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
Sage Weil
09:28 PM Bug #1694: monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
Can you try this and see if there is a mismatch?... Sage Weil
10:06 AM Bug #1694: monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
Aha! Read that wrong, tnx.
I used mkcephfs to generate the crushmap, I did not write my own.
Wido den Hollander
09:17 AM Bug #1694: monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
max_osd in the osdmap needs to be >= the max_devices in the crush map. how did you set up the cluster? did mkcephfs... Sage Weil
07:18 AM Bug #1694: monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
I just made a small adjustment to crushtool so it would print max_devices:... Wido den Hollander
07:01 AM Bug #1694 (Resolved): monitor crash: FAILED assert(get_max_osd() >= crush.get_max_devices())
I just did a fresh install of my cluster and after starting I saw my monitors go down with:... Wido den Hollander
10:18 PM Feature #1646 (Resolved): mon: catch up on committed items before attempting to join quorum
Sage Weil
10:17 PM Revision 7a32cc60 (ceph): rgw: swift bucket report returns both bytes size and actual size
Yehuda Sadeh
10:17 PM Revision 76090324 (ceph): rgw: don't return partial content response with bad header
Yehuda Sadeh
10:17 PM Revision a04afd09 (ceph): rgw: abort early on incorrect method
Yehuda Sadeh
09:33 PM Bug #1695: wrong path to ceph's libs / bash scripts in /etc/init.d/ceph
What is the output if you... Sage Weil
09:06 AM Bug #1695 (Rejected): wrong path to ceph's libs / bash scripts in /etc/init.d/ceph
After installing Ceph from sources (version ceph-0.37.tar.gz) on Ubuntu by executing
$ ./autogen.sh
$ ./configure...
Serge Rittscher
09:09 PM Revision 2fb73bdd (ceph): paxos: fix race between active and commit
If paxos reproposes an old learned value, we have a C_Active waiter, and
also a commit in progress.
When we reach qu...
Sage Weil
08:56 PM Revision 1ffb7b97 (ceph): mon: add 'quorum_status' command
Show status of the current quorum. Block until there is one.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:52 PM Revision a8b28ee5 (ceph): mon: do not participate in the election unless we are in electing state
If we participate, we may be included in the quorum, even tho we are
probing, slurping, whatever.
Signed-off-by: Sag...
Sage Weil
07:50 PM Revision 64350c0b (ceph): rgw: guard perfcounter accesses in rgw_cache.
This gets called by radosgw-admin, so it needs to handle
perfcounter being a null pointer.
Signed-off-by: Greg Farnu...
Greg Farnum
07:28 PM Revision 42f5f024 (ceph): rgw: initialize all the perfcounters, in order
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
06:42 PM Revision e952e10f (ceph): ReplicatedPG: use finc, not fset, on average counters
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
06:42 PM Revision 29e091b5 (ceph): mon: 'mon_status' command to dump individual mon state
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:04 PM Revision f0b9a331 (ceph): rgw: use l_rgw_qactive perfcounter
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
05:58 PM Revision 9035ffb2 (ceph): mon: add probe+slurp timeouts
A short timeout on probe, so we can form new quorums quickly.
A longer timeout on slurp, so we will tolerate a slow ...
Sage Weil
05:50 PM Revision 0fe0f9db (ceph): rgw: create and tear down a radosgw perfcounter
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil
05:50 PM Revision d0b226e7 (ceph): perfcounter: assert when you try and set an average.
If you're trying to set an average, you're probably doing it wrong.
Signed-off-by: Greg Farnum <gregory.farnum@dream...
Greg Farnum
05:50 PM Revision 57b60b8a (ceph): perfcounter: add some minimal documentation.
The data model is a bit obtuse if you're just looking at the code.
Signed-off-by: Greg Farnum <gregory.farnum@dreamh...
Greg Farnum
05:50 PM Revision cf566550 (ceph): rgw: implement perfcounters
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
04:59 PM Linux kernel client Bug #1696: kclient: crash in ceph_d_prune
Here is the code:... Sage Weil
11:50 AM Linux kernel client Bug #1696 (Resolved): kclient: crash in ceph_d_prune
During the 11/08 nighly, several suites:
1606 autotest dbench
1607 workunit direct_io
1608 workunit kc...
Anonymous
04:57 PM Bug #1684: mon: crash in CryptoKey::encrypt
This happened on an mds during a thrashing run:... Josh Durgin
04:29 PM Linux kernel client Feature #1699 (Resolved): debug symbols in autobuilt (sepia) kernels
We need debug symbols in the .ko objects:... Sage Weil
03:49 PM rgw Bug #1698: radosgw-admin log list returns invalid json when a log object was created with a name ...
The two preceding days show similar errors as well. Matthew Wodrich
03:48 PM rgw Bug #1698: radosgw-admin log list returns invalid json when a log object was created with a name ...
The description above is malformed for whatever reason, so I'll try again:
radosgw-admin log list is producing bad J...
Matthew Wodrich
03:44 PM rgw Bug #1698 (Resolved): radosgw-admin log list returns invalid json when a log object was created w...
2011-11-07-12-0-<80>.. Matthew Wodrich
02:34 PM rgw Feature #1697 (Resolved): s3-tests: test bucket headers
Sage Weil
12:04 PM rgw Feature #1591 (Resolved): rgw: instrument with perfcounter
Finally sat down and did this. Merged in commit:64350c0b4d3ba2061cebed87f4cd6f513d2ba6ed and passed s3tests. Greg Farnum
06:46 AM Revision 2523b70e (ceph): mon: slurp latest state from active monitors before joining quorum
If a monitor has been down and is behind, and joins the quorum, the
other nodes will try to send it all of the needed...
Sage Weil
06:41 AM Revision c2fc986e (ceph): monmap: simplify constructor
Explicitly set created, last_changed where appropriate.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
06:41 AM Revision 279661f3 (ceph): paxos: last_consumed == latest_stashed; behave accordingly
Initialize on startup.
Don't re-read off of disk on every trim_to() call.
Signed-off-by: Sage Weil <sage.weil@dreamh...
Sage Weil
06:41 AM Revision 100fba8e (ceph): mon: fix osdmap trim
We can raise the floor even when min_last_epoch_clean if very close to
the current version, as long as it is still ab...
Sage Weil
04:40 AM Revision 628de548 (ceph): mon: don't call out to mon->call_election for internal election restarts
This lets us drop the is_new kludge.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
04:40 AM Revision 18941dd0 (ceph): mon: rename election_starting -> restart
These callbacks reset monitor/paxos/paxosesrvice state, which used to
happen when an election started, but will now n...
Sage Weil
04:40 AM Revision 2f46e8cd (ceph): mon: revamp monitor states
starting -> probing, electing
some cleanup
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
04:40 AM Revision 40843eb3 (ceph): rgw: fix warning
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
01:08 AM Revision 2836104a (ceph): rgw: fix accept-range for suffix format, other related issues
Yehuda Sadeh

11/07/2011

11:04 PM Revision 2f881e12 (ceph): Timer.cc: remove global thread variable
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
11:04 PM Revision d4ef9215 (ceph): common: return null if mc.init() unsuccessful
Prevents ceph.cc from segfaulting on missing keyring.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just
09:05 PM Revision c764b247 (ceph): Fix leftover orchestra import clause.
This seems to be a leftover from
a2372fce12b6bd1818e155d1d8ed5134dbd8fd4a,
no idea how it stayed hidden this long.
Tommi Virtanen
05:27 PM Revision 480b8260 (ceph): rbd: add showmapped to clitests and rst man page
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
05:27 PM Revision 4e518ed3 (ceph): rbd: Document the rbd showmapped cmd
Document the rbd showmapped cmd in rbd.usage(), and rbd's man page,
and add it to the bash completion script.
Signed...
Stratos Psomadakis
05:10 PM Revision 34d80397 (ceph): rbd.py: fix list when there are no images
It should return [], not [''].
Reported-by: Eric Chen <Eric_YH_Chen@wistron.com>
Signed-off-by: Josh Durgin <josh.du...
Josh Durgin
03:35 PM Bug #1690: osd re-created from scratch will crash on start-up
I seem to be having some trouble reproducing this. What version are you running? Could you repeat the procedure wit... Samuel Just
10:33 AM Bug #1690 (Can't reproduce): osd re-created from scratch will crash on start-up
Some time ago, it was possible to re-create an osd after its filesystem failed as simply as running “cosd -i # --mkfs... Alexandre Oliva
02:59 PM CephFS Feature #1693: libcephfs: Support TRIM (hole punching)
Kernelside ceph.ko ticket is #591. Let this ticket stand for the userspace libcephfs (and ceph-fuse) support. Anonymous
02:12 PM CephFS Feature #1693 (Resolved): libcephfs: Support TRIM (hole punching)
Anonymous
02:57 PM Feature #1692: librbd: Support TRIM (hole punching) (userspace client)
Kernel-side rbd.ko ticket is #190. Let this ticket stand for the librbd (userspace) support. Anonymous
02:11 PM Feature #1692 (Duplicate): librbd: Support TRIM (hole punching) (userspace client)
Anonymous
01:56 PM Bug #1691 (Can't reproduce): rados export failures
... Sage Weil
11:36 AM Linux kernel client Bug #1667 (Resolved): BUG at fs/inode.c line 1375
Sage Weil
11:17 AM rbd Feature #1662 (In Progress): libvirt: obscure qemu/rbd secrets
Sage Weil

11/06/2011

03:08 PM Linux kernel client Bug #1667: BUG at fs/inode.c line 1375
Sage Weil

11/05/2011

09:37 PM Linux kernel client Bug #1686 (Resolved): directory not empty errors
fixed commit:c6ffe10015f4e6fba8a915318b319c43aed1836f clear helper Sage Weil
09:37 PM Linux kernel client Bug #1687 (Resolved): directory existence failures
fixed commit:c6ffe10015f4e6fba8a915318b319c43aed1836f clear helper Sage Weil
01:38 AM Revision ae41f323 (ceph): OSD: write_info/log before dropping lock in generate_backlog
Bug #1530
This should fix the following race:
1) osd->generate_backlog does pg->assemble_backlog
2) osd->generate_ba...
Samuel Just
12:30 AM Revision fb70f5cc (ceph): FileJournal: stop using sync_file_range
Using sync_file_range means that neither any required metadata gets commited,
nor the disk cache gets flushed. Stop ...
Christoph Hellwig
12:29 AM Revision 585a46c5 (ceph): monclient: simplify auth_supported set
Use AuthSupported class instead of repopulating it ourselves.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:23 AM Revision a38c0054 (ceph): test_libcephfs
Greg Farnum
12:21 AM Revision 10141673 (ceph): Makefile: use static add for test_libcephfs_readdir.
Otherwise it doesn't seem to play nicely with teuthology/sepia
due to requiring the host to have gtest installed.
Si...
Greg Farnum

11/04/2011

09:57 PM Revision 5b4e9d31 (ceph): RadosModel: add DeleteOp to test object deletions
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
08:40 PM Revision 280a4d1d (ceph): rgw: fix tmp objects leakage
Yehuda Sadeh
08:13 PM Revision 8d914f0e (ceph): rgw: list system buckets through rados api
Yehuda Sadeh
08:13 PM Revision fc6522a8 (ceph): rgw: don't purge pools in any case
Yehuda Sadeh
06:44 PM Bug #1530: osd crash during build_inc_scrub_map
ae41f3232a39dbf33487ab02cbac292f58debea8 Samuel Just
04:59 PM Bug #1530: osd crash during build_inc_scrub_map
My best guess about this bug goes something like this:
1) osd->generate_backlog does pg->assemble_backlog
2) osd->g...
Samuel Just
05:20 PM Linux kernel client Bug #1686: directory not empty errors
this is probably due to the d_prune stuff i just pushed to master. need to do some serious debugging here.
the re...
Sage Weil
01:43 PM Linux kernel client Bug #1686 (Resolved): directory not empty errors
Today, many of the kclient ceph fs tests failed due to problems removing directories. This did not happen with yester... Josh Durgin
04:52 PM Bug #1689 (Can't reproduce): osd: segfault in recover_primary
This happened in run 1497, thrashing with the snaps workload, on 3 osds.... Josh Durgin
04:50 PM Bug #1529: cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")
Thrashing with the snaps workload triggered this on several osds in run 1497 today. Josh Durgin
03:21 PM Bug #1683: librados: list objects should also return locator key
Apparently, I implemented this about 2 months ago but didn't merge it... Samuel Just
01:19 PM Bug #1683 (Resolved): librados: list objects should also return locator key
Yehuda Sadeh
02:47 PM CephFS Bug #1472: cfuse hangs with v0.34
We're seeing similar hangs again. One thing I didn't mention in my previous posts, we are always adjusting the repli... Sam Lang
02:43 PM Bug #1688 (Closed): Benjamin: pg stuck in scrub
Looks like the bug is related to last_update_applied not getting up to last_update on primary. No further scrubbing ... Samuel Just
01:46 PM Linux kernel client Bug #1687 (Resolved): directory existence failures
Some benchmarks today failed to cd to directories. These worked yesterday.
From blogbench and ffsb:...
Josh Durgin
01:40 PM rgw Bug #1685 (Resolved): rgw: tmp objects leakage
Yes, but the problem was elsewhere. Fixed, commit:280a4d1ded4b83974805c60bcd410ee00ccc3884. Yehuda Sadeh
01:38 PM rgw Bug #1685: rgw: tmp objects leakage
This is probably due to to #1683, as tmp objects are all placed using locators, right? Greg Farnum
01:27 PM rgw Bug #1685 (Resolved): rgw: tmp objects leakage
After running radosgw-admin temp remove, we're still left out with objects from the tmp namespace. Either we fail to ... Yehuda Sadeh
01:21 PM Bug #1684 (Duplicate): mon: crash in CryptoKey::encrypt
From teuthology:~teuthworker/archive/nightly_coverage_2011-11-04/1472/teuthology.log:... Josh Durgin
01:17 PM rgw Bug #1672 (Resolved): rgw: support chunked transfer encoding
Done. Yehuda Sadeh
12:55 PM CephFS Bug #1682 (Resolved): mds: segfault in CInode::authority
From teuthology:~teuthworker/archive/nightly_coverage_2011-11-04/1469/teuthology.log:... Josh Durgin
12:30 PM rgw Bug #1681 (Resolved): rgw: user rm with --purge doesn't remove data
I just disabled it as it did it incorrectly Yehuda Sadeh
10:25 AM Feature #1618: libvirt: make sure migration works
Mike Lowe emailed me and mentioned it works for him on Oneiric with a custom kvm 0.15.1, no other changes. I still wa... Anonymous
09:53 AM CephFS Feature #1680 (New): support reflink (cheap file copy/clone)
It seems the API is still fs-specific ioctls, but there's repeated discussion about reflink(2).
If a nice common API...
Anonymous
08:19 AM Bug #1679: assertion failure is_replica()
Upon trying to restart the failed osds, other osds (7) fail:
*** Caught signal (Aborted) **
in thread 0x7fcceb...
Sam Lang
08:12 AM Bug #1679 (Can't reproduce): assertion failure is_replica()
3 boxes, 12 osds per box. 4 osds (9,11,20,24) crashed at the following assertion. This was triggered by first setti... Sam Lang

11/03/2011

11:01 PM Revision 0f98006c (ceph): rgw: fix PUT without content length (non chunked)
Yehuda Sadeh
10:46 PM Revision 256ac72a (ceph): rbd: document --order and list required args where they're necessary
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
09:43 PM Revision 0df3f036 (ceph): Merge remote branch 'nwatkins/for-master'
Greg Farnum
09:11 PM Revision 90249069 (ceph): Merge branch 'wip-getdir'
Greg Farnum
08:59 PM Revision b8733476 (ceph): gitignore: just ignore all test_ files
We don't want to add a new ignore for each test!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
08:55 PM Revision d4faf588 (ceph): qa: workunit to run test_libcephfs_readder
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
08:49 PM Revision 120c3fbd (ceph): test: write a test to try and check on Client::readdir_r_cb.
It's made difficult by having to go through libcephfs, but it's better
than nothing and should catch most of the erro...
Greg Farnum
08:39 PM Feature #1678 (Resolved): rados tool: ability to specify object locator
We need to be able to access objects with none-default locators. Yehuda Sadeh
08:27 PM Revision 4f3b1138 (ceph): ceph_manager: log ceph -s output so progress is visible in the logs
Josh Durgin
08:08 PM Revision 0b451f94 (ceph): Keep each ssh connection alive.
With long-running jobs like thrashing, ssh connections were timing
out.
Josh Durgin
08:07 PM Revision 6e3e0d7c (ceph): connection: allow the caller to specify whether keep-alive should be used
Josh Durgin
06:45 PM Revision 58eb8c5e (ceph): rgw: fix null deref, cleanups
Yehuda Sadeh
06:29 PM Revision 0d4987d9 (ceph): rgw: fix crash when accessing swift auth without user
Yehuda Sadeh
06:29 PM Revision 7726e78d (ceph): rgw: add support for chunked upload
Yehuda Sadeh
06:29 PM Revision b1a0c1ad (ceph): locker: fix race in locking
The isolation level is lower than I thought. This made it possible for
two clients to think they both locked the same...
Josh Durgin
04:39 PM CephFS Bug #1663 (Resolved): Hadoop: file ownership/permission not available in hadoop
This is still a pretty cheap fix :), but I think it's enough to close out this bug. Greg Farnum
04:12 PM CephFS Bug #1663: Hadoop: file ownership/permission not available in hadoop
a79b7e17ebbc70cedae80216986ae5fd52a1c0b7 provides an OK fix for now. Basically it makes any file look like the curren... Noah Watkins
04:08 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Bummer. Well... for the time being it may be sufficient to force FileStatus.getModificationTime() to go directly to t... Noah Watkins
03:58 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Yeah, it's not impossible, I just would have thought that one of the other updates would have prompted the server to ... Greg Farnum
03:52 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Do you mean that you are surprised that client-1's inode didn't get updated from the server's change before the stat ... Noah Watkins
03:49 PM CephFS Bug #1666: hadoop: time-related meta-data problems
If that's the case then I'm surprised the mtime didn't get updated at an earlier time. If nothing else we can probabl... Greg Farnum
03:44 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Greg Farnum wrote:
> So the "bad" mtime is the same time the inode was created on the MDS server?
I think so. Her...
Noah Watkins
03:35 PM CephFS Bug #1666: hadoop: time-related meta-data problems
So the "bad" mtime is the same time the inode was created on the MDS server? Greg Farnum
03:30 PM CephFS Bug #1666: hadoop: time-related meta-data problems
If Client-1 is seeing a cached copy of the inode's mtime, then the following server-side scenario may explain what's ... Noah Watkins
02:44 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Grepping for the inode number got me this:... Greg Farnum
01:20 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Sage Weil wrote:
> If you can generate client logs for C1 and C2 (debug ms = 1, debug client = 10) that should tell ...
Noah Watkins
11:44 AM CephFS Bug #1666: hadoop: time-related meta-data problems
If you can generate client logs for C1 and C2 (debug ms = 1, debug client = 10) that should tell us everything. Sage Weil
11:07 AM CephFS Bug #1666: hadoop: time-related meta-data problems
Just ran a little experiment that may shed some light on this.... Noah Watkins
03:49 PM CephFS Bug #1677: mds interval_set.h: 385: FAILED assert(p->first <= start)
Here is the log from the MDS that caused this. I have from the other mds's, mon, and osd if it is relevant -- but not... Noah Watkins
03:44 PM CephFS Bug #1677 (Resolved): mds interval_set.h: 385: FAILED assert(p->first <= start)
Noah got this and sent it to the mailing list on Oct 28, 2011:... Greg Farnum
02:15 PM Bug #1617 (New): pgs stuck down and peering with only one osd down and out
Happened again today in teuthology:~teuthworker/archive/nightly_coverage_2011-11-03/1433:... Josh Durgin
02:06 PM Messengers Bug #1674: daemons crash when sent random data
This is actually going to be pretty unpleasant. Removing the asserts that deliberately crash on unexpected types is e... Greg Farnum
06:29 AM Messengers Bug #1674 (Can't reproduce): daemons crash when sent random data
mon seem to crash every time, osd seem to take a few attempts (similar stack trace). not tested mds... John Leach
12:04 PM Bug #1676 (Resolved): stats mismatch during snaps workunit
It looks like this started failing between 10-20 and 10-24.... Josh Durgin
11:54 AM CephFS Bug #1675 (Can't reproduce): mds: failed rstat assert
This happened during the multiple_rsync workunit.
From teuthology:~teuthworker/archive/nightly_coverage_2011-11-03/1...
Josh Durgin
11:29 AM Bug #1671: rgw: access to swift auth url without user info crashes gateway
Ah, failed to push. Rebased commit:0d4987d990e9795fda75d9e7903ba2d449b11fec. Yehuda Sadeh
02:52 AM Revision 376dad92 (ceph): hadoop: remove unused fs_default_name
The variable fs_default_name is effectively unused
and the same affect is achieved by treating paths
in a standard wa...
Noah Watkins
02:51 AM Revision 3191e0db (ceph): hadoop: FileSystem.rename should not return FileNotFound
This fixes several unit test failure cases.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Noah Watkins
02:51 AM Revision 60e1e148 (ceph): hadoop: ENOTDIR should be negative
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Noah Watkins
02:51 AM Revision 6deea1c2 (ceph): hadoop: fix unit test: testWorkingDirectory
The working directory should be set in initialize() and
is expected by the unit tests to be fully qualified (i.e.
wit...
Noah Watkins
02:51 AM Revision ccb08e21 (ceph): hadoop: remove deprecation warning
The routine cannot be fully removed yet because it
still exists as an abstract function in FileSystem class.
Signed-...
Noah Watkins
02:51 AM Revision 1c24fc7a (ceph): hadoop: remove deprecated isDirectory()
Uses the suggested getFileStatus() method for
replacing the deprecated isDirectory(). This is
only marginally slower ...
Noah Watkins
02:51 AM Revision a407da0e (ceph): hadoop: remove statistics initialization
This is already handled by super.initialize()
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Noah Watkins
02:51 AM Revision dcf2d629 (ceph): hadoop: remove unused variable
Remove CephFileSystem.debug as log4j is now
used for debug level control.
Signed-off-by: Noah Watkins <noahwatkins@g...
Noah Watkins
02:51 AM Revision 9e8fa029 (ceph): hadoop: remove initialization check
The initialization check is removed because
it is part of Hadoop's treatment of file systems
that initialize() is cal...
Noah Watkins
02:51 AM Revision 3006c6e5 (ceph): hadoop: simplify workingDir handling; add home directory
1. Simplifies the handling of paths by allowing them to be passed
around and manipulated in their fully qualified for...
Noah Watkins
02:50 AM Revision a79b7e17 (ceph): hadoop: emulate Ceph file owner as current user
Make CephFileSystem tell Hadoop that the owner
of all files is the current user. This provides
zero security or isola...
Noah Watkins
02:49 AM Revision e9adf735 (ceph): hadoop: use standard log4j logging facility
Replace ceph.debug(msg, level) with LOG.level(msg)
provided by the log4j facility used by Hadoop. The
level can now b...
Noah Watkins
02:06 AM Bug #1529: cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")
Sorry for the slow response! Somehow I didn't get a e-mail update.
I do have logs preceeding the crash, but they a...
Wido den Hollander

11/02/2011

08:45 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Something like this would make the most sense to me. (I'd have to check the specifics of mtime updating to see exactly.) Greg Farnum
08:30 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Formatting oops:... Noah Watkins
08:29 PM CephFS Bug #1666: hadoop: time-related meta-data problems
You're right about that last point Greg, it doesn't quite add up--not thinking straight today.
Here is what happen...
Noah Watkins
07:46 PM CephFS Bug #1666: hadoop: time-related meta-data problems
I'd have to look at the specifics again -- but it probably can't be done. If the client buffers a write and then flus... Greg Farnum
06:39 PM CephFS Bug #1666: hadoop: time-related meta-data problems
So, I think I've got this nailed down. The good news is that the error was a clock sync issue. The bad news is that i... Noah Watkins
06:51 PM Revision c861ee10 (ceph): PG: mark scrubmap entry as not absent when we see an update
Previously, there would be an assert failure in _scan_list if we see an
object deleted and then recreated.
Signed-of...
Samuel Just
06:33 PM Revision a2f406ef (ceph): testrados: set CEPH_CLIENT_ID without a ;
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
04:23 PM Bug #1633: osd crash in CryptoKey::decrypt
Happened again today. I put the core and tarball on the gcov gitbuilder in ~ubuntu/bug_1633. Josh Durgin
03:45 PM Revision 78111d07 (ceph): Merge branch 'wip-freebsd'
Conflicts:
src/osd/OSD.cc
Sage Weil
03:44 PM Revision 47b70367 (ceph): debian: update VCS sources
Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu> Laszlo Boszormenyi
03:44 PM Revision 0b0f65a4 (ceph): add missingok to logrotate
When ceph is not running, it has no logs. Thus logrotate has nothing to
rotate. The missingok directive handles this ...
Laszlo Boszormenyi
03:44 PM Revision f4971328 (ceph): debian: empty dependency_libs in *.la files
Per policy and multiarch support.
Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
Laszlo Boszormenyi
03:44 PM Revision 26787ce3 (ceph): debian: add watch
Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu> Laszlo Boszormenyi
03:44 PM Revision ee34e09c (ceph): debian: fix libceph1 -> libcephfs1 rename
Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu> Laszlo Boszormenyi
02:34 PM rgw Bug #1673 (Won't Fix): rgw: mod_fastcgi needs to be backward compatible
The changes we introduced for 100-continue breaks the protocol, we need to make that optional that way or another. Yehuda Sadeh
02:18 PM rgw Bug #1672 (Resolved): rgw: support chunked transfer encoding
This is required for swift support. Currently mod_fastcgi doesn't support chunked transfer and we can't just use mod_... Yehuda Sadeh
01:33 PM Bug #1530: osd crash during build_inc_scrub_map
Alright, in irc, slb seems to have hit a related bug...with logging! Samuel Just
11:49 AM Bug #1530: osd crash during build_inc_scrub_map
c861ee105475b3f20f64f51b8611f9b69207ca8c should take care of the assert(!o.negative) error. Still trying to reproduc... Samuel Just
09:02 AM Bug #1530: osd crash during build_inc_scrub_map
Possibly related: the snaps workunit failed yesterday and today with bad stats:... Josh Durgin
08:53 AM Bug #1530: osd crash during build_inc_scrub_map
Two more tests hit this last night, and two other osds crashed due to an assert in build_inc_scrub_map:... Josh Durgin
12:41 PM Bug #1671 (Resolved): rgw: access to swift auth url without user info crashes gateway
Fixed, commit:add8f59df9b6ef63a8431d3415e791b14ce1fe3c. Yehuda Sadeh
12:36 PM Bug #1671 (Resolved): rgw: access to swift auth url without user info crashes gateway
Yehuda Sadeh
11:31 AM Bug #1657 (Resolved): teuthology: testrados failed to find conf
Forgot to include my fix for that, pushed: a2f406ef49a1e5ec31d90957122e14addf56901c. Samuel Just
08:58 AM Bug #1657 (New): teuthology: testrados failed to find conf
Failed due to escaped env setting:... Josh Durgin
09:35 AM Bug #1670 (Can't reproduce): osd: crash in update_heartbeat_peers
... Sage Weil
04:20 AM Revision 2fc01b52 (ceph): osdmaptool: test --create-with-conf with racks
Make sure we generate a map that will map (and not assert about bad
max_osd/max_device mismatch).
Signed-off-by: Sag...
Sage Weil
04:14 AM Revision 885d7148 (ceph): osdmap: assert that osdmap max_osds >= crushmap max_devices
This will catch potential array overruns before they happen.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
04:14 AM Revision 0bcdd4f3 (ceph): osdmap: fix off-by-one in build_simple_from_conf
maxosd is the highest osd id. set_max_osd(that + 1), since that is
setting the array size. This fixes references of...
Sage Weil
03:04 AM Revision b66847ea (ceph): osd: fix assert include
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

11/01/2011

11:07 PM Bug #1669: linux 32 bit kernel client ld libraries and rm issue
Yes it is much better. I used a git version of the kernel and it's version is 3.1.0+. It seems ldconfig and rm are ... Hong Cho
08:59 PM Bug #1669: linux 32 bit kernel client ld libraries and rm issue
There was a recent fix for 32-bit ino generation that will avoid this problem most of the time, although in theory yo... Sage Weil
07:00 PM Bug #1669 (Resolved): linux 32 bit kernel client ld libraries and rm issue
I am running ceph on 64 bit OS (Debian linux-image-3.0.0-2-x86_64). It is on two machines each of them having 1 mon,... Hong Cho
11:02 PM Revision 219141e9 (ceph): rgw: swift prefix and path params fixes
Yehuda Sadeh
08:12 PM Revision 143c572b (ceph): .gitignore: test_str_list
Sage Weil
08:10 PM Revision aa5f697f (ceph): Makefile: include/compat.h in tarball
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:35 PM Revision 9252dccc (ceph): Merge branch 'master' into wip-freebsd
Sage Weil
06:49 PM Revision b3b45bf9 (ceph): Merge remote-tracking branch 'gh/wip-auth'
Sage Weil
06:43 PM Revision 79d9718d (ceph): common: make get_str_list work with other delimiters, and skip the
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:43 PM Revision 99bcd7b5 (ceph): common: get_str_list unit tests
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:19 PM Revision ba8c345b (ceph): monclient: fail fast when our auth protocols aren't supported
This handles the case where the server does not support any of the
authentication protocols that the client does. Pre...
Josh Durgin
06:19 PM Revision 7a4c232f (ceph): monclient: fix else formatting
If one branch has braces, the other should too.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
06:16 PM Revision d1e95134 (ceph): PG: set_last_peering_reset in Reset constructor
If an osd in the prior set comes up, we can restart peering without a
new peering interval starting. However, we sti...
Samuel Just
05:46 PM Revision e15177ab (ceph): monclient: fail fast when our auth protocols aren't supported
This handles the case where the server does not support any of the
authentication protocols that the client does. Pre...
Josh Durgin
05:46 PM Revision ef51f0fa (ceph): monclient: fix else formatting
If one branch has braces, the other should too.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
02:56 PM Bug #1633: osd crash in CryptoKey::decrypt
have a core but no matching binary :(. need to reproduce again, and save the build tarball. Sage Weil
01:03 PM devops Feature #1668 (New): collectd: push ceph plugin upstream
Rebase the perfcounter ceph plugin in the dho collectd repo against mainline collectd and push upstream. Sage Weil
11:09 AM Bug #1530: osd crash during build_inc_scrub_map
can someone work on reproducing this? see metropolis:~sage/src/teuthology/j.1530 and hammer.sh Sage Weil
10:11 AM Bug #1530: osd crash during build_inc_scrub_map
This happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-11-01/1254/remote/ubuntu@sepia68.ceph.dr... Josh Durgin
11:08 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
Someone needs to try to reproduce this with logs. fwiw metropolis:~sage/src/teuthology/hammer.sh is what i've been u... Sage Weil
10:22 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
This happened after the misc workunit today. Josh Durgin
08:49 AM Linux kernel client Bug #1667 (Resolved): BUG at fs/inode.c line 1375
... Sage Weil

10/31/2011

10:03 PM Revision 9ea02239 (ceph): osd: kill unused on_osd_failure() hook
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
10:00 PM Revision 1d9e8065 (ceph): RadosModel.h: use default conf location
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:54 PM Revision 810cae1a (ceph): testrados: specify CEPH_CONF directly
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:02 PM Revision b9a0b2b7 (ceph): Revert "PG: call set_last_peering_reset in Started contructor"
Unfortunately, the Started constructor doesn't occur until map
activation. We need to reset last_peering_reset exact...
Samuel Just
06:15 PM Revision f9b7ecdb (ceph): hadoop: Return NULL when the path does not exist.
Although unspecified in the declaration header, other file
systems return a single result when the path is a file.
T...
Noah Watkins
05:53 PM Bug #1633: osd crash in CryptoKey::decrypt
Another occurrence in teuthology:~teuthology/archive/nightly_coverage_2011-10-28/1170/remote/ubuntu@sepia50.ceph.drea... Josh Durgin
05:32 PM CephFS Bug #1666: hadoop: time-related meta-data problems
It looks like the check is equality of timestamps. So, I think Hadoop is setting an explicit timestamp, and sometime ... Noah Watkins
05:30 PM CephFS Bug #1666: hadoop: time-related meta-data problems
All of the local clocks on the nodes look good. The code is comparing timestamps (I assume since epoch), so maybe the... Noah Watkins
05:06 PM CephFS Bug #1666: hadoop: time-related meta-data problems
Neither of these errors are in code that's remotely familiar to me. So my first favorite question is:
Are your clock...
Greg Farnum
04:55 PM CephFS Bug #1666 (Resolved): hadoop: time-related meta-data problems
The following exceptions are being thrown. It looks like something related to lstat?
pre>
java.io.IOException: Th...
Noah Watkins
02:59 PM Bug #1657 (Resolved): teuthology: testrados failed to find conf
Should work now
ceph: 1d9e8065c835c343608930585c2853984cde2fa8
teuthology: 810cae1a1d03138abfa54cd31059723ec0c22ab1
Samuel Just
02:04 PM Bug #1665 (Resolved): osd: last_peering_reset incorrect on stray?
b9a0b2b7a4d3b5a7db1f942af0158712199377a8 reverted 6d123067ce1ba99522281d5c72623bd5ba3e0fc8 Samuel Just
12:09 PM Bug #1665: osd: last_peering_reset incorrect on stray?
this is why. the interval starts at 150,a nd that is when teh query is sent. on the stray, we hit it in 151:... Sage Weil
11:46 AM Bug #1665 (Resolved): osd: last_peering_reset incorrect on stray?
on alexandria,... Sage Weil
01:55 PM Bug #1588 (Can't reproduce): blogbench on kclient possibly made machine die
I think this is fixed - the nightly tests haven't hit it in the past week, since 339573406737461cfb17bebabf7ba536a302... Josh Durgin
11:35 AM CephFS Bug #1661 (Resolved): Hadoop: expected system directories not present
Apparently this was actually the result of an API mismatch. Fixed by Noah's patch in commit:f9b7ecdb5bba1439dc4c13005... Greg Farnum
11:26 AM Feature #1618: libvirt: make sure migration works
Braindump of what I did for the earlier libvirt migration demo:
- on each vm host, install kvm 0.15 (0.14 is too o...
Anonymous
09:13 AM Bug #1415 (Duplicate): cosd assertion: existing->state == STATE_CONNECTING || existing->state ==...
Sage Weil
09:11 AM rgw Feature #1664 (Resolved): rgw: pass swift tests
Sage Weil
09:06 AM Messengers Feature #1648 (Duplicate): msgr: choose ip to bind to based on network
Sage Weil
09:02 AM Messengers Feature #1648: msgr: choose ip to bind to based on network
duplicates #1487 Sage Weil
07:58 AM Bug #1529: cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")
Sage Weil wrote:
> Do you have the odd log preferring the restart?
Er, osd log preceeding ...
Sage Weil
07:54 AM Bug #1529: cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")
Do you have the odd log preferring the restart? Sage Weil
06:46 AM Bug #1529: cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")
I'm still seeing this one. All my 6 OSDs went down and after starting them most of them would crash:... Wido den Hollander
 

Also available in: Atom