Project

General

Profile

Activity

From 12/29/2011 to 01/27/2012

01/27/2012

09:27 PM Revision 56d164c8 (ceph): mon: stale pgs -> HEALTH_WARN
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
09:21 PM Revision 61c54a79 (ceph): mon: mark pgs stale in pg_map if primary osd is down
This alerts the administrator when all OSDs for a PG have failed and the
monitor doesn't receive any further updates....
Sage Weil
09:09 PM Bug #1997 (Resolved): teuthology: wait for clean osd shutdown before umount
/a/master-2012-01-27_13:29:47/9361... Sage Weil
09:02 PM Revision 6e44af9f (ceph): osd: add STALE pg state bit
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
09:00 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
again,... Sage Weil
08:58 PM CephFS Bug #1996 (Duplicate): mds: scatter_nudge() bad pointer on shutdown?
... Sage Weil
08:35 PM Revision c1345f71 (ceph): v0.41
Sage Weil
08:23 PM Revision 374fec47 (ceph): objector: document Objecter::init_ops()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
08:23 PM Revision 6d37d5c9 (ceph): objecter: fix out_* initialization
This looks more like the real cause for #1986. Op ctor gets a vector of
ops but out_* aren't initialized to match.
...
Sage Weil
07:21 PM Revision 995ff222 (ceph): osd: remove unused PG::block_if_wrlocked declaration
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
07:21 PM Revision 94729206 (ceph): Revert "common/Throttle: Remove unused return type on Throttle::get()"
This reverts commit 4549501c9b0968ce4243e06ff7e9ef03b19de667.
We're about to use it to avoid a time lookup if possibl...
Greg Farnum
06:45 PM Revision 946da5a3 (ceph): filestore: dump offending transaction on any error
Clean this code up to explicitly whitelist what is ok so that the flow is
less annoying to follow/maintain, and so th...
Sage Weil
06:40 PM Revision 6453123c (ceph): objecter: warn when OSD returns mismatched op vector
The osd shouldn't do this (even though we should tolerate it).
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed...
Sage Weil
06:39 PM Revision 0cc26a94 (ceph): objecter: fix bounds checking on op reply demuxing
We can't assume that the size of out_ops (from the reply) matches the
op->out_* vectors from our request state. In p...
Sage Weil
06:28 PM Feature #1881 (Resolved): objecter: expose in-progress request state via admin socket
Implemented in commit:097bc5cb1dbc83d8b09d4cb95c3c5abd1874de77 and added to the qa suite. Josh Durgin
04:50 PM Feature #1881: objecter: expose in-progress request state via admin socket
This is implemented in the wip-track-objecter-reqs branch in ceph.git, and testing is enabled by wip-admin-socket in ... Josh Durgin
06:01 PM Revision 9b554d4c (ceph): mds: remove test assert
Grr!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
02:32 PM Revision b8e6a6bd (ceph): assert: include timestamp
Also drop quotes around thread id.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
01:28 PM Feature #1993: mon: warn admin about down pgs
Sage Weil
11:18 AM Feature #1993 (Resolved): mon: warn admin about down pgs
Sage Weil
01:21 PM Bug #1995 (Resolved): Turn down non-btrfs warning in FileStore
... Greg Farnum
12:20 PM RADOS Feature #1994 (New): osd: expire objects using scrubbing
We can set an attribute on the object that would set its expiration, and check that attribute when doing the scrubbing. Yehuda Sadeh
11:46 AM Bug #1992: OSD::get_or_create_pg
Er, actually, the OSD is getting an MOSDPGLog with info DNE (presumably uninitialized). That appears to be non-kosher... Greg Farnum
11:00 AM Bug #1992: OSD::get_or_create_pg
The assert here is because the PG doesn't exist yet but the OSD is not the primary for that PG. It's getting into get... Greg Farnum
07:26 AM Bug #1992 (Can't reproduce): OSD::get_or_create_pg
I've just upgraded my 0.39 cluster to 0.40 and that didn't go that well.
The whole cluster started bouncing and cr...
Wido den Hollander
09:50 AM Bug #1943: osd: bad clone transaction on journal replay
hit this again on ubuntu@teuthology:/var/lib/teuthworker/archive/nightly_coverage_2012-01-27-a/9261. thrashing.
a...
Sage Weil
06:49 AM Bug #1986: objecter: segfault during osd op reply demux
wip-1986 Sage Weil

01/26/2012

09:36 PM Bug #1986: objecter: segfault during osd op reply demux
nevermind, wrong branch Sage Weil
09:34 PM Bug #1986: objecter: segfault during osd op reply demux
I can't find 'if (*p)' anywhere in osdc/Objecter.cc... what commit was this on? Sage Weil
10:41 AM Bug #1988 (Won't Fix): osd: scrub stat mismatch
removed num_kb code entirely. Sage Weil
10:12 AM Bug #1984: osd: failed assert, got into finish_recovery_ops without any recovery ops active?
http://85.214.49.87/ceph/20120124/osd.0.log.bz2 Sage Weil
08:17 AM Linux kernel client Bug #147: lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and con->mutex
I'll take this. I may have reordered something in my recent commits
to cause this to surface.
Alex Elder
07:54 AM Revision b3c80bcb (ceph): rgw: acls cleanup wip
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Yehuda Sadeh
07:17 AM Bug #1974: osd: radosmodel crash on thrashing
... Sage Weil

01/25/2012

11:40 PM Revision 91b547b9 (ceph): osd: remove the unused require_current_map
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
11:09 PM Revision b5371403 (ceph): Merge branch 'master' into wip-encoding
Conflicts:
src/osd/osd_types.h
Sage Weil
10:07 PM Revision 2bc71056 (ceph): filestore: fix typo
Grr
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
10:04 PM Revision fe2834f6 (ceph): remove snap thrashing from regression suite for time being
Sage Weil
10:03 PM Revision 0b088eb5 (ceph): Merge branch 'wip-kb'
Reviewed-by: Samuel Just <samuel.just@dreamhost.com> Sage Weil
09:58 PM Revision 4454d391 (ceph): Merge remote branch 'upstream/wip-osd-clone-obc'
Samuel Just
09:52 PM Revision ec7a1402 (ceph): filestore: zero btrfs vol_args prior to ioctl
Just to be paranoid. Nothing we haven't set *should* affect the ABI,
but...
Always do this immediately after declar...
Sage Weil
08:40 PM Revision 625b0b02 (ceph): osd: remove num_kb from object_stat_sum_t stats
This is redundant--we can just use num_bytes. If we're worried about the
per-object overhead or rounding, we can fac...
Sage Weil
08:40 PM Revision dedf5758 (ceph): mon: num_kb -> num_bytes in cluster perfcounters
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:15 PM CephFS Bug #1991 (Duplicate): mds: crash during clean shutdown
Teuthology config:... Josh Durgin
06:01 PM Linux kernel client Bug #147: lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and con->mutex
This happened in a teuthology run of iozone on rbd. From teuthology:~teuthworker/archive/master-2012-01-25_14:30:58/9... Josh Durgin
05:56 PM Revision acb164c8 (ceph): osd: improve object context debug output
Include pointer. This may help with #1979.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
03:59 PM Bug #1980 (Closed): osd/PG.cc: 3562: FAILED assert(p->second.need <= v)
f16b38deaee3d0ed35229aecba4f8f12b8404f03 should take care of this. Samuel Just
03:13 PM Linux kernel client Bug #1990 (In Progress): rbd: null pointer dereference during map
I was already looking at another commit which I found after
review to be suspect:
rbd: adequately protect rbd cli...
Alex Elder
11:32 AM Linux kernel client Bug #1990 (Resolved): rbd: null pointer dereference during map
These commands from teuthology:... Josh Durgin
01:59 PM Bug #1978 (Resolved): osd: FAILED assert(!object_contexts.size())
Sage Weil
10:28 AM Bug #1978: osd: FAILED assert(!object_contexts.size())
this should be resolved by 44b11441ad3ef231ff207476bbb0d2e8ab130f26 once it's in master. Samuel Just
01:41 PM Feature #1885: identify top 10 expected failures and process to diagnose
Mark Kampe wrote:
> Additional issues from Carl's list:
> * RGW request timeouts
That's a symptom, not a cause...
...
Greg Farnum
01:25 PM Feature #1885: identify top 10 expected failures and process to diagnose
Additional issues from Carl's list:
* RGW request timeouts
* OSD file system timeouts
* OSD that is "down" but sti...
Anonymous
01:17 PM Bug #1949: osd: ENOTEMPTY on collection removal from snaptrimmer
i have a full log (osd 20 filestore 20) leading up to this at metropolis:/home/sage/osd.enotempty.log Sage Weil
10:45 AM Bug #1981 (Won't Fix): pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(...
Glad to hear it! Greg Farnum
10:30 AM Bug #1982 (Closed): osd: failed assert (obc->watchers.size())
should be fixed in b17736a12611a12461df26fb184acc5d85f82fea Samuel Just
10:29 AM Bug #1979 (Duplicate): osd: suicide timeout on recovery_tp... heap corruption?
same as 1978 Samuel Just
09:56 AM Bug #1979 (Need More Info): osd: suicide timeout on recovery_tp... heap corruption?
pushed a patch to include pointer in get/put object_context to help narrow this down. Sage Weil
10:05 AM Bug #1989: teuthology: error in ceph.log didn't make teutholgy return error code
we whitelist log entries. it only prints that (and sets success=False) if it sees something unexpected... Sage Weil
10:02 AM Bug #1989: teuthology: error in ceph.log didn't make teutholgy return error code
I thought we turned this off on purpose because thrashing always triggered it. Am I remembering incorrectly? Greg Farnum
09:50 AM Bug #1989 (Resolved): teuthology: error in ceph.log didn't make teutholgy return error code
in a bash while loop, I saw... Sage Weil
10:01 AM Bug #1988: osd: scrub stat mismatch
I've never understood why we track them separately to begin with, myself. :) Greg Farnum
09:48 AM Bug #1988: osd: scrub stat mismatch
I suspect the problem is due to the rounding off when we are doing the complicated dance of keeping these stats corre... Sage Weil
09:35 AM Bug #1988 (Won't Fix): osd: scrub stat mismatch
on thrash + radosmodel workload. bytes match, but kb don't:... Sage Weil
09:15 AM Bug #1975 (Need More Info): btrfs: EINVAL on snap create
added btrfs printks to figure out where that EINVAL is coming from. also changed it to return EBADF if the fd is inva... Sage Weil
09:08 AM Bug #1975: btrfs: EINVAL on snap create
... Sage Weil
06:03 AM Revision f16b38de (ceph): osd: track obc for clone from log replay
We need to keep an in-memory obc to track the state of the in-flight io
to disk. This is analogous to when an object...
Sage Weil
05:34 AM Revision 44b11441 (ceph): osd: set object_info_t::oid properly when recovering clones
I saw a case (#1973) where the clone had the oid set to the head. That is
clearly wrong. Not sure what damage this ...
Sage Weil
05:19 AM Revision abc005a5 (ceph): Merge remote branch 'gh/wip-filestore-errors'
Sage Weil
05:18 AM Revision eec87bb8 (ceph): package *.py* files
Some post-install rpmbuild defaults byte-compile all packaged python
files, so don't bother removing the .pyc files, ...
Alexandre Oliva
01:08 AM Revision 2c2cc159 (ceph): librbd: don't infinite loop when header is too large
Since snapshots are currently stored at the end of the header, having
many snapshots made the header larger than the ...
Josh Durgin
12:50 AM Revision 746a2302 (ceph): ReplicatedPG: data_subset may be empty during sub_op_push
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
Samuel Just

01/24/2012

10:15 PM Revision 72cd1210 (ceph): rgw: acl changes compile
and link. Doesn't work though.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Yehuda Sadeh
09:23 PM Revision f3d200c0 (ceph): filestore: fix non-::-prefixed close
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
09:22 PM Revision a49a53d7 (ceph): filestore: add debugging to each error case in lfn_open
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
09:16 PM Revision 0fd6ca9a (ceph): filestore: audit + clean up error checks
- use temp var for errno
- in general return -errno from helpers
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
09:16 PM Revision a43937f0 (ceph): filestore: return -errno from lfn_open
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
09:16 PM Revision ae36f599 (ceph): filestore: TEMP_FAILURE_RETRY on ::close(2)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
08:57 PM Revision 2835d402 (ceph): rgw: rgw_acl_s3.* compiles
very much wip
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Yehuda Sadeh
07:28 PM Revision 4aa9ca45 (ceph): CephManager: base timeout on time since last change in active+clean
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
06:56 PM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
Greg Farnum wrote:
> Okay, that's a cross-version message incompatibility. You should be able to resolve your issue ...
James Oakley
06:18 PM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
Okay, that's a cross-version message incompatibility. You should be able to resolve your issue by just upgrading the ... Greg Farnum
04:59 PM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
Greg Farnum wrote:
> Okay, I see the bug which is the immediate cause of the OOM[1], but I haven't yet tracked down ...
James Oakley
04:15 PM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
Okay, I see the bug which is the immediate cause of the OOM[1], but I haven't yet tracked down the actual triggering ... Greg Farnum
02:02 PM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
Greg Farnum wrote:
> Actually James, could you tell me what actions you took that resulted in this log? It looks lik...
James Oakley
01:34 PM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
Actually James, could you tell me what actions you took that resulted in this log? It looks like:
started OSD at 10:...
Greg Farnum
11:54 AM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
That log contains enough to get started on; yesterday's won't matter. If you could make sure that your logs from the ... Greg Farnum
11:44 AM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
I am attaching the logs for today. I did shut this box down yesterday to upgrade it, but due to build system delays, ... James Oakley
11:42 AM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
Dirk Meister wrote:
> Greg Farnum wrote:
> > A 33GB core file is...very large! Did you have any logging enabled th...
Greg Farnum
11:37 AM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
Greg Farnum wrote:
> A 33GB core file is...very large! Did you have any logging enabled that might let us see what h...
Dirk Meister
11:25 AM Bug #1981: pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)
A 33GB core file is...very large! Did you have any logging enabled that might let us see what happened?
When you s...
Greg Farnum
11:22 AM Bug #1981 (Won't Fix): pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(...
I just upgraded one of my boxes from 0.39 to 0.40, and I am getting this shortly after starting the osds:... James Oakley
05:28 PM Bug #1987 (Resolved): librbd: listing an image with more than ~200 snapshots infinite loops and c...
Fixed by commit:2c2cc1596cd63b6368d13a2665c6c85d3d8ed532. Josh Durgin
05:01 PM Bug #1987 (Resolved): librbd: listing an image with more than ~200 snapshots infinite loops and c...
As reported in http://article.gmane.org/gmane.comp.file-systems.ceph.devel/5038. Josh Durgin
04:58 PM Bug #1986 (Resolved): objecter: segfault during osd op reply demux
This happened on master + an rbd fix when running 'rbd snap purge blah', when the blah image had > 200 snapshots. Cor... Josh Durgin
04:25 PM Messengers Bug #1985 (Won't Fix): msgr: creating new Pipe for pre-existing connection leaks Pipe if they don...
See #1981. Turns out that if we have an existing Pipe but don't replace it, we never clean ourselves up and we need to. Greg Farnum
01:20 PM Bug #1979: osd: suicide timeout on recovery_tp... heap corruption?
Ooh, I saw exactly that yesterday. IIRC there was a get_object_context for some object (refcount now one), and then ... Sage Weil
11:21 AM Bug #1979: osd: suicide timeout on recovery_tp... heap corruption?
It looks like there was a use after free or heap corruption - the recovery thread was stuck waiting on an invalid obj... Josh Durgin
09:52 AM Bug #1979 (Duplicate): osd: suicide timeout on recovery_tp... heap corruption?
/var/lib/teuthworker/archive/nightly_coverage_2012-01-24-a/8882... Sage Weil
01:10 PM Bug #1984 (Can't reproduce): osd: failed assert, got into finish_recovery_ops without any recover...
osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t&, bool)', in thread '7f1fdab26700'
osd/PG.cc: 15...
Greg Farnum
01:09 PM Bug #1983 (Resolved): osd: failed assert, info does not match peer info
... Greg Farnum
01:07 PM Bug #1982 (Closed): osd: failed assert (obc->watchers.size())
... Greg Farnum
11:50 AM Feature #1885: identify top 10 expected failures and process to diagnose
OSD:
* cascading failures
* single OSD failure
* failure to complete peering/recovery
* unfound objects after rec...
Anonymous
11:28 AM Bug #1976 (Closed): osd: timeout getting clean
In the case of 8880 at least, it looked like the osds were still making progress. I've changed wait_till_clean to ti... Samuel Just
09:38 AM Bug #1976 (Closed): osd: timeout getting clean
actually, this may be a monitor thing. seen it twice now:
/var/lib/teuthworker/archive/nightly_coverage_2012-01-2...
Sage Weil
10:46 AM Bug #1975: btrfs: EINVAL on snap create
wip-filestore-errors looks good to me except for one comment on github. Samuel Just
10:03 AM Bug #1975: btrfs: EINVAL on snap create
wip-filestore-errors check should be reviewed+merged so we can see the actual error code. sadly,... Sage Weil
09:35 AM Bug #1975 (Won't Fix): btrfs: EINVAL on snap create
/var/lib/teuthworker/archive/nightly_coverage_2012-01-24-a/8879... Sage Weil
09:52 AM Bug #1980 (Closed): osd/PG.cc: 3562: FAILED assert(p->second.need <= v)
/var/lib/teuthworker/archive/nightly_coverage_2012-01-24-a/8882... Sage Weil
09:50 AM Bug #1978 (Resolved): osd: FAILED assert(!object_contexts.size())
/var/lib/teuthworker/archive/nightly_coverage_2012-01-24-a/8882... Sage Weil
09:45 AM Bug #1977 (Can't reproduce): mon: ceph command hang
/var/lib/teuthworker/archive/nightly_coverage_2012-01-24-a/8881... Sage Weil
09:31 AM Feature #1655: gitbuilder aggregator page
It's just a quick perl hack. What it really should do is make javascript and <div>s to fetch the results for each bu... Sage Weil
04:47 AM Feature #1655: gitbuilder aggregator page
Is the source available for the gitbuilders.cgi aggregator? It looks like a pretty useful script for other projects w... Jimmy Tang

01/23/2012

09:50 PM Revision 1e421093 (ceph): Merge commit '9dc7b9233b985bf859751fc89a5b02253e829836'
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com> Sage Weil
09:16 PM Bug #1974 (Resolved): osd: radosmodel crash on thrashing
... Sage Weil
09:11 PM Bug #1973: osd: segfault in ReplicatedPG::remove_object_with_snap_hardlinks
this is the _9 snap object_info_t:... Sage Weil
09:02 PM Bug #1973 (Can't reproduce): osd: segfault in ReplicatedPG::remove_object_with_snap_hardlinks
/var/lib/teuthworker/archive/nightly_coverage_2012-01-23-b/8775... Sage Weil
08:50 PM Revision 54a76734 (ceph): ceph: don't write output on error
Accumulate all output, and write it at the end. This way we can avoid
writing it if any of the commands fail.
Fixes...
Sage Weil
08:50 PM Revision cfe1d011 (ceph): ceph: bail out on first failing command
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
08:50 PM Revision 9dc7b923 (ceph): rgw: fix warning
rgw/rgw_rest.cc:258: warning: comparison between signed and unsigned integer expressions
Signed-off-by: Sage Weil <s...
Sage Weil
06:24 PM Revision c5e7a74c (ceph): .gitignore: ceph-dencoder
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:21 PM Revision 7ce544e6 (ceph): osd: ignore MInfoRec, MNotifyRec in WaitActingChange
We should ignore logs, infos, and notifies while we are waiting for the
map to change. Peering has reached a dead-en...
Sage Weil
05:53 PM Revision d9eedf53 (ceph): rgw: fix warning in 32bit arch
Yehuda Sadeh
05:19 PM Revision 1a10b517 (ceph): ceph-dencoder: needs ceph_ver.h dependency
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
05:02 PM Revision 92a8f5e7 (ceph): pg: unindex entries when clearing or removing from the log
Leaving the index around could cause use of the indexes to access
freed memory.
Signed-off-by: Josh Durgin <josh.dur...
Josh Durgin
05:02 PM Revision 5451d871 (ceph): osd: do not clobber log on backfill progress update
This is unnecessary and counterproductive, since the log is used to detect
dup ops. It's an artifact of an earlier b...
Sage Weil
03:58 PM Bug #1936: teuthology: github downtime -> failed runs
Greg Farnum wrote:
> Is this going to be okay if the update hangs in the middle and then qa clones the repository?
...
Anonymous
02:18 PM Revision f002ed4c (ceph): features: #include ceph_features directly where needed
Less rebuild time when touched.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
01:58 PM Bug #1954 (Resolved): ceph tool: don't create output files when an error occurs
commit:54a76734b11c87f1edab993f15dc0a754b843019 Sage Weil
12:44 PM Tasks #1923 (Resolved): document required properties and features for alternative backend file sy...
Updating wiki also.
Ceph requirements for running over alternative backend filesystems (non btrfs):
1. support ...
Yehuda Sadeh
11:20 AM Feature #1972 (Resolved): encoding: cross-version test repo, scripts
Sage Weil
11:16 AM Feature #1971 (Resolved): encoding: adapt to messages
Sage Weil
11:13 AM Feature #1970 (Resolved): osd: migrate to new encoding schemes
Sage Weil
11:11 AM Feature #1934 (Closed): Get new Sepia machines into service
Sage Weil
11:10 AM Feature #1969 (Resolved): gitbuilder for 11.10, 12.04
Sage Weil
11:06 AM Feature #1968 (Rejected): ferro: Batch resource allocation (not fair, no quotas yet)
Anonymous
11:06 AM Feature #1967 (Rejected): ferro: Single API endpoint that delegates to machine managers
Anonymous
11:05 AM Feature #1966 (Rejected): ferro: Connect actions to state machine
Anonymous
11:05 AM Feature #1965 (Rejected): ferro: Machine management state machine (fake actions)
Anonymous
11:04 AM Feature #1964 (Rejected): ferro: Create a cloud-init OVF config that reimages a machine
Anonymous
11:04 AM Feature #1963 (Closed): ferro: OVF Environment creation as a library
Anonymous
11:03 AM Feature #1962 (Rejected): ferro: Trigger vMedia boot via IPMI/DRAC
Anonymous
11:03 AM Feature #1881 (In Progress): objecter: expose in-progress request state via admin socket
Sage Weil
11:03 AM Feature #1961 (Rejected): ferro: Python wrapper for vmcli (using gevent)
Anonymous
10:29 AM Bug #1958 (Resolved): osd: crash during peering due to receiving an info msg in WaitActingChange
commit:7ce544e640d45e901ef67e8268c963c958a66eff Sage Weil
06:59 AM Bug #1958: osd: crash during peering due to receiving an info msg in WaitActingChange
fix pushed to commit:2f6205e57c7b8a21da72f0af8f1edd38a5989149 Sage Weil
10:00 AM Cleanup #1960: You should be able to print daemon options without specifying a config file
Greg Farnum wrote:
> Ew! This problem probably occurs with the mon and mds, but maybe not.
I can confirm this occ...
Adam Jacob Muller
09:43 AM Cleanup #1960 (Resolved): You should be able to print daemon options without specifying a config ...
... Greg Farnum
09:46 AM Bug #1959 (Resolved): qa: half of nightlies failing with chef+ruby error
sepia29 has bad disk, marked out. Sage Weil
06:16 AM Revision e1044712 (ceph): osd: pg_stat_t generator
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision c095c354 (ceph): osd: uninline pool_stat_t methods
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision 37d38f77 (ceph): osd: pool_stat_t generator
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision f09c01f7 (ceph): osd: watch_info_t generator
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision abb0510b (ceph): objectstore: implement generator
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision 58988920 (ceph): ceph-dencoder: reenable generated types
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision 10b87ed0 (ceph): objectstore: drop unused Transaction::p
This should have gone away when we added the iterator.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
06:16 AM Revision 99851450 (ceph): ceph-dencoder: fix up usage a bit
- verb_noun
- "in-memory"
- 1-based test index
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
06:16 AM Revision 56c5e851 (ceph): osd: initialize fields in watch_info_t constructor
Go-go unit tests!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
06:16 AM Revision 0ced79db (ceph): objectstore: remove unused setattr variants
No callers. Fugly.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
06:16 AM Revision 0b2cf7de (ceph): osd: pool_snap_info_t generator
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision e07178fb (ceph): osd: pg_pool_t generator
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision adcc9bf7 (ceph): osd: object_sum_stat_t generator
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision ee823972 (ceph): osd: uninline object_stat_sum_t methods
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision 8e9c229b (ceph): osd: generator for object_stat_collection_t; uninline too
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision 89b189ab (ceph): osd: uninline pg_stat_t methods
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:16 AM Revision 40f59b80 (ceph): msg: entity_{name,addr}_t generators
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:43 AM Revision 7ece2787 (ceph): osd: dump and generators for OSDSuperblock
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:43 AM Revision f1af44f6 (ceph): test-generated.sh
Test built-in test instances
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
04:43 AM Revision 47e20068 (ceph): ceph-dencoder: implement 'version' command
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:43 AM Revision fa20be37 (ceph): qa: misc encoding scripts, in various states of usefullness.
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:43 AM Revision 2baa6c0d (ceph): encoding: adjust ENCODE/DECODE macros
- make argument order consistent
- 'v' for code version
- 'compat' for compat version (lower bound)
- *_FINISH needs ...
Sage Weil
04:41 AM Revision 354f2cbe (ceph): ceph-dencoder: encode/decode/dump test tool
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 AM Revision 942f302b (ceph): move feature bit definition to separate header file
It was sloppy to have this in SimpleMessenger.h
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:41 AM Revision e674f2b4 (ceph): features: add missing features to default set
UID, MONCLOCKCHECK. No practical impact here, since they only mattered to
the mon and that specified them explicitly...
Sage Weil
04:41 AM Revision c0bdb071 (ceph): ceph-dencoder: support feature bits
Print our version's feature bits with --get-features.
Specify bits to encode with via -f <val>.
Signed-off-by: Sage...
Sage Weil
04:41 AM Revision 3f30a6f2 (ceph): objectstore: implement Transaction::dump(Formatter*)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 AM Revision 7c0a4da3 (ceph): ceph-dencoder: ObjectStore::Transaction
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 AM Revision 7393ad19 (ceph): ceph-dencoder: fix build
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 AM Revision 7d4c3dbc (ceph): osd: osd_stat_t generator
Generate some test object instances.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:41 AM Revision bbf18268 (ceph): ceph-dencoder: generate object instances from static method
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 AM Revision 03b6113a (ceph): ceph-dencoder: clean up usage
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 AM Revision 45649214 (ceph): osd: implement watch_info_t, object_info_t::dump()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 AM Revision 80b80b05 (ceph): filejournal: use ::encode() wrapper func for Transaction
So we can capture the output.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:41 AM Revision cdeeed6c (ceph): msgr: dump() entity_name_t and entity_addr_t
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:36 AM Revision a7106421 (ceph): encoding: instrument to dump encoded objects
If built with -DENCODE_DUMP=path, dump encoded copies of objects to that
directory. Limit the copies of each class w...
Sage Weil
04:36 AM Revision d3213935 (ceph): encoding: new {ENCODE,DECODE}_{START,FINISH} macros
New macros to bracket encode/decode methods. The encoding scheme:
1 byte - version
1 byte - incompat version
4 b...
Sage Weil

01/22/2012

02:55 PM Bug #1959 (Resolved): qa: half of nightlies failing with chef+ruby error
Sage Weil
02:55 PM Bug #1936 (Resolved): teuthology: github downtime -> failed runs
Sage Weil
02:54 PM Messengers Bug #1942 (Won't Fix): msgr: Address family not supported by protocol
Sage Weil
02:44 PM Feature #1884 (Resolved): plan encoding strategy to test+facilitate non-disruptive upgrades
Sage Weil
06:06 AM Bug #1849: directories' timestamps in snapshots sometimes change when directory is modified
I'm not sure this was a dupe, but I can see why it would seem like it.
When I reported this, plenty of directories...
Alexandre Oliva
06:00 AM CephFS Bug #1435: mds: loss of layout policies upon mds restart
I've been looking at the MDS implementation, and I have a theory now.
It was probably not the MDS restarts that we...
Alexandre Oliva

01/20/2012

08:56 PM Revision aea6a305 (ceph): rgw: read_user_buckets() fix redone
The problem with the original fix is that it wasn't atomic. Going back
to the original inefficient (though atomic) me...
Yehuda Sadeh
06:55 PM Revision fdaf91e2 (ceph): osd: implement --dump-journal
Dump the contents of the journal to stdout in text form. Useful for
debugging.
Signed-off-by: Sage Weil <sage.weil@...
Sage Weil
06:50 PM Revision a52762ac (ceph): rgw: read large bucket directory correctly
Issue #1955. When there wre too many buckets, we failed reading
the bucket directory.
Signed-off-by: Yehuda Sadeh <y...
Yehuda Sadeh
03:48 PM Bug #1958 (Resolved): osd: crash during peering due to receiving an info msg in WaitActingChange
This happened during a teuthology run with thrashing and reads/writes/deletes.
Logs are in vit:~joshd/bug_1958
<p...
Josh Durgin
03:30 PM CephFS Bug #1957: ceph-fuse: have "." and ".." entries consistently
this is specifically me not know how to handle .. on the root directory with fuse. sshfs does it, though, so it's po... Sage Weil
02:56 PM CephFS Bug #1957 (Resolved): ceph-fuse: have "." and ".." entries consistently
I was cleaning old emails and found this: http://marc.info/?l=ceph-devel&m=130688351921306&w=2
Quick experiment sa...
Anonymous
12:58 PM rgw Bug #1955 (Resolved): rgw: cannot list user buckets when number of buckets is large
Issue at rgw code, not librados. read_user_buckets() was broken. Fixed with commit:aea6a305e61c1fa54828b71eff29070c3f... Yehuda Sadeh
09:29 AM rgw Bug #1955 (Resolved): rgw: cannot list user buckets when number of buckets is large
could be an issue with librados::tmap_get(). Yehuda Sadeh
11:10 AM Feature #1956 (Resolved): rgw: revisit atomic GET/PUT
Discuss the following different options to simplify the process:
1. Instead of writing tmp object and then clone i...
Yehuda Sadeh
11:02 AM Feature #1944 (Resolved): osd: dump journal
Sage Weil
12:54 AM Revision 802acb11 (ceph): rgw: refactor acls, separate protocol dependent code
Does not compile yet, part of swift acls work.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Yehuda Sadeh

01/19/2012

05:11 PM Revision 6c275c81 (ceph): rgw: fix warning
Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com> Yehuda Sadeh
01:36 PM Bug #1954 (Resolved): ceph tool: don't create output files when an error occurs
Doing something like 'ceph osd getmap 1000000 -o osdmap' results in a 0 length file if epoch 1000000 doesn't exist. Josh Durgin
11:15 AM Bug #1953 (Resolved): teuthology: core files aren't archived when using valgrind
When daemons crash while running under valgrind, the core file is being saved in the home dir instead of /tmp/cephtes... Josh Durgin
10:50 AM Bug #1952 (Resolved): rgw: test suite times out
We have seen a number of instances where the s3-tests test suite (running via jenkins) does not complete after 20 min... Yehuda Sadeh
10:01 AM rgw Feature #830: rgw: swift per-object ACLs
Decisions:
(a) buckets used only through S3 see strict S3 behavior
(b) buckets used only through Swift see strict...
Yehuda Sadeh
07:51 AM Linux kernel client Bug #1907: rbd: don't reuse device ids while they're still in use elsewhere
I can ask on fsdevel, but right now I feel the need to understand a
little better what's going in inside rbd in orde...
Alex Elder
04:41 AM Revision 3650fd61 (ceph): Merge remote branch 'gh/wip-op-data-mux'
Reviewed-by: Greg Farnum <greg.farnum@dreamhost.com>
Reviewed-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
Sage Weil
01:49 AM Revision 29885f3e (ceph): kernel: ignore connection problems while waiting for reboot
Josh Durgin

01/18/2012

11:52 PM Revision e016cca9 (ceph): Convert mount.ceph to use KEY_SPEC_PROCESS_KEYRING
having mount.ceph use KEY_SPEC_USER_KEYRING to pass keys to the kernel has
several disadvantages:
1) It leaves the k...
Neil Horman
09:16 PM Cleanup #1886 (Resolved): objecter/osd: mux/demux in MOSDOpReply encoding
Sage Weil
07:46 PM Revision f1f75dd4 (ceph): Merge branch 'wip-rgw-simplelog'
Yehuda Sadeh
07:37 PM Revision 8a9252f9 (ceph): rgw: adjust high level debug level
setting it to 2 instead of 1
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Yehuda Sadeh
07:25 PM Revision b890838c (ceph): Merge remote branch 'gh/wip-rgw-simplelog'
* gh/wip-rgw-simplelog:
rgw: add timestamp to high level log
rgw: log host_bucket, http status
rgw: simple requ...
Sage Weil
04:05 PM Bug #1849 (Duplicate): directories' timestamps in snapshots sometimes change when directory is mo...
I believe this is now a duplicate of #1946? Greg Farnum
03:41 PM Feature #1951 (Resolved): store history of configs and changes
During a discussion today it occurred to me that there will be situations under which we'll be a lot happier if we ca... Greg Farnum
02:57 PM rgw Feature #1950 (New): rgw: create S3/Swift ACL interoperability suite
We need that in order to test swift and s3 ACLs interactions. Yehuda Sadeh
11:49 AM rgw Feature #1882 (Resolved): rgw: high-level log entries for request state transitions
Done, merged at commit:f1f75dd4768e017c61088b44e7457bf96916d1a9. Log level 2 now dumps a plain request state with tim... Yehuda Sadeh
11:45 AM Bug #1949 (Resolved): osd: ENOTEMPTY on collection removal from snaptrimmer
... Sage Weil
11:27 AM Linux kernel client Bug #1907: rbd: don't reuse device ids while they're still in use elsewhere
my gut feeling is also that 'echo > /sys/bus/rbd/remove' should return EBUSY (along with rbd unmap). if you can't te... Sage Weil
10:40 AM Linux kernel client Bug #1907: rbd: don't reuse device ids while they're still in use elsewhere
Alex Elder wrote:
> From the linked message:
> > root <at> cephnode3:/# rbd unmap /dev/rbd0
> >
> < -> works wit...
Josh Durgin
09:51 AM Linux kernel client Bug #1907: rbd: don't reuse device ids while they're still in use elsewhere
From the linked message:
> root <at> cephnode3:/# rbd unmap /dev/rbd0
>
< -> works without any error message (sho...
Alex Elder
07:42 AM Revision 148031b7 (ceph): rgw: fix intent log processing
Intent log processing was completely broken. First, it wasn't
parsing the date correctly (due to failure to initalize...
Yehuda Sadeh
07:40 AM Revision 731c8832 (ceph): rgw: initialize tm before calling strptime
strptime assumes tm is already initialized.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Yehuda Sadeh
05:59 AM Revision 0aab0890 (ceph): objecter: some helpful multiop result debug output
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:57 AM Linux kernel client Bug #1795 (Resolved): break d_lock > s_cap_lock ordering
I have verified under UML that no new problems arise with the
fix in place. I have not verified the lockdep warning...
Alex Elder
05:32 AM Revision 6f35e322 (ceph): objecter: make getxattrs set rval on decode error
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:31 AM Revision f441adfd (ceph): objecter: add stat ops to op vector!
They work better that way.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:10 AM Revision 1d5c8fd3 (ceph): objecter: gift reply data to outbl _after_ demuxing
Divvy up the result bl first, then gift the whole shebang to outbl. If
we gift it first, there's nothing to demux (s...
Sage Weil
01:33 AM Revision 2bffed3b (ceph): Merge remote branch 'gh/master' into wip-op-data-mux
Sage Weil
01:33 AM Revision 905e8d80 (ceph): osd: make in/outdata split/merge helpers static OSDOp methods
Avoid defining new global functions.
Also add basic doxygen descriptions.
Signed-off-by: Sage Weil <sage.weil@dream...
Sage Weil

01/17/2012

11:10 PM Revision 1a7c8b49 (ceph): rgw: log_show_next() fix reading of the next buffer
Bug #1939. Failed reading large logs.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Yehuda Sadeh
11:08 PM Revision 5bb9a9d6 (ceph): Add small cluster thrashing tasks
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
11:05 PM Revision 019a0d4c (ceph): Merge branch 'master' of ssh://github.com/NewDreamNetwork/ceph
Yehuda Sadeh
10:23 PM Revision 956a4b43 (ceph): Merge remote branch 'gh/wip-backfill'
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Conflicts:
src/ceph_mds.cc
src/ceph_osd.cc
Sage Weil
10:21 PM Revision 06e7562f (ceph): filestore: overwrite fsid during --mkfs
This mainly matters because read_fsid() now looks at the file size to
determine if it's an old- or new-style fsid, an...
Sage Weil
09:42 PM Revision 4c6c4430 (ceph): rgw: reset timestamp when processing starts
otherwise we'd count also the time waiting for the request.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Yehuda Sadeh
09:00 PM Revision bd8e32d9 (ceph): doc: update control file for setting pg num on pool create
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
09:00 PM Revision a85ea475 (ceph): hadoop: check for valid filehandler, before using in next calls
In case of nonexistent file, calling Client::replication()
triggers assert.
Signed-off-by: Andrey Stepachev <octo@ya...
Andrey Stepachev
09:00 PM Revision 127bbd17 (ceph): hadoop: fix unix timestamp calculation in hadoop lib
Hadoop always see wrong dates due of wrong timestamp calculation. Properly
convert nanoseconds to millis when adding....
Andrey Stepachev
07:43 PM Revision 94f55f48 (ceph): TestRados: fix {min,max}_stride_size initialization
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:54 PM Revision 0c2f2b76 (ceph): Merge branch 'master' of ssh://ceph.newdream.net/git/ceph
Yehuda Sadeh
06:51 PM Revision 79d19320 (ceph): osd: fix bind error checks
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:44 PM Revision 7804046d (ceph): Makefile: fix testkeys non-tcmalloc linkage
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:56 PM Revision 241cbebe (ceph): rgw: add timestamp to high level log
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Yehuda Sadeh
05:56 PM Revision db65295b (ceph): rgw: log host_bucket, http status
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Yehuda Sadeh
05:56 PM Revision 0e8b12cd (ceph): rgw: simple request logging
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Yehuda Sadeh
05:36 PM Revision 63b94b6f (ceph): mds: abort startup if we fail to bind
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:36 PM Revision 4f70acfa (ceph): osd: abort on startup if we fail to bind to a port
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:24 PM Revision 45e4c924 (ceph): thrashosds: maxdead default to 0
This avoids any possibility of blocking peering. Sage Weil
04:21 PM Revision 47db4d04 (ceph): ceph: fix "run_uml.sh" script
Last-minute cleverness prior to checkin broke the "run-uml.sh" script.
Rearange where a few definitions are done to m...
Alex Elder
03:54 PM rgw Bug #1948 (Resolved): rgw: need to read intent log in chunks
Yehuda Sadeh
03:14 PM rgw Bug #1939 (Resolved): rgw: error processing large logs
Fixed, commit:1a7c8b49f099268ee468877f7f1f7ad747995547. Yehuda Sadeh
02:58 PM Feature #1658 (Resolved): osd: backfill instead of backlog
merged in commit:956a4b439759e46424fde3551971cd66b6d682e6 Sage Weil
02:18 PM Messengers Bug #1942: msgr: Address family not supported by protocol
commit:dcceb8e835cbf40173c334de18bd68c2cf7f3716 add the osd_fsid to the OSDSuperblock message and reved the version. ... Sage Weil
11:27 AM Messengers Bug #1942: msgr: Address family not supported by protocol
we now report a connection fault instead of asserting. and during initialization we check for bind() errors. afaics... Sage Weil
11:47 AM CephFS Bug #1947 (Resolved): mds: SIGBUS during _mark_dirty
This happened on umount after ffsb with the kernel client.
From teuthology:~teuthworker/archive/nightly_coverage_201...
Josh Durgin
11:40 AM Bug #1936: teuthology: github downtime -> failed runs
Is this going to be okay if the update hangs in the middle and then qa clones the repository? Greg Farnum
11:25 AM Bug #1936: teuthology: github downtime -> failed runs
I set up github mirrors for ceph.git, ceph-qa-chef.git, and s3-tests.git. They are at ceph.newdream.net/git, and upd... Sage Weil
11:34 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
Happened again in teuthology:~teuthworker/archive/nightly_coverage_2012-01-15-b/7721/remote/ubuntu@sepia6.ceph.dreamh... Josh Durgin
11:28 AM Bug #1943 (Need More Info): osd: bad clone transaction on journal replay
Sage Weil
11:23 AM CephFS Bug #1946: snapshot inherits timestamp/size/etc from modified trunk dir upon mds restart
It looks to me like the problem is that CInode::old_inodes isn't included in EMetaBlob::fullbit. CInode::pick_old_in... Sage Weil
11:18 AM CephFS Bug #1946 (Resolved): snapshot inherits timestamp/size/etc from modified trunk dir upon mds restart
mkdir .snap/name
ls -ld . .snap/name
# both have the same timestamp
touch .
ls -ld . .snap/name
# now . has a di...
Alexandre Oliva
09:15 AM Bug #1638: Can't create object with large xattrs in a single operation (on extN)
So now there's an assert on the ENOSPC. I triggered t by running s3tests under teuthology. Adding this here so we kno... Anonymous
09:11 AM CephFS Bug #1945 (Can't reproduce): blogbench hang on caps
... Sage Weil
12:54 AM Revision 549b7806 (ceph): TestRados: implement max_seconds, reimplement argument parsing
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:53 AM Revision bf22a4fb (ceph): task/rados: use new usage for radosmodel tool
Sage Weil
12:22 AM Revision 20f3f686 (ceph): RadosModel: prefix line with m_op
So we can guage progress...
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:16 AM Revision 7b2fd45b (ceph): mds: fix uninitialized value in MClientLease::h
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil

01/16/2012

11:09 PM Revision b2c07d8a (ceph): add simple thrash workload to regression suite
Sage Weil
11:05 PM Revision 8fc60869 (ceph): thrashosds: make actions less nonsensical
Make marking OSD up/down and in/out totally orthogonal.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
11:05 PM Revision 71390f97 (ceph): thrashosds: fix action selection
I'm not sure what the old code was trying to do, but I'm pretty sure it
wasn't doing it correctly.. a .1 chance_down ...
Sage Weil
10:38 PM Revision ec6e57c9 (ceph): Merge remote branch 'gh/master' into wip-op-data-mux
Sage Weil
10:06 PM Revision b5f8de7b (ceph): msgr: move operator<< for sockaddr_storage to msg_types.cc
tcp.{cc,h} aren't built/linked cleanly.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:26 PM Revision e93999f9 (ceph): qa/workunits/rados/load-gen-mix.sh
10k objects, not 100k!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:25 PM Revision ba83e8c6 (ceph): qa: rados load-gen: use rbd pool
No replay interval. Sage Weil
09:18 PM Revision 9419f583 (ceph): ls: include duration, less noise
Sage Weil
09:18 PM Revision c5bbfffa (ceph): hammer.sh: new -nuke syntax
Sage Weil
08:39 PM Revision 8fb115fe (ceph): include run duration in summary.yaml
Sage Weil
07:08 PM Revision 8e126db1 (ceph): mon.0 -> mon.a
Sage Weil
07:08 PM Revision 43da161d (ceph): mds.0 -> mds.a
Sage Weil
06:47 PM Revision 7b47e49f (ceph): ls: fix extraneous newline
Sage Weil
06:40 PM Revision b7a11026 (ceph): rados: load-gen: wake up on reply
So we can send requests more than once per second.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:40 PM Revision 51e402e3 (ceph): rados: fix load-gen 'max-ops'
This was mixed up with min/max_op_len. And max_ops wasn't being used
the initial object creation stage, flooding the...
Sage Weil
06:19 PM Revision 7d3b2c41 (ceph): librados: allow ObjectReadOperation::stat() to get time_t mtime
We can't use the internal utime_t type here.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
06:16 PM Revision ecd6dec6 (ceph): Merge remote branch 'gh/master' into wip-op-data-mux
Conflicts:
src/librados.cc
src/objclass/class_api.cc
src/rgw/rgw_rados.cc
Sage Weil
05:55 PM Revision b58f9560 (ceph): ceph: ignore all leaks
unless/until we figure out where the DefinitelyLost records are coming
from.. at first glance they look bogus.
Sage Weil
05:46 PM Revision 706b6910 (ceph): osd: recover_primary_got() -> recover_got()
This is called on primary and replicas alike.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:34 PM Revision a4e2395f (ceph): osd: clear missing set on replica when restarting backfill
The primary does the same in PG::activate().
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:22 PM Revision 40fb86ff (ceph): ceph: take single arg or list for valgrind args
Sage Weil
06:54 AM Revision c88ec571 (ceph): combined mon, osd, mds starter functions
Sage Weil
06:53 AM Revision f8ec23e7 (ceph): rbd: default to all:
Sage Weil
06:52 AM Revision f7952614 (ceph): lost_unfound: make test work with backfill
If we backfill, we fail to peer instead of having every object show up as
'unfound'. Avoid that by preventing log tr...
Sage Weil
06:52 AM Revision f70b158c (ceph): show host -> roles mapping on startup
Less guessing when manually inspecting an in-progress or hung run. Sage Weil
06:52 AM Revision fbfa94bb (ceph): teuthology-ls: show pid, last line of output for running jobs
Sage Weil
06:52 AM Revision 72057a9c (ceph): use local mirrors for (most) github urls
A cronjob on ceph.newdream.net updates these every 15 minutes. Sigh. Sage Weil
06:52 AM Revision 709d9441 (ceph): use local mirrors for (most) github urls
A cronjob on ceph.newdream.net updates these every 15 minutes. Sigh. Sage Weil
05:56 AM Revision a4642946 (ceph): msgr: don't assert on socket(2) failure
This can happen if we're connecting to an invalid address. Generate an
error message instead of crashing.
See #1942...
Sage Weil

01/15/2012

10:59 PM Feature #390 (Resolved): Implement bdrv_snapshot_goto (Rollback), bdrv_snapshot_delete
Sage Weil
10:57 PM Feature #1781 (Resolved): qa: readwrite and roundtrip rgw tests in qa suite
Sage Weil
10:56 PM rgw Feature #1911 (Closed): rgw: plan handling for large and/or manifest objects, s3 and/or swift
the plan is to do manifest objects for large s3 objects. that means the pieces won't get a locator and will be distr... Sage Weil
10:28 PM Messengers Bug #1942: msgr: Address family not supported by protocol
Still not sure how the bad address made it into the map (or OSDBoot) message, but at least it won't crash now as of c... Sage Weil
05:16 AM Revision a6c06103 (ceph): msgr: uninline operator<< on sockaddr_storage
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil

01/14/2012

10:01 PM Bug #1928 (Resolved): osd: scrub stat mismatch after fsstress on kernel client
Sage Weil
09:51 PM Feature #1944: osd: dump journal
wip-osd-dump-journal Sage Weil
06:47 PM Feature #1944 (Resolved): osd: dump journal
dump text summary of journal contents. this would help debug #1943 Sage Weil
06:46 PM Bug #1943 (Duplicate): osd: bad clone transaction on journal replay
Martin got this with v0.40:... Sage Weil
02:29 PM Messengers Bug #1942 (Won't Fix): msgr: Address family not supported by protocol
http://joshp.no-ip.com:8080/20120114-osd-family-error.log.bz2 Sage Weil
11:01 AM rgw Feature #1941 (Rejected): rgw: revisit bucket removal
We can try to look again at the steps were doing wen removing buckets, exploring ways to reduce osd operations that c... Yehuda Sadeh
01:13 AM Revision 6b02f9fa (ceph): osd: rev osd internal cluster protocol
Prevent backfill code from talking to pre-backfill code.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil

01/13/2012

11:57 PM Revision 8d271f43 (ceph): Merge branch 'stable'
Sage Weil
11:08 PM Revision 7f123de8 (ceph): mds: require OSDREPLYMUX feature bit
We use ObjectOperations now and need a new server to decompose replies
into their constituent components.
Signed-off...
Sage Weil
11:07 PM Revision 012a9855 (ceph): librados: require OSDREPLYMUX feature
We need this since we now rely on the server telling us rvals and
payload_lens for each OSDOp.
Signed-off-by: Sage W...
Sage Weil
11:07 PM Revision 436f8cac (ceph): define new OSDREPLYMUX feature bit
This corresponds to the OSDs ability to pass payload_len hints and
return values for each OSDOp in the MSDOOpReply me...
Sage Weil
10:55 PM Linux kernel client Bug #1940: locking cycle in ceph_osdc_start_request
this was causing teuthology runs to fail.
patch in master, testing!
Sage Weil
10:28 PM Linux kernel client Bug #1940 (Resolved): locking cycle in ceph_osdc_start_request
... Sage Weil
10:50 PM Revision 9d0476c5 (ceph): objecter: fix add_*() calls to use proper helper
The helper resizes the other vectors; need that everywhere.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
09:04 PM Revision 42a6cefe (ceph): ReplicatedPG: munge truncate_seq 1/truncate_size -1 to seq 0/size 0
Truncate with seq 1 and size -1 is a noop.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Sage ...
Samuel Just
08:34 PM Revision 0ded7e4d (ceph): ReplicatedPG: munge truncate_seq 1/truncate_size -1 to seq 0/size 0
Truncate with seq 1 and size -1 is a noop.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Sage ...
Samuel Just
08:19 PM Revision 44cb0763 (ceph): rgw: limit object PUT size
Yehuda Sadeh
07:26 PM Revision 3bfa41cf (ceph): Use yaml.safe_dump so unicode doesn't mess up the yaml files.
In general, yaml.dump is comparable to pickle, and my personal
coding standard says *never* use it. yaml.safe_dump is...
Tommi Virtanen
07:14 PM Linux kernel client Bug #1795: break d_lock > s_cap_lock ordering
I was hitting some odd behavior while testing. Will try again over
the weekend or early next week.
Also, a note:...
Alex Elder
02:28 PM Linux kernel client Bug #1795: break d_lock > s_cap_lock ordering
OK, after several iterations and some discussion we
concluded the last two patches (turning these things
into atomi...
Alex Elder
11:54 AM Linux kernel client Bug #1795: break d_lock > s_cap_lock ordering
I have posted a series of four proposed patches to the list to address
this, along with a few other issues identifie...
Alex Elder
05:06 PM Revision d5753374 (ceph): objecter: fix up stat, getxattrs handlers
- try/catch
- stat mtime
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:36 PM Revision 7eea40ea (ceph): v0.40
Sage Weil
04:35 PM Revision 224a65a7 (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
03:08 PM rgw Bug #1939 (Resolved): rgw: error processing large logs
Yehuda Sadeh
02:06 PM Bug #1928: osd: scrub stat mismatch after fsstress on kernel client
Looks like the fixes for this introduced a new bug - 3 runs so far today failed with a similar scrub stat mismatch:
...
Josh Durgin
10:49 AM Bug #1928 (Closed): osd: scrub stat mismatch after fsstress on kernel client
4815cafddf46e968501ac3b96e593c5e8db6218b Samuel Just
11:40 AM Bug #1935: teuthology: readwrite/roundtrip jobs run manually, but not in suite
teuthworker is updated. Josh Durgin
11:31 AM Bug #1935 (Resolved): teuthology: readwrite/roundtrip jobs run manually, but not in suite
Alright, I was reading the wrong file. s3readwrite.py and s3roundtrip.py do use the yaml format. The above cleanups w... Anonymous
10:04 AM Bug #1935: teuthology: readwrite/roundtrip jobs run manually, but not in suite
I'm wrong, ignore me for a while. Anonymous
09:58 AM Bug #1935: teuthology: readwrite/roundtrip jobs run manually, but not in suite
More specific plan for change:
- change s3tests/functional/__init__.py to read env var S3TEST_YAML, raise if it is...
Anonymous
09:50 AM Bug #1935: teuthology: readwrite/roundtrip jobs run manually, but not in suite
s3tests/functional (run via nosetests) reads a .ini -style configuration.
This was not flexible enough for all the...
Anonymous
11:36 AM CephFS Bug #1938 (Resolved): mds: snaptest-2 doesn't pass with 3 MDS system
run vstart, mount ceph-fuse, run snaptest-2; mds.a crashes:... Greg Farnum
09:24 AM Cleanup #1886: objecter/osd: mux/demux in MOSDOpReply encoding
Sage Weil
08:52 AM Feature #1937 (Resolved): teuthology: --unlock option for -nuke
this'd make cleanup slightly less painful (no need to check back on old terminal to unlock nuked nodes) Sage Weil
08:38 AM CephFS Bug #1682: mds: segfault in CInode::authority
happened again on /var/lib/teuthworker/archive/nightly_coverage_2012-01-13-a/7335... Sage Weil
01:50 AM Revision 81c0ad82 (ceph): librados: make new ObjectReadOperations arguments non-optional
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
01:50 AM Revision 7347538d (ceph): rgw: use new librados ObjectReadOperation method arguments
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
01:47 AM Revision 4815cafd (ceph): ReplicatedPG: Update stat accounting for truncate during write
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
Samuel Just
12:39 AM Revision 8ceb3883 (ceph): rgw: wrap cls_cxx_map_* with try/catch around decoding
Yehuda Sadeh
12:23 AM Revision 876d829a (ceph): librados: add ObjectOperation::exec
Yehuda Sadeh
12:23 AM Revision 05d8ecbe (ceph): rgw: bucket index creation and init in a single operation
Yehuda Sadeh
12:14 AM Revision dc628a5b (ceph): secret: move null check before strlen(key_name) deref
Coverity cid: 98
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:10 AM Revision d41ddcdf (ceph): osd: stat op, don't compare in memory state to object
might be that object is being created by the current compound request. Yehuda Sadeh

01/12/2012

11:44 PM Revision 4f4b79cc (ceph): osd: include return code in OSDOp
This will expose the per-operation return values to the caller.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
11:44 PM Revision a8558284 (ceph): osd: mux/demux OSDOp::outdata in MOSDOpReply
Bump encoding, so that we don't try to demux old encoded messages, which
will likely have OSDOp::payload_len == indat...
Sage Weil
11:44 PM Revision ff55d2f3 (ceph): osd: put result data in OSDOp.outdata
The removes an argument from do_osd_ops() and cleans up the surrounding
code a bit.
Signed-off-by: Sage Weil <sage@n...
Sage Weil
11:44 PM Revision fe077832 (ceph): objecter: specify read return values pointers in ObjectOperatio methods
This let's Objecter do the demuxing work for compount read operations.
Signed-off-by: Sage Weil <sage.weil@dreamhost...
Sage Weil
11:44 PM Revision 920bd568 (ceph): librados: specify read return value pointers in ObjectReadOperation met...
This lets librados do the work of parsing the reply from compound
operations, instead of requiring callers to have kn...
Sage Weil
11:09 PM Revision f42c658d (ceph): osd: fill in empty item in peer_missing for strays
If we search_for_missing() on a host, make a corresponding entry in our
peer_missing map (if it isn't already there)....
Sage Weil
11:06 PM Bug #1936 (Resolved): teuthology: github downtime -> failed runs
we should use a github mirror for anything on github. maybe we can/should piggyback off whatever carl set up. Sage Weil
11:02 PM Revision 10b00316 (ceph): rgw: don't crash when copying a zero sized object
Yehuda Sadeh
10:48 PM Revision 0da44591 (ceph): nuke: take config files from -t argument
teuthology-lock and teuthology-updatekeys both use -t for this already Josh Durgin
09:21 PM Revision 80f57f96 (ceph): ReplicatedPG: fix stat accounting error in CEPH_OSD_OP_WRITEFULL
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:21 PM Revision 845aa534 (ceph): ReplicatedPG: Do a write even for 0 length operation
Otherwise, a 0 length write to an offset past the end of the file will
cause the internal accounting to reflect the f...
Samuel Just
09:02 PM Revision 96e89d30 (ceph): kernel: loop reconnecting in case we race with shutdown
Previously, if we reconnected before shutdown completed we asserted
that the kernel did not boot into the new version...
Josh Durgin
08:59 PM Revision cfa39bfb (ceph): qa/client/gen-1774.sh
Capture Alexandre's script for reproducing #1774 here for posterity, until
we write a properly harnessed test for thi...
Sage Weil
07:46 PM Revision 6cf77532 (ceph): osd: fix PG::Log::copy_up_to() tail
The tail needs to refer to the entry preceeding the first entry in the
log. This updates copy_up_to() to match the b...
Sage Weil
07:07 PM Revision 805513be (ceph): osd: reset last_complete on backfill restart
Since last_backfill is hobject_t(), we can set this equal to last_update.
This fixes a problem where last_complete pr...
Sage Weil
06:38 PM Revision 1e56367e (ceph): client: avoid taking inode ref in case of nonexistent dir
Signed-off-by: Andrey Stepachev <octo@yandex-team.ru>
Signed-off-by: Sage Weil <sage@newdream.net>
Andrey Stepachev
06:35 PM Revision cedd92be (ceph): Merge branch 'wip-makefile'
Sage Weil
06:03 PM Revision 71131371 (ceph): COPYING: note licenses for all files, not just the default
This (mostly) copies debian/copyright for now, but there are format
restrictions for that file. Suggestions for a cl...
Sage Weil
06:03 PM Revision 54e0dfc1 (ceph): debian/copyright: note acx_pthread.m4 license
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
05:42 PM Bug #1935 (Resolved): teuthology: readwrite/roundtrip jobs run manually, but not in suite
I'm not sure what's going on here.. roundtrip and readwrite are failing when scheduled, but when i copy/paste the sam... Sage Weil
05:17 PM Revision 6b55b6b3 (ceph): Makefile: Add headers that were omitted in make dist and prevented test...
Signed-off-by: Kacper Kowalik (Xarthisius) <xarthisius@gentoo.org> Kacper Kowalik (Xarthisius)
05:17 PM Revision c9e028f4 (ceph): Makefile: Handle corner case of crypto++ correctly
i.e. use c++ while compiling, append to CRYPTO_LIBS instead of LIBS
Signed-off-by: Kacper Kowalik (Xarthisius) <xart...
Kacper Kowalik (Xarthisius)
05:17 PM Revision c5144eed (ceph): Makefile: Use ACX_PTHREAD in configure.ac and resulting flags in src/Ma...
instead of hardcoded flags
Signed-off-by: Kacper Kowalik (Xarthisius) <xarthisius@gentoo.org>
Kacper Kowalik (Xarthisius)
05:17 PM Revision 7bf01b11 (ceph): Makefile: Add recent acx_pthread.m4 that has a fix for nostdlib issue.
See http://code.google.com/p/protobuf/issues/detail?id=188 for details
Signed-off-by: Kacper Kowalik (Xarthisius) <x...
Kacper Kowalik (Xarthisius)
04:58 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Ok, I reproduced this with osd debugging, but not mds unfortunately. The logs are at slider:/home/samuelj/archived_l... Samuel Just
04:45 PM Bug #1930 (Resolved): objclass: need to wrap cls_cxx_map_* with try/catch, protect against bad de...
Fixed, commit:8ceb388396d02daefe53e3bdb68c08b4855ceaf7. Yehuda Sadeh
09:48 AM Bug #1930 (Resolved): objclass: need to wrap cls_cxx_map_* with try/catch, protect against bad de...
Yehuda Sadeh
04:26 PM Bug #1931 (Resolved): rgw: bucket index should create and init index atomically
Fixed, commit:05d8ecbe1cea8d7c7bd33f9b53f4cbf06b2c4e61.
Yehuda Sadeh
09:56 AM Bug #1931 (Resolved): rgw: bucket index should create and init index atomically
Yehuda Sadeh
04:25 PM Linux kernel client Bug #1795 (In Progress): break d_lock > s_cap_lock ordering
Discussed this with Sage. The problem arose because dentry_lease_is_valid()
is using the MDS session's s_cap_lock f...
Alex Elder
04:13 PM Feature #1934 (Closed): Get new Sepia machines into service
Anonymous
03:47 PM rgw Bug #1933 (Resolved): rgw: crash in swift copy
Yehuda Sadeh
03:08 PM rgw Bug #1933: rgw: crash in swift copy
copy of zero sized objects indeed. Affects both S3 and swift.
Fixed, at commit:10b00316b7778f6aecbf46ec0aea2aca8b8...
Yehuda Sadeh
01:36 PM rgw Bug #1933: rgw: crash in swift copy
Might be a copy of a zero sized object. Yehuda Sadeh
01:24 PM rgw Bug #1933 (Resolved): rgw: crash in swift copy
... Yehuda Sadeh
03:38 PM Bug #1924 (Resolved): teuthology: installing kernels can fail due to reconnecting too soon
Fixed by 96e89d30ec5f912f3c1b4844328e0966a2266e05 in teuthology.git. Josh Durgin
03:24 PM Bug #1909: Two mons crash after starting the third one
I have reinstalled ceph mon like I wrote but it has a different IP address now. Even though I have changed DNS record... Maciej Galkiewicz
01:13 PM Bug #1928: osd: scrub stat mismatch after fsstress on kernel client
Samuel Just wrote:
> It seems that fstress will do that: 2012-01-11T14:30:04.867 INFO:teuthology.task.workunit.clien...
Sage Weil
01:07 PM Bug #1928: osd: scrub stat mismatch after fsstress on kernel client
It seems that fstress will do that: 2012-01-11T14:30:04.867 INFO:teuthology.task.workunit.client.0.out:8/17: dwrite f... Samuel Just
12:58 PM Bug #1928: osd: scrub stat mismatch after fsstress on kernel client
One possibility: in CEPH_OSD_OP_WRITE in ReplicatedPG::do_op we pass op.extent.offset and op.extent.length to write_u... Samuel Just
11:07 AM Linux kernel client Feature #1922 (Resolved): rbd: annotate for lockdep
The fix was to initialize the semaphore in rbd_add().
I have verified that this eliminates the lockdep warning.
...
Alex Elder
10:56 AM Bug #1898 (Duplicate): very long scrub blocked write operation
Even though this is not the same complaint as 1783, we plan on
fixing it with the same changes, so I am calling this...
Anonymous
10:30 AM Feature #1932 (Resolved): mon: before accepting a new crushmap, monitor should validate and test ...
Samuel Just
01:04 AM Cleanup #1899: use acx_pthread instead of hardcoding libs and cflags into build system
Sage Weil wrote:
> Looks good.. can I add your Signed-off-by to this?
Sure
Anonymous

01/11/2012

09:50 PM Revision b93bf285 (ceph): PG: gen_prefix should grab a map reference atomically
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:46 PM Feature #1929 (Resolved): teuthology: log runtime
- include run time in summary.yaml
- include run time in teuthology-ls output
this will help a bit in identifying...
Sage Weil
09:37 PM Revision 38b9b503 (ceph): rgw-admin: add pool rm and pools list
Yehuda Sadeh
09:05 PM Revision e2c02543 (ceph): rgw-admin: clean up unused commands
Yehuda Sadeh
09:04 PM Revision ac1e105e (ceph): osd: bound log we send when restarting backfill
Use the new tunable from b1da5115aa0756aefa4f0aad36395911e82fce28.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:54 PM Revision 6dae2f8a (ceph): thrasher: adjust min_dead default
Make this 1, not 2. That's a bit more friendly. It doesn't strictly
matter, tho, since we revive osds before waitin...
Sage Weil
08:54 PM Revision 3c0346b4 (ceph): lost_unfound: typo
Sage Weil
08:54 PM Revision 59369237 (ceph): thrasher: don't mark down osds out; tell monitor same
Stopping ceph-osd doesn't make it out (immediately). Prevent monitor
from doing this after a delay too so we can kee...
Sage Weil
08:54 PM Revision 50463ffd (ceph): verify all osds start before checking health
Just checking health isn't good enough, since it races with OSD startup:
we can have a healthy cluster with 0 (or som...
Sage Weil
08:54 PM Revision fb74b901 (ceph): thrasher: add max_dead
Add max_dead, and revive osds prior to waiting for clean. Otherwise we
can leave too many OSDs down and the cluster ...
Sage Weil
08:22 PM Revision 79085ad0 (ceph): rados.py: avoid getting return value of void function
rados_ioctx_locator_set_key is void. The return value seems to have
been uninitialized, so the tests failed rarely.
...
Josh Durgin
08:12 PM Linux kernel client Feature #1922: rbd: annotate for lockdep
I believe the problem is that when a new rbd device structure gets
initialized in rbd_add(), the rw_semaphore contai...
Alex Elder
07:13 PM Revision 85552cf8 (ceph): pg: remove unnecessary guard from calc_trim_to()
The num_objects check doesn't make sense, and could only make trimming
happen more often than it should. Sage did not...
Josh Durgin
07:13 PM Revision b1da5115 (ceph): pg: add a configurable lower bound on log size
This helps prevent problems with retrying requests being detected as
duplicates. The problem occurs when the log is t...
Josh Durgin
06:34 PM Revision 8a9dbc47 (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
05:29 PM Bug #1928 (Resolved): osd: scrub stat mismatch after fsstress on kernel client
... Josh Durgin
04:24 PM CephFS Bug #1774: client: files become inaccessible in large directories (with snapshots?)
This script (properly adjusted to actually mount and remount the ceph-fuse tree) should be enough to trigger the bug. Alexandre Oliva
02:41 PM Revision 734737f3 (ceph): osd: limit size of log sent to reset backfill targets
Need to replace magic number with new tunable, once that is merged.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
01:47 PM Bug #1925 (Closed): osd: segfault during _scan_list
b93bf285c9f05ab943e8e506ea2125af0f1f97ad should fix it. Samuel Just
06:58 AM Bug #1925 (Closed): osd: segfault during _scan_list
... Sage Weil
01:39 PM rgw Feature #1927 (Resolved): rgw: add radosgw-admin pool list
Fixed, commit:38b9b5030747349acf657946133ef57736542310. Yehuda Sadeh
12:50 PM rgw Feature #1927 (Resolved): rgw: add radosgw-admin pool list
listing the active set of pools. Yehuda Sadeh
01:38 PM rgw Feature #1926 (Resolved): rgw: add radosgw-admin pool remove
Fixed, commit:38b9b5030747349acf657946133ef57736542310. Yehuda Sadeh
12:49 PM rgw Feature #1926 (Resolved): rgw: add radosgw-admin pool remove
Would allow removing pool from the active set of pools. Yehuda Sadeh
01:17 PM Cleanup #1899: use acx_pthread instead of hardcoding libs and cflags into build system
Looks good.. can I add your Signed-off-by to this? Sage Weil
04:25 AM Revision 8f9549f0 (ceph): client: start caching readdir results after readdir_start
Use upper_bound rather than lower_bound to compute the initial pd within
insert_trace, so that we don't attempt to re...
Alexandre Oliva
12:39 AM Revision 5d989608 (ceph): monclient: fix resolve_addrs() call
This was broken in def36668a13459d9c0851e4d4da440a288f9a34f it looks like.
Passing uninitialized memory to resolve_ad...
Sage Weil
12:35 AM Revision f09b21ef (ceph): resolve_addrs: return ipv4 and ipv6 addrs
Fixes: #1891
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:22 AM Revision 9e9b5c6f (ceph): ReplicatedPG: fix typo in stats accounting in _rollback_to
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
12:14 AM Revision d4815e5b (ceph): osd: send log with backfill restart
This makes backfill restart less of a special case: we send an info AND
log, just like we do normally. Code paths ar...
Sage Weil
12:07 AM Revision f4883ebf (ceph): ceph: let the user running ceph-osd remove subvolumes
This will prevent EPERM when using the SNAP_DESTROY ioctl,
so the filestore will use btrfs snaps.
Josh Durgin

01/10/2012

11:30 PM Revision 2317b9ae (ceph): add rgw readwrite and roundtrip tasks
Yehuda Sadeh
11:28 PM Revision d2fadf9f (ceph): syslog: ignore lockdep non-static key warning
It looks like this warning was made default in linux 3.2.
This will keep happening until #1922 is done.
Josh Durgin
09:23 PM Revision c7d92d1d (ceph): osd: fail to peer if interval lacks any !incomplete replicas
We need at least one non-incomplete replica during a rw interval in order
to peer. The backfilling/incomplete replic...
Sage Weil
09:00 PM CephFS Bug #1774 (Resolved): client: files become inaccessible in large directories (with snapshots?)
Yay, that's way better than a log. Merged.
I wanted to make a test case for this but wasn't able to easily reprod...
Sage Weil
07:26 PM Revision 3b81fa58 (ceph): mon: allow specifying pg_num and pgp_num when creating new pools.
Right now this is only exposed via the monitor command interface:
osd pool create <poolname> [pg_num [pgp_num]]
but i...
Greg Farnum
07:20 PM Revision b8916528 (ceph): librados: Make API docs use @note instead of @bug for now.
Asphyxiate doesn't yet support all of the Doxygen markup.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen
07:20 PM Revision 741f6d54 (ceph): Fix several doxygen warnings, to minimize noise. Only changes comments.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:20 PM Revision 69aface0 (ceph): auth: Fix Doxygen warnings.
Match prototype and implementation argument names and types
(textually, that is use std:: prefix).
Signed-off-by: To...
Tommi Virtanen
07:17 PM Revision 1f46e7c9 (ceph): FileStore: assert on ENOSPC even for SETXATTR
Otherwise we can get corrupt object attributes on ext*.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just
06:47 PM Revision 45897b52 (ceph): mds: remove beacon_killer code.
This no longer does *anything* except print out
useless warning messages.
Signed-off-by: Greg Farnum <gregory.farnum...
Greg Farnum
06:47 PM Revision ac9b2d09 (ceph): mds: initiate monitor reconnect if beacon acks take too long
If it takes 2*mds_beacon_grace (default 30 seconds total) seconds
to get an ack back, maybe it's the monitor and not ...
Greg Farnum
05:49 PM Revision ba00f95a (ceph): osd: make less noise when filestore is already up to date
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:17 PM Bug #1924 (Resolved): teuthology: installing kernels can fail due to reconnecting too soon
... Josh Durgin
04:48 PM Bug #1623 (Can't reproduce): ceph-osd fails to bind socket
Sage Weil
04:47 PM Bug #1891 (Resolved): monclient: try ipv6 if ipv4 fails
commit:f09b21eff689c2597c1f006b8a03086d63e1a138 Sage Weil
04:24 PM Bug #1905 (Resolved): osd: preserve pg log when resetting backfill
commit:d4815e5bd4727f499643f2bebe2715cc85faaa42 Sage Weil
04:20 PM Bug #1832 (Closed): osd: size tracking discrepancy (scrub stat mismatch)
9e9b5c6f15c6de1af3f804eb10129ffc5d1c9c90 should take care of the more recent occurrence. Samuel Just
04:14 PM Tasks #1923 (Resolved): document required properties and features for alternative backend file sy...
Yehuda Sadeh
03:32 PM Linux kernel client Feature #1922 (Resolved): rbd: annotate for lockdep
This happened when lockdep was turned on:... Josh Durgin
02:00 PM Feature #1879 (In Progress): osd: track list of in-progress requests, log slow ones
Greg Farnum
01:18 PM Feature #1884 (In Progress): plan encoding strategy to test+facilitate non-disruptive upgrades
Sage Weil
12:01 PM Feature #390: Implement bdrv_snapshot_goto (Rollback), bdrv_snapshot_delete
I sent this in patch form to the QEMU list; we'll have to wait on them now... Greg Farnum
11:45 AM Bug #1921 (Resolved): teuthology: silently continues when len(targets) != len(roles)
iirc this used to assert, but that probably was lost when the auto-locking was added? Sage Weil
11:43 AM Feature #1887 (Resolved): mon: ability to specify pg_num for new pools
commit:3b81fa5845c158120c66584f3f8d2868cf018e35 Greg Farnum
11:32 AM CephFS Bug #1737: ceph-fuse crash in xlist::remove
Happened again in teuthology:~teuthworker/archive/nightly_coverage_2012-01-09-b/6741/teuthology.log. Josh Durgin
11:16 AM CephFS Feature #1912: mds: should time out slow monitors
Errm, commit:ac9b2d092f1f075621f30f77cdbb49bdf35b9ae5 Greg Farnum
11:13 AM CephFS Feature #1912 (Resolved): mds: should time out slow monitors
This was actually very easy to do. Ripping out the utterly useless beacon_kill code was actually a larger diff! Greg Farnum
11:16 AM Bug #1898 (Won't Fix): very long scrub blocked write operation
We will fix this as part of the incremental scrubbing #1783 Samuel Just
09:51 AM Bug #1898: very long scrub blocked write operation
2012-01-06 16:22:52.45141 another case where osd.220 required 4 minutes to scan the 150000 object pg. osd.97 seems t... Samuel Just
11:15 AM Bug #1750 (Closed): xattr errors silently ignored, cause trouble later
We now assert on ENOSPC for setxattr (1f46e7c91534676f23eaaf8e9740d882928b5685). Samuel Just
09:45 AM Linux kernel client Feature #1870 (Resolved): usedcache mount option
Sage Weil
07:57 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Hi,
I've triggered this failure mode again with a later snapshot of master (commit: a1252463055e2d6816407bd6465e74...
David McBride
03:10 AM Revision ce53447d (ceph): doc: add librados pool creation defaults
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:10 AM Revision f99b3e04 (ceph): doc: describe some rados_pool_stat_t members
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:10 AM Revision 9490b36e (ceph): doc: add librados C aio example
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:10 AM Revision 1e8b8a8d (ceph): doc: standardize rados_tmap_* docs
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:10 AM Revision 547e9e34 (ceph): doc: @return -> @returns to match the sphinx output
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:10 AM Revision 0e19062c (ceph): doc: clarify librados return codes
Adding a second @returns for specific error codes makes the sphinx output more readable.
Signed-off-by: Josh Durgin ...
Josh Durgin
03:10 AM Revision dd31ff2e (ceph): doc: add short section on documenting code
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:10 AM Revision 590520c1 (ceph): doc: fix rados_version todo formatting
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:04 AM Revision b464b757 (ceph): doc: move rados_ioctx_get_id to the pool group
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:04 AM Revision d9d9e6d3 (ceph): doc: Put rados_ioctx_locator_set_key in a group so it can be cross-refe...
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:04 AM Revision 50c9cb16 (ceph): doc: add a prefix to group names in librados.h
doxygen groups are in a global namespace.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
03:03 AM Revision 43952a3b (ceph): doc: add librados C api docs
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:03 AM Revision b5759df8 (ceph): doc: add configuration and connecting to librados C api example
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
03:03 AM Revision aca3c621 (ceph): doxygen: Use first sentence as brief description.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
03:03 AM Revision 78cc07f6 (ceph): librados: Avoid using "crush_rule" as name of function argument.
"struct crush_rule" exists already, using the same identifier
confuses Doxygen.
Signed-off-by: Tommi Virtanen <tommi...
Tommi Virtanen
03:03 AM Revision c9606416 (ceph): doc: Switch doxygen integration from breathe to asphyxiate.
TODO: path of librados.h is now just the basename
TODO: no enum support for now
TODO: no @bug support for now
Sign...
Tommi Virtanen
03:03 AM Revision b148bef3 (ceph): doc: fix some typos in librados C API
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
02:13 AM Revision d3e34773 (ceph): ceph: add a new "run_uml.sh" script to manage running a UML client
This script is used to automate most of what's required to run a
User-Mode Linux (UML) instance. This is mainly of i...
Alex Elder

01/09/2012

07:59 PM CephFS Bug #1774: client: files become inaccessible in large directories (with snapshots?)
How about a patch instead of logs? :-)
It turned out that the problem occurred while caching the readdir responses...
Alexandre Oliva
07:40 PM Revision 162ac06f (ceph): rgw: adjust log level
Yehuda Sadeh
07:12 PM Documentation #1815 (Resolved): document librados C api
Josh Durgin
06:37 PM Revision a90f34cd (ceph): rgw: only use plain PUT processor when !chunked_upload
Yehuda Sadeh
06:37 PM Revision 1e0afb62 (ceph): rgw: some cleanup
Yehuda Sadeh
05:50 PM Bug #1898: very long scrub blocked write operation
It seems that osd.220 took 5 minutes between starting _scan_list and relocking the pg. Samuel Just
05:13 PM Revision 1ae42d65 (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
04:47 PM Subtask #1920 (Resolved): Update OSD to use ObjectStore tmap implentation (object recovery must c...
Samuel Just
04:46 PM Subtask #1919 (Resolved): implement the key value interface in terms of leveldb
Samuel Just
04:39 PM Subtask #1918 (Resolved): create mock key-value store and tests for the related object map implen...
Samuel Just
04:38 PM Subtask #1917 (Resolved): create interface for backing key-value store and create object map impl...
Samuel Just
04:37 PM Subtask #1916 (Resolved): add tests ObjectStore level tests for tmap operations
Samuel Just
04:30 PM Subtask #1915 (Resolved): Create trivial implementation for the object map interface (using curre...
Samuel Just
04:28 PM Subtask #1914 (Resolved): Create interface for object map implementation
Samuel Just
04:27 PM Subtask #1913 (Resolved): Add tmap operations to ObjectStore interface
Samuel Just
03:59 PM CephFS Feature #1912 (Resolved): mds: should time out slow monitors
Similar to #1841 for the OSDs, the MDS should reconnect to a different monitor if its beacons go too long without get... Greg Farnum
03:53 PM Bug #1909 (Need More Info): Two mons crash after starting the third one
can you generate a log with 'debug mon = 20' and 'debug ms = 1' for the existing monitors leading up to the crash? Sage Weil
11:32 AM Bug #1909 (Resolved): Two mons crash after starting the third one
I had three mons. One of them was reinstalled without removing it from the cluster. Now after starting reinstalled mo... Maciej Galkiewicz
03:40 PM Linux kernel client Bug #1871 (Resolved): ceph_client: crash after running xfstests 002
Sage Weil
03:38 PM rgw Feature #1911 (Closed): rgw: plan handling for large and/or manifest objects, s3 and/or swift
Sage Weil
03:27 PM rgw Bug #1859 (Resolved): rgw: bucket creation is not atomic
Sage Weil
03:15 PM CephFS Bug #1047 (In Progress): mds: crash on anchor table query
All right, looking through a log that Amon got me after I specified the right debug arguments. I see that there are t... Greg Farnum
02:57 PM Feature #1658 (In Progress): osd: backfill instead of backlog
Sage Weil
02:46 PM Bug #1903 (Resolved): osd/ReplicatedPG.cc: 3189: FAILED assert(obc->unconnected_watchers.size() =...
Sage Weil
11:31 AM rbd Feature #1908 (Resolved): rbd: test map/unmap more extensively
Include unmapping and remapping while reading and writing through an FS or with dd, and simply having an FS mounted. Josh Durgin
11:23 AM Linux kernel client Bug #1907 (Resolved): rbd: don't reuse device ids while they're still in use elsewhere
If an FS on top of rbd is mounted, and the rbd device is unmapped, and another one is mapped, the old sysfs entry is ... Josh Durgin
09:44 AM rgw Bug #1906 (Can't reproduce): rgw: total_time isn't logged consistently
Just looked at some rgw log, and it seems that the total_time field is usually zero, although there are a few cases w... Yehuda Sadeh
09:40 AM Bug #1896 (Rejected): osd: FAILED assert(is_up(osd)) from do_queries() during activate_map()
this was a bug in an early version of commit:a59ee8f91bc879beb614ac10fa2f9a4a284ebfb6 Sage Weil
01:21 AM Revision c23cc237 (ceph): osd: add OSDOp::outdata
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
01:15 AM Revision 6b389218 (ceph): osd: move OSDOp vector indata split/merge into helpers
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
01:13 AM Revision d8ebbf48 (ceph): osd: OSDOp::data -> indata
We'll soon add outdata...
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
12:23 AM Revision b17736a1 (ceph): osd: populate_obc_watchers when object pulled to primary
We don't care about degraded state, only whether the object is on the
primary so that we can load the object_info_t.
...
Sage Weil

01/08/2012

11:15 PM Revision a59ee8f9 (ceph): osd: handle case where no acceptable info exists
This happens when the only available replicas has last_backfill != MAX.
In that case, revert to up, and then set the...
Sage Weil
10:39 PM Revision b354ce4e (ceph): run: put pid in archive dir
This will make it easy for teuthology-ls to show you the running process's
pid (if it's still running). Or for other...
Sage Weil
06:15 PM Revision c5affd6c (ceph): Merge remote branch 'gh/wip-osd-retry-attempt'
Sage Weil
04:16 PM Revision 5abe49d6 (ceph): Merge remote branch 'gh/wip-admin-socket'
Sage Weil
03:27 PM Bug #1904 (Resolved): osd: calc_acting bad iterator deref
commit:a59ee8f91bc879beb614ac10fa2f9a4a284ebfb6 Sage Weil
08:39 AM Bug #1904 (Resolved): osd: calc_acting bad iterator deref
... Sage Weil
10:46 AM Bug #1905 (Resolved): osd: preserve pg log when resetting backfill
If we use the pg log to detect dups, we need to preserve some pg log history when we backfill. Sage Weil
10:45 AM Feature #1883 (Resolved): admin socket: string based protocol
Sage Weil
08:36 AM Bug #1896: osd: FAILED assert(is_up(osd)) from do_queries() during activate_map()
again,... Sage Weil

01/07/2012

06:16 PM Revision fbf79121 (ceph): do not put monitors on the same nodes as clients
Otherwise, for kernel clients (rbd or kclient), ceph-mon can cause a deadlock when it calls sync(2). Sage Weil
10:22 AM Bug #1903 (Resolved): osd/ReplicatedPG.cc: 3189: FAILED assert(obc->unconnected_watchers.size() =...
... Sage Weil
10:21 AM CephFS Bug #1902 (Won't Fix): mds: unittest_interval_tree bad memory access
after a segfault:... Sage Weil
07:25 AM Bug #1901 (Resolved): Missing files in ceph packages results in build failure of tests
make dist in src/Makefile.am is missing several header files that are used by tests, as result ceph fails to build. Anonymous
07:23 AM Bug #1900 (Resolved): Fix detection and build issues with libcrypto++
Currently ceph's build system uses AC_SEARCH_LIBS (in some cases) to search for libcrypto++. As a result C++ library ... Anonymous
07:18 AM Cleanup #1899 (Resolved): use acx_pthread instead of hardcoding libs and cflags into build system
Hardcoding -lphtread is not the most portable way of including threads support. Please use upstream macros instead.
...
Anonymous
04:54 AM Revision 92ca3ef7 (ceph): perfcounters: fix unittest for new admin_socket interface
Broken by b389685afa1be00b5147855bf71c50042bfbfa6c.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:39 AM Revision d8e54994 (ceph): Makefile: disable untitest_interval_tree
Segfaults. Valgrind errors. Accessing uninitialized memory.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:38 AM Revision bcf21467 (ceph): unittest_interval_tree: make it compile
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
01:21 AM Revision 13445d23 (ceph): ceph_manager: a booting osd is no longer automatically marked in
as of ceph.git commit 96b7b0d83e5fe70a4efb4e284e18b4b40840bfec Sage Weil
01:18 AM Revision a774d500 (ceph): osd: clean up src_oid, src_obc map key calculation
Be consistent about how we generate the src_oid and src_oloc, so that we
feed good value into find_object_context and...
Sage Weil
12:55 AM Revision 3c60e804 (ceph): osd: read op should claim_append data instead of claim
Yehuda Sadeh
12:53 AM Revision 0d175cd6 (ceph): rgw: remove object before writing both xattrs and data
otherwise we'll leak xattrs from previous incarnation Yehuda Sadeh
12:53 AM Revision 469f3eb4 (ceph): rgw: create plain processor for small objects
Yehuda Sadeh
12:53 AM Revision 75acc0a3 (ceph): rgw: fix multipart PUT
latest revamp broke it, missed calling RGWPutObjProcessor::prepare(s)
where needed.
Yehuda Sadeh
12:53 AM Revision 26b54ae5 (ceph): rgw: rearrange PutObj::execute()
groundwork for different handling of small object PUTs Yehuda Sadeh
12:53 AM Revision a0b55397 (ceph): rgw: different atomic handling for small objects
Yehuda Sadeh
12:44 AM Revision 66f38254 (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
12:22 AM Revision 199b14d8 (ceph): mon: fix uninitialized cluster_logger_registered
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

01/06/2012

11:12 PM Revision 001701a0 (ceph): mon_recovery: need n/2 + 1 monitors for quorum
Sage Weil
11:08 PM Revision cfeaef45 (ceph): move multimon failure thrashing tests into regression
We need to test these nightly. Sage Weil
09:36 PM Revision da921077 (ceph): ceph: don't skip monitor ports
We can use the same port multiple times if they are on a different hosts. Sage Weil
09:00 PM CephFS Bug #1682: mds: segfault in CInode::authority
hit this again:... Sage Weil
08:50 PM Revision bebd393b (ceph): objecter: ignore replies from old request attempts
If we know the request attempt, ignore old attempts.
If we do not know the attempt (because the server is old), acce...
Sage Weil
08:49 PM Revision ac177d78 (ceph): osd: encode retry attempt in MOSDOp[Reply]
In addition to the boolean flag, also encode the exact retry attempt.
Return -1 if we don't know.
Signed-off-by: Sa...
Sage Weil
08:23 PM Revision b501efda (ceph): mon: document quorum_status, mon_status
Fixes: #1824
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:22 PM Revision ca8df7ea (ceph): mon: fix misplaced else
Broken by 435c29448a10ec343f5a2b7195d94c72de5b1a25.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:20 PM Revision 13f1debb (ceph): Merge remote branch 'gh/wip-mon-timeouts'
Sage Weil
05:31 PM Revision b389685a (ceph): admin_socket: string commands
Commands are strings. Old __be32 works too. 'help' to list available
commands.
Signed-off-by: Sage Weil <sage@newd...
Sage Weil
05:31 PM Revision 5a5dece1 (ceph): admin_socket: fix, extend admin_socket unit tests
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:31 PM Revision 643b9dbd (ceph): ceph: speak new admin socket protocol
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:19 PM Bug #1896: osd: FAILED assert(is_up(osd)) from do_queries() during activate_map()
... Sage Weil
03:48 PM Bug #1896 (Rejected): osd: FAILED assert(is_up(osd)) from do_queries() during activate_map()
... Sage Weil
05:18 PM Bug #1897 (Resolved): osd: failed assert assert(src_obc) under rgw workload
commit:a774d5002132cffb7b408e7de3d75ee841301fbf Sage Weil
03:53 PM Bug #1897 (Resolved): osd: failed assert assert(src_obc) under rgw workload
... Sage Weil
04:32 PM Bug #1898 (Duplicate): very long scrub blocked write operation
On cephstore6235 we saw a write operation get blocked for 4 minutes by scrub. The log is available in /var/log/ceph/l... Greg Farnum
04:23 PM Feature #1895: osd: detect duplicate requests by tracking per-client last_acked_tid instead of us...
my first thought would be something like:
- set<osd_reqid_t> in the Session
- on session open, load above set fro...
Sage Weil
04:18 PM Feature #1895: osd: detect duplicate requests by tracking per-client last_acked_tid instead of us...
Naively this data can just go in the OSD::Session struct. However, it might be a bit of a hassle dealing with thrashi... Greg Farnum
03:15 PM Feature #1895 (Rejected): osd: detect duplicate requests by tracking per-client last_acked_tid in...
Currently duplicate request detection uses the PG log, which may be trimmed too much to contain actual duplicates. Th... Josh Durgin
03:16 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
One server side cause of this (I'm not sure there aren't more) is duplicate request tracking. Currently requests are ... Josh Durgin
12:42 PM Bug #1835 (Resolved): Monclient crash when keyring is not readable
Thanks, Wido! Sage Weil
12:38 PM Bug #1835: Monclient crash when keyring is not readable
Confirmed, this fixed it for me. Wido den Hollander
12:41 PM rgw Tasks #1824 (Resolved): ceph monitor status should be available and documented
documented in commit:b501efdab9798e30b8bf7990e9a58d076553cd2f
the in-quorum monitors don't keep track of out-of-qu...
Sage Weil
11:53 AM rgw Bug #1859: rgw: bucket creation is not atomic
resolved, as of commit:bd3ccf7f7b8370bd0e8d52f9f74fcccc7e62c902. Yehuda Sadeh
10:48 AM CephFS Bug #1435 (Need More Info): mds: loss of layout policies upon mds restart
Alexandre-
Same situation here. If you can produce a full mds log that includes the creation of the initial layou...
Sage Weil
10:35 AM CephFS Bug #1318: directories disappear across multiple rsyncs
Alexandre, have you seen this recently? We haven't turned it up in our testing.
Could this be the same as #1774?
Sage Weil
10:31 AM Feature #1808: filestore: gracefully handle EMFILE
I'm going to call this a feature. The more pressing problem is to find any fd leaks (#1788), and to limit msgr conne... Sage Weil
10:28 AM Bug #1831 (Resolved): mon: should not accept (and should disconnect) session when not in quorum
commit:13f1debbf054612fbb2c9f4dafbe12c8f937cf14 Sage Weil
10:27 AM Bug #1804 (Closed): filestore: unexpected EINVAL
ok, i'm going to assume this was the mds trucnate thing and close it out. we can reopen later if it crops up again! Sage Weil
10:10 AM CephFS Bug #1774 (New): client: files become inaccessible in large directories (with snapshots?)
Sage Weil
10:09 AM CephFS Bug #1774 (Need More Info): client: files become inaccessible in large directories (with snapshots?)
Alexandre-
We're heavily focusing on rados for the next couple of weeks, so I don't have time to try to reproduce ...
Sage Weil
10:07 AM CephFS Bug #1586 (Can't reproduce): failed pjd chmod test 00 on kclient
haven't seen this since... Sage Weil
09:52 AM Bug #1842 (Can't reproduce): osd: failed authorizations leak memory somehow?
I instrumented the rados client to send bad authenticators and hammered ceph-osd, but massif showed no leaks. I thin... Sage Weil
09:43 AM Feature #1798 (Rejected): qa: add rados/librados tests (RadosModel)
Sage Weil
01:27 AM Revision 561f06cf (ceph): suite: make email-on-success the default behavior
This way you can tell when a run is complete, instead of wondering if
it's stuck in the queue.
Josh Durgin

01/05/2012

11:42 PM Revision 435c2944 (ceph): mon: instrument elector so you can stop participating in the quorum
Add new monitor commands "quorum exit" and "quorum enter" to use it.
Signed-off-by: Greg Farnum <gregory.farnum@drea...
Greg Farnum
11:42 PM Revision 14a49433 (ceph): mon: elector needs to reset leader_acked on every election start
Otherwise you never reset the leader_acked after a failed
election attempt, so if mon 0 is available on the first rou...
Greg Farnum
11:41 PM Revision 99e5f85e (ceph): mon: kill client sessions when we're not in quorum
After a timeout of 2*mon_lease length (ie, two election rounds),
kill existing client sessions so they can reconnect ...
Greg Farnum
09:46 PM Revision c83b2a0b (ceph): OCF RA: fix variable name
Florian Haas
09:46 PM Revision b3f8b55d (ceph): debian: build ceph-resource-agents
Florian Haas
06:51 PM Revision 49a96fa7 (ceph): osd: parameterize min/max values for backfill scanning
For local scans, use the optimal value for the local filestore.
For remote scans, make it configurable, so we can co...
Sage Weil
06:46 PM Revision 98881a11 (ceph): admin_socket: refactor
Combine AdminSocketConfigObs with AdminSocket so that we can interact
with it via the cct. Simpler class structure. ...
Sage Weil
06:19 PM Revision b6c43d2a (ceph): rbd: add a command to delete all snapshots of an image
This makes deleting images with many snapshots easier.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
05:39 PM Feature #1879 (New): osd: track list of in-progress requests, log slow ones
Sage Weil
03:17 PM Feature #1879 (Duplicate): osd: track list of in-progress requests, log slow ones
Anonymous
05:38 PM rgw Tasks #1823 (Rejected): radosgw should have internal timeouts
Sage Weil
02:44 PM rgw Tasks #1823: radosgw should have internal timeouts
Sage suggests that this can more properly be detected in the OSD:
- add request to tail list when started
- remove ...
Anonymous
05:32 PM Revision a94b7314 (ceph): admin_socket: whitespace
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:30 PM Revision 96b7b0d8 (ceph): common: default 'mon osd auto mark in = false'
This way an osd that was explicitly marked out will stay out, even when
it is restarted.
Signed-off-by: Sage Weil <s...
Sage Weil
05:26 PM Revision a71a0d36 (ceph): osd: log backfill restart
This is interesting, particularly in determining when a peer that was
partially backfilled needs to be restarted.
Si...
Sage Weil
05:08 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
It looks like there are at least two bugs here: one client side, and one server side. I'm reproducing with more logs ... Josh Durgin
04:46 PM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
Updated the branch, it now includes monitor commands and instrumentation so you can drop a monitor out of the quorum.... Greg Farnum
02:21 PM Bug #1831 (In Progress): mon: should not accept (and should disconnect) session when not in quorum
A basic stab at this is staggeringly boring — pushed to wip-mon-timeouts. I want to discuss instrumenting Monitor to ... Greg Farnum
02:30 PM RADOS Feature #1894 (New): mon: implement internal heartbeating
Right now the monitor doesn't really detect if it starts breaking. It should (probably using the HeartbeatMap thingy)... Greg Farnum
02:03 PM Bug #1893 (Rejected): crushtool can't decompile crushmap
This is just the cephtool writing excess information to stdout. Josh Durgin
01:56 PM Bug #1893 (Rejected): crushtool can't decompile crushmap
This happens on 0.39 and master.... Josh Durgin
02:03 PM Bug #1892 (Rejected): osdmaptool can't decode osdmap
This is just the cephtool writing excess information to stdout. Josh Durgin
01:55 PM Bug #1892 (Rejected): osdmaptool can't decode osdmap
This happens on 0.39 and master.... Josh Durgin
12:58 PM Bug #1891: monclient: try ipv6 if ipv4 fails
I wonder if changing MonClient::build_initial_monmap() to put both the A and AAAA records in the search pool is suffi... Sage Weil
11:30 AM Bug #1891 (Resolved): monclient: try ipv6 if ipv4 fails
When a hostname is specified, and it has an A and AAAA record, only the ipv4 address is tried.
If this fails, the...
Josh Durgin
10:50 AM Bug #1771 (Resolved): rbd: delete snapshots when image is deleted
Made image deletion fail when snapshots are present (commit:bd2339102f0c650d87203fdc2336f9533a18b755), and added a co... Josh Durgin
10:08 AM Bug #1832: osd: size tracking discrepancy (scrub stat mismatch)
This happened in the other direction with python rbd tests:... Josh Durgin
10:04 AM Bug #1758 (Need More Info): OSD segfault in SimpleMessenger::send_message
Sage Weil
09:10 AM Feature #1890 (Resolved): log: async log writeout
dump logs asynchronously (to a file, syslog, or whatever other sync)
make log level and emit level independently a...
Sage Weil
09:09 AM Feature #1889 (Resolved): log: structure log records
break out some structure from the log entries. for starters,
- threadid
- timestamp
- log level
- subsystem
...
Sage Weil
09:07 AM Feature #1888 (Rejected): log: per-thread ring buffer
use per thread buffer for log messages to reduce lock contention by logging code Sage Weil
08:47 AM Feature #1887 (Resolved): mon: ability to specify pg_num for new pools
Sage Weil
01:46 AM Revision bd233910 (ceph): librbd: don't remove an image that still has snapshots
Return -EBUSY instead. After the header is removed, the snapshots
can't be removed or read, so make sure they're gone...
Josh Durgin
01:34 AM Revision 4728f4f8 (ceph): SimpleMessenger: clarify when ms_bind_ipv6 is used
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
01:11 AM Revision c8a13f2b (ceph): qa: fix mdstable script for proper injectargs use.
This script is fairly primitive, but somebody might find it useful...
Signed-off-by: Greg Farnum <gregory.farnum@dre...
Greg Farnum
01:11 AM Revision 4ea8ad43 (ceph): qa: add a slightly more stressful anchortable test
This creates more than 8 links.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
12:38 AM Revision 3935551d (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
12:25 AM Revision 61c3a918 (ceph): rados: fix run-length option parsing for rados load-gen
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil

01/04/2012

11:57 PM Revision 857b243b (ceph): osdmap: include state names in dump()
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
11:57 PM Revision b8be3382 (ceph): mon: rev cluster protocol
The OSDMap NEW and AUTOOUT bit additions subtely change the decoding of
the incremental maps in a reasonably harmless...
Sage Weil
11:57 PM Revision 9986553c (ceph): msgr: explicitly specify internal cluster protocol
Replace case statement based on my_type.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
11:57 PM Revision 0fb88b6f (ceph): move cluster protocol definitions out of ceph_fs.h
Among other things, we don't recompile the whole system when we touch
these.
Signed-off-by: Sage Weil <sage@newdream...
Sage Weil
11:57 PM Revision 9510f9c9 (ceph): mon: track auto-marked-out osds
Mark OSDs that were automatically marked OUT by the monitor because they
were down for too long. Clear the bit as so...
Sage Weil
11:57 PM Revision fcb87701 (ceph): mon: independently control whether AUTOOUT OSDs are marked in on boot
Add separate config option to control whether the monitor will mark
AUTOOUT OSDs in on boot.
Signed-off-by: Sage Wei...
Sage Weil
11:57 PM Revision af535077 (ceph): mon: maintain CEPH_OSD_NEW bit for new, unused OSDs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
11:57 PM Revision 80d90109 (ceph): mon: separately control auto-mark-in of new OSDs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
11:56 PM Revision adea084a (ceph): osd: mark degraded only when < desired replica count
Having extra replicas is not 'degraded' per se. Although it's weird that
we ever do that!
Signed-off-by: Sage Weil ...
Sage Weil
11:56 PM Revision becb71b0 (ceph): osd: don't add all strays in calc_acting()
We weren't counting up usable strays, which meant we added all of them.
This could result in acting sets with more ac...
Sage Weil
11:56 PM Revision d3395335 (ceph): osd: fix backfill reset on activate
Look at peer's info, now our own!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
11:56 PM Revision 3166b373 (ceph): osd: avoid querying missing set from (full) backfill target
If we are doing a complete backfill, we don't care about missing; it will
clearly all be below last_backfill anwyay a...
Sage Weil
11:53 PM Revision 6a918feb (ceph): Merge pull request #8 from kylemarsh/master
Remove cloudfiles requirement from obsync. Sage Weil
10:21 PM Revision 8658f0d5 (ceph): qa: load-gen-mix-small-long
30 minutes
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:14 PM Revision 5cebc740 (ceph): obsync: make obsync run without cloudfiles installed
Cloudfiles probably shouldn't be a requirement for running obsync, so this
commit makes it optional.
Kyle Marsh
09:59 PM Revision 4bcdb37c (ceph): osd: do not use incomplete peer for best info/log
For one, their stats are incomplete; if we use them we'll screw up everyone
else. For another, it doesn't do us any ...
Sage Weil
09:59 PM Revision dcd1ebab (ceph): osd: initialize backfill_pos on activate
Handling of writes depends on backfill_pos being initialized (to know what
is between the leading and trailing edge o...
Sage Weil
09:59 PM Revision 6a1cac92 (ceph): osd: initialize backfill_target; include in PG operator<<
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:46 PM Revision a5d99add (ceph): osd: fix misdirect check for requests with old epochs
get_map() assumes the epoch passed is valid. Check here in the caller.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:44 PM Revision 172dd3e0 (ceph): osd: check that we're supposed to be getting a PG before waitlisting re...
This was broken in fa722de6708d3e92037df6289cc29ece12c8ea66.
Fix it by checking if the mapping was correct in the se...
Sage Weil
06:40 PM Revision 54f36f0b (ceph): rados: gracefully report errors from 'ls'
Catch the exception thrown by the iterator when the OSD returns errors.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:38 PM Revision ed9a4a09 (ceph): osd: return EINVAL on bad PGLS[_FILTER] handle
Fixes: #1875
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:25 PM CephFS Bug #1047: mds: crash on anchor table query
After looking into this a lot today, none of my ideas panned out. I also tried a few simple tests to reproduce locall... Greg Farnum
04:50 PM Bug #1858 (Resolved): OSD needs to check for misdirected ops before putting non-existent PGs on hold
Sage merged a modified form of this in commit: 172dd3e0e40ead65349904e604962eb0724bc230. Greg Farnum
04:35 PM Cleanup #1886 (Resolved): objecter/osd: mux/demux in MOSDOpReply encoding
- osd: return read results via OSDOp, not odata, in do_osd_ops()
- MOSDOpReply: mux/demux based on payload_len
- ob...
Sage Weil
03:37 PM Feature #1885 (Resolved): identify top 10 expected failures and process to diagnose
- peering failures
- unfound objects
Sage Weil
03:36 PM Feature #1884 (Resolved): plan encoding strategy to test+facilitate non-disruptive upgrades
- best practices for encoding structures, like
- single encoding version vs compat+incompat version value
- pr...
Sage Weil
03:35 PM Feature #1883 (Resolved): admin socket: string based protocol
switch admin socket from a u32 based binary protocol to a string based one. e.g., commands like "perfcounter_dump\n"... Sage Weil
03:33 PM rgw Feature #1882 (Resolved): rgw: high-level log entries for request state transitions
log request start and transition through high-level stages (start, authenticate, read, write, etc.)
probably just ...
Sage Weil
03:32 PM Feature #1881 (Resolved): objecter: expose in-progress request state via admin socket
an admin socket command that will dump current in-progress requests, similar to cat /sys/kernel/debug/ceph/*/osdc Sage Weil
03:31 PM Feature #1880 (Rejected): osd: optionally log all request latencies
start/stop+dump via admin socket?
need something like this to evaluate distirbution of latencies (e.g. 99 percenti...
Sage Weil
03:31 PM Feature #1879 (Resolved): osd: track list of in-progress requests, log slow ones
- add request to tail list when started
- remove when complete
- periodically scan start of list and log slow reque...
Sage Weil
11:32 AM Feature #1422: libvirt: rbd storage pool
I've done some work on this, see: http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/4846 Wido den Hollander
10:48 AM Bug #1875 (Resolved): osd: ReplicatedPG::do_op
commit:ed9a4a0996a9879bb2798a0771b263312f5d88fc Sage Weil
09:55 AM Bug #1875: osd: ReplicatedPG::do_op
The PGLS iterator handle format was recently changed, and this crashed while decoding it. My guess is an old binary ... Sage Weil
05:35 AM Bug #1875 (Resolved): osd: ReplicatedPG::do_op
I just noticed two OSD's (osd.11 and osd.20) go down in my cluster.
The backtrace of both OSD's:...
Wido den Hollander
10:47 AM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
Last night a test hit this. The MDS got stuck connected to an out-of-quorum mon, and never stopped being laggy. Josh Durgin
10:33 AM CephFS Bug #1874: Running `git gc` on a bare git repository hosted by ceph results in a bus error.
Hi,
Drat, I was hoping this would be a simple-to-reproduce case. Never mind, here are some more details:
Kerne...
David McBride
10:06 AM CephFS Bug #1874 (Need More Info): Running `git gc` on a bare git repository hosted by ceph results in a...
Sage Weil
10:06 AM CephFS Bug #1874: Running `git gc` on a bare git repository hosted by ceph results in a bus error.
Which version of the kernel client and server are you running?
Can you attach an strace -f of the 'git gc' run s...
Sage Weil
03:27 AM CephFS Bug #1874 (Can't reproduce): Running `git gc` on a bare git repository hosted by ceph results in ...
When @git gc@ is run on a bare git repository hosted by a local test ceph filesystem mounted via the kernel client, i... David McBride
10:19 AM CephFS Bug #1878 (Resolved): ceph.ko doesn't setattr (lchown, utimes) on symlinks
rsync -a /my/symlink/pool/ /mnt/ceph.ko/pool/ silently fails to set times and ownership of symlinks, whereas the same... Alexandre Oliva
10:02 AM CephFS Bug #1877 (Can't reproduce): ceph.ko (3.1.6) oopses upon cephfs set_layout of a symlink to a dir
I moved elsewhere a directory that had a layout policy set. Later on, next time the mds lost directory policy inform... Alexandre Oliva
09:57 AM Cleanup #1876 (Resolved): osd: EINVAL on all client command decoding errors
There are various other places in the osd (besides #1875) where we decode data that is provided as part of the user c... Sage Weil
01:28 AM Revision a97aca74 (ceph): rados.py: use uint64_t for auids
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
01:28 AM Revision 2f1720d0 (ceph): librados: return int64_t pool ids
468e28ee60ee2fe625d2680c792a4bcb9ef19951 missed the get_id() functions.
Signed-off-by: Josh Durgin <josh.durgin@drea...
Josh Durgin

01/03/2012

10:08 PM Revision 8e56e99c (ceph): radosgw-admin: add eol following info
Yehuda Sadeh
10:07 PM Revision ec3a3a96 (ceph): rados: fix example config
Josh Durgin
09:55 PM Revision 71d5bcbb (ceph): Adjust rados model workloads for new config format
Josh Durgin
09:10 PM Revision ec6530df (ceph): RadosModel: make object write ranges configurable
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
09:10 PM Revision f03e770f (ceph): RadosModel: allow TestOps to pass data to their finish methods
This will allow nested writes to keep track of which write actually
completed. Also remove finish() and _finish() fr...
Josh Durgin
09:10 PM Revision 91124051 (ceph): RadosModel: check for out of order replies within WriteOps
A single WriteOp already does multiple aio_writes. Each aio_write
gets a unique tid that is checked upon completion. ...
Josh Durgin
09:10 PM Revision 0e470c50 (ceph): testrados: replace testreadwrite and testsnaps with testrados
testrados can act as testreadwrite or testsnaps by changing the
command line options for the weight of each operation...
Josh Durgin
09:02 PM Revision 0176c9ab (ceph): Remove unused mon.0 variables.
Josh Durgin
09:02 PM Revision cdd5c456 (ceph): nuke-on-error: only unlock if this run locked the machines
Josh Durgin
09:02 PM Revision 2e9b1c75 (ceph): rados: use testrados instead of testsnaps and testreadwrite
Josh Durgin
07:37 PM Revision a66d90e3 (ceph): osd: add a monitor timeout via MPGStatsAck messages
Keep track of when we have outstanding updates, and while we do, make
sure the monitor responds within a timeout (def...
Greg Farnum
05:55 PM Feature #1863 (Resolved): qa: tester for osd op reply order
Josh Durgin
05:13 PM Bug #1873 (Won't Fix): crush_rule type is inconsistent
Here's a table of crush_rule's type in various places:... Josh Durgin
04:49 PM rgw Feature #1872: rgw: only use shadow objects for large objects
Once a race has been detected, operation needs to be restarted (unless we already have all requested data). Yehuda Sadeh
04:29 PM rgw Feature #1872: rgw: only use shadow objects for large objects
This will also require being careful to check both current and new sizes — the new object might be < chunk size while... Greg Farnum
04:23 PM rgw Feature #1872 (Resolved): rgw: only use shadow objects for large objects
Currently we use shadow objects for every write that overwrites an object. We can avoid that by assuming that objects... Yehuda Sadeh
04:07 PM Feature #1836: filejournal: use async directio to write to the journal
Presumably AIO writes can be combined or reordered by the block device/interfaces, right? So having a bunch of them i... Greg Farnum
09:27 AM Feature #1836: filejournal: use async directio to write to the journal
Say we have a cap of N aio ops, to prevent a stream of small ops resulting in a stream of tiny aio writes to the jour... Sage Weil
04:04 PM Revision f4b0cda1 (ceph): Fix invalid docdir_SCRIPTS usage with >=automake-1.11.2
Alphat-PC
04:04 PM Bug #1865 (Duplicate): mon: need to disconnect clients when we drop out of quorum
Adding active ping requirements to the monitors is contrary to the direction we want to take them with clients, thoug... Greg Farnum
04:02 PM Bug #1841 (Resolved): OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
Sage merged this into master. Greg Farnum
03:43 PM CephFS Bug #1047: mds: crash on anchor table query
This log is less illuminating than I'd hoped since it's just replay commits. :(
However, it did give me one idea t...
Greg Farnum
02:33 PM Linux kernel client Bug #1866 (Duplicate): null pointer dereference after osd went down
same as #1793 Sage Weil
01:10 PM Linux kernel client Feature #1870: usedcache mount option
maybe call it use_dcache? usedcache is easy to misread as 'used cache'. Josh Durgin
11:14 AM Linux kernel client Feature #1870 (Resolved): usedcache mount option
Sage Weil
12:16 PM rgw Bug #1867 (Resolved): rgw crash
should be fixed by commit:68ec8d8ee900642cdb594c67b7d2c416d55bec80. Yehuda Sadeh
11:46 AM Linux kernel client Bug #1871 (Resolved): ceph_client: crash after running xfstests 002
Running xfstests under UML against a ceph filesystem, I get a
client crash due to dereferencing a null pointer in ce...
Alex Elder
11:00 AM Linux kernel client Bug #1767 (Resolved): osd_client: send_request() cannot fail
commit:24e08cacc999503a069003364565116c923342b9 Sage Weil
10:53 AM Linux kernel client Bug #1762 (Resolved): i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
merged into mainline for 3.2 Sage Weil
10:47 AM Linux kernel client Bug #1652 (Resolved): rbd: rollback correctly after resizing
we removed rollback functionality from the kernel. Sage Weil
10:42 AM Linux kernel client Bug #1812 (Resolved): iput scheduling while atomic
Sage Weil
10:42 AM Linux kernel client Bug #1812: iput scheduling while atomic
fixed by commit:aab26905f6a1df8e6a14f41a32a93b9af0c8b7c5 Sage Weil
09:16 AM Bug #1868 (Resolved): Ceph client crashed after shutting down one mds and osd
This bug was fixed by commit:935b639a049053d0ccbcf7422f2f9cd221642f58 in v3.1.
You should have better luck with th...
Sage Weil
08:37 AM Bug #1869 (Resolved): automake fails with >=automake-1.11.2 due to "docdir" usage
applied, commit:f4b0cda17875c27d8b945be6cf5db9b356bb2dab
Thanks!
Sage Weil
07:40 AM Bug #1869 (Resolved): automake fails with >=automake-1.11.2 due to "docdir" usage
doc_SCRIPTS is no longer allowed in >=automake-1.11.2 [1], as a result ceph fails to configure:
configure.ac:40: A...
Anonymous
08:34 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
David McBride wrote:
> Unfortunately, my Ceph cluster which was presenting these symptoms has now died -- the remain...
Sage Weil
07:45 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Unfortunately, my Ceph cluster which was presenting these symptoms has now died -- the remaining set of OSDs didn't h... David McBride

01/02/2012

08:01 PM CephFS Bug #1682: mds: segfault in CInode::authority
This probably isn't all that useful for anyone who knows the code well, but I threw together a quick run down of plac... Mark Nelson
02:24 PM Bug #1868 (Resolved): Ceph client crashed after shutting down one mds and osd
Here is my cluster configuration before shutting down ceph components on n2cc (2.2.2.2).... Maciej Galkiewicz
05:47 AM CephFS Bug #1047: mds: crash on anchor table query
Here is what MDS logs with debug 20. No idea if it really helps. The cluster
is still in the broken state, should I...
Amon Ott

12/31/2011

11:09 PM Revision f8929bad (ceph): osd: trigger RecoveryFinished event on recovery completion
Unconditionally trigger the RecoveryFinished event when start_recvoery_ops
thinks it may be done. This lets us trigg...
Sage Weil
02:49 PM CephFS Bug #1774: client: files become inaccessible in large directories (with snapshots?)
Oh, interesting! With a debug client = 20, debug ms = 1 log from ceph-fuse this should be pretty straightforward to ... Sage Weil
01:04 AM Revision a1252463 (ceph): librados: take lock in rollback
We're poking through the osdmap; need to hold the lock here.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
01:04 AM Revision 4c23e9e4 (ceph): objecter: assert lock held in op_submit
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
01:04 AM Revision 68ec8d8e (ceph): librados: call aio_operate() with lock held
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:45 AM Revision 0692bed8 (ceph): cmp: fix 5-uple operator==
Doh!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
12:45 AM Revision 1c754187 (ceph): osd: be a bit more verbose during backfill
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil

12/30/2011

11:51 PM Revision 3dcaf6c3 (ceph): osd: do not backfill if any objects are missing on the primary
Someday we need to do something smarter so that a single unfound object
doesn't hold up replication of other objects....
Sage Weil
10:37 PM Revision cdf142b5 (ceph): rados: fix documentation format
Josh Durgin
10:37 PM Revision 6df4ce50 (ceph): rados: fix references to testrados
Josh Durgin
10:37 PM Revision 0af9c0a2 (ceph): rados: clean up argument construction
Only the client id varies, so it can be done outside the loop. Also
handle coredumps and coverage, and use LD_LIBRARY...
Josh Durgin
10:37 PM Revision 932257fb (ceph): rados: remove unused variable
Josh Durgin
10:37 PM Revision 2f71f03f (ceph): misc: simplify reconnect logic
Ignore all errors until the timeout expires so we don't have to worry
about whitelisting them.
Josh Durgin
10:31 PM Revision 949f24d5 (ceph): rgw: create default constructors for some structs
this will silence valgrind a bit Yehuda Sadeh
08:23 PM Revision 251fc3d5 (ceph): osd: handle backfill_target for pick_newest_available
The it may not be missing on the backfill_target if it is after the
last_backfill marker.
Signed-off-by: Sage Weil <...
Sage Weil
08:19 PM Revision a3525891 (ceph): osd: return EINVAL if multi op specified with no src object name
This avoids crashing later in do_osd_ops() with something like
osd/ReplicatedPG.cc: In function 'int ReplicatedPG::d...
Sage Weil
07:39 PM Revision e686c1b6 (ceph): hobject_t: fix operator==, !=
These weren't comparing key.
While we're at it, clean this up by using generic macros for writing
these operators, s...
Sage Weil
07:38 PM Revision 063ab2e4 (ceph): cmp.h: define macros for creating comparison operators
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:43 PM Revision 7969cc4f (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
06:32 PM Revision 6687ccf5 (ceph): workunits: update rbd test for new error format
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
05:50 PM Revision 85719b0e (ceph): config: use autoconf $libdir for default rados class dir
Fixes: #1722
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:17 PM Revision 0d9507c2 (ceph): .gitignore: src/ocf/ceph
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:00 PM Revision 9b6422db (ceph): Spec: conditionally build ceph-resource-agents package
Put OCF resource agents in a separate subpackage,
to be enabled with a separate build conditional
(--with ocf).
Make...
Florian Haas
05:00 PM Revision 92cfad42 (ceph): Add OCF-compliant resource agent for Ceph daemons
Add a wrapper around the ceph init script that makes
MDS, OSD and MON configurable as Open Cluster Framework
(OCF) co...
Florian Haas
04:31 PM rgw Bug #1867: rgw crash
This is the same assert as #1737. It may not be related, tho.. the bug may just be unlocked concurrent access to the... Sage Weil
04:16 PM rgw Bug #1867 (Resolved): rgw crash
logs were turned off, so not much of that, but here's the backtrace. Happened on 7fc97e6 during a performance test (r... Yehuda Sadeh
04:06 PM Revision 66170633 (ceph): mon: fix full ratio updates
- update them independently
- only if we are leader
- fix type for nearfull_ratio
Signed-off-by: Sage Weil <sage@new...
Sage Weil
04:06 PM Revision f2e41097 (ceph): mon: don't ignore first full ratio update callback
We get a callack on startup. Don't ignore it.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
03:45 PM Revision a693438e (ceph): mon: only update full_ratio if we're the leader
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:42 PM Revision 47d02271 (ceph): Merge remote branch 'gh/wip-cleanup'
Sage Weil
03:28 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
looks like a null pointer dereference.. look for a struct member that's 0x48 bytes in? Sage Weil
02:59 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
This happened again with the same workload on sepia81. Josh Durgin
10:35 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
hit this again, nightly_coverage_2011-12-29-b/5388... Sage Weil
09:59 AM Bug #1722 (Resolved): osd_class_dir must reflect autoconf libdir
fixed by commit:85719b0ed81a38c3bd36c6be411f29181c969cda. duh Sage Weil
08:47 AM Bug #1865: mon: need to disconnect clients when we drop out of quorum
the ceph-mon is deadlocked by... Sage Weil
08:44 AM Bug #1865: mon: need to disconnect clients when we drop out of quorum
the kernel client is repeated reconnecting to a down osd and resendig its' requests because its osdmpa is out of date... Sage Weil
06:52 AM Bug #1862: filestore: EINVAL on replay
Hello Sage,
Int64_t do the trick and now the osd are back online again!
Thank you
Marco
Marco Aroldi
01:15 AM Revision df84594f (ceph): mon: make full ratio config change callback safe
We can't propose_pending() from any context; do this in the tick() thread,
with the proper locking. Among other thin...
Sage Weil

12/29/2011

11:43 PM Revision 585fb5ce (ceph): clitests: update for new error format
This was changed in 1f434da8a3ca4db830d1f3b0d87e5df941d85f2d
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
11:28 PM Revision cec2692e (ceph): clitests: update monmaptool test
e93961c11119942eae3a4cd14a79f779a5a4d277 changed output format.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
09:09 PM Revision f04e2955 (ceph): teuthology rgw-admin: annotated test cases for inventory
this is not a nose suite, so I simply added test case
descriptions in csv format, and put a file to extract
the...
Mark Kampe
08:00 PM Revision 48df71c8 (ceph): init script: be LSB compliant for exit code on status
An exit code of 1 on status is defined in LSB as
"program is dead, but pid file exists". Check for existence
of this ...
Florian Haas
07:58 PM Revision 3b2ca7cf (ceph): keyring: print more useful errors to log/err
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:57 PM Revision eba235f2 (ceph): common: trigger all observers on startup
Among other things, this makes err-to-stderr and friends initialize
properly in the DoutStreamBuf.
Signed-off-by: Sa...
Sage Weil
07:24 PM Revision 1f434da8 (ceph): common: make cpp_strerror output prettier
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:24 PM Revision 04c8db00 (ceph): librados: check for monclient::init() error
I think this fixes #1835.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:59 PM Revision 1a59405c (ceph): rgw: turn on cache by default
Yehuda Sadeh
05:59 PM Revision 37013b6f (ceph): qa: load-gen-mix-small.sh
Sage Weil
05:41 PM Revision 959fd71f (ceph): osd: explicitly track leading edge of backfill
backfill_pos is the leading edge; last_backfill is the trailing edge.
Anything inbetween is either pushed, doesn't ex...
Sage Weil
05:31 PM CephFS Bug #1682: mds: segfault in CInode::authority
Not sure if this is the same underlying problem, but here's another CInode::authority crash from teuthology:~teuthwor... Josh Durgin
05:09 PM Revision d24ea235 (ceph): mds: assert if we get an EINVAL on our truncate
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:00 PM Revision 47013c28 (ceph): osd: get fsid from monmap, not osdmap
We may not have a valid OSDMap in all of these cases (notably, during
boot). Always take the fsid from the monmap, w...
Sage Weil
04:59 PM Revision 05cc4eb9 (ceph): monc: get latest monmap during authentication
Tell the monitor which monmap version we have in our initial auth message.
Make the monitor send the latest monmap if...
Sage Weil
04:44 PM Revision 5d5c9b6f (ceph): osdmap: add const markers to some unfixed functions
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
04:44 PM Revision 300c7584 (ceph): osd: catch authenticate error on startup
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:30 PM Linux kernel client Bug #1866 (Duplicate): null pointer dereference after osd went down
This was during a kernel_untar_build workunit on rbd:... Josh Durgin
12:15 PM Bug #1835: Monclient crash when keyring is not readable
should be fixed by commit:04c8db001a4ed02ef7335ed01ce73ce9ab28dc9d .. can you verify, Wido? Sage Weil
11:16 AM Feature #1863 (In Progress): qa: tester for osd op reply order
Josh Durgin
11:14 AM Bug #1865 (Duplicate): mon: need to disconnect clients when we drop out of quorum
From sepia4:/tmp/cephtest/archive/log/osd.0.log:... Josh Durgin
10:59 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-29-a/5318/remote/ubuntu@sepia60.ceph.dream... Josh Durgin
10:33 AM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-28-b/5258/teuthology.log. Josh Durgin
10:04 AM Bug #1862: filestore: EINVAL on replay
Marco Aroldi wrote:
> Hmmm
> I have another problem: i've tried the patch in #1759 but I have a error at compile ti...
Sage Weil
10:01 AM Bug #1862: filestore: EINVAL on replay
Hmmm
I have another problem: i've tried the patch in #1759 but I have a error at compile time:
CXX libos_l...
Marco Aroldi
08:33 AM Bug #1862 (Duplicate): filestore: EINVAL on replay
Aha, this is actually #1759. If you apply the patch in that bug report it'll get your OSDs up and running again. Th... Sage Weil
03:33 AM Bug #1862: filestore: EINVAL on replay
Hi,
I've downloaded and compiled the latest code from the git repository.
I've issued a "ceph-osd -i 1 --debug_ms 2...
Marco Aroldi
09:53 AM Bug #1804: filestore: unexpected EINVAL
My money is that this is caused by #1759. Which hopefully means that the qa suite will eventually trigger the new as... Sage Weil
09:17 AM Bug #1741 (Can't reproduce): teuthology: failed to untar
Sage Weil
09:16 AM Bug #1759 (Need More Info): mds/client: truncate size overflow, fails with EINVAL
The OSD now returns EINVAL, the MDS asserts if it gets EINVAL, and we have some MDS-side assertions that should catch... Sage Weil
09:10 AM Bug #1846 (Resolved): Mds crash immediately after start (segmentation fault)
Great! Sage Weil
06:05 AM Bug #1846: Mds crash immediately after start (segmentation fault)
I have built debian package from master branch and upgraded ceph on both servers. Mds and osd started properly. Thank... Maciej Galkiewicz
09:08 AM Bug #1848 (Resolved): osd got zeroed out fsid
Sage Weil
09:08 AM Bug #1848: osd got zeroed out fsid
fixed by commit:47013c289e6ad6638b0f77152dafbc9f4723c032 and commit:05cc4eb93ce6d193c6aea4918144006fb4d1c187 Sage Weil
01:00 AM Revision e18b1c97 (ceph): rgw: removing swift user index when removing user
Yehuda Sadeh
12:50 AM Revision 997e35ae (ceph): rgw-admin: remove subuser index when required
Yehuda Sadeh
12:42 AM Revision 1f40031f (ceph): osd: fix push completion check
Only check backfill if we pushed to the backfill target. And avoid teh hash
lookup in the general case.
Signed-off-...
Sage Weil
12:34 AM Revision 2dc90d03 (ceph): rgw: clone operation should only update index for main category
Yehuda Sadeh
12:33 AM Revision bb52b187 (ceph): rgw: fix cache interface (was not overloading method)
Yehuda Sadeh
 

Also available in: Atom