Project

General

Profile

Activity

From 12/10/2011 to 01/08/2012

01/08/2012

11:15 PM Revision a59ee8f9 (ceph): osd: handle case where no acceptable info exists
This happens when the only available replicas has last_backfill != MAX.
In that case, revert to up, and then set the...
Sage Weil
10:39 PM Revision b354ce4e (ceph): run: put pid in archive dir
This will make it easy for teuthology-ls to show you the running process's
pid (if it's still running). Or for other...
Sage Weil
06:15 PM Revision c5affd6c (ceph): Merge remote branch 'gh/wip-osd-retry-attempt'
Sage Weil
04:16 PM Revision 5abe49d6 (ceph): Merge remote branch 'gh/wip-admin-socket'
Sage Weil
03:27 PM Bug #1904 (Resolved): osd: calc_acting bad iterator deref
commit:a59ee8f91bc879beb614ac10fa2f9a4a284ebfb6 Sage Weil
08:39 AM Bug #1904 (Resolved): osd: calc_acting bad iterator deref
... Sage Weil
10:46 AM Bug #1905 (Resolved): osd: preserve pg log when resetting backfill
If we use the pg log to detect dups, we need to preserve some pg log history when we backfill. Sage Weil
10:45 AM Feature #1883 (Resolved): admin socket: string based protocol
Sage Weil
08:36 AM Bug #1896: osd: FAILED assert(is_up(osd)) from do_queries() during activate_map()
again,... Sage Weil

01/07/2012

06:16 PM Revision fbf79121 (ceph): do not put monitors on the same nodes as clients
Otherwise, for kernel clients (rbd or kclient), ceph-mon can cause a deadlock when it calls sync(2). Sage Weil
10:22 AM Bug #1903 (Resolved): osd/ReplicatedPG.cc: 3189: FAILED assert(obc->unconnected_watchers.size() =...
... Sage Weil
10:21 AM CephFS Bug #1902 (Won't Fix): mds: unittest_interval_tree bad memory access
after a segfault:... Sage Weil
07:25 AM Bug #1901 (Resolved): Missing files in ceph packages results in build failure of tests
make dist in src/Makefile.am is missing several header files that are used by tests, as result ceph fails to build. Anonymous
07:23 AM Bug #1900 (Resolved): Fix detection and build issues with libcrypto++
Currently ceph's build system uses AC_SEARCH_LIBS (in some cases) to search for libcrypto++. As a result C++ library ... Anonymous
07:18 AM Cleanup #1899 (Resolved): use acx_pthread instead of hardcoding libs and cflags into build system
Hardcoding -lphtread is not the most portable way of including threads support. Please use upstream macros instead.
...
Anonymous
04:54 AM Revision 92ca3ef7 (ceph): perfcounters: fix unittest for new admin_socket interface
Broken by b389685afa1be00b5147855bf71c50042bfbfa6c.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:39 AM Revision d8e54994 (ceph): Makefile: disable untitest_interval_tree
Segfaults. Valgrind errors. Accessing uninitialized memory.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:38 AM Revision bcf21467 (ceph): unittest_interval_tree: make it compile
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
01:21 AM Revision 13445d23 (ceph): ceph_manager: a booting osd is no longer automatically marked in
as of ceph.git commit 96b7b0d83e5fe70a4efb4e284e18b4b40840bfec Sage Weil
01:18 AM Revision a774d500 (ceph): osd: clean up src_oid, src_obc map key calculation
Be consistent about how we generate the src_oid and src_oloc, so that we
feed good value into find_object_context and...
Sage Weil
12:55 AM Revision 3c60e804 (ceph): osd: read op should claim_append data instead of claim
Yehuda Sadeh
12:53 AM Revision 0d175cd6 (ceph): rgw: remove object before writing both xattrs and data
otherwise we'll leak xattrs from previous incarnation Yehuda Sadeh
12:53 AM Revision 469f3eb4 (ceph): rgw: create plain processor for small objects
Yehuda Sadeh
12:53 AM Revision 75acc0a3 (ceph): rgw: fix multipart PUT
latest revamp broke it, missed calling RGWPutObjProcessor::prepare(s)
where needed.
Yehuda Sadeh
12:53 AM Revision 26b54ae5 (ceph): rgw: rearrange PutObj::execute()
groundwork for different handling of small object PUTs Yehuda Sadeh
12:53 AM Revision a0b55397 (ceph): rgw: different atomic handling for small objects
Yehuda Sadeh
12:44 AM Revision 66f38254 (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
12:22 AM Revision 199b14d8 (ceph): mon: fix uninitialized cluster_logger_registered
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

01/06/2012

11:12 PM Revision 001701a0 (ceph): mon_recovery: need n/2 + 1 monitors for quorum
Sage Weil
11:08 PM Revision cfeaef45 (ceph): move multimon failure thrashing tests into regression
We need to test these nightly. Sage Weil
09:36 PM Revision da921077 (ceph): ceph: don't skip monitor ports
We can use the same port multiple times if they are on a different hosts. Sage Weil
09:00 PM CephFS Bug #1682: mds: segfault in CInode::authority
hit this again:... Sage Weil
08:50 PM Revision bebd393b (ceph): objecter: ignore replies from old request attempts
If we know the request attempt, ignore old attempts.
If we do not know the attempt (because the server is old), acce...
Sage Weil
08:49 PM Revision ac177d78 (ceph): osd: encode retry attempt in MOSDOp[Reply]
In addition to the boolean flag, also encode the exact retry attempt.
Return -1 if we don't know.
Signed-off-by: Sa...
Sage Weil
08:23 PM Revision b501efda (ceph): mon: document quorum_status, mon_status
Fixes: #1824
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
08:22 PM Revision ca8df7ea (ceph): mon: fix misplaced else
Broken by 435c29448a10ec343f5a2b7195d94c72de5b1a25.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:20 PM Revision 13f1debb (ceph): Merge remote branch 'gh/wip-mon-timeouts'
Sage Weil
05:31 PM Revision b389685a (ceph): admin_socket: string commands
Commands are strings. Old __be32 works too. 'help' to list available
commands.
Signed-off-by: Sage Weil <sage@newd...
Sage Weil
05:31 PM Revision 5a5dece1 (ceph): admin_socket: fix, extend admin_socket unit tests
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:31 PM Revision 643b9dbd (ceph): ceph: speak new admin socket protocol
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:19 PM Bug #1896: osd: FAILED assert(is_up(osd)) from do_queries() during activate_map()
... Sage Weil
03:48 PM Bug #1896 (Rejected): osd: FAILED assert(is_up(osd)) from do_queries() during activate_map()
... Sage Weil
05:18 PM Bug #1897 (Resolved): osd: failed assert assert(src_obc) under rgw workload
commit:a774d5002132cffb7b408e7de3d75ee841301fbf Sage Weil
03:53 PM Bug #1897 (Resolved): osd: failed assert assert(src_obc) under rgw workload
... Sage Weil
04:32 PM Bug #1898 (Duplicate): very long scrub blocked write operation
On cephstore6235 we saw a write operation get blocked for 4 minutes by scrub. The log is available in /var/log/ceph/l... Greg Farnum
04:23 PM Feature #1895: osd: detect duplicate requests by tracking per-client last_acked_tid instead of us...
my first thought would be something like:
- set<osd_reqid_t> in the Session
- on session open, load above set fro...
Sage Weil
04:18 PM Feature #1895: osd: detect duplicate requests by tracking per-client last_acked_tid instead of us...
Naively this data can just go in the OSD::Session struct. However, it might be a bit of a hassle dealing with thrashi... Greg Farnum
03:15 PM Feature #1895 (Rejected): osd: detect duplicate requests by tracking per-client last_acked_tid in...
Currently duplicate request detection uses the PG log, which may be trimmed too much to contain actual duplicates. Th... Josh Durgin
03:16 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
One server side cause of this (I'm not sure there aren't more) is duplicate request tracking. Currently requests are ... Josh Durgin
12:42 PM Bug #1835 (Resolved): Monclient crash when keyring is not readable
Thanks, Wido! Sage Weil
12:38 PM Bug #1835: Monclient crash when keyring is not readable
Confirmed, this fixed it for me. Wido den Hollander
12:41 PM rgw Tasks #1824 (Resolved): ceph monitor status should be available and documented
documented in commit:b501efdab9798e30b8bf7990e9a58d076553cd2f
the in-quorum monitors don't keep track of out-of-qu...
Sage Weil
11:53 AM rgw Bug #1859: rgw: bucket creation is not atomic
resolved, as of commit:bd3ccf7f7b8370bd0e8d52f9f74fcccc7e62c902. Yehuda Sadeh
10:48 AM CephFS Bug #1435 (Need More Info): mds: loss of layout policies upon mds restart
Alexandre-
Same situation here. If you can produce a full mds log that includes the creation of the initial layou...
Sage Weil
10:35 AM CephFS Bug #1318: directories disappear across multiple rsyncs
Alexandre, have you seen this recently? We haven't turned it up in our testing.
Could this be the same as #1774?
Sage Weil
10:31 AM Feature #1808: filestore: gracefully handle EMFILE
I'm going to call this a feature. The more pressing problem is to find any fd leaks (#1788), and to limit msgr conne... Sage Weil
10:28 AM Bug #1831 (Resolved): mon: should not accept (and should disconnect) session when not in quorum
commit:13f1debbf054612fbb2c9f4dafbe12c8f937cf14 Sage Weil
10:27 AM Bug #1804 (Closed): filestore: unexpected EINVAL
ok, i'm going to assume this was the mds trucnate thing and close it out. we can reopen later if it crops up again! Sage Weil
10:10 AM CephFS Bug #1774 (New): client: files become inaccessible in large directories (with snapshots?)
Sage Weil
10:09 AM CephFS Bug #1774 (Need More Info): client: files become inaccessible in large directories (with snapshots?)
Alexandre-
We're heavily focusing on rados for the next couple of weeks, so I don't have time to try to reproduce ...
Sage Weil
10:07 AM CephFS Bug #1586 (Can't reproduce): failed pjd chmod test 00 on kclient
haven't seen this since... Sage Weil
09:52 AM Bug #1842 (Can't reproduce): osd: failed authorizations leak memory somehow?
I instrumented the rados client to send bad authenticators and hammered ceph-osd, but massif showed no leaks. I thin... Sage Weil
09:43 AM Feature #1798 (Rejected): qa: add rados/librados tests (RadosModel)
Sage Weil
01:27 AM Revision 561f06cf (ceph): suite: make email-on-success the default behavior
This way you can tell when a run is complete, instead of wondering if
it's stuck in the queue.
Josh Durgin

01/05/2012

11:42 PM Revision 435c2944 (ceph): mon: instrument elector so you can stop participating in the quorum
Add new monitor commands "quorum exit" and "quorum enter" to use it.
Signed-off-by: Greg Farnum <gregory.farnum@drea...
Greg Farnum
11:42 PM Revision 14a49433 (ceph): mon: elector needs to reset leader_acked on every election start
Otherwise you never reset the leader_acked after a failed
election attempt, so if mon 0 is available on the first rou...
Greg Farnum
11:41 PM Revision 99e5f85e (ceph): mon: kill client sessions when we're not in quorum
After a timeout of 2*mon_lease length (ie, two election rounds),
kill existing client sessions so they can reconnect ...
Greg Farnum
09:46 PM Revision c83b2a0b (ceph): OCF RA: fix variable name
Florian Haas
09:46 PM Revision b3f8b55d (ceph): debian: build ceph-resource-agents
Florian Haas
06:51 PM Revision 49a96fa7 (ceph): osd: parameterize min/max values for backfill scanning
For local scans, use the optimal value for the local filestore.
For remote scans, make it configurable, so we can co...
Sage Weil
06:46 PM Revision 98881a11 (ceph): admin_socket: refactor
Combine AdminSocketConfigObs with AdminSocket so that we can interact
with it via the cct. Simpler class structure. ...
Sage Weil
06:19 PM Revision b6c43d2a (ceph): rbd: add a command to delete all snapshots of an image
This makes deleting images with many snapshots easier.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
05:39 PM Feature #1879 (New): osd: track list of in-progress requests, log slow ones
Sage Weil
03:17 PM Feature #1879 (Duplicate): osd: track list of in-progress requests, log slow ones
Anonymous
05:38 PM rgw Tasks #1823 (Rejected): radosgw should have internal timeouts
Sage Weil
02:44 PM rgw Tasks #1823: radosgw should have internal timeouts
Sage suggests that this can more properly be detected in the OSD:
- add request to tail list when started
- remove ...
Anonymous
05:32 PM Revision a94b7314 (ceph): admin_socket: whitespace
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:30 PM Revision 96b7b0d8 (ceph): common: default 'mon osd auto mark in = false'
This way an osd that was explicitly marked out will stay out, even when
it is restarted.
Signed-off-by: Sage Weil <s...
Sage Weil
05:26 PM Revision a71a0d36 (ceph): osd: log backfill restart
This is interesting, particularly in determining when a peer that was
partially backfilled needs to be restarted.
Si...
Sage Weil
05:08 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
It looks like there are at least two bugs here: one client side, and one server side. I'm reproducing with more logs ... Josh Durgin
04:46 PM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
Updated the branch, it now includes monitor commands and instrumentation so you can drop a monitor out of the quorum.... Greg Farnum
02:21 PM Bug #1831 (In Progress): mon: should not accept (and should disconnect) session when not in quorum
A basic stab at this is staggeringly boring — pushed to wip-mon-timeouts. I want to discuss instrumenting Monitor to ... Greg Farnum
02:30 PM RADOS Feature #1894 (New): mon: implement internal heartbeating
Right now the monitor doesn't really detect if it starts breaking. It should (probably using the HeartbeatMap thingy)... Greg Farnum
02:03 PM Bug #1893 (Rejected): crushtool can't decompile crushmap
This is just the cephtool writing excess information to stdout. Josh Durgin
01:56 PM Bug #1893 (Rejected): crushtool can't decompile crushmap
This happens on 0.39 and master.... Josh Durgin
02:03 PM Bug #1892 (Rejected): osdmaptool can't decode osdmap
This is just the cephtool writing excess information to stdout. Josh Durgin
01:55 PM Bug #1892 (Rejected): osdmaptool can't decode osdmap
This happens on 0.39 and master.... Josh Durgin
12:58 PM Bug #1891: monclient: try ipv6 if ipv4 fails
I wonder if changing MonClient::build_initial_monmap() to put both the A and AAAA records in the search pool is suffi... Sage Weil
11:30 AM Bug #1891 (Resolved): monclient: try ipv6 if ipv4 fails
When a hostname is specified, and it has an A and AAAA record, only the ipv4 address is tried.
If this fails, the...
Josh Durgin
10:50 AM Bug #1771 (Resolved): rbd: delete snapshots when image is deleted
Made image deletion fail when snapshots are present (commit:bd2339102f0c650d87203fdc2336f9533a18b755), and added a co... Josh Durgin
10:08 AM Bug #1832: osd: size tracking discrepancy (scrub stat mismatch)
This happened in the other direction with python rbd tests:... Josh Durgin
10:04 AM Bug #1758 (Need More Info): OSD segfault in SimpleMessenger::send_message
Sage Weil
09:10 AM Feature #1890 (Resolved): log: async log writeout
dump logs asynchronously (to a file, syslog, or whatever other sync)
make log level and emit level independently a...
Sage Weil
09:09 AM Feature #1889 (Resolved): log: structure log records
break out some structure from the log entries. for starters,
- threadid
- timestamp
- log level
- subsystem
...
Sage Weil
09:07 AM Feature #1888 (Rejected): log: per-thread ring buffer
use per thread buffer for log messages to reduce lock contention by logging code Sage Weil
08:47 AM Feature #1887 (Resolved): mon: ability to specify pg_num for new pools
Sage Weil
01:46 AM Revision bd233910 (ceph): librbd: don't remove an image that still has snapshots
Return -EBUSY instead. After the header is removed, the snapshots
can't be removed or read, so make sure they're gone...
Josh Durgin
01:34 AM Revision 4728f4f8 (ceph): SimpleMessenger: clarify when ms_bind_ipv6 is used
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
01:11 AM Revision c8a13f2b (ceph): qa: fix mdstable script for proper injectargs use.
This script is fairly primitive, but somebody might find it useful...
Signed-off-by: Greg Farnum <gregory.farnum@dre...
Greg Farnum
01:11 AM Revision 4ea8ad43 (ceph): qa: add a slightly more stressful anchortable test
This creates more than 8 links.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
12:38 AM Revision 3935551d (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
12:25 AM Revision 61c3a918 (ceph): rados: fix run-length option parsing for rados load-gen
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil

01/04/2012

11:57 PM Revision 857b243b (ceph): osdmap: include state names in dump()
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
11:57 PM Revision b8be3382 (ceph): mon: rev cluster protocol
The OSDMap NEW and AUTOOUT bit additions subtely change the decoding of
the incremental maps in a reasonably harmless...
Sage Weil
11:57 PM Revision 9986553c (ceph): msgr: explicitly specify internal cluster protocol
Replace case statement based on my_type.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
11:57 PM Revision 0fb88b6f (ceph): move cluster protocol definitions out of ceph_fs.h
Among other things, we don't recompile the whole system when we touch
these.
Signed-off-by: Sage Weil <sage@newdream...
Sage Weil
11:57 PM Revision 9510f9c9 (ceph): mon: track auto-marked-out osds
Mark OSDs that were automatically marked OUT by the monitor because they
were down for too long. Clear the bit as so...
Sage Weil
11:57 PM Revision fcb87701 (ceph): mon: independently control whether AUTOOUT OSDs are marked in on boot
Add separate config option to control whether the monitor will mark
AUTOOUT OSDs in on boot.
Signed-off-by: Sage Wei...
Sage Weil
11:57 PM Revision af535077 (ceph): mon: maintain CEPH_OSD_NEW bit for new, unused OSDs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
11:57 PM Revision 80d90109 (ceph): mon: separately control auto-mark-in of new OSDs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
11:56 PM Revision adea084a (ceph): osd: mark degraded only when < desired replica count
Having extra replicas is not 'degraded' per se. Although it's weird that
we ever do that!
Signed-off-by: Sage Weil ...
Sage Weil
11:56 PM Revision becb71b0 (ceph): osd: don't add all strays in calc_acting()
We weren't counting up usable strays, which meant we added all of them.
This could result in acting sets with more ac...
Sage Weil
11:56 PM Revision d3395335 (ceph): osd: fix backfill reset on activate
Look at peer's info, now our own!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
11:56 PM Revision 3166b373 (ceph): osd: avoid querying missing set from (full) backfill target
If we are doing a complete backfill, we don't care about missing; it will
clearly all be below last_backfill anwyay a...
Sage Weil
11:53 PM Revision 6a918feb (ceph): Merge pull request #8 from kylemarsh/master
Remove cloudfiles requirement from obsync. Sage Weil
10:21 PM Revision 8658f0d5 (ceph): qa: load-gen-mix-small-long
30 minutes
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:14 PM Revision 5cebc740 (ceph): obsync: make obsync run without cloudfiles installed
Cloudfiles probably shouldn't be a requirement for running obsync, so this
commit makes it optional.
Kyle Marsh
09:59 PM Revision 4bcdb37c (ceph): osd: do not use incomplete peer for best info/log
For one, their stats are incomplete; if we use them we'll screw up everyone
else. For another, it doesn't do us any ...
Sage Weil
09:59 PM Revision dcd1ebab (ceph): osd: initialize backfill_pos on activate
Handling of writes depends on backfill_pos being initialized (to know what
is between the leading and trailing edge o...
Sage Weil
09:59 PM Revision 6a1cac92 (ceph): osd: initialize backfill_target; include in PG operator<<
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:46 PM Revision a5d99add (ceph): osd: fix misdirect check for requests with old epochs
get_map() assumes the epoch passed is valid. Check here in the caller.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:44 PM Revision 172dd3e0 (ceph): osd: check that we're supposed to be getting a PG before waitlisting re...
This was broken in fa722de6708d3e92037df6289cc29ece12c8ea66.
Fix it by checking if the mapping was correct in the se...
Sage Weil
06:40 PM Revision 54f36f0b (ceph): rados: gracefully report errors from 'ls'
Catch the exception thrown by the iterator when the OSD returns errors.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:38 PM Revision ed9a4a09 (ceph): osd: return EINVAL on bad PGLS[_FILTER] handle
Fixes: #1875
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:25 PM CephFS Bug #1047: mds: crash on anchor table query
After looking into this a lot today, none of my ideas panned out. I also tried a few simple tests to reproduce locall... Greg Farnum
04:50 PM Bug #1858 (Resolved): OSD needs to check for misdirected ops before putting non-existent PGs on hold
Sage merged a modified form of this in commit: 172dd3e0e40ead65349904e604962eb0724bc230. Greg Farnum
04:35 PM Cleanup #1886 (Resolved): objecter/osd: mux/demux in MOSDOpReply encoding
- osd: return read results via OSDOp, not odata, in do_osd_ops()
- MOSDOpReply: mux/demux based on payload_len
- ob...
Sage Weil
03:37 PM Feature #1885 (Resolved): identify top 10 expected failures and process to diagnose
- peering failures
- unfound objects
Sage Weil
03:36 PM Feature #1884 (Resolved): plan encoding strategy to test+facilitate non-disruptive upgrades
- best practices for encoding structures, like
- single encoding version vs compat+incompat version value
- pr...
Sage Weil
03:35 PM Feature #1883 (Resolved): admin socket: string based protocol
switch admin socket from a u32 based binary protocol to a string based one. e.g., commands like "perfcounter_dump\n"... Sage Weil
03:33 PM rgw Feature #1882 (Resolved): rgw: high-level log entries for request state transitions
log request start and transition through high-level stages (start, authenticate, read, write, etc.)
probably just ...
Sage Weil
03:32 PM Feature #1881 (Resolved): objecter: expose in-progress request state via admin socket
an admin socket command that will dump current in-progress requests, similar to cat /sys/kernel/debug/ceph/*/osdc Sage Weil
03:31 PM Feature #1880 (Rejected): osd: optionally log all request latencies
start/stop+dump via admin socket?
need something like this to evaluate distirbution of latencies (e.g. 99 percenti...
Sage Weil
03:31 PM Feature #1879 (Resolved): osd: track list of in-progress requests, log slow ones
- add request to tail list when started
- remove when complete
- periodically scan start of list and log slow reque...
Sage Weil
11:32 AM Feature #1422: libvirt: rbd storage pool
I've done some work on this, see: http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/4846 Wido den Hollander
10:48 AM Bug #1875 (Resolved): osd: ReplicatedPG::do_op
commit:ed9a4a0996a9879bb2798a0771b263312f5d88fc Sage Weil
09:55 AM Bug #1875: osd: ReplicatedPG::do_op
The PGLS iterator handle format was recently changed, and this crashed while decoding it. My guess is an old binary ... Sage Weil
05:35 AM Bug #1875 (Resolved): osd: ReplicatedPG::do_op
I just noticed two OSD's (osd.11 and osd.20) go down in my cluster.
The backtrace of both OSD's:...
Wido den Hollander
10:47 AM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
Last night a test hit this. The MDS got stuck connected to an out-of-quorum mon, and never stopped being laggy. Josh Durgin
10:33 AM CephFS Bug #1874: Running `git gc` on a bare git repository hosted by ceph results in a bus error.
Hi,
Drat, I was hoping this would be a simple-to-reproduce case. Never mind, here are some more details:
Kerne...
David McBride
10:06 AM CephFS Bug #1874 (Need More Info): Running `git gc` on a bare git repository hosted by ceph results in a...
Sage Weil
10:06 AM CephFS Bug #1874: Running `git gc` on a bare git repository hosted by ceph results in a bus error.
Which version of the kernel client and server are you running?
Can you attach an strace -f of the 'git gc' run s...
Sage Weil
03:27 AM CephFS Bug #1874 (Can't reproduce): Running `git gc` on a bare git repository hosted by ceph results in ...
When @git gc@ is run on a bare git repository hosted by a local test ceph filesystem mounted via the kernel client, i... David McBride
10:19 AM CephFS Bug #1878 (Resolved): ceph.ko doesn't setattr (lchown, utimes) on symlinks
rsync -a /my/symlink/pool/ /mnt/ceph.ko/pool/ silently fails to set times and ownership of symlinks, whereas the same... Alexandre Oliva
10:02 AM CephFS Bug #1877 (Can't reproduce): ceph.ko (3.1.6) oopses upon cephfs set_layout of a symlink to a dir
I moved elsewhere a directory that had a layout policy set. Later on, next time the mds lost directory policy inform... Alexandre Oliva
09:57 AM Cleanup #1876 (Resolved): osd: EINVAL on all client command decoding errors
There are various other places in the osd (besides #1875) where we decode data that is provided as part of the user c... Sage Weil
01:28 AM Revision a97aca74 (ceph): rados.py: use uint64_t for auids
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
01:28 AM Revision 2f1720d0 (ceph): librados: return int64_t pool ids
468e28ee60ee2fe625d2680c792a4bcb9ef19951 missed the get_id() functions.
Signed-off-by: Josh Durgin <josh.durgin@drea...
Josh Durgin

01/03/2012

10:08 PM Revision 8e56e99c (ceph): radosgw-admin: add eol following info
Yehuda Sadeh
10:07 PM Revision ec3a3a96 (ceph): rados: fix example config
Josh Durgin
09:55 PM Revision 71d5bcbb (ceph): Adjust rados model workloads for new config format
Josh Durgin
09:10 PM Revision ec6530df (ceph): RadosModel: make object write ranges configurable
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
09:10 PM Revision f03e770f (ceph): RadosModel: allow TestOps to pass data to their finish methods
This will allow nested writes to keep track of which write actually
completed. Also remove finish() and _finish() fr...
Josh Durgin
09:10 PM Revision 91124051 (ceph): RadosModel: check for out of order replies within WriteOps
A single WriteOp already does multiple aio_writes. Each aio_write
gets a unique tid that is checked upon completion. ...
Josh Durgin
09:10 PM Revision 0e470c50 (ceph): testrados: replace testreadwrite and testsnaps with testrados
testrados can act as testreadwrite or testsnaps by changing the
command line options for the weight of each operation...
Josh Durgin
09:02 PM Revision 0176c9ab (ceph): Remove unused mon.0 variables.
Josh Durgin
09:02 PM Revision cdd5c456 (ceph): nuke-on-error: only unlock if this run locked the machines
Josh Durgin
09:02 PM Revision 2e9b1c75 (ceph): rados: use testrados instead of testsnaps and testreadwrite
Josh Durgin
07:37 PM Revision a66d90e3 (ceph): osd: add a monitor timeout via MPGStatsAck messages
Keep track of when we have outstanding updates, and while we do, make
sure the monitor responds within a timeout (def...
Greg Farnum
05:55 PM Feature #1863 (Resolved): qa: tester for osd op reply order
Josh Durgin
05:13 PM Bug #1873 (Won't Fix): crush_rule type is inconsistent
Here's a table of crush_rule's type in various places:... Josh Durgin
04:49 PM rgw Feature #1872: rgw: only use shadow objects for large objects
Once a race has been detected, operation needs to be restarted (unless we already have all requested data). Yehuda Sadeh
04:29 PM rgw Feature #1872: rgw: only use shadow objects for large objects
This will also require being careful to check both current and new sizes — the new object might be < chunk size while... Greg Farnum
04:23 PM rgw Feature #1872 (Resolved): rgw: only use shadow objects for large objects
Currently we use shadow objects for every write that overwrites an object. We can avoid that by assuming that objects... Yehuda Sadeh
04:07 PM Feature #1836: filejournal: use async directio to write to the journal
Presumably AIO writes can be combined or reordered by the block device/interfaces, right? So having a bunch of them i... Greg Farnum
09:27 AM Feature #1836: filejournal: use async directio to write to the journal
Say we have a cap of N aio ops, to prevent a stream of small ops resulting in a stream of tiny aio writes to the jour... Sage Weil
04:04 PM Revision f4b0cda1 (ceph): Fix invalid docdir_SCRIPTS usage with >=automake-1.11.2
Alphat-PC
04:04 PM Bug #1865 (Duplicate): mon: need to disconnect clients when we drop out of quorum
Adding active ping requirements to the monitors is contrary to the direction we want to take them with clients, thoug... Greg Farnum
04:02 PM Bug #1841 (Resolved): OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
Sage merged this into master. Greg Farnum
03:43 PM CephFS Bug #1047: mds: crash on anchor table query
This log is less illuminating than I'd hoped since it's just replay commits. :(
However, it did give me one idea t...
Greg Farnum
02:33 PM Linux kernel client Bug #1866 (Duplicate): null pointer dereference after osd went down
same as #1793 Sage Weil
01:10 PM Linux kernel client Feature #1870: usedcache mount option
maybe call it use_dcache? usedcache is easy to misread as 'used cache'. Josh Durgin
11:14 AM Linux kernel client Feature #1870 (Resolved): usedcache mount option
Sage Weil
12:16 PM rgw Bug #1867 (Resolved): rgw crash
should be fixed by commit:68ec8d8ee900642cdb594c67b7d2c416d55bec80. Yehuda Sadeh
11:46 AM Linux kernel client Bug #1871 (Resolved): ceph_client: crash after running xfstests 002
Running xfstests under UML against a ceph filesystem, I get a
client crash due to dereferencing a null pointer in ce...
Alex Elder
11:00 AM Linux kernel client Bug #1767 (Resolved): osd_client: send_request() cannot fail
commit:24e08cacc999503a069003364565116c923342b9 Sage Weil
10:53 AM Linux kernel client Bug #1762 (Resolved): i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
merged into mainline for 3.2 Sage Weil
10:47 AM Linux kernel client Bug #1652 (Resolved): rbd: rollback correctly after resizing
we removed rollback functionality from the kernel. Sage Weil
10:42 AM Linux kernel client Bug #1812 (Resolved): iput scheduling while atomic
Sage Weil
10:42 AM Linux kernel client Bug #1812: iput scheduling while atomic
fixed by commit:aab26905f6a1df8e6a14f41a32a93b9af0c8b7c5 Sage Weil
09:16 AM Bug #1868 (Resolved): Ceph client crashed after shutting down one mds and osd
This bug was fixed by commit:935b639a049053d0ccbcf7422f2f9cd221642f58 in v3.1.
You should have better luck with th...
Sage Weil
08:37 AM Bug #1869 (Resolved): automake fails with >=automake-1.11.2 due to "docdir" usage
applied, commit:f4b0cda17875c27d8b945be6cf5db9b356bb2dab
Thanks!
Sage Weil
07:40 AM Bug #1869 (Resolved): automake fails with >=automake-1.11.2 due to "docdir" usage
doc_SCRIPTS is no longer allowed in >=automake-1.11.2 [1], as a result ceph fails to configure:
configure.ac:40: A...
Anonymous
08:34 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
David McBride wrote:
> Unfortunately, my Ceph cluster which was presenting these symptoms has now died -- the remain...
Sage Weil
07:45 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Unfortunately, my Ceph cluster which was presenting these symptoms has now died -- the remaining set of OSDs didn't h... David McBride

01/02/2012

08:01 PM CephFS Bug #1682: mds: segfault in CInode::authority
This probably isn't all that useful for anyone who knows the code well, but I threw together a quick run down of plac... Mark Nelson
02:24 PM Bug #1868 (Resolved): Ceph client crashed after shutting down one mds and osd
Here is my cluster configuration before shutting down ceph components on n2cc (2.2.2.2).... Maciej Galkiewicz
05:47 AM CephFS Bug #1047: mds: crash on anchor table query
Here is what MDS logs with debug 20. No idea if it really helps. The cluster
is still in the broken state, should I...
Amon Ott

12/31/2011

11:09 PM Revision f8929bad (ceph): osd: trigger RecoveryFinished event on recovery completion
Unconditionally trigger the RecoveryFinished event when start_recvoery_ops
thinks it may be done. This lets us trigg...
Sage Weil
02:49 PM CephFS Bug #1774: client: files become inaccessible in large directories (with snapshots?)
Oh, interesting! With a debug client = 20, debug ms = 1 log from ceph-fuse this should be pretty straightforward to ... Sage Weil
01:04 AM Revision a1252463 (ceph): librados: take lock in rollback
We're poking through the osdmap; need to hold the lock here.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
01:04 AM Revision 4c23e9e4 (ceph): objecter: assert lock held in op_submit
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
01:04 AM Revision 68ec8d8e (ceph): librados: call aio_operate() with lock held
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:45 AM Revision 0692bed8 (ceph): cmp: fix 5-uple operator==
Doh!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
12:45 AM Revision 1c754187 (ceph): osd: be a bit more verbose during backfill
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil

12/30/2011

11:51 PM Revision 3dcaf6c3 (ceph): osd: do not backfill if any objects are missing on the primary
Someday we need to do something smarter so that a single unfound object
doesn't hold up replication of other objects....
Sage Weil
10:37 PM Revision cdf142b5 (ceph): rados: fix documentation format
Josh Durgin
10:37 PM Revision 6df4ce50 (ceph): rados: fix references to testrados
Josh Durgin
10:37 PM Revision 0af9c0a2 (ceph): rados: clean up argument construction
Only the client id varies, so it can be done outside the loop. Also
handle coredumps and coverage, and use LD_LIBRARY...
Josh Durgin
10:37 PM Revision 932257fb (ceph): rados: remove unused variable
Josh Durgin
10:37 PM Revision 2f71f03f (ceph): misc: simplify reconnect logic
Ignore all errors until the timeout expires so we don't have to worry
about whitelisting them.
Josh Durgin
10:31 PM Revision 949f24d5 (ceph): rgw: create default constructors for some structs
this will silence valgrind a bit Yehuda Sadeh
08:23 PM Revision 251fc3d5 (ceph): osd: handle backfill_target for pick_newest_available
The it may not be missing on the backfill_target if it is after the
last_backfill marker.
Signed-off-by: Sage Weil <...
Sage Weil
08:19 PM Revision a3525891 (ceph): osd: return EINVAL if multi op specified with no src object name
This avoids crashing later in do_osd_ops() with something like
osd/ReplicatedPG.cc: In function 'int ReplicatedPG::d...
Sage Weil
07:39 PM Revision e686c1b6 (ceph): hobject_t: fix operator==, !=
These weren't comparing key.
While we're at it, clean this up by using generic macros for writing
these operators, s...
Sage Weil
07:38 PM Revision 063ab2e4 (ceph): cmp.h: define macros for creating comparison operators
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
06:43 PM Revision 7969cc4f (ceph): Merge remote branch 'gh/master' into wip-backfill
Sage Weil
06:32 PM Revision 6687ccf5 (ceph): workunits: update rbd test for new error format
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
05:50 PM Revision 85719b0e (ceph): config: use autoconf $libdir for default rados class dir
Fixes: #1722
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:17 PM Revision 0d9507c2 (ceph): .gitignore: src/ocf/ceph
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:00 PM Revision 9b6422db (ceph): Spec: conditionally build ceph-resource-agents package
Put OCF resource agents in a separate subpackage,
to be enabled with a separate build conditional
(--with ocf).
Make...
Florian Haas
05:00 PM Revision 92cfad42 (ceph): Add OCF-compliant resource agent for Ceph daemons
Add a wrapper around the ceph init script that makes
MDS, OSD and MON configurable as Open Cluster Framework
(OCF) co...
Florian Haas
04:31 PM rgw Bug #1867: rgw crash
This is the same assert as #1737. It may not be related, tho.. the bug may just be unlocked concurrent access to the... Sage Weil
04:16 PM rgw Bug #1867 (Resolved): rgw crash
logs were turned off, so not much of that, but here's the backtrace. Happened on 7fc97e6 during a performance test (r... Yehuda Sadeh
04:06 PM Revision 66170633 (ceph): mon: fix full ratio updates
- update them independently
- only if we are leader
- fix type for nearfull_ratio
Signed-off-by: Sage Weil <sage@new...
Sage Weil
04:06 PM Revision f2e41097 (ceph): mon: don't ignore first full ratio update callback
We get a callack on startup. Don't ignore it.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
03:45 PM Revision a693438e (ceph): mon: only update full_ratio if we're the leader
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:42 PM Revision 47d02271 (ceph): Merge remote branch 'gh/wip-cleanup'
Sage Weil
03:28 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
looks like a null pointer dereference.. look for a struct member that's 0x48 bytes in? Sage Weil
02:59 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
This happened again with the same workload on sepia81. Josh Durgin
10:35 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
hit this again, nightly_coverage_2011-12-29-b/5388... Sage Weil
09:59 AM Bug #1722 (Resolved): osd_class_dir must reflect autoconf libdir
fixed by commit:85719b0ed81a38c3bd36c6be411f29181c969cda. duh Sage Weil
08:47 AM Bug #1865: mon: need to disconnect clients when we drop out of quorum
the ceph-mon is deadlocked by... Sage Weil
08:44 AM Bug #1865: mon: need to disconnect clients when we drop out of quorum
the kernel client is repeated reconnecting to a down osd and resendig its' requests because its osdmpa is out of date... Sage Weil
06:52 AM Bug #1862: filestore: EINVAL on replay
Hello Sage,
Int64_t do the trick and now the osd are back online again!
Thank you
Marco
Marco Aroldi
01:15 AM Revision df84594f (ceph): mon: make full ratio config change callback safe
We can't propose_pending() from any context; do this in the tick() thread,
with the proper locking. Among other thin...
Sage Weil

12/29/2011

11:43 PM Revision 585fb5ce (ceph): clitests: update for new error format
This was changed in 1f434da8a3ca4db830d1f3b0d87e5df941d85f2d
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
11:28 PM Revision cec2692e (ceph): clitests: update monmaptool test
e93961c11119942eae3a4cd14a79f779a5a4d277 changed output format.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
09:09 PM Revision f04e2955 (ceph): teuthology rgw-admin: annotated test cases for inventory
this is not a nose suite, so I simply added test case
descriptions in csv format, and put a file to extract
the...
Mark Kampe
08:00 PM Revision 48df71c8 (ceph): init script: be LSB compliant for exit code on status
An exit code of 1 on status is defined in LSB as
"program is dead, but pid file exists". Check for existence
of this ...
Florian Haas
07:58 PM Revision 3b2ca7cf (ceph): keyring: print more useful errors to log/err
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:57 PM Revision eba235f2 (ceph): common: trigger all observers on startup
Among other things, this makes err-to-stderr and friends initialize
properly in the DoutStreamBuf.
Signed-off-by: Sa...
Sage Weil
07:24 PM Revision 1f434da8 (ceph): common: make cpp_strerror output prettier
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:24 PM Revision 04c8db00 (ceph): librados: check for monclient::init() error
I think this fixes #1835.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:59 PM Revision 1a59405c (ceph): rgw: turn on cache by default
Yehuda Sadeh
05:59 PM Revision 37013b6f (ceph): qa: load-gen-mix-small.sh
Sage Weil
05:41 PM Revision 959fd71f (ceph): osd: explicitly track leading edge of backfill
backfill_pos is the leading edge; last_backfill is the trailing edge.
Anything inbetween is either pushed, doesn't ex...
Sage Weil
05:31 PM CephFS Bug #1682: mds: segfault in CInode::authority
Not sure if this is the same underlying problem, but here's another CInode::authority crash from teuthology:~teuthwor... Josh Durgin
05:09 PM Revision d24ea235 (ceph): mds: assert if we get an EINVAL on our truncate
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:00 PM Revision 47013c28 (ceph): osd: get fsid from monmap, not osdmap
We may not have a valid OSDMap in all of these cases (notably, during
boot). Always take the fsid from the monmap, w...
Sage Weil
04:59 PM Revision 05cc4eb9 (ceph): monc: get latest monmap during authentication
Tell the monitor which monmap version we have in our initial auth message.
Make the monitor send the latest monmap if...
Sage Weil
04:44 PM Revision 5d5c9b6f (ceph): osdmap: add const markers to some unfixed functions
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
04:44 PM Revision 300c7584 (ceph): osd: catch authenticate error on startup
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:30 PM Linux kernel client Bug #1866 (Duplicate): null pointer dereference after osd went down
This was during a kernel_untar_build workunit on rbd:... Josh Durgin
12:15 PM Bug #1835: Monclient crash when keyring is not readable
should be fixed by commit:04c8db001a4ed02ef7335ed01ce73ce9ab28dc9d .. can you verify, Wido? Sage Weil
11:16 AM Feature #1863 (In Progress): qa: tester for osd op reply order
Josh Durgin
11:14 AM Bug #1865 (Duplicate): mon: need to disconnect clients when we drop out of quorum
From sepia4:/tmp/cephtest/archive/log/osd.0.log:... Josh Durgin
10:59 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-29-a/5318/remote/ubuntu@sepia60.ceph.dream... Josh Durgin
10:33 AM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-28-b/5258/teuthology.log. Josh Durgin
10:04 AM Bug #1862: filestore: EINVAL on replay
Marco Aroldi wrote:
> Hmmm
> I have another problem: i've tried the patch in #1759 but I have a error at compile ti...
Sage Weil
10:01 AM Bug #1862: filestore: EINVAL on replay
Hmmm
I have another problem: i've tried the patch in #1759 but I have a error at compile time:
CXX libos_l...
Marco Aroldi
08:33 AM Bug #1862 (Duplicate): filestore: EINVAL on replay
Aha, this is actually #1759. If you apply the patch in that bug report it'll get your OSDs up and running again. Th... Sage Weil
03:33 AM Bug #1862: filestore: EINVAL on replay
Hi,
I've downloaded and compiled the latest code from the git repository.
I've issued a "ceph-osd -i 1 --debug_ms 2...
Marco Aroldi
09:53 AM Bug #1804: filestore: unexpected EINVAL
My money is that this is caused by #1759. Which hopefully means that the qa suite will eventually trigger the new as... Sage Weil
09:17 AM Bug #1741 (Can't reproduce): teuthology: failed to untar
Sage Weil
09:16 AM Bug #1759 (Need More Info): mds/client: truncate size overflow, fails with EINVAL
The OSD now returns EINVAL, the MDS asserts if it gets EINVAL, and we have some MDS-side assertions that should catch... Sage Weil
09:10 AM Bug #1846 (Resolved): Mds crash immediately after start (segmentation fault)
Great! Sage Weil
06:05 AM Bug #1846: Mds crash immediately after start (segmentation fault)
I have built debian package from master branch and upgraded ceph on both servers. Mds and osd started properly. Thank... Maciej Galkiewicz
09:08 AM Bug #1848 (Resolved): osd got zeroed out fsid
Sage Weil
09:08 AM Bug #1848: osd got zeroed out fsid
fixed by commit:47013c289e6ad6638b0f77152dafbc9f4723c032 and commit:05cc4eb93ce6d193c6aea4918144006fb4d1c187 Sage Weil
01:00 AM Revision e18b1c97 (ceph): rgw: removing swift user index when removing user
Yehuda Sadeh
12:50 AM Revision 997e35ae (ceph): rgw-admin: remove subuser index when required
Yehuda Sadeh
12:42 AM Revision 1f40031f (ceph): osd: fix push completion check
Only check backfill if we pushed to the backfill target. And avoid teh hash
lookup in the general case.
Signed-off-...
Sage Weil
12:34 AM Revision 2dc90d03 (ceph): rgw: clone operation should only update index for main category
Yehuda Sadeh
12:33 AM Revision bb52b187 (ceph): rgw: fix cache interface (was not overloading method)
Yehuda Sadeh

12/28/2011

11:10 PM Revision 0db9a423 (ceph): rgw: fix bucket creation
Yehuda Sadeh
06:43 PM Feature #1709 (Resolved): specfile: merge suse spec file changes
Sage Weil
06:42 PM Feature #1678 (Resolved): rados tool: ability to specify object locator
Sage Weil
06:42 PM Bug #1683 (Resolved): librados: list objects should also return locator key
Sage Weil
06:41 PM Bug #1508 (Can't reproduce): iozone stuck on kernel rbd mount
haven't seen this recently Sage Weil
05:06 PM Bug #1848: osd got zeroed out fsid
Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-28/5238/remote/ubuntu@sepia56.ceph.dreamho... Josh Durgin
05:05 PM rgw Bug #1854 (Resolved): Deletion of an rgw user that has a subuser with a swift key leaves behind a...
Fixed, commit:e18b1c9734e88e3b779ba2d70cdd54f8fb94743d. Yehuda Sadeh
03:26 PM Bug #1846: Mds crash immediately after start (segmentation fault)
Oh, I think Henry sent in a fix for this. Can you apply commit:bfbeae68c045de76ede86ca4f72d2a760a19c84b (or use late... Sage Weil
02:45 PM Bug #1846: Mds crash immediately after start (segmentation fault)
... Maciej Galkiewicz
08:12 AM Bug #1846: Mds crash immediately after start (segmentation fault)
I looked at the attached monmap and didn't see anything odd. This fully reproducible, I take it? That's good news.
...
Sage Weil
02:35 PM rgw Bug #1864 (Resolved): rgw: atomic bucket info
Yehuda Sadeh
01:46 PM Bug #1828 (Resolved): osd: preserve write order when ops wait for recovery of src_oids
Sage Weil
01:45 PM Feature #1863 (Resolved): qa: tester for osd op reply order
Out of order osd replies current trigger an ObjectCacher assert. Presumably there are lots more of them than the one... Sage Weil
09:14 AM Bug #1862: filestore: EINVAL on replay
Hi Sage,
I'm sorry but I don't understand the steps requested.
Please, could you explain a little bit more?
Marco Aroldi
08:19 AM Bug #1862 (Need More Info): filestore: EINVAL on replay
Can you try running the latest master code and restart ceph-osd? Specifically, commit:7133a2faf0ae0710b7cbd9801c6476... Sage Weil
07:25 AM Bug #1862 (Duplicate): filestore: EINVAL on replay
Hello,
I'm testing ceph 0.39 on two VM (Ubuntu 11.10) on Hyper-V with all Linux Integration Components installed.
2...
Marco Aroldi

12/27/2011

03:18 PM Bug #1846: Mds crash immediately after start (segmentation fault)
Do you have any suggestions how to temporary workaround this problem? Maciej Galkiewicz

12/23/2011

08:47 PM Revision 4ac04e89 (ceph): rgw: write bucket info in one operation
Yehuda Sadeh
05:56 PM Revision 60bbf688 (ceph): Objecter: fix local reads one more time.
Document it a little since we've gotten it wrong so often.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
04:59 PM Bug #1841: OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
Pushed a wip-osd-mon-communication branch that implements this. It's untested, though! Greg Farnum
02:50 PM RADOS Feature #1861 (New): qa: test OSD handling of misdirected operations
This will probably require some new facilities in a client in order to do automatically, but we had a regression (I t... Greg Farnum
02:48 PM Messengers Bug #1803 (New): msgr: behave better when ending TCP connections
Greg Farnum
02:47 PM Bug #1858: OSD needs to check for misdirected ops before putting non-existent PGs on hold
Pushed the branch osd-misdirected-checks. So far it's untested. Greg Farnum
07:56 AM Bug #1858: OSD needs to check for misdirected ops before putting non-existent PGs on hold
we should also include a read-from-replicas workload in the qa suite.. probably combined with osd thrashing. that ma... Sage Weil
10:56 AM rados-java Feature #1860 (New): qa: write tests for local reads and random replica reads
We need to test the client options that randomize the read load and attempt to read from local replicas. The local re... Greg Farnum
10:07 AM rgw Bug #1859 (Resolved): rgw: bucket creation is not atomic
the backing bucket object is being created in two opearions: create and write. We need to combine these into a single... Yehuda Sadeh
01:41 AM Revision eb37637f (ceph): Merge remote branch 'upstream/master' into wip-backfill
Conflicts:
src/include/object.h
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just
01:38 AM Revision 1f02e34c (ceph): ReplicatedPG: objects currently begin backfilled are degraded
pending_stat_updates has also been renamed to pending_backfill_updates.
Signed-off-by: Samuel Just <samuel.just@drea...
Samuel Just
01:24 AM Revision d2eb119a (ceph): ReplicatedPG: fill in backfill_peer in on_activate
Previously, there was a race between issue_repop/do_op and
start_recovery_ops.
Signed-off-by: Samuel Just <samuel.ju...
Samuel Just
01:17 AM Revision 517ddf84 (ceph): ReplicatedPG: only pull in one backfill peer at a time
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just

12/22/2011

11:25 PM Revision 855e93b6 (ceph): filestore: fix config observer
Actually, I don't think this was fully implemented to begin with, so it's
not a 'fix' per se. This will let you use ...
Sage Weil
11:18 PM Revision decdc363 (ceph): MOSDPGRepScrub: Fix typo in MOSDPGRepScrub
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:33 PM Revision db04d680 (ceph): ReplicatedPG: update last_backfill when pushes complete
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:33 PM Revision 3b90df0d (ceph): ReplicatedPG: init backfill infos to last_backfill
We can scan starting from last_backfill to avoid rescanning portions
of the collection recovered by normal recovery. ...
Samuel Just
10:33 PM Revision 298b1349 (ceph): PG: backfill info should be cleared on recovery reset
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:33 PM Revision 8b8aab84 (ceph): PG: update stats from master only if not backfilling
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:33 PM Revision fa6bd38a (ceph): ReplicatedPG: simplify recover_backfill
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:00 PM Revision 9fc060c0 (ceph): Merge branch 'wip-signal'
Sage Weil
09:48 PM Bug #1858 (Resolved): OSD needs to check for misdirected ops before putting non-existent PGs on hold
Right now it doesn't. If it had, then diagnosing Noah's local reads problem would have been much, much faster. :( Greg Farnum
08:33 PM Revision c7fee72d (ceph): MOSDRepScrub: use header.version for payload version
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
08:16 PM Revision 3cb53cc9 (ceph): Merge branch 'stable'
Sage Weil
08:15 PM Revision e93961c1 (ceph): monmap: iterate over addr_name when printing summary
The rank is now ordered by IP address. We should iterate over
addr_name.
Signed-off-by: Henry C Chang <henry.cy.chan...
Henry Chang
08:15 PM Revision bfbeae68 (ceph): monmap: clear addr_name map on calculating ranks
We should clear addr_name before filling it. Otherwise, the removed
mon will stay there and cause incorrect rank assi...
Henry Chang
08:15 PM Revision ea9f2f62 (ceph): interval_set: fix truncation of _size
_size is type of int64_t. Use int to store the value of _size
will cause value truncation.
Signed-off-by: Henry C Ch...
Henry Chang
04:28 PM CephFS Bug #1047: mds: crash on anchor table query
Unfortunately there's not enough info in this log either. We're going to need a log with (at minimum) level 10 mds de... Greg Farnum
04:27 PM Bug #1850 (Duplicate): mds sometimes crashes removing trees with plenty of hardlinks
I'm pretty sure you're looking at #1047 here. :) Greg Farnum
12:45 PM Fix #1857 (Resolved): osd: reimplement shutdown()
on sigterm, go through OSD::shutdown() and try to clean things up in an orderly fashion. This will be useful for lea... Sage Weil
11:02 AM rgw Bug #1856 (Resolved): It is possible to look up an rgw user by a subuser that does not exist as l...
Matthew Wodrich
11:00 AM rgw Bug #1855 (Resolved): Creation of a subuser that appears to own an s3 key is possible, and removi...
Matthew Wodrich
10:58 AM rgw Bug #1854 (Resolved): Deletion of an rgw user that has a subuser with a swift key leaves behind a...
Creating an rgw user with a subuser and swift key and then deleting the user appears to orphan the object for that su... Matthew Wodrich

12/21/2011

10:21 PM Revision 9eee1ecb (ceph): osd: remove SIGTERM cruft
The default handler will exit(0). The got_sigterm stuff was dead code.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:17 PM Revision e04109a3 (ceph): mon: drop special SIGTERM handler
Default does exit(0).
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:17 PM Revision 2daa655f (ceph): mds: drop special SIGTERM handler
Default does exit(0).
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:17 PM Revision d1dbeaf5 (ceph): exit(0) on SIGTERM by default
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:20 PM Revision 07e11862 (ceph): ReplicatedPG: Initialize blocked_by in ObjectContext constructor
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
03:07 PM rgw Bug #1853 (Resolved): rgw: qa test to verify bucket recreation does not override bucket
Yehuda Sadeh
09:11 AM RADOS Feature #1852 (New): librados: don't do memory copies for the C interface
The current implementation of the librados C interface (well, the one I'm working on now) uses in-memory copies for a... Greg Farnum
09:10 AM Messengers Feature #1851 (Rejected): SimpleMessenger: use non-blocking io
This will allow some great stuff, like doing a revoke on sending messages. And decoupling threads from sockets. Etc.
...
Greg Farnum
04:11 AM Revision dcedda84 (ceph): Merge pull request #7 from kylemarsh/wip-obsync-swift-metadata
obsync: pull object metadata from swift store Sage Weil
02:58 AM Bug #1850 (Duplicate): mds sometimes crashes removing trees with plenty of hardlinks
rsync -aH /usr/share/zoneinfo/ /mnt/ceph-fuse/subdir/ (the H and a hardlink-plentiful /usr/share/zoneinfo are essenti... Alexandre Oliva
02:42 AM Bug #1849 (Duplicate): directories' timestamps in snapshots sometimes change when directory is mo...
When trying to load a series of backups from directories trees large and small, I noticed one particularly undesirabl... Alexandre Oliva
01:35 AM Revision 78030bc7 (ceph): ReplicatedPG: take references for blocked_by and blocking
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
01:08 AM Revision a85ab1ea (ceph): obsync: pull object metadata from swift store
Obsync wasn't pulling object metadata from swift stores and thus wasn't
syncing metadata when reading from a swift st...
Kyle Marsh
01:05 AM Revision fd3231c6 (ceph): Merge remote branch 'upstream/wip-backfill-ordering' into wip-backfill
Samuel Just
12:52 AM Revision 7eb28730 (ceph): PG: add some documentation
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
12:50 AM Revision ffd1b437 (ceph): ReplicatedPG: delay op while snapdir is missing/degraded
We cannot get/update a snapcontext if snapdir is missing/degraded.
Signed-off-by: Samuel Just <samuel.just@dreamhost...
Samuel Just

12/20/2011

11:28 PM Revision 45b9659f (ceph): ReplicatedPG: don't manage waiting_on_backfill in start/finish_recovery_op
Set waiting_on_backfill in recover_backfill and clear in do_scan.
Signed-off-by: Samuel Just <samuel.just@dreamhost....
Samuel Just
07:39 PM Revision 3bea1ed4 (ceph): rgw: fix subuser key name when purging subuser keys
Yehuda Sadeh
07:00 PM Revision 9ddb802c (ceph): radosgw-admin: add --purge-keys option
Yehuda Sadeh
06:53 PM Revision 5e9d1019 (ceph): ReplicatedPG: apply_repop: apply local_t before op_t
We create snap_collections in local_t and clone into them in op_t.
Signed-off-by: Samuel Just <samuel.just@dreamhost...
Samuel Just
04:25 PM CephFS Bug #1737: ceph-fuse crash in xlist::remove
Another occurence today in teuthology:~teuthworker/archive/nightly_coverage_2011-12-20-b/4585/teuthology.log Josh Durgin
11:05 AM rgw Bug #1801 (Resolved): rgw: radosgw-admin remove subuser and related swift key in a single command
done, commit:9ddb802c72fc805ce400f9bf5cceffb88b0f3d47
radosgw-admin subuser rm --subuser=<name> --purge-keys
Yehuda Sadeh
10:09 AM Bug #1848 (Resolved): osd got zeroed out fsid
From teuthology:~teuthworker/archive/nightly_coverage_2011-12-20-a/4579/remote/ubuntu@sepia51.ceph.dreamhost.com/log/... Josh Durgin
10:00 AM rgw Feature #1847 (Resolved): rgw: revisit the way we store large objects
We should probably keep large objects in chunks, and not coalesce into a single large object. Chunks shouldn't use th... Yehuda Sadeh
08:38 AM Bug #1846: Mds crash immediately after start (segmentation fault)
I got monmap from machine with working mds cause the other one does not have admin key. I hope that this is not a pro... Maciej Galkiewicz
08:34 AM Bug #1846: Mds crash immediately after start (segmentation fault)
can you 'ceph mon getmap -o /tmp/monmap' and attach that file to this bug? Sage Weil
03:25 AM Bug #1846: Mds crash immediately after start (segmentation fault)
In the same way crashes osd on this machine. Maciej Galkiewicz
03:23 AM Bug #1846 (Resolved): Mds crash immediately after start (segmentation fault)
I have two mds' in my configuration. One of them works fine and the other crashes immediately after reboot:
@2011-...
Maciej Galkiewicz
02:03 AM Revision 97dd28c0 (ceph): librados: return -EROFS when trying to write to a snapshot
operate_read doesn't need this check because it does not write.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
02:00 AM Revision 68ba1862 (ceph): librados: make getxattrs ENOMEM return negative
This is more consistent with the rest of librados.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
12:26 AM Revision a798a85f (ceph): PG: Do not update_snap_collections for log entries > last_backfill
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
12:26 AM Revision 2401176b (ceph): PG: Fix stat debug output
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
12:23 AM Revision 1362d3e1 (ceph): calc_acting: Prefer up[0] as primary if possible
Previously, we could get into a state where although up[0] has been
fully backfilled, acting[0] could be selected as ...
Samuel Just
12:01 AM Revision 01f3f6a6 (ceph): rgw: add timeout to init path
Yehuda Sadeh

12/19/2011

10:57 PM Revision cc22f154 (ceph): MOSDRepScrub,ReplicatedPG: Add scrub_to to MOSDRepScrub
When scrub_from is set, also set scrub_to to the primary's
last_update_applied (which will also be the official last_...
Samuel Just
10:02 PM Revision 720bab94 (ceph): osd: EINVAL on truncate to huge object size
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:02 PM Revision ed780fdd (ceph): mds: misc assertions about truncation
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:02 PM Revision 2710bd85 (ceph): mon: update man page to document --mkfs stuff
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:00 PM Revision 29e6d6c8 (ceph): Merge pull request #6 from kylemarsh/wip-obsync-swift
Wip obsync swift Sage Weil
09:57 PM Revision 33cb2796 (ceph): rgw: remove temp context in prepare_get_obj
Yehuda Sadeh
09:57 PM Revision 5e739335 (ceph): rgw: fix xml parser internal structure leak
Yehuda Sadeh
09:57 PM Revision a72348ea (ceph): rgw: fix a leak of acl structure (in req_state)
Yehuda Sadeh
09:54 PM Revision 002eb581 (ceph): rgw: remove temp context in prepare_get_obj
Yehuda Sadeh
09:38 PM Revision 27da89f4 (ceph): rgw: fix xml parser internal structure leak
Yehuda Sadeh
09:38 PM Revision 3a8af0f7 (ceph): rgw: fix a leak of acl structure (in req_state)
Yehuda Sadeh
09:25 PM Revision 42980922 (ceph): Merge branch 'wip-osd-maybe-created'
Greg Farnum
09:24 PM Revision 98a4809a (ceph): Merge branch 'wip-osd-fsid'
Sage Weil
09:24 PM Revision 3af5fff5 (ceph): doc: fix typo
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:15 PM Revision dc977901 (ceph): osd: --get-journal-fsid
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:13 PM Revision c8c5e5d6 (ceph): filestore: make fsid uuid_d instead of uint64_t
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:13 PM Revision ae8fbb88 (ceph): filejournal: uuid for fsid
Decode old header struct, but encode new class using more normal encoding
style. Embed in a bufferlist for later ext...
Sage Weil
04:12 PM Revision dcceb8e8 (ceph): osd: include osd_fsid in OSDSuperblock
Generated during mkfs.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:12 PM Revision a5822095 (ceph): osd: store osd_fsid as text in osd_data dir
along with ceph_fsid (the cluster fsid) and a few other things.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
04:12 PM Revision c59eb8ca (ceph): osd: --get-osd-fsid and --get-cluster-fsid
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:12 PM Revision 237b19cd (ceph): osd: rename OSDSuperblock::fsid -> cluster_fsid
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:04 PM Revision cd909aca (ceph): doc: fix mon cluster expansion docs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
04:03 PM Revision f2a95990 (ceph): mon: pull addr from ceph.conf, mon_host as needed when joining mon cluster
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:57 PM Revision d9593342 (ceph): mon: fix setting of mon addr when joining a cluster
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
02:02 PM rgw Bug #1844 (Resolved): radosgw memory leak
Fixed a few leaks, as of commit:33cb27961e1b20f188d2a83a764ae3f2fabeb141. Current run with massif looks flat. Yehuda Sadeh
11:15 AM rgw Bug #1844 (Resolved): radosgw memory leak
apparently radosgw is leaking. Yehuda Sadeh
01:38 PM Bug #1825 (Resolved): osd loses object deletes by some creates in the same transaction
Merged to master in commit:42980922f253ed29718bfac64e17c85cdf9805a6. Still haven't written tests but I have a persona... Greg Farnum
01:22 PM rgw Feature #1838 (Resolved): rgw: update man page
Sage Weil
01:21 PM Bug #1845 (Rejected): "recovery_ops" performance counter isn't decreased
Right, it should never decrement. Closing! Sage Weil
11:51 AM Bug #1845: "recovery_ops" performance counter isn't decreased
I'm using munin, wrote plugin myself and I don't divide this value by anything. If it shouldn't decrement please clos... Szymon Szypulski
11:47 AM Bug #1845: "recovery_ops" performance counter isn't decreased
The counters are counting events and never decrement. Normally collectd will divide the change by time to give you s... Sage Weil
11:15 AM Bug #1845 (Rejected): "recovery_ops" performance counter isn't decreased
I'm generating osd statistics based on performance sockets like described here - http://ceph.newdream.net/wiki/Perfom... Szymon Szypulski
11:07 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
For the life of me I cannot seem to get useful symbols out of this, though I'm not sure why. I've been using LD_LIBRA... Greg Farnum
10:49 AM Bug #1688: Benjamin: pg stuck in scrub
Still happening, I'm looking into an instance on benjamin now. Samuel Just
10:07 AM Bug #1530: osd crash during build_inc_scrub_map
This was the only failure in the run last night. Core at teuthology:~teuthworker/archive/nightly_coverage_2011-12-19-... Josh Durgin
10:00 AM Bug #1490 (New): cfuse assert failure: assert(ob->last_commit_tid < tid)
Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-15-b/4357/remote/ubuntu@sepia63.ceph.dream... Josh Durgin
08:09 AM Bug #1839 (Resolved): osd: assert in send_incremental_map_msg
this was the hobject_t::max initialization patch that wasn't in master. Sage Weil
08:08 AM Documentation #1840 (Resolved): doc: fix mon addition stpes
Sage Weil

12/17/2011

03:34 PM Feature #1655 (Resolved): gitbuilder aggregator page
http://ceph.newdream.net/gitbuilder.cgi Sage Weil
06:21 AM Revision 37e7a521 (ceph): rgw: fix updating of object metadata
being used in swift POST. We were updating wrong object
size and etag
Yehuda Sadeh
06:21 AM Revision 44b4e029 (ceph): rgw: bucket cannot be recreated if already exists
Yehuda Sadeh
06:15 AM Revision e5f49104 (ceph): man: Update the configuration example for radosgw
Signed-off-by: Wido den Hollander <wido@widodh.nl> Wido den Hollander
06:15 AM Revision 83cf1b62 (ceph): man: It is capital -C instead of -c when for creating a new keyring
Signed-off-by: Wido den Hollander <wido@widodh.nl> Wido den Hollander
06:04 AM Revision 3e323e6a (ceph): rgw: fix updating of object metadata
being used in swift POST. We were updating wrong object
size and etag
Yehuda Sadeh
02:09 AM Revision d0e90d71 (ceph): syslog checking: forgot a pipe
Josh Durgin
01:14 AM Revision 08f968f8 (ceph): rgw: bucket cannot be recreated if already exists
Yehuda Sadeh
12:07 AM Revision f54f4aa0 (ceph): obsync: add authurl to CLI
s3 connections require the hostname and swift connections require the
authurl. obsync treats these as equivalent int...
Kyle Marsh

12/16/2011

10:42 PM Revision bfbde5b1 (ceph): object.h: initialize max in hobject_t(sobject_t) constructor
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:09 PM rgw Bug #1830 (Resolved): RGW Swift Metadata Bug
Fixed, commit:3e323e6adbf87d794be39fd4f75c6626e8968ce1. Yehuda Sadeh
05:36 PM rgw Bug #1830: RGW Swift Metadata Bug
Ok, was able to reproduce it. Problem is in the swift specific update metadata operation. Fix should be pretty easy. Yehuda Sadeh
08:41 PM Revision 061e7619 (ceph): ReplicatedPG: fix handle_watch_timeout ctx->at_version
ctx->at_version should match the head of the new log entries
during issue_repop. This could cause the scrub hang bug...
Samuel Just
07:43 PM Revision 5274e88d (ceph): ReplicatedPG: add asserts to catch scrub error
If last_update_applied skipped over last_update, we would see
scrub hang.
Signed-off-by: Samuel Just <samuel.just@dr...
Samuel Just
06:39 PM Revision 3f3913c9 (ceph): doc: fix filename in mon addition process
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:28 PM rgw Bug #1843 (Resolved): rgw: recreation of bucket overrides old one
Fixed with commit 08f968f8cd74a2e782257eea91a97b52598ef6f1. Yehuda Sadeh
05:08 PM rgw Bug #1843 (Resolved): rgw: recreation of bucket overrides old one
Instead of returning success and not doing anything, we actually create a new bucket and override the old one. This i... Yehuda Sadeh
05:19 PM Revision 7d81a3b5 (ceph): filejournal: preallocate journal bytes on create
This should reduce fragmentation for large journals that are written
slowly the first time around.
Signed-off-by: Sa...
Sage Weil
05:08 PM Revision 92cb2a20 (ceph): Merge pull request #5 from homac/master
Minor fix for init files and cleaned up spec file. Please pull Sage Weil
03:20 PM Bug #1841 (In Progress): OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
Yep; it is easy enough to add a check in tick based on how long it's been since we sent a PGStat without getting an a... Greg Farnum
01:28 PM Bug #1841: OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
My memory is a bit fuzzy, but I think they're waiting on acks for the MOSDPGStat messages they're sending.. checking ... Sage Weil
11:23 AM Bug #1841 (Resolved): OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
Right now OSDs don't notice their monitor connection has dropped until after the (by default) 15 minute TCP connectio... Greg Farnum
03:18 PM Bug #1842 (Can't reproduce): osd: failed authorizations leak memory somehow?
I've got a log from todin showing lots of "fault initiating reconnect" that I suspect are on failed auths. Log in kai... Greg Farnum
01:03 PM rgw Feature #1838: rgw: update man page
I just submitted a patch on the ml with an updated version of the manpage. This works in my setup. Wido den Hollander
10:10 AM rgw Feature #1838 (Resolved): rgw: update man page
use current alexandria as a model, probably. minus the now-unneeded setenv stuff.
Sage Weil
10:51 AM Documentation #1840 (Resolved): doc: fix mon addition stpes
--public-addr
ameks ure port is correct, too
Sage Weil
10:37 AM Bug #1839 (Resolved): osd: assert in send_incremental_map_msg
... Sage Weil
09:59 AM Linux kernel client Feature #1837 (New): krbd: freeze filesystem on snapshot
The block device can ask for an fs freeze (dm currently does this). We can do this with rbd when we see that the rbd... Sage Weil
09:22 AM Feature #1836 (Resolved): filejournal: use async directio to write to the journal
Currently we're doing a sync direct io write, which means we pay a full rotation between each io. Sage Weil
08:44 AM Bug #1833 (Resolved): mon: failed decode in LogMonitor::update_from_paxos
Yeah, this is one of the things I hit (and fixed) in a few different ways when doing the mon thrashing on the new code. Sage Weil
06:33 AM Bug #1835: Monclient crash when keyring is not readable
Btw, I know I can use the build-in 'secret' functions of libvirt, but I didn't modify my XML's yet. Wido den Hollander
06:32 AM Bug #1835 (Resolved): Monclient crash when keyring is not readable
I had some issues with my Qemu-RBD VM's to get them online, I saw Qemu segfault and started tracing this back with GD... Wido den Hollander
05:18 AM Bug #1834 (Closed): 'High' memory usage of monitors
Actually, I seem to be wrong here. My other monitor running on a 4GB box is using about 240MB of memory, I did a smal... Wido den Hollander
04:43 AM Bug #1834 (Closed): 'High' memory usage of monitors
I'm still hunting this one, but I'm seeing high memory usage of my monitors (three in total).
My monitor configura...
Wido den Hollander
04:34 AM phprados Feature #424: Stream wrappers
It took some time to find docs about this, but I'm currently on track. Wido den Hollander

12/15/2011

10:03 PM Revision 739fd9fe (ceph): man: clarify mount.ceph auth options
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
09:49 PM Revision e5a5ae12 (ceph): man: update rule definition for ceph-rbdnamer
This is the rule we install since 891025e539a92b5d75011e2e75c475fc0c272042.
Signed-off-by: Josh Durgin <josh.durgin@...
Josh Durgin
09:43 PM Revision 4eb83654 (ceph): authx -> cephx everywhere it's used
The term authx was in the mount.ceph man page, and got accidentally
copied into rbd help.
Signed-off-by: Josh Durgin...
Josh Durgin
09:24 PM Revision 7eec3094 (ceph): rountrip: add task
Yehuda Sadeh
09:15 PM Revision 41f64be0 (ceph): ReplicatedPG: calc_clone_subsets fix other clone_overlap case
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:15 PM Revision b5c32590 (ceph): ReplicatedPG: fix backfill mismatch error output
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:15 PM Revision 5b41c470 (ceph): OSD: use disk_tp.pause() without osd_lock
Previously, we called disk_tp.pause_new(). This can cause a race
where snap_trimmer queues more transactions after w...
Samuel Just
08:39 PM Revision 97cc6c29 (ceph): readwrite: fix task with default conf
Yehuda Sadeh
04:51 PM Revision ec776f4b (ceph): ceph.spec: Clean up and fix spec file and build for a couple of distrib...
Clean up and fix the spec file. This includes cleaning up of build and
installed system dependencies, LSB compliance ...
Holger Macht
04:49 PM Revision 0e0583f8 (ceph): init-ceph/init-radosgw: Don't use unspecified runlevel 4
Don't use runlevel 4 in init scripts. AFAIK, no distribution is using it
and at least the Open Build Service complain...
Holger Macht
02:32 PM Bug #1833 (Resolved): mon: failed decode in LogMonitor::update_from_paxos
Saw this on benjamin today. It was during catchup; mon.beta had been out for a day or more and was catching up. Perha... Greg Farnum
03:08 AM Revision 0c547046 (ceph): osd: preserve write order when waiting on src_oids
We need to preserve the order of write operations on each object. If we
have a write on X that needs to read from Y,...
Sage Weil
03:08 AM Revision ca2e8e5a (ceph): osd: EINVAL on mismatched locator without waiting for degraded
No reason to recover before returning an error.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
03:08 AM Revision 7a7aab25 (ceph): osd: wait for src_oid if it on other side of last_backfill from oid
If the target object is before last_backfill, then the backfill_target
will be asked to apply the operation. If one ...
Sage Weil
01:43 AM Revision da286059 (ceph): client: fix logger deregistration
Only unregister logger if it is non-NULL (and thus registered) to avoid
running afoul of the cct assertions.
Signed-...
Sage Weil
01:14 AM Revision 659e66aa (ceph): readwrite: fix conf, task runs
Yehuda Sadeh
12:12 AM Revision 7d085ad9 (ceph): readwrite: add readwrite task
still not really running, but at least getting configured Yehuda Sadeh

12/14/2011

11:51 PM Revision 62c830f0 (ceph): ReplicatedPG: add_object_context_to_pg_stat, obc->ssc may be null
obc->ssc is not necessarily filled in by get_object_context.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just
11:37 PM Revision 5a400935 (ceph): obsync: add vvprint back in
Commit ebe5fc60d20f92a0037c53c1e7bd7ae512be3da4 removed the definition of
vvprint without removint all the places tha...
Kyle Marsh
11:19 PM Revision cda5f0d3 (ceph): PG: clear waiting_on_backfill during clear_recovery_state
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
11:17 PM Revision d32fd8c5 (ceph): ReplicatedPG: list snapid 0 on collection_list_partial for backfill
0 will list all objects, CEPH_NO_SNAP will list only head objects.
Signed-off-by: Samuel Just <samuel.just@dreamhost...
Samuel Just
10:10 PM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
There's two things here, the second being the monitor changes you're focusing on. I need to investigate further why t... Greg Farnum
07:03 PM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
I think there are two parts here:
- the mon shouldn't let sessions start if it is not in the quorum. that may ac...
Sage Weil
03:39 PM Bug #1831 (Resolved): mon: should not accept (and should disconnect) session when not in quorum
This happened on Benjamin. The OSDs ought to be failing the connection and going to a new monitor, but they failed to... Greg Farnum
07:40 PM Revision d9d05117 (ceph): Merge remote branch 'upstream/master' into wip_backfill_merged
Samuel Just
07:39 PM Revision 07b3ba81 (ceph): ReplicatedPG: collection_list_partial also takes a snapid
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:38 PM Revision 1430c8ab (ceph): doc: Make overview.rst valid reStructuredText, so I can stop seeing war...
It's still wrong, but now it won't clutter the output.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen
07:33 PM Revision 53f7323c (ceph): doc: reStructuredText syntax fix.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:33 PM Revision c1190740 (ceph): pybind: Add a description to docstring.
This avoids a Sphinx warning like this:
.../src/pybind/rbd.py:docstring of rbd.RBD.version:2: WARNING: Field list en...
Tommi Virtanen
07:32 PM Revision 9d633a4f (ceph): PG: A backfill osd can have last_complete < log_tail
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:32 PM Revision 51deeef6 (ceph): ReplicatedPG: calc_*_subsets must consider last_backfill
Objects yet to be backfilled do not show up in the missing set. Thus,
we cannot use an object past last_backfill to ...
Samuel Just
07:32 PM Revision 7832e17e (ceph): PG: activate, backfill replica can have last_complete < log_tail
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:32 PM Revision b9eea709 (ceph): osd: object_stat_sum_t::clear()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision 940a55e0 (ceph): osd: track backfill target pg stats
Maintain backfill target pg stats to be the summation over objects to
the left of last_backfill. Reflect this in the...
Sage Weil
07:32 PM Revision 7213c457 (ceph): PG: Ask for digest at most once at a time
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:32 PM Revision 9bb77b49 (ceph): osd: observe last_backfill in merge_log() and helpers
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision e1006d76 (ceph): osd: more backfill changes
Always ship log for updates to backfill targets to preserve the repgather
ordering.
Fix up recover_backfill() bounds...
Sage Weil
07:32 PM Revision af7536d0 (ceph): hobject_t: fix hobject(sobject_t) constructor
Initialize max
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:32 PM Revision cd0c8fb3 (ceph): osd: add incomplete, backfill states; simplify calculation
Set/clear states in peering state machine state ctor/dtors where possible.
Set degraded if the number of non-backfil...
Sage Weil
07:32 PM Revision f83a787e (ceph): osd: some recover_backfill() comments
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision f1caaa37 (ceph): osd: fix calc_acting()
Look at usable, not want.size(), so we don't count backfill targets.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:32 PM Revision 57baf9ef (ceph): osd: fix signed/unsigned comp
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision 71893b0e (ceph): osd: remove bad !is_incomplete() assert
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision 999846f7 (ceph): PG: fix phantom entry in peer_info
In GetLog, do not call pg->peer_info[newest_update_osd] if
newest_update_osd is osd->whoami.
Signed-off-by: Samuel J...
Samuel Just
07:32 PM Revision f483df15 (ceph): PG: there may now be backfill entries in the acting set
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:32 PM Revision f1ae9ed5 (ceph): objectstore: make list by hash *next > instead of >=
This means we should set it to a hash boundary or the last item of our
result set (not the next item we didn't includ...
Sage Weil
07:31 PM Revision f7a0b9c5 (ceph): hobject_t: fix sorting by hash key
Use get_effective_key() to return key (if explicit) or object name. Sort
by that within each hash value.
Clean up o...
Sage Weil
07:31 PM Revision 9288f0e0 (ceph): osd: advance last_backfill by keys only
This ensures that transactions are never split by last_backfill.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 88ee86d0 (ceph): osd: keep backfill targets in acting set
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision b99e1358 (ceph): osd: make backfill (basically) work again
Still need to handle concurrent updates, log recovery vs backfill, etc.
Signed-off-by: Sage Weil <sage.weil@dreamhos...
Sage Weil
07:31 PM Revision de19a6bb (ceph): Revert "osd: don't keep push state on replicas"
This reverts commit 69c77e33f8530993dbc280525bd21218ea6f9ddb.
sub_op_pull() calls send_push_op directly, does not pa...
Sage Weil
07:31 PM Revision baa21c9b (ceph): osd: implement PG::copy_range()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision c03c49ca (ceph): osd: initialize repop gather set in issue_repop instead of new_repop
Simpler. It will also make the last_backfill correction live in one
place.
Signed-off-by: Sage Weil <sage.weil@drea...
Sage Weil
07:31 PM Revision 5b558dc4 (ceph): osd: strip out some backlog logic
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 82a23dbe (ceph): osd: strip backlog case out of merge_log
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 3f5ced69 (ceph): osd: kill backlog_requested
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 6d299552 (ceph): osd: strip backlog logic out of PG::activate()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision e7514f75 (ceph): osd: state machine whitespace
I feel better now
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 257b85d8 (ceph): osd: remove log_backlog from PG::Info
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 7521c51a (ceph): osd: remove backlog case from clean_up_local
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 9ceecc89 (ceph): osd: kill PG::Info::backlog
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision d7f7bbdc (ceph): osd: remove recovery-from-backlog kludge last_update
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 722ec7e5 (ceph): osd: kill unused PG_STATE_SCANNING
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision d84a9f6f (ceph): osd: cleanup
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 693950bf (ceph): osd: cleanup lingering backlog refs
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision e63c595a (ceph): osd: kill unused PG::Log::copy_after_unless_divergent
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision b5de19b5 (ceph): osd: kill unused PG::trim_write_ahead
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 0e7f4aff (ceph): osd: pg whitespace
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 400c27da (ceph): osd: track backfill with last_backfill, not interval_set<>
We always fill from the bottom up anyway. Using an hobject_t also gives us
a precise bound. It also makes things co...
Sage Weil
07:31 PM Revision 91ee3375 (ceph): osd: osd_kill_backfill_at
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 99c614fa (ceph): osd: don't keep push state on replicas
Primaries need this, but replicas don't: the primary will explicitly pull
the pieces of the object that it wants.
Si...
Sage Weil
07:31 PM Revision 2cdc6b4e (ceph): osd: rewrite choose_acting process
Consolidate callers, eliminate obsolete backlog ones.
New process:
- pick best log, with preferences for those that...
Sage Weil
07:31 PM Revision 9e51c639 (ceph): osd: MOSDPGScan
Message to query hash ranges of a PG.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:31 PM Revision 8f14a358 (ceph): osd: add PG::BackfillInterval type
Describe a range of objects for the purposes of backfilling a PG.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 55c24813 (ceph): osd: implement ReplicatedPG::_lookup_object_context
Look up an existing ObjectContext without taking a reference.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 92d290d6 (ceph): osd: implement ReplicatedPG::scan_range
Scan a range of the local collection.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 17b5d5c3 (ceph): osd: implement do_scan
Handle MOSDPGScan messages to request or send a digest of a range of
objects in a collection, sorted in hobject_t (ha...
Sage Weil
07:31 PM Revision 353195d6 (ceph): types: operator<< for multimaps
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision e4ab0e3b (ceph): osd: add MOSDPGBackfill message
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 910398fe (ceph): osd: recover discontiguous peers using backfill instead of backlog
Instead of generating a huge list of objects to recover, and then pushing
them, iterate over the collection and copy ...
Sage Weil
07:31 PM Revision 4509e619 (ceph): test_backfill.sh
Sage Weil
07:31 PM Revision 004e7c92 (ceph): osd: add Incomplete peering state
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 73d15e01 (ceph): osd: do not read backlog off disk
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision b0664856 (ceph): osd: remove backlog generation code
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 6e9d135a (ceph): osd: simplify replica queries for finding divergent objects
No need to request backlog here, clearly, since those don't exist anymore.
Signed-off-by: Sage Weil <sage.weil@dream...
Sage Weil
07:31 PM Revision b8ee27a3 (ceph): osd: remove Query::BACKLOG processing
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 78b64473 (ceph): osd: kill PG::Log::copy_non_backlog
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 10e481d1 (ceph): osd: fix push_to_replica typo
We are always pushing soid. If we are missing snapdir locally, that means
we can't do an informed efficient clone, a...
Sage Weil
07:19 PM Revision b7a5a6a6 (ceph): doc: More consistency on formatting placeholder names.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision 196d4273 (ceph): doc: Link to manpage when command is mentioned.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision 75fd16a5 (ceph): doc: Use todo directive, rescue list of missing commands from wiki.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision 81feae12 (ceph): doc: Add misc explanations of Ceph internals from email.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision 034dd58f (ceph): doc: Add more missing commands to control.
This is too unstructured, that will have to be fixed later.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost....
Tommi Virtanen
07:19 PM Revision f5cfdbb7 (ceph): doc: Split intro to talk about the DFS separately. Mention petabytes.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision bc16ac3b (ceph): doc: Fix sentence that ended too abruptly.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision d745ff8d (ceph): doc: "ceph -w" clarification.
Stop saying "watch cluster state" so many times.
Don't say stdout, that's the assumption.
Don't call showing things...
Tommi Virtanen
07:14 PM Revision 18d99637 (ceph): Merge branch 'wip-messenger'
Greg Farnum
07:11 PM Revision 55639dcd (ceph): msgr: unset did_bind in stop().
We use did_bind as a flag on whether or not to stop the Accepter thread
and we should clear it when we do the stoppin...
Greg Farnum
06:59 PM Revision 41049f30 (ceph): objecter: fix use-after-free
messenger consumes the m reference. Yay valgrind.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:51 PM Revision 041d0456 (ceph): client: move PerfCounter into Client
globals are evil.
Fixes: #1826
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:50 PM Revision e8e1e5df (ceph): swift: auth response returns X-Auth-Token instead of X-Storage-Token
Yehuda Sadeh
05:31 PM Revision c9d0e556 (ceph): osd: fix build_incremental_map_msg
We keep both the inc and the full for our oldest osdmap.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
05:27 PM Revision 1a473b7a (ceph): osd: clean up _delete_head
Might be fixing a subtle logic bug, but old flow was confusing, so not
sure. :)
Signed-off-by: Sage Weil <sage@newd...
Sage Weil
05:26 PM Revision 6c8f60f6 (ceph): osd: simplify creation logic in do_osd_ops
Drop the maybe_created variable, and track exists over the course of the
transaction.
Fixes: #1825
Signed-off-by: Sa...
Sage Weil
05:16 PM Bug #1832 (Closed): osd: size tracking discrepancy (scrub stat mismatch)
During fsstress on the kernel client, this occurred:... Josh Durgin
01:53 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Hi,
I've run into this precise problem on a small testing cluster that I'm running -- down to the large 64-bit tru...
David McBride
11:55 AM rgw Bug #1830 (Resolved): RGW Swift Metadata Bug
I believe the rados gateway has a but in the way it's talking swift. When I ask it to list the objects in a container... Kyle Marsh
11:44 AM Feature #1782 (Resolved): mon: dump key cluster stats via perfcounter
Sage Weil
11:32 AM CephFS Bug #1788: msgr file descriptor leak
Forgot to update this. Haven't run into it yet and wip-messenger seemed to have fixed things. Thanks Greg! Noah Watkins
11:27 AM CephFS Bug #1788 (Resolved): msgr file descriptor leak
Haven't heard any new issues from Noah; merged to master in commit:18d996370efc2fc32d4973e9e6934901558bcbaf. Greg Farnum
11:26 AM Messengers Bug #1829 (Resolved): SimpleMessenger tries to shut down threads that aren't running
Oh, even simpler than I expected. Fixed in commit:55639dcd87fe985059355afe5fab787e4d139b11 (compile tested). Greg Farnum
11:12 AM Messengers Bug #1829 (Resolved): SimpleMessenger tries to shut down threads that aren't running
Saw this on benjamin yesterday. Looks like the OSD repeatedly restarted its messengers and was eventually unable to r... Greg Farnum
11:01 AM CephFS Cleanup #1826 (Resolved): client: kill static perfcounter
commit:041d04563e7cfdb837a345787a1569b07a064307 Sage Weil
10:54 AM rgw Bug #1780 (Resolved): swift: auth response should return X-Auth-Token instead of X-Storage-Token
Fixed, commit:e8e1e5dffbd25e2124331e607264e1bc4120676c. Yehuda Sadeh
10:12 AM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
This happened again on sepia70 during the kernel untar build workunit on rbd. Josh Durgin
09:40 AM Bug #1804 (Need More Info): filestore: unexpected EINVAL
Sage Weil
09:39 AM Bug #1828 (Resolved): osd: preserve write order when ops wait for recovery of src_oids
This affects current code.
It will need a minor adjustment so that "recovery" includes both is_missing() and osd >...
Sage Weil
09:33 AM CephFS Bug #1549 (Need More Info): mds: zeroed root CDir* vtable in scatter_writebehind_finish
Sage Weil
09:32 AM Bug #1530: osd crash during build_inc_scrub_map
fixed that last thing with commit:c9d0e556c7ad294819c60ca4e3cd4d0191811f18, but i think it's unrelated to the rest of... Sage Weil
09:22 AM Bug #1825: osd loses object deletes by some creates in the same transaction
Fix looks good; I'm working on tests to verify and check regressions. Greg Farnum
02:08 AM Revision abecbc59 (ceph): OSDMonitor: remove useless check
Session was already verified to exist before this.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
12:31 AM Revision 5804477b (ceph): qa: trivial_libceph test
This currently fails... see #1827
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:29 AM Revision c87f31e0 (ceph): client: return errors from init
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:29 AM Revision 2f281d1f (ceph): libceph: catch errors from Client::init()
And clean up error paths.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:29 AM Revision 207c40b0 (ceph): libceph: add missing #includes
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:16 AM Revision 31b5ccbf (ceph): coverage: use locally stored build instead of downloading from a gitbui...
Josh Durgin

12/13/2011

05:31 PM CephFS Bug #1827: libceph: hang on creating a file
see commit:5804477b20f89a2b02218b518a44e73073b393c9 for reproducer.
fwiw i ran with vstart and 'LD_PRELOAD=../../s...
Sage Weil
04:36 PM CephFS Bug #1827 (Resolved): libceph: hang on creating a file
Using trivial thinger from Noah. Sage Weil
05:15 PM Revision 6b425676 (ceph): objectstore: implement Transaction::dump()
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:15 PM Revision 7133a2fa (ceph): filestore: dump transaction to log if we hit an error
This will let us see which operation in the transaction failed.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:05 PM Revision 3d13f003 (ceph): objectstore: create Transaction::iterator class
Remove iterator state from Transaction itself.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
04:32 PM CephFS Cleanup #1826 (Resolved): client: kill static perfcounter
Make it a Client member. The CephContext stuff tracks "per-process" state now, so no need to be weird. Also, these ... Sage Weil
04:28 PM Revision 4da96ff3 (ceph): rados load-gen workunits
Sage Weil
04:19 PM Revision 6ff95e9d (ceph): qa: rados load-gen workunits
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:10 PM Bug #1825: osd loses object deletes by some creates in the same transaction
see wip-osd-maybe-created Sage Weil
02:11 PM Bug #1825 (Resolved): osd loses object deletes by some creates in the same transaction
We found a missing object in alexandria, caused by the gateway trying to delete an object that seems to not actually ... Greg Farnum
11:07 AM rgw Tasks #1823: radosgw should have internal timeouts
I think I wasn't clear enough. RGW doesn't need to do that in the I/O path. Anyway, we need to think of the functiona... Yehuda Sadeh
10:55 AM rgw Tasks #1823: radosgw should have internal timeouts
RGW ought to be able to grab information about IOs which are taking too long and figure out what OSD that IO resides ... Greg Farnum
10:52 AM rgw Tasks #1823: radosgw should have internal timeouts
We can have timeouts for the init process for other operations I'm not sure it'll make sense doing it in the rgw laye... Yehuda Sadeh
10:44 AM rgw Tasks #1823 (Rejected): radosgw should have internal timeouts
Letting Apache time out the rados gateway makes admins sad, since there's no visibility into what is actually timing ... Greg Farnum
10:53 AM rgw Tasks #1824 (Resolved): ceph monitor status should be available and documented
I saw last night that I think we can run "ceph quorum_status" to see which monitors are in the quorum, "ceph mon_stat... Greg Farnum
10:49 AM Bug #1821: librados: rados_create_with_context is unusable
Josh Durgin wrote:
> The C++ variant librados::Rados::init_with_context is used by librbd, radosgw, and some command...
Sage Weil
10:44 AM Bug #1821: librados: rados_create_with_context is unusable
The C++ variant librados::Rados::init_with_context is used by librbd, radosgw, and some command line tools, but this ... Josh Durgin
10:49 AM Bug #1820: deprecate "ceph stop"
It's not being run because getting the parsing and isolating leaks is a pain, but there are teuthology tasks to run v... Greg Farnum
10:28 AM Bug #1820: deprecate "ceph stop"
none of this is tested anywhere.. it's for when you manually want to check for leaks, and need the osd to try to shut... Sage Weil
10:08 AM Bug #1820: deprecate "ceph stop"
I don't see anything in teuthology sending stop commands to the OSDs; I believe the valgrind stuff just uses SIGTERM. Greg Farnum
09:59 AM Bug #1820: deprecate "ceph stop"
exit(0) on SIGTERM is perfectly valid.
If we do need more than SIGUSR1 & SIGUSR2, the communication mechanism shou...
Anonymous
09:38 AM Bug #1820: deprecate "ceph stop"
... Sage Weil
09:31 AM Bug #1820: deprecate "ceph stop"
gcov is already using SIGTERM. Anonymous
10:33 AM Bug #1530: osd crash during build_inc_scrub_map
I'm guessing this is the new incarnation of this issue?
From teuthology:~teuthworker/archive/nightly_coverage_2011-1...
Josh Durgin
10:31 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
Happened again in teuthology:teuthworker~/archive/nightly_coverage_2011-12-13-a/4183/remote/ubuntu@sepia74.ceph.dream... Josh Durgin
10:12 AM rgw Bug #1822 (Closed): radosgw can be slow to respond to requests
The DHO admins are having problems where sometimes requests take so long that Apache issues an ISE 500. It's often bu... Greg Farnum
09:48 AM Bug #1789 (Need More Info): mon: failed assert(paxosv == pg_map.version)
have core, but no matching binary. not clear from code inspection what happened.
Sage Weil
09:30 AM Bug #1804: filestore: unexpected EINVAL
as of commit:7133a2faf0ae0710b7cbd9801c64767172d48faf we dump the failed transaction to the log. Sage Weil
08:28 AM Feature #1799 (Resolved): qa: add 'rados --load-gen' test(s)
Sage Weil
12:29 AM Revision c9e4504f (ceph): Ignore lockdep being turned off for now.
Some machines are hitting this udev issue:
http://marc.info/?l=linux-kernel&m=132033587908426&w=2 and lockdep is
turn...
Josh Durgin
12:00 AM Revision 6d5e5bdb (ceph): pybind/rados: add asynchronous write,append,read,write_full operations
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just

12/12/2011

10:31 PM Revision 78b7a255 (ceph): doc: Import the list of ceph subcommands from wiki.
This adds the content of the wiki page at
http://ceph.newdream.net/wiki/Monitor_commands
to doc/control.rst in orde...
Andre Noll
10:31 PM Revision 9aadd41b (ceph): doc: Add documentation of missing osd commands.
The set of OSD commands which added by the previous commit is
incomplete. This patch adds documentation for the follo...
Andre Noll
10:31 PM Revision 1867a745 (ceph): doc: Document pause and unpause osd commands.
These two commands were undocumented so far. This patch adds a short
description.
Signed-Off-By: Andre Noll <maan@sy...
Andre Noll
10:31 PM Revision 7dce3e6f (ceph): doc: Update the list of fields for the pool set command.
This list was lacking a few fields: crash_replay_interval, pg_num,
pgp_num and crush_ruleset. Include these fields an...
Andre Noll
10:31 PM Revision db30716b (ceph): doc: Add missing documentation for osd pool get.
"osd pool set" was already documented, but the corresponding "get"
command was not. This patch adds the list of valid...
Andre Noll
10:31 PM Revision fb8fd186 (ceph): doc: Clarify documentation of reweight command.
This caused some discussions on the mailing list, so let's try to be clear
about the meaning of an OSD weight.
Signe...
Andre Noll
09:35 PM Bug #1821: librados: rados_create_with_context is unusable
i think radosgw uses it. it creates a CephContext by linking directly the ceph internals... Sage Weil
05:12 PM Bug #1821 (Resolved): librados: rados_create_with_context is unusable
There's no way to get a CephContext using the C api, so you can't pass one to rados_create_with_context. Maybe a rado... Josh Durgin
09:24 PM Revision 06046470 (ceph): SimpleMessenger: remove void send_keepalive.
Nobody uses this; they all call the version that returns an int.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhos...
Greg Farnum
09:24 PM Revision e6e66232 (ceph): mds: mark_disposable when closing a Client connection.
This is causing issues since the Client's ack of the MClientSession
is somehow not getting back to the MDS. We should...
Greg Farnum
09:24 PM Revision 1dd173a2 (ceph): messenger: fix up fault()'s "onconnect" parameter.
We should be setting this true when calling fault() from connect().
And rename it in the header -- it does produce le...
Greg Farnum
07:25 PM Bug #1820: deprecate "ceph stop"
Iirc the real purpose is to make the daemon shut down cleanly. This is important for gprof, valgrind memcheck, etc. ... Sage Weil
02:38 PM Bug #1820 (Resolved): deprecate "ceph stop"
A good daemon supervision system would try to restart any daemons that just exited. For "ceph stop" to work in the wo... Anonymous
05:29 PM Revision 5e215c7e (ceph): Merge branch 'wip-mon-stats'
Sage Weil
05:27 PM Revision 808a851d (ceph): mdsmap: rename get_num_*_mds() methods
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:27 PM Revision 711447d8 (ceph): mon: add mds, mon info to cluster_logger
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:24 PM Revision ac31d526 (ceph): mon: report basic cluster stats via perfcounters
These are basic point-in-time cluster stats.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:22 PM Revision 1f1b5fdf (ceph): crush: drop unused label
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:20 PM Revision 62b78de7 (ceph): Merge remote branch 'gh/stable'
Sage Weil
05:18 PM Revision 495307a1 (ceph): crush: fix force to behave with non-root TAKE
If the (first) TAKE in the crush rule is not the root, see if they picked
a point somewhere beneath the appropriate p...
Sage Weil
05:17 PM Revision 14f8f00e (ceph): crush: simplify force argument check
force isn't used past this point, only force_pos. Collapse the if
conditions.
Signed-off-by: Sage Weil <sage@newdre...
Sage Weil
04:45 PM Messengers Bug #1803: msgr: behave better when ending TCP connections
And I've flipped back and forth umpteen times today about what's going on. At this point I can conclude that nobody o... Greg Farnum
10:49 AM Messengers Bug #1803 (In Progress): msgr: behave better when ending TCP connections
From the little I'm reading in Unix Network Programming, it looks like we're just doing this wrong — we call shutdown... Greg Farnum
11:21 AM Documentation #1819 (Resolved): document librados python api
Josh Durgin
11:21 AM rbd Documentation #1818 (Closed): document librbd C++ api
Josh Durgin
11:20 AM Documentation #1817 (Closed): document librados C++ api
Josh Durgin
11:20 AM rbd Documentation #1816 (Closed): document librbd C api
Use similar examples to the python api docs. Josh Durgin
11:19 AM Documentation #1815 (Resolved): document librados C api
Document the librados C api with doxygen. Josh Durgin
10:00 AM Documentation #1814 (Resolved): doc: openstack + ceph install howto
Sage Weil
09:58 AM rgw Documentation #1813 (Resolved): doc: document radosgw api diffs with s3
move from google docs or wherever. clean up. maintain going forward. Sage Weil
09:50 AM Bug #1683 (In Progress): librados: list objects should also return locator key
Sage Weil
09:48 AM Bug #1744: teuthology: race with daemon shutdown?
any additional teuthology logging we can add to sort out what is happening? Sage Weil
09:47 AM RADOS Bug #1794 (Resolved): crush: creating/destroying buckets of zero items
fixed by commit:ca002a3389877f5e150659649e27e7ae59d7d402 Sage Weil
09:45 AM Feature #1782: mon: dump key cluster stats via perfcounter
Sage Weil
08:53 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
Verify that last failure was running a commit that included the fix? Sage Weil
08:38 AM Linux kernel client Bug #1812 (Resolved): iput scheduling while atomic
iput can sleep, but is called with spinlocks held in some cases.... Sage Weil
08:34 AM Bug #1750 (In Progress): xattr errors silently ignored, cause trouble later
Sage Weil
08:31 AM Bug #1750: xattr errors silently ignored, cause trouble later
Shouldn't the FileStore have asserted on the -28? Sage Weil
03:19 AM Linux kernel client Bug #1795: break d_lock > s_cap_lock ordering
Seems fixed here now with git branch wip-d-lock. Amon Ott
03:18 AM Linux kernel client Bug #1762: i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
Seems to be fixed here now with git commits be655596b3de5873f994ddbe205751a5ffb4de39 (for-linus) and 1a2fe05d296a35da... Amon Ott

12/10/2011

12:31 AM Revision cf279a8b (ceph): workunits: print tests pjd runs
This will tell us which ones actually failed within a test suite.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost....
Josh Durgin
 

Also available in: Atom