Project

General

Profile

Activity

From 11/16/2011 to 12/15/2011

12/15/2011

10:03 PM Revision 739fd9fe (ceph): man: clarify mount.ceph auth options
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
09:49 PM Revision e5a5ae12 (ceph): man: update rule definition for ceph-rbdnamer
This is the rule we install since 891025e539a92b5d75011e2e75c475fc0c272042.
Signed-off-by: Josh Durgin <josh.durgin@...
Josh Durgin
09:43 PM Revision 4eb83654 (ceph): authx -> cephx everywhere it's used
The term authx was in the mount.ceph man page, and got accidentally
copied into rbd help.
Signed-off-by: Josh Durgin...
Josh Durgin
09:24 PM Revision 7eec3094 (ceph): rountrip: add task
Yehuda Sadeh
09:15 PM Revision 41f64be0 (ceph): ReplicatedPG: calc_clone_subsets fix other clone_overlap case
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:15 PM Revision b5c32590 (ceph): ReplicatedPG: fix backfill mismatch error output
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:15 PM Revision 5b41c470 (ceph): OSD: use disk_tp.pause() without osd_lock
Previously, we called disk_tp.pause_new(). This can cause a race
where snap_trimmer queues more transactions after w...
Samuel Just
08:39 PM Revision 97cc6c29 (ceph): readwrite: fix task with default conf
Yehuda Sadeh
04:51 PM Revision ec776f4b (ceph): ceph.spec: Clean up and fix spec file and build for a couple of distrib...
Clean up and fix the spec file. This includes cleaning up of build and
installed system dependencies, LSB compliance ...
Holger Macht
04:49 PM Revision 0e0583f8 (ceph): init-ceph/init-radosgw: Don't use unspecified runlevel 4
Don't use runlevel 4 in init scripts. AFAIK, no distribution is using it
and at least the Open Build Service complain...
Holger Macht
02:32 PM Bug #1833 (Resolved): mon: failed decode in LogMonitor::update_from_paxos
Saw this on benjamin today. It was during catchup; mon.beta had been out for a day or more and was catching up. Perha... Greg Farnum
03:08 AM Revision 0c547046 (ceph): osd: preserve write order when waiting on src_oids
We need to preserve the order of write operations on each object. If we
have a write on X that needs to read from Y,...
Sage Weil
03:08 AM Revision ca2e8e5a (ceph): osd: EINVAL on mismatched locator without waiting for degraded
No reason to recover before returning an error.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
03:08 AM Revision 7a7aab25 (ceph): osd: wait for src_oid if it on other side of last_backfill from oid
If the target object is before last_backfill, then the backfill_target
will be asked to apply the operation. If one ...
Sage Weil
01:43 AM Revision da286059 (ceph): client: fix logger deregistration
Only unregister logger if it is non-NULL (and thus registered) to avoid
running afoul of the cct assertions.
Signed-...
Sage Weil
01:14 AM Revision 659e66aa (ceph): readwrite: fix conf, task runs
Yehuda Sadeh
12:12 AM Revision 7d085ad9 (ceph): readwrite: add readwrite task
still not really running, but at least getting configured Yehuda Sadeh

12/14/2011

11:51 PM Revision 62c830f0 (ceph): ReplicatedPG: add_object_context_to_pg_stat, obc->ssc may be null
obc->ssc is not necessarily filled in by get_object_context.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just
11:37 PM Revision 5a400935 (ceph): obsync: add vvprint back in
Commit ebe5fc60d20f92a0037c53c1e7bd7ae512be3da4 removed the definition of
vvprint without removint all the places tha...
Kyle Marsh
11:19 PM Revision cda5f0d3 (ceph): PG: clear waiting_on_backfill during clear_recovery_state
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
11:17 PM Revision d32fd8c5 (ceph): ReplicatedPG: list snapid 0 on collection_list_partial for backfill
0 will list all objects, CEPH_NO_SNAP will list only head objects.
Signed-off-by: Samuel Just <samuel.just@dreamhost...
Samuel Just
10:10 PM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
There's two things here, the second being the monitor changes you're focusing on. I need to investigate further why t... Greg Farnum
07:03 PM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
I think there are two parts here:
- the mon shouldn't let sessions start if it is not in the quorum. that may ac...
Sage Weil
03:39 PM Bug #1831 (Resolved): mon: should not accept (and should disconnect) session when not in quorum
This happened on Benjamin. The OSDs ought to be failing the connection and going to a new monitor, but they failed to... Greg Farnum
07:40 PM Revision d9d05117 (ceph): Merge remote branch 'upstream/master' into wip_backfill_merged
Samuel Just
07:39 PM Revision 07b3ba81 (ceph): ReplicatedPG: collection_list_partial also takes a snapid
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:38 PM Revision 1430c8ab (ceph): doc: Make overview.rst valid reStructuredText, so I can stop seeing war...
It's still wrong, but now it won't clutter the output.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen
07:33 PM Revision 53f7323c (ceph): doc: reStructuredText syntax fix.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:33 PM Revision c1190740 (ceph): pybind: Add a description to docstring.
This avoids a Sphinx warning like this:
.../src/pybind/rbd.py:docstring of rbd.RBD.version:2: WARNING: Field list en...
Tommi Virtanen
07:32 PM Revision 9d633a4f (ceph): PG: A backfill osd can have last_complete < log_tail
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:32 PM Revision 51deeef6 (ceph): ReplicatedPG: calc_*_subsets must consider last_backfill
Objects yet to be backfilled do not show up in the missing set. Thus,
we cannot use an object past last_backfill to ...
Samuel Just
07:32 PM Revision 7832e17e (ceph): PG: activate, backfill replica can have last_complete < log_tail
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:32 PM Revision b9eea709 (ceph): osd: object_stat_sum_t::clear()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision 940a55e0 (ceph): osd: track backfill target pg stats
Maintain backfill target pg stats to be the summation over objects to
the left of last_backfill. Reflect this in the...
Sage Weil
07:32 PM Revision 7213c457 (ceph): PG: Ask for digest at most once at a time
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:32 PM Revision 9bb77b49 (ceph): osd: observe last_backfill in merge_log() and helpers
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision e1006d76 (ceph): osd: more backfill changes
Always ship log for updates to backfill targets to preserve the repgather
ordering.
Fix up recover_backfill() bounds...
Sage Weil
07:32 PM Revision af7536d0 (ceph): hobject_t: fix hobject(sobject_t) constructor
Initialize max
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:32 PM Revision cd0c8fb3 (ceph): osd: add incomplete, backfill states; simplify calculation
Set/clear states in peering state machine state ctor/dtors where possible.
Set degraded if the number of non-backfil...
Sage Weil
07:32 PM Revision f83a787e (ceph): osd: some recover_backfill() comments
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision f1caaa37 (ceph): osd: fix calc_acting()
Look at usable, not want.size(), so we don't count backfill targets.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:32 PM Revision 57baf9ef (ceph): osd: fix signed/unsigned comp
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision 71893b0e (ceph): osd: remove bad !is_incomplete() assert
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:32 PM Revision 999846f7 (ceph): PG: fix phantom entry in peer_info
In GetLog, do not call pg->peer_info[newest_update_osd] if
newest_update_osd is osd->whoami.
Signed-off-by: Samuel J...
Samuel Just
07:32 PM Revision f483df15 (ceph): PG: there may now be backfill entries in the acting set
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
07:32 PM Revision f1ae9ed5 (ceph): objectstore: make list by hash *next > instead of >=
This means we should set it to a hash boundary or the last item of our
result set (not the next item we didn't includ...
Sage Weil
07:31 PM Revision f7a0b9c5 (ceph): hobject_t: fix sorting by hash key
Use get_effective_key() to return key (if explicit) or object name. Sort
by that within each hash value.
Clean up o...
Sage Weil
07:31 PM Revision 9288f0e0 (ceph): osd: advance last_backfill by keys only
This ensures that transactions are never split by last_backfill.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 88ee86d0 (ceph): osd: keep backfill targets in acting set
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision b99e1358 (ceph): osd: make backfill (basically) work again
Still need to handle concurrent updates, log recovery vs backfill, etc.
Signed-off-by: Sage Weil <sage.weil@dreamhos...
Sage Weil
07:31 PM Revision de19a6bb (ceph): Revert "osd: don't keep push state on replicas"
This reverts commit 69c77e33f8530993dbc280525bd21218ea6f9ddb.
sub_op_pull() calls send_push_op directly, does not pa...
Sage Weil
07:31 PM Revision baa21c9b (ceph): osd: implement PG::copy_range()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision c03c49ca (ceph): osd: initialize repop gather set in issue_repop instead of new_repop
Simpler. It will also make the last_backfill correction live in one
place.
Signed-off-by: Sage Weil <sage.weil@drea...
Sage Weil
07:31 PM Revision 5b558dc4 (ceph): osd: strip out some backlog logic
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 82a23dbe (ceph): osd: strip backlog case out of merge_log
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 3f5ced69 (ceph): osd: kill backlog_requested
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 6d299552 (ceph): osd: strip backlog logic out of PG::activate()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision e7514f75 (ceph): osd: state machine whitespace
I feel better now
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 257b85d8 (ceph): osd: remove log_backlog from PG::Info
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 7521c51a (ceph): osd: remove backlog case from clean_up_local
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 9ceecc89 (ceph): osd: kill PG::Info::backlog
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision d7f7bbdc (ceph): osd: remove recovery-from-backlog kludge last_update
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 722ec7e5 (ceph): osd: kill unused PG_STATE_SCANNING
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision d84a9f6f (ceph): osd: cleanup
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 693950bf (ceph): osd: cleanup lingering backlog refs
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision e63c595a (ceph): osd: kill unused PG::Log::copy_after_unless_divergent
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision b5de19b5 (ceph): osd: kill unused PG::trim_write_ahead
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 0e7f4aff (ceph): osd: pg whitespace
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 400c27da (ceph): osd: track backfill with last_backfill, not interval_set<>
We always fill from the bottom up anyway. Using an hobject_t also gives us
a precise bound. It also makes things co...
Sage Weil
07:31 PM Revision 91ee3375 (ceph): osd: osd_kill_backfill_at
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 99c614fa (ceph): osd: don't keep push state on replicas
Primaries need this, but replicas don't: the primary will explicitly pull
the pieces of the object that it wants.
Si...
Sage Weil
07:31 PM Revision 2cdc6b4e (ceph): osd: rewrite choose_acting process
Consolidate callers, eliminate obsolete backlog ones.
New process:
- pick best log, with preferences for those that...
Sage Weil
07:31 PM Revision 9e51c639 (ceph): osd: MOSDPGScan
Message to query hash ranges of a PG.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:31 PM Revision 8f14a358 (ceph): osd: add PG::BackfillInterval type
Describe a range of objects for the purposes of backfilling a PG.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 55c24813 (ceph): osd: implement ReplicatedPG::_lookup_object_context
Look up an existing ObjectContext without taking a reference.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 92d290d6 (ceph): osd: implement ReplicatedPG::scan_range
Scan a range of the local collection.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:31 PM Revision 17b5d5c3 (ceph): osd: implement do_scan
Handle MOSDPGScan messages to request or send a digest of a range of
objects in a collection, sorted in hobject_t (ha...
Sage Weil
07:31 PM Revision 353195d6 (ceph): types: operator<< for multimaps
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision e4ab0e3b (ceph): osd: add MOSDPGBackfill message
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 910398fe (ceph): osd: recover discontiguous peers using backfill instead of backlog
Instead of generating a huge list of objects to recover, and then pushing
them, iterate over the collection and copy ...
Sage Weil
07:31 PM Revision 4509e619 (ceph): test_backfill.sh
Sage Weil
07:31 PM Revision 004e7c92 (ceph): osd: add Incomplete peering state
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 73d15e01 (ceph): osd: do not read backlog off disk
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision b0664856 (ceph): osd: remove backlog generation code
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 6e9d135a (ceph): osd: simplify replica queries for finding divergent objects
No need to request backlog here, clearly, since those don't exist anymore.
Signed-off-by: Sage Weil <sage.weil@dream...
Sage Weil
07:31 PM Revision b8ee27a3 (ceph): osd: remove Query::BACKLOG processing
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 78b64473 (ceph): osd: kill PG::Log::copy_non_backlog
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:31 PM Revision 10e481d1 (ceph): osd: fix push_to_replica typo
We are always pushing soid. If we are missing snapdir locally, that means
we can't do an informed efficient clone, a...
Sage Weil
07:19 PM Revision b7a5a6a6 (ceph): doc: More consistency on formatting placeholder names.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision 196d4273 (ceph): doc: Link to manpage when command is mentioned.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision 75fd16a5 (ceph): doc: Use todo directive, rescue list of missing commands from wiki.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision 81feae12 (ceph): doc: Add misc explanations of Ceph internals from email.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision 034dd58f (ceph): doc: Add more missing commands to control.
This is too unstructured, that will have to be fixed later.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost....
Tommi Virtanen
07:19 PM Revision f5cfdbb7 (ceph): doc: Split intro to talk about the DFS separately. Mention petabytes.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision bc16ac3b (ceph): doc: Fix sentence that ended too abruptly.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
07:19 PM Revision d745ff8d (ceph): doc: "ceph -w" clarification.
Stop saying "watch cluster state" so many times.
Don't say stdout, that's the assumption.
Don't call showing things...
Tommi Virtanen
07:14 PM Revision 18d99637 (ceph): Merge branch 'wip-messenger'
Greg Farnum
07:11 PM Revision 55639dcd (ceph): msgr: unset did_bind in stop().
We use did_bind as a flag on whether or not to stop the Accepter thread
and we should clear it when we do the stoppin...
Greg Farnum
06:59 PM Revision 41049f30 (ceph): objecter: fix use-after-free
messenger consumes the m reference. Yay valgrind.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:51 PM Revision 041d0456 (ceph): client: move PerfCounter into Client
globals are evil.
Fixes: #1826
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
06:50 PM Revision e8e1e5df (ceph): swift: auth response returns X-Auth-Token instead of X-Storage-Token
Yehuda Sadeh
05:31 PM Revision c9d0e556 (ceph): osd: fix build_incremental_map_msg
We keep both the inc and the full for our oldest osdmap.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
05:27 PM Revision 1a473b7a (ceph): osd: clean up _delete_head
Might be fixing a subtle logic bug, but old flow was confusing, so not
sure. :)
Signed-off-by: Sage Weil <sage@newd...
Sage Weil
05:26 PM Revision 6c8f60f6 (ceph): osd: simplify creation logic in do_osd_ops
Drop the maybe_created variable, and track exists over the course of the
transaction.
Fixes: #1825
Signed-off-by: Sa...
Sage Weil
05:16 PM Bug #1832 (Closed): osd: size tracking discrepancy (scrub stat mismatch)
During fsstress on the kernel client, this occurred:... Josh Durgin
01:53 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Hi,
I've run into this precise problem on a small testing cluster that I'm running -- down to the large 64-bit tru...
David McBride
11:55 AM rgw Bug #1830 (Resolved): RGW Swift Metadata Bug
I believe the rados gateway has a but in the way it's talking swift. When I ask it to list the objects in a container... Kyle Marsh
11:44 AM Feature #1782 (Resolved): mon: dump key cluster stats via perfcounter
Sage Weil
11:32 AM CephFS Bug #1788: msgr file descriptor leak
Forgot to update this. Haven't run into it yet and wip-messenger seemed to have fixed things. Thanks Greg! Noah Watkins
11:27 AM CephFS Bug #1788 (Resolved): msgr file descriptor leak
Haven't heard any new issues from Noah; merged to master in commit:18d996370efc2fc32d4973e9e6934901558bcbaf. Greg Farnum
11:26 AM Messengers Bug #1829 (Resolved): SimpleMessenger tries to shut down threads that aren't running
Oh, even simpler than I expected. Fixed in commit:55639dcd87fe985059355afe5fab787e4d139b11 (compile tested). Greg Farnum
11:12 AM Messengers Bug #1829 (Resolved): SimpleMessenger tries to shut down threads that aren't running
Saw this on benjamin yesterday. Looks like the OSD repeatedly restarted its messengers and was eventually unable to r... Greg Farnum
11:01 AM CephFS Cleanup #1826 (Resolved): client: kill static perfcounter
commit:041d04563e7cfdb837a345787a1569b07a064307 Sage Weil
10:54 AM rgw Bug #1780 (Resolved): swift: auth response should return X-Auth-Token instead of X-Storage-Token
Fixed, commit:e8e1e5dffbd25e2124331e607264e1bc4120676c. Yehuda Sadeh
10:12 AM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
This happened again on sepia70 during the kernel untar build workunit on rbd. Josh Durgin
09:40 AM Bug #1804 (Need More Info): filestore: unexpected EINVAL
Sage Weil
09:39 AM Bug #1828 (Resolved): osd: preserve write order when ops wait for recovery of src_oids
This affects current code.
It will need a minor adjustment so that "recovery" includes both is_missing() and osd >...
Sage Weil
09:33 AM CephFS Bug #1549 (Need More Info): mds: zeroed root CDir* vtable in scatter_writebehind_finish
Sage Weil
09:32 AM Bug #1530: osd crash during build_inc_scrub_map
fixed that last thing with commit:c9d0e556c7ad294819c60ca4e3cd4d0191811f18, but i think it's unrelated to the rest of... Sage Weil
09:22 AM Bug #1825: osd loses object deletes by some creates in the same transaction
Fix looks good; I'm working on tests to verify and check regressions. Greg Farnum
02:08 AM Revision abecbc59 (ceph): OSDMonitor: remove useless check
Session was already verified to exist before this.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
12:31 AM Revision 5804477b (ceph): qa: trivial_libceph test
This currently fails... see #1827
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:29 AM Revision c87f31e0 (ceph): client: return errors from init
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:29 AM Revision 2f281d1f (ceph): libceph: catch errors from Client::init()
And clean up error paths.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:29 AM Revision 207c40b0 (ceph): libceph: add missing #includes
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
12:16 AM Revision 31b5ccbf (ceph): coverage: use locally stored build instead of downloading from a gitbui...
Josh Durgin

12/13/2011

05:31 PM CephFS Bug #1827: libceph: hang on creating a file
see commit:5804477b20f89a2b02218b518a44e73073b393c9 for reproducer.
fwiw i ran with vstart and 'LD_PRELOAD=../../s...
Sage Weil
04:36 PM CephFS Bug #1827 (Resolved): libceph: hang on creating a file
Using trivial thinger from Noah. Sage Weil
05:15 PM Revision 6b425676 (ceph): objectstore: implement Transaction::dump()
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:15 PM Revision 7133a2fa (ceph): filestore: dump transaction to log if we hit an error
This will let us see which operation in the transaction failed.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:05 PM Revision 3d13f003 (ceph): objectstore: create Transaction::iterator class
Remove iterator state from Transaction itself.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
04:32 PM CephFS Cleanup #1826 (Resolved): client: kill static perfcounter
Make it a Client member. The CephContext stuff tracks "per-process" state now, so no need to be weird. Also, these ... Sage Weil
04:28 PM Revision 4da96ff3 (ceph): rados load-gen workunits
Sage Weil
04:19 PM Revision 6ff95e9d (ceph): qa: rados load-gen workunits
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:10 PM Bug #1825: osd loses object deletes by some creates in the same transaction
see wip-osd-maybe-created Sage Weil
02:11 PM Bug #1825 (Resolved): osd loses object deletes by some creates in the same transaction
We found a missing object in alexandria, caused by the gateway trying to delete an object that seems to not actually ... Greg Farnum
11:07 AM rgw Tasks #1823: radosgw should have internal timeouts
I think I wasn't clear enough. RGW doesn't need to do that in the I/O path. Anyway, we need to think of the functiona... Yehuda Sadeh
10:55 AM rgw Tasks #1823: radosgw should have internal timeouts
RGW ought to be able to grab information about IOs which are taking too long and figure out what OSD that IO resides ... Greg Farnum
10:52 AM rgw Tasks #1823: radosgw should have internal timeouts
We can have timeouts for the init process for other operations I'm not sure it'll make sense doing it in the rgw laye... Yehuda Sadeh
10:44 AM rgw Tasks #1823 (Rejected): radosgw should have internal timeouts
Letting Apache time out the rados gateway makes admins sad, since there's no visibility into what is actually timing ... Greg Farnum
10:53 AM rgw Tasks #1824 (Resolved): ceph monitor status should be available and documented
I saw last night that I think we can run "ceph quorum_status" to see which monitors are in the quorum, "ceph mon_stat... Greg Farnum
10:49 AM Bug #1821: librados: rados_create_with_context is unusable
Josh Durgin wrote:
> The C++ variant librados::Rados::init_with_context is used by librbd, radosgw, and some command...
Sage Weil
10:44 AM Bug #1821: librados: rados_create_with_context is unusable
The C++ variant librados::Rados::init_with_context is used by librbd, radosgw, and some command line tools, but this ... Josh Durgin
10:49 AM Bug #1820: deprecate "ceph stop"
It's not being run because getting the parsing and isolating leaks is a pain, but there are teuthology tasks to run v... Greg Farnum
10:28 AM Bug #1820: deprecate "ceph stop"
none of this is tested anywhere.. it's for when you manually want to check for leaks, and need the osd to try to shut... Sage Weil
10:08 AM Bug #1820: deprecate "ceph stop"
I don't see anything in teuthology sending stop commands to the OSDs; I believe the valgrind stuff just uses SIGTERM. Greg Farnum
09:59 AM Bug #1820: deprecate "ceph stop"
exit(0) on SIGTERM is perfectly valid.
If we do need more than SIGUSR1 & SIGUSR2, the communication mechanism shou...
Anonymous
09:38 AM Bug #1820: deprecate "ceph stop"
... Sage Weil
09:31 AM Bug #1820: deprecate "ceph stop"
gcov is already using SIGTERM. Anonymous
10:33 AM Bug #1530: osd crash during build_inc_scrub_map
I'm guessing this is the new incarnation of this issue?
From teuthology:~teuthworker/archive/nightly_coverage_2011-1...
Josh Durgin
10:31 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
Happened again in teuthology:teuthworker~/archive/nightly_coverage_2011-12-13-a/4183/remote/ubuntu@sepia74.ceph.dream... Josh Durgin
10:12 AM rgw Bug #1822 (Closed): radosgw can be slow to respond to requests
The DHO admins are having problems where sometimes requests take so long that Apache issues an ISE 500. It's often bu... Greg Farnum
09:48 AM Bug #1789 (Need More Info): mon: failed assert(paxosv == pg_map.version)
have core, but no matching binary. not clear from code inspection what happened.
Sage Weil
09:30 AM Bug #1804: filestore: unexpected EINVAL
as of commit:7133a2faf0ae0710b7cbd9801c64767172d48faf we dump the failed transaction to the log. Sage Weil
08:28 AM Feature #1799 (Resolved): qa: add 'rados --load-gen' test(s)
Sage Weil
12:29 AM Revision c9e4504f (ceph): Ignore lockdep being turned off for now.
Some machines are hitting this udev issue:
http://marc.info/?l=linux-kernel&m=132033587908426&w=2 and lockdep is
turn...
Josh Durgin
12:00 AM Revision 6d5e5bdb (ceph): pybind/rados: add asynchronous write,append,read,write_full operations
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just

12/12/2011

10:31 PM Revision 78b7a255 (ceph): doc: Import the list of ceph subcommands from wiki.
This adds the content of the wiki page at
http://ceph.newdream.net/wiki/Monitor_commands
to doc/control.rst in orde...
Andre Noll
10:31 PM Revision 9aadd41b (ceph): doc: Add documentation of missing osd commands.
The set of OSD commands which added by the previous commit is
incomplete. This patch adds documentation for the follo...
Andre Noll
10:31 PM Revision 1867a745 (ceph): doc: Document pause and unpause osd commands.
These two commands were undocumented so far. This patch adds a short
description.
Signed-Off-By: Andre Noll <maan@sy...
Andre Noll
10:31 PM Revision 7dce3e6f (ceph): doc: Update the list of fields for the pool set command.
This list was lacking a few fields: crash_replay_interval, pg_num,
pgp_num and crush_ruleset. Include these fields an...
Andre Noll
10:31 PM Revision db30716b (ceph): doc: Add missing documentation for osd pool get.
"osd pool set" was already documented, but the corresponding "get"
command was not. This patch adds the list of valid...
Andre Noll
10:31 PM Revision fb8fd186 (ceph): doc: Clarify documentation of reweight command.
This caused some discussions on the mailing list, so let's try to be clear
about the meaning of an OSD weight.
Signe...
Andre Noll
09:35 PM Bug #1821: librados: rados_create_with_context is unusable
i think radosgw uses it. it creates a CephContext by linking directly the ceph internals... Sage Weil
05:12 PM Bug #1821 (Resolved): librados: rados_create_with_context is unusable
There's no way to get a CephContext using the C api, so you can't pass one to rados_create_with_context. Maybe a rado... Josh Durgin
09:24 PM Revision 06046470 (ceph): SimpleMessenger: remove void send_keepalive.
Nobody uses this; they all call the version that returns an int.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhos...
Greg Farnum
09:24 PM Revision e6e66232 (ceph): mds: mark_disposable when closing a Client connection.
This is causing issues since the Client's ack of the MClientSession
is somehow not getting back to the MDS. We should...
Greg Farnum
09:24 PM Revision 1dd173a2 (ceph): messenger: fix up fault()'s "onconnect" parameter.
We should be setting this true when calling fault() from connect().
And rename it in the header -- it does produce le...
Greg Farnum
07:25 PM Bug #1820: deprecate "ceph stop"
Iirc the real purpose is to make the daemon shut down cleanly. This is important for gprof, valgrind memcheck, etc. ... Sage Weil
02:38 PM Bug #1820 (Resolved): deprecate "ceph stop"
A good daemon supervision system would try to restart any daemons that just exited. For "ceph stop" to work in the wo... Anonymous
05:29 PM Revision 5e215c7e (ceph): Merge branch 'wip-mon-stats'
Sage Weil
05:27 PM Revision 808a851d (ceph): mdsmap: rename get_num_*_mds() methods
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:27 PM Revision 711447d8 (ceph): mon: add mds, mon info to cluster_logger
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:24 PM Revision ac31d526 (ceph): mon: report basic cluster stats via perfcounters
These are basic point-in-time cluster stats.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
05:22 PM Revision 1f1b5fdf (ceph): crush: drop unused label
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:20 PM Revision 62b78de7 (ceph): Merge remote branch 'gh/stable'
Sage Weil
05:18 PM Revision 495307a1 (ceph): crush: fix force to behave with non-root TAKE
If the (first) TAKE in the crush rule is not the root, see if they picked
a point somewhere beneath the appropriate p...
Sage Weil
05:17 PM Revision 14f8f00e (ceph): crush: simplify force argument check
force isn't used past this point, only force_pos. Collapse the if
conditions.
Signed-off-by: Sage Weil <sage@newdre...
Sage Weil
04:45 PM Messengers Bug #1803: msgr: behave better when ending TCP connections
And I've flipped back and forth umpteen times today about what's going on. At this point I can conclude that nobody o... Greg Farnum
10:49 AM Messengers Bug #1803 (In Progress): msgr: behave better when ending TCP connections
From the little I'm reading in Unix Network Programming, it looks like we're just doing this wrong — we call shutdown... Greg Farnum
11:21 AM Documentation #1819 (Resolved): document librados python api
Josh Durgin
11:21 AM rbd Documentation #1818 (Closed): document librbd C++ api
Josh Durgin
11:20 AM Documentation #1817 (Closed): document librados C++ api
Josh Durgin
11:20 AM rbd Documentation #1816 (Closed): document librbd C api
Use similar examples to the python api docs. Josh Durgin
11:19 AM Documentation #1815 (Resolved): document librados C api
Document the librados C api with doxygen. Josh Durgin
10:00 AM Documentation #1814 (Resolved): doc: openstack + ceph install howto
Sage Weil
09:58 AM rgw Documentation #1813 (Resolved): doc: document radosgw api diffs with s3
move from google docs or wherever. clean up. maintain going forward. Sage Weil
09:50 AM Bug #1683 (In Progress): librados: list objects should also return locator key
Sage Weil
09:48 AM Bug #1744: teuthology: race with daemon shutdown?
any additional teuthology logging we can add to sort out what is happening? Sage Weil
09:47 AM RADOS Bug #1794 (Resolved): crush: creating/destroying buckets of zero items
fixed by commit:ca002a3389877f5e150659649e27e7ae59d7d402 Sage Weil
09:45 AM Feature #1782: mon: dump key cluster stats via perfcounter
Sage Weil
08:53 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
Verify that last failure was running a commit that included the fix? Sage Weil
08:38 AM Linux kernel client Bug #1812 (Resolved): iput scheduling while atomic
iput can sleep, but is called with spinlocks held in some cases.... Sage Weil
08:34 AM Bug #1750 (In Progress): xattr errors silently ignored, cause trouble later
Sage Weil
08:31 AM Bug #1750: xattr errors silently ignored, cause trouble later
Shouldn't the FileStore have asserted on the -28? Sage Weil
03:19 AM Linux kernel client Bug #1795: break d_lock > s_cap_lock ordering
Seems fixed here now with git branch wip-d-lock. Amon Ott
03:18 AM Linux kernel client Bug #1762: i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
Seems to be fixed here now with git commits be655596b3de5873f994ddbe205751a5ffb4de39 (for-linus) and 1a2fe05d296a35da... Amon Ott

12/10/2011

12:31 AM Revision cf279a8b (ceph): workunits: print tests pjd runs
This will tell us which ones actually failed within a test suite.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost....
Josh Durgin

12/09/2011

11:23 PM Revision 8064440d (ceph): Merge branch 'wip_pgls'
Samuel Just
11:22 PM Revision 864847b2 (ceph): pybind: add object locator support to pybind pool listing
list_objects returns Object(). Object therefore now has an optional
locator_key parameter which will set up the obje...
Samuel Just
09:44 PM Revision 111c12ce (ceph): ReplicatedPG: collection_list_handle_t is now an hobject_t
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
09:44 PM Revision 4ce7dd48 (ceph): rados.cc: add --object-locator and object locator output to ls
--object-locator locator causes io to use the specified locator. For
objects with non-empty locators, rados pool ls ...
Samuel Just
09:44 PM Revision 798ef38b (ceph): osd: delay pg list on a snapid until missing is empty
We cannot determine from the missing set whether an object existed
at a given snap.
Signed-off-by: Samuel Just <samu...
Samuel Just
04:53 PM CephFS Bug #1811 (Duplicate): 2 pjd chown tests failed on cfuse
From teuthology:~teuthworker/archive/nightly_coverage_2011-12-09-a/4061/teuthology.log:... Josh Durgin
04:32 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
A disk error prevented me from getting logs before:... Josh Durgin
03:42 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
Got the same trace on sepia18 while running mkfs.ext3 on an rbd image. Josh Durgin
03:18 PM Bug #1758 (New): OSD segfault in SimpleMessenger::send_message
This happened again yesterday. Core is in teuthology:~teuthworker/archive/nightly_coverage_2011-12-08-a/3954/remote/u... Josh Durgin
11:18 AM Messengers Bug #1803: msgr: behave better when ending TCP connections
I'm going to see if I can handle this in userspace today — fixing it in the kernel client will be another ticket. Greg Farnum
11:14 AM Feature #1810 (Resolved): monclient: timeouts?
It's been suggested that maybe certain categories of clients which are used for gathering statistics rather than comm... Greg Farnum
11:13 AM Messengers Feature #1809 (New): msgr: limit simultaneous connections
Right now SimpleMessenger has no mechanism for limiting the number of simultaneous connections it holds open. This is... Greg Farnum
11:10 AM Feature #1808 (Rejected): filestore: gracefully handle EMFILE
If the FileStore gets an EMFILE error it asserts out without attempting to handle the problem. I don't know whether t... Greg Farnum
09:34 AM Revision e2a94505 (ceph): obsync: add swift support to obsync
A single "url" doesn't make sense for a swift object store the way it does
for an S3 store or local file, so this com...
Kyle Marsh
07:15 AM Bug #1797: configure doesn't link to pthread on Fedora 14 on linking librados-config
I just find out it works when you call configure with
LIBS="-lpthread" ./configure
Still a bug, though, the c...
Guido Winkelmann
02:01 AM Revision d21f4abc (ceph): msgr: turn up socket debug printouts
These shouldn't be too common and will help in debugging
socket leaks.
Signed-off-by: Greg Farnum <gregory.farnum@dr...
Greg Farnum
01:47 AM Revision a768ad73 (ceph): coverage: don't generate html reports for each test
These can always be generated from the lcov files later, right now they just waste space. Josh Durgin
01:17 AM Revision 7b52dd14 (ceph): syslog: ignore 'task blocked' warnings
These will happen under heavy load (usually on the osd). Josh Durgin
12:36 AM Revision 891025e5 (ceph): udev: drop device number from name
The device number depends on how many rbd images have been
mapped. Removing it makes the name determined solely by th...
Josh Durgin

12/08/2011

11:35 PM Revision 6b8588b7 (ceph): Use btrfs for regression tests
Some of the tests (particularly the s3 tests) use very long filenames
which trigger bugs related to ext4 xattr handli...
Samuel Just
09:10 PM Revision a5606ca4 (ceph): pybind: trivial fix of missing argument
Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Henry Chang
06:40 PM Bug #1805 (Rejected): OSD: fd leak
I was trying to figure out why the OSD was generating ~600 new sessions in the 4.5 seconds after starting up, when I ... Greg Farnum
06:20 PM Bug #1805 (Need More Info): OSD: fd leak
*sigh* It appears that I didn't manage to gather the correlated data that I thought I did. After an audit of who uses... Greg Farnum
02:10 PM Bug #1805 (Rejected): OSD: fd leak
There's an fd leak in the OSD. It looks like it's probably related to doing lots of OSDMap advancements at once, base... Greg Farnum
06:35 PM Bug #1807 (Can't reproduce): CentOS compile error in perfglue/heap_profiler.cc
on a CentOS system, I did a git fetch/merge followed by a make clean,
and got a compilation error in perf
CXX ...
Anonymous
05:59 PM Bug #1741: teuthology: failed to untar
Doesn't look like any other tests that day had the same machines locked while this was run. I think this might just b... Josh Durgin
05:40 PM Bug #1741: teuthology: failed to untar
It was 2662 that had this error. Josh Durgin
05:21 PM Feature #1800 (Resolved): qa: run osd tests on btrfs
Josh Durgin
04:42 PM Revision e4db1297 (ceph): crush: whitespace
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 PM Revision 808763ea (ceph): osdmap: initialize cluster_snapshot_epoch
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:41 PM Revision c94590ab (ceph): crush: set max_devices=0 for map with empty buckets
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
04:06 PM Revision ca002a33 (ceph): crush: fix stepping on unallocated memory
If size is 0 we can't write here.
Reported-by: pankaj singh <psingh.ait@gmail.com>
Signed-off-by: Sage Weil <sage.we...
Sage Weil
03:56 PM CephFS Bug #1806 (Can't reproduce): MDS won't start
ceph-mds fails to enter replay on start even though mon appears to instruct it to do so, all 3 mds processes remain i... Adam Jacob Muller
03:34 PM Bug #1750 (Rejected): xattr errors silently ignored, cause trouble later
I've updated the regression suite to use btrfs. Samuel Just
02:16 PM Bug #1750: xattr errors silently ignored, cause trouble later
I was able to reproduce this once with logging. It appears to be the ext4 xattr limitation.
2011-12-08 12:45:41.2...
Samuel Just
11:31 AM CephFS Bug #1788: msgr file descriptor leak
I guess this bug should be considered fixed by commit:8c4f4748e8b683f5b4ea939295793421c0ab7b61 in the wip-messenger b... Greg Farnum
05:19 AM Revision d940d68d (ceph): client: trim lru after flushing dirty data
Shouldn't matter, but it would be interesting to see if this affects
#1737.
Signed-off-by: Sage Weil <sage.weil@drea...
Sage Weil
05:19 AM Revision 1545d03c (ceph): client: unmount cleanup
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
05:19 AM Revision f3c90f8d (ceph): client: wait for sync writes even with cache enabled
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
05:19 AM Revision adbe3639 (ceph): client: send umount warnings to log, not stderr
stderr isn't usually open anyway.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil

12/07/2011

11:20 PM Revision e69057e4 (ceph): internal: check syslog for errors
This should catch lockdep warnings and mark tests with them as failed. Josh Durgin
07:40 PM Revision 9ab445a4 (ceph): ObjectStore: Add collection_list_partial for hash order
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Sage Weil
07:40 PM Revision 997265a2 (ceph): os/HashIndex: some minimal debug output
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:40 PM Revision 0807e7d5 (ceph): hobject_t: make filestore_hobject_key_t 64 bits
So we can return 0x100000000 when max=true.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:40 PM Revision 322f93a2 (ceph): hobject_t: encode max properly
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:40 PM Revision 717621f6 (ceph): librados,Objecter,PG: list objects now includes the locator key
Previously, there was no way to recover the locator key used to create
and object. Now, rados_objects_list_next and ...
Samuel Just
07:40 PM Revision 2d3721c6 (ceph): ObjectStore,ReplicatedPG: remove old collection_list_partial
No need for the old collection_list_partial instance: it's cleaner to
just use an hobject_t as the collection list ha...
Samuel Just
07:40 PM Revision 2026450b (ceph): hobject_t: define max value
Create a max value that is greater than all other values.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:40 PM Revision 348321a5 (ceph): hobject_t: sort by (max, hash, oid, snap)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
07:40 PM Revision cada2f2e (ceph): object.h: Sort hobject_t by nibble reversed hash
To match the HashIndex ordering, we need to sort hobject_t by the nibble
reversed hash. We store objects in the file...
Samuel Just
07:40 PM Revision 63e3d864 (ceph): hobject_t: define explicit hash, operator<<; drop implicit sobject_t()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
05:56 PM Bug #1804 (Closed): filestore: unexpected EINVAL
Core file and binary are on gitbuilder-gcov-amd64:~/bug_1804.
The data is still on sepia24 for inspection....
Josh Durgin
05:20 PM Messengers Bug #1803: msgr: behave better when ending TCP connections
This actually caused a deadlock with ffsb on the kernel client - ffsb ended up with 1006 connections in the CLOSING s... Josh Durgin
04:56 PM Messengers Bug #1803 (Won't Fix): msgr: behave better when ending TCP connections
TV is telling me that if we're not confirming that each side of the connection calls ::shutdown() on the socket, we'r... Greg Farnum
04:51 PM Bug #1791 (Resolved): osd: assert(0) in sub_op_modify
This looks like the objecter bug, fixed by commit:2f5bd5f737e831a03beb93c3928c74b59a59052e Sage Weil
03:38 PM Bug #1763 (Resolved): qa: need to run qa tests on kernel with lockdep enabled
Lockdep was already enabled, but we weren't marking runs as failed if errors appeared in syslog. Teuthology commit e6... Josh Durgin
01:49 PM CephFS Bug #1737: ceph-fuse crash in xlist::remove
This happened again from a different path in teuthology:~teuthworker/archive/nightly_coverage_2011-12-07-a/3843/remot... Josh Durgin
11:46 AM Feature #1802 (Resolved): qa: test to exercise divergent osd logs
- generate some write/overwrite workload with many concurrent writes
- extend ceph_manager to pause (kill -STOP) an ...
Sage Weil
11:18 AM rgw Bug #1801 (Resolved): rgw: radosgw-admin remove subuser and related swift key in a single command
Yehuda Sadeh
11:15 AM Feature #1800 (Resolved): qa: run osd tests on btrfs
i think all the code is there, but we need to make the night runs actually do it. Sage Weil
10:41 AM Feature #1799 (Resolved): qa: add 'rados --load-gen' test(s)
maybe a few tests with a range of options, if appropriate Sage Weil
10:41 AM Feature #1798 (Rejected): qa: add rados/librados tests (RadosModel)
Sage Weil
10:10 AM Feature #1784 (Duplicate): osd: redo pgls api
Sage Weil
09:27 AM rbd Feature #1790: rbd: have a way of establishing configured mappings at boot time
Single-file configuration is more annoying to handle with automated tools, file-per-device gives you good atomicity o... Anonymous
09:01 AM Bug #1778 (Resolved): Error after installing an iso-image via qemu / rbd-image
Hi Oliver,
You can use rbd to take live snapshots with the same consistency as with snapshotting images on nfs. Th...
Josh Durgin
03:31 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
Hi Josh,
well, the small fix does it, no more crashes.
But, of course I would love to have back my live-snapsho...
Oliver Francke
08:57 AM Bug #1797 (Resolved): configure doesn't link to pthread on Fedora 14 on linking librados-config
When building ceph 0.39 on Fedora 14, the build process fails with the
following messages:
CXXLD librados-con...
Guido Winkelmann
08:43 AM CephFS Bug #1796 (Resolved): mds: exit cleanly on EBLACKLISTED
... Sage Weil
08:31 AM Linux kernel client Bug #1795 (Resolved): break d_lock > s_cap_lock ordering
... Sage Weil
08:01 AM RADOS Bug #1794 (Resolved): crush: creating/destroying buckets of zero items
we still try to calloc the length zero array
and then try to free it later...
Sage Weil
07:32 AM CephFS Bug #1047: mds: crash on anchor table query
Got it again with 0.39. Still there. Amon Ott
12:16 AM Revision 95e63247 (ceph): workunit: set client id and secretfile env vars
These are used by the kernel rbd workunit to know how to map images.
Signed-off-by: Josh Durgin <josh.durgin@dreamho...
Josh Durgin

12/06/2011

11:56 PM rbd Feature #1790: rbd: have a way of establishing configured mappings at boot time
What if your image is not in the pool "rbd" ?
I was thinking about a 'rbdtab' file:...
Wido den Hollander
11:10 AM rbd Feature #1790 (Resolved): rbd: have a way of establishing configured mappings at boot time
We need to be careful about the config format, to make automatic editing easy (think Chef).
First draft:
/etc/c...
Anonymous
11:22 PM Revision 745be30f (ceph): gitignore: Ignore src/keyring, as created by vstart.sh
Commit 86c34ba9ee8c883b71a8449c3c261154365c35ae changed
the filename but not .gitignore.
Signed-off-by: Tommi Virtan...
Tommi Virtanen
10:44 PM Revision a1ebd725 (ceph): ReplicatedPG: don't crash on empty data_subset in sub_op_push
If data_subset is empty (i.e., the data we pulled is no longer useful),
we should mark complete false and continue ra...
Samuel Just
10:24 PM Revision 03b03553 (ceph): ReplicatedPG: do not ->put() scrub messages when adding to a WorkQueue.
This function is passing a reference from PG::active_rep_scrub to
the req_scrub_wq, not eliminating the reference (an...
Greg Farnum
10:20 PM Revision 8afa5a5d (ceph): workunits: fix secret file and temp file removal for kernel rbd
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
09:36 PM Revision bcd26fca (ceph): workunits: make rbd kernel workunit executable
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
08:13 PM Revision 2bdf9078 (ceph): doc: Reorganize pip calls to use a requirements file.
The conditional before running pip install was unnecessary,
"pip install" on already installed packages is fast (as l...
Tommi Virtanen
08:07 PM Revision 200d7c89 (ceph): doc: Switch diagram tools from dia to ditaa.
Now you can create diagrams easily with the ".. ditaa::"
directive in the Sphinx documents.
admin/build-doc now chec...
Tommi Virtanen
06:50 PM Revision 20b7af79 (ceph): doc: fix typo
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:50 PM Revision 33753c82 (ceph): filestore: send back op error to log, not stderr
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
06:31 PM Revision 66b6b1bf (ceph): workunits: add some tests for kernel rbd
This covers some snapshot and resize functions that aren't tested by fs benchmarks.
Signed-off-by: Josh Durgin <josh...
Josh Durgin
06:26 PM Revision 575f717f (ceph): rbd: allow snapshots to be mapped
unmap and showmapped already support snapshots. map should too.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
06:26 PM Revision 01d30e6a (ceph): secret: fix error check
add_key will return -1 when an error occurs, which should be handled at a higher level and not printed here.
Signed-...
Josh Durgin
06:26 PM Revision 0ad0fbfe (ceph): secret: add is_kernel_secret function
This will let us know whether we can add a key mount option
if no secret is specified.
Signed-off-by: Josh Durgin <j...
Josh Durgin
06:26 PM Revision 274f4890 (ceph): rbd, mount.ceph: use pre-stored secret if available
If a secret is specified, store and use it, but otherwise
check for a pre-existing secret to use.
Signed-off-by: Jos...
Josh Durgin
06:26 PM Revision 16a211bf (ceph): ceph-rbdnamer: include snapshot name if present
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
06:26 PM Revision fd9556f0 (ceph): rbd: the showmapped command shouldn't connect to the cluster
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Josh Durgin
06:02 PM Linux kernel client Bug #1793 (Can't reproduce): NULL pointer dereference at try_write+0x627/0x1060
Found in sepia50's console:... Josh Durgin
04:44 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
Josh Durgin
04:44 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
The bug is in the qemu driver - the fix is "in our qemu repo":https://github.com/NewDreamNetwork/qemu-kvm/commit/7ee2... Josh Durgin
09:28 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
Hi Oliver,
That gdb session is actually an entirely different crash - I'll take a closer look at both of these tod...
Josh Durgin
02:14 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
Well Josh,
being quite busy... and need to understand ( not a "real-coder" these days anymore ;-) ) how to configu...
Oliver Francke
04:34 PM Revision ddc11a8f (ceph): test_rados.py: clean up after EEXIST test
This extra pool caused subsequent pool tests to fail.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
02:35 PM Bug #1758 (Resolved): OSD segfault in SimpleMessenger::send_message
I checked out a core dump, and the OSD is calling send_message with a null Connection* from PG::replica_scrub::2895. ... Greg Farnum
11:53 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
And in teuthology:~teuthworker/archive/nightly_coverage_2011-12-06-a/3757/remote/ubuntu@sepia66.ceph.dreamhost.com/lo... Josh Durgin
11:52 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
Happened again today in teuthology:~teuthworker/archive/nightly_coverage_2011-12-06-a/3772/remote/ubuntu@sepia66.ceph... Josh Durgin
02:01 PM CephFS Bug #1702 (Can't reproduce): Ceph MDS crash + client mount problem
Sage Weil
02:01 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
I think the next step here is to run the mds under valgrind. Sage Weil
02:00 PM Bug #1490 (Resolved): cfuse assert failure: assert(ob->last_commit_tid < tid)
Sage Weil
11:34 AM CephFS Bug #1792 (Can't reproduce): crash in ceph-mds
This is the full log from teuthology:~teuthworker/archive/nightly_coverage_2011-12-01-b/3516/remote/ubuntu@sepia70.ce... Josh Durgin
11:25 AM Bug #1791 (Resolved): osd: assert(0) in sub_op_modify
From teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-a/3569/remote/ubuntu@sepia6.ceph.dreamhost.com/log/o... Josh Durgin
11:19 AM Bug #1750 (New): xattr errors silently ignored, cause trouble later
Happened again after s3tests in teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-b/3624/teuthology.log. Josh Durgin
11:09 AM CephFS Bug #1675: mds: failed rstat assert
Happened during fsstress in teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-b/3593/remote/ubuntu@sepia92.... Josh Durgin
11:07 AM Bug #1789 (Resolved): mon: failed assert(paxosv == pg_map.version)
From teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-b/3603/remote/ubuntu@sepia44.ceph.dreamhost.com/log/... Josh Durgin
10:54 AM Bug #1530: osd crash during build_inc_scrub_map
Another one crashed in PG::replica_scrub yesterday. core is in teuthology:~teuthworker/archive/nightly_coverage_2011-... Josh Durgin
06:01 AM CephFS Bug #1047: mds: crash on anchor table query
Updated Ceph to 0.39 and the bug seems to be gone. Amon Ott
01:33 AM Revision 54758abc (ceph): Merge remote branch 'gh/stable'
Sage Weil
12:16 AM Revision 9512aed5 (ceph): doc: fix rst syntax
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

12/05/2011

10:07 PM Revision 7178f1ca (ceph): doc: document monitor cluster expansion/contraction
Pretty sure my rst syntax is wrong.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:33 PM Revision 16f79282 (ceph): cephtool: fix shutdown
Fix 'ceph -w' brokenness from commit ad13d0b7.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
07:21 PM Revision 019597e6 (ceph): filejournal: make FileJournal::open() arg slightly less weird
Pass in fs_op_seq (last_committed_seq), not the next expected seq, so we
can avoid subtracting and adding 1 in odd pl...
Sage Weil
07:21 PM Revision bfbc4324 (ceph): Merge branch 'stable'
Sage Weil
07:21 PM Revision 86c34ba9 (ceph): vstart.sh: .ceph_keyring -> keyring
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
07:15 PM CephFS Bug #1774: client: files become inaccessible in large directories (with snapshots?)
Some interesting findings... It appears that the problem has nothing to do with the mds, but with the fuse client. ... Alexandre Oliva
06:53 PM Revision 1e3da7ed (ceph): filejournal: remove bogus check in read_entry
It is perfectly fine to read events that are older than the fs's seq from
the journal; open() will skip them when pos...
Sage Weil
06:08 PM Revision dbd7a3b4 (ceph): Rename "testrados" task to not begin with "test".
See commit e80c32c44293e6453cce1bf89ad3cf5b1b4917ab in
teuthology.git
Tommi Virtanen
06:07 PM Revision e80c32c4 (ceph): Rename "testrados" and "testswift" tasks to not begin with "test".
Anything "test*" looks like a unit test, and shouldn't be used for
actual code.
Tommi Virtanen
06:07 PM Revision 9598e479 (ceph): Rename "testrados" and "testswift" tasks to not begin with "test".
Anything "test*" looks like a unit test, and shouldn't be used for
actual code.
Tommi Virtanen
06:02 PM Revision 0dd4d69f (ceph): Fix unit tests for SSH keep-alive setting.
Commit 6e3e0d7cdcb5ba70f938f0850a8828aca2753ab5 failed to pass
unit tests.
Tommi Virtanen
05:37 PM Revision dc167bac (ceph): filejournal: set last_committed_seq based on fs, not journal
last_committed_seq is the last seq committed to the fs, not the journal.
Set it when we begin replay with the fs prov...
Sage Weil
04:15 PM CephFS Bug #1788 (Resolved): msgr file descriptor leak
With our Hadoop workload (lots of client connections), this problem occurs every couple hours -- although this is the... Noah Watkins
02:18 PM Bug #1786 (Resolved): ceph -w goes dead after 5 minutes
commit:16f79282cd0132c3633216f51fbbf0f93a0aec61 Sage Weil
11:13 AM Bug #1786 (Resolved): ceph -w goes dead after 5 minutes
Sage Weil
02:18 PM Bug #1785 (Resolved): osd: os/FileJournal.cc: 1011: FAILED assert(seq >= last_committed_seq)
commit:1e3da7edcf8881b10f35879e4b5b6be93167c636 Sage Weil
09:14 AM Bug #1785 (Resolved): osd: os/FileJournal.cc: 1011: FAILED assert(seq >= last_committed_seq)
Sage Weil
11:22 AM CephFS Bug #1787 (Closed): mds: laggy oneshot replays pollute mdsmap
... Sage Weil
10:53 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
I lost my setup over the weekend, so I'm not going to be able to try the wip-truncate branch on the deployment to see... Sam Lang

12/03/2011

03:11 PM Feature #1784 (Duplicate): osd: redo pgls api
include locators
use hobject_t as iterator (and hopefully make the objecter split/merge coping logic less ugly in th...
Sage Weil
03:09 PM Feature #1783 (Resolved): osd: scrub incrementally across hash range using MOSDPGScan
Current scrub will not scale to large PGs. Sage Weil
01:01 AM CephFS Bug #1047: mds: crash on anchor table query
Attached a log of a full run up to the crash. MDS tries to recover from some problem, replays and crashes. Amon Ott

12/02/2011

11:35 PM Revision 4a0b00a0 (ceph): mon: stub perfcounters for monitor, cluster
The 'mon' perfcounter is for the local daemon and is always registered.
The 'cluster' perfcounter is for cluster sta...
Sage Weil
11:27 PM Revision 6dd81485 (ceph): osd: rename {take -> requeue}_object_waiters
It calls osd->requeue_ops(), so make naming more consistent and avoid
confusing people like me.
Signed-off-by: Sage ...
Sage Weil
11:27 PM Revision 8bbe576c (ceph): osd: safely requeue waiting_for_ondisk waiters on_role_change
This could conceivably cause the reply ordering mismatch seen in bug
#1490. Not sure why we didn't also fix this cal...
Sage Weil
09:38 PM Revision c8831004 (ceph): rados.py: add list_pools method
Signed-off-by: Eric Chen <Eric_YH_Chen@wistron.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin
08:06 PM Revision 6b4b6595 (ceph): Merge branch 'stable'
Sage Weil
07:28 PM Revision 06228716 (ceph): Doc: add a conceptual overview of the peering process
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com> Mark Kampe
07:19 PM Revision c45a8491 (ceph): mds: remove obsolete doc
Sage Weil
06:52 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
Hi Oliver,
With snapshot=on data is never saved to the backing device - the original file is not modified unless y...
Josh Durgin
05:31 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
Well Josh,
attached you will find a crash, qemu-system... started without "-daemonize" to see what's going on ;-)
...
Oliver Francke
04:46 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
Hi Josh,
I have just made a session with savevm/loadvm, once without/with the snapshot-option, now with qemu-1.0. ...
Oliver Francke
05:58 PM Revision 0c183ec7 (ceph): crush: ignore forcefed input that doesn't exist
This might happen if, e.g., the file_layout specifies an osd that later
is removed from the cluster entirely. Just i...
Sage Weil
05:47 PM Revision faf5ce62 (ceph): Revert "CrushWrapper: ignore forcefeed if it does not exist"
This reverts commit 6fbab6da6942c238d40a6b4f1680a7e6da463289.
This fails a unit test.
And I change my mind.. I thin...
Sage Weil
05:01 PM Revision 321ecdab (ceph): v0.39
Sage Weil
05:00 PM Revision 75aff023 (ceph): OSDMap: build_simple_from_conf pg_num should not be 0 with one osd
Previously, pg_num would end up set to 0 if osd.0 is the only osd.
Signed-off-by: Samuel Just <samuel.just@dreamhost...
Samuel Just
03:51 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Sorry - haven't had a chance yet. I'll try it on Monday. Sam Lang
11:50 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Sam, did you get a chance to try this? Sage Weil
03:43 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
If we're lucky this was caused by taking waiters improperly, which Sage fixed in commit:8bbe576cab9ecdbfea939ad3d7866... Greg Farnum
03:40 PM Feature #1782: mon: dump key cluster stats via perfcounter
commit:4a0b00a0f29a87965925e0b44c997bece96b9936 stubs this out. just need to populate the perfcounter with the relev... Sage Weil
02:20 PM Feature #1782 (Resolved): mon: dump key cluster stats via perfcounter
This may be a minor abuse of the perfcounter intent, but it lets us get cluster stats using a common mechanism (via c... Sage Weil
03:22 PM Feature #390 (In Progress): Implement bdrv_snapshot_goto (Rollback), bdrv_snapshot_delete
Have some functions, trying to get a setup to test them with. Greg Farnum
01:54 PM Feature #1082 (Rejected): obsync: swift support
dho guys are doing this. Sage Weil
01:27 PM Feature #1781 (Resolved): qa: readwrite and roundtrip rgw tests in qa suite
Sage Weil
01:01 PM rgw Bug #1780 (Resolved): swift: auth response should return X-Auth-Token instead of X-Storage-Token
Yehuda Sadeh
11:56 AM Bug #1750 (Resolved): xattr errors silently ignored, cause trouble later
Sage Weil
11:54 AM Bug #1757 (Closed): oi disagrees with stat, or error code on stat
Sage Weil
11:52 AM Bug #1679 (Can't reproduce): assertion failure is_replica()
and old codepending new code. Sage Weil
11:52 AM Bug #1688 (Won't Fix): Benjamin: pg stuck in scrub
old code. Sage Weil
11:50 AM Bug #1689 (Can't reproduce): osd: segfault in recover_primary
going to ignore this and see how the new backfill code fares. Sage Weil
11:48 AM CephFS Bug #1775 (Need More Info): mds startup: _replay journaler got error -22, aborting, possible regr...
Without logs, it's hard to say, but it looks like something caused the OSD to drop a write (or series of writes). No... Sage Weil
11:46 AM Bug #1617 (Won't Fix): pgs stuck down and peering with only one osd down and out
the new code will have an explicit 'incomplete' state when peering fails, instead of being 'stuck'. let's ignore thi... Sage Weil
09:44 AM CephFS Bug #1047 (Need More Info): mds: crash on anchor table query
Amon Ott just hit this one. Sage Weil
04:36 AM Revision 2f5bd5f7 (ceph): objecter: initialize global_op_flags to zero
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Sage Weil
12:13 AM Revision 813523a6 (ceph): Doc: delete gratuitous index.html
It was not an index, and seems to contain recommendations
for system configuration. I have renamed it to confusing.t...
Mark Kampe
12:12 AM Revision 48165af5 (ceph): Doc: complete reversion of architecture.rst
(abandon in progress improvements until everything works)
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>
Mark Kampe
12:12 AM Revision 3c7a82a6 (ceph): Doc: deleted gratuitious PlanningImplementation.html,
which was a copy of PlanningImplementation.txt
(and not html at all).
restored previous index.rst, which was overwri...
Mark Kampe
12:11 AM Revision fdf3f7bd (ceph): Doc: Restore the previous version of architecture.rst
it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphin...
Mark Kampe
12:07 AM Revision 4cfe0815 (ceph): doc: change state model from .svg to .png
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com> Mark Kampe

12/01/2011

10:41 PM Revision 1bbf9ae6 (ceph): fixed ubuntu version typo
Steve MacGregor
10:20 PM Revision 6fbab6da (ceph): CrushWrapper: ignore forcefeed if it does not exist
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
08:38 PM Revision 363ebb6c (ceph): librbd: report an error if rbd header does not match
This will fail on future incompatible versions of the header format.
Signed-off-by: Josh Durgin <josh.durgin@dreamho...
Josh Durgin
07:15 PM Revision cce67171 (ceph): Merge branch 'wip_local_reads'
Greg Farnum
07:15 PM Revision d4aef202 (ceph): hadoop: apache license.
We haven't made explicit that the Hadoop Java code is under the Apache
License. Do so (with permission from the other...
Greg Farnum
05:40 PM Messengers Bug #1747 (Need More Info): msgr: osd connection originates from wrong port
The blank address isn't a problem; it's due to the in_hbmsgr not being bound (deliberately). Unfortunately I've been ... Greg Farnum
05:17 PM Revision 348c71c4 (ceph): mds: fix blocking in standby replay thread
We need to hold mylock before waiting on the cond or else we get
./common/Cond.h: In function 'int Cond::Wait(Mutex&...
Sage Weil
05:17 PM Revision f6ee3699 (ceph): global: make daemon banner print explicit
This eliminates some flags and avoids annoying cases where the banner is
printed but we don't want to see it.
Signed...
Sage Weil
04:19 PM Revision 5828009e (ceph): mds: fix usage text
Filename is not optional.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
01:16 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
There's certainly a difference with the snapshot parameter - it doesn't store anything in the rbd image unless you us... Josh Durgin
12:09 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
Hi Josh,
at least my experience showed a different behaviour: no reliable snapshots and even crashes of qemu-syste...
Oliver Francke
10:54 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
You don't need any special qemu options to use snapshots - the snapshot option is confusingly named. The qemu 'snapsh... Josh Durgin
09:30 AM Bug #1778 (Resolved): Error after installing an iso-image via qemu / rbd-image
Hi *,
we are currently running:
ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) fro...
Oliver Francke
12:10 PM CephFS Bug #1775: mds startup: _replay journaler got error -22, aborting, possible regresion?
stick a
continue;
after the set_read_pos() call to avoid the second crash.
Sage Weil
08:36 AM CephFS Bug #1775: mds startup: _replay journaler got error -22, aborting, possible regresion?
No I didn't have osd logging enabled, I'll provide you with journal in few minutes. Szymon Szypulski
08:26 AM CephFS Bug #1775: mds startup: _replay journaler got error -22, aborting, possible regresion?
Can you dump the mds journal so we can get a closer look at the corruption? Something like
ceph-mds -i foo --dum...
Sage Weil
12:24 AM CephFS Bug #1775 (Resolved): mds startup: _replay journaler got error -22, aborting, possible regresion?
ubuntu natty, kernel 3.2-rc2, ceph 0.38 (stable from git) with patch from #1756 and workaround for #1757
setup
s1...
Szymon Szypulski
10:13 AM rgw Bug #1779 (Resolved): rgw: swift auth returns wrong error code when unexisting user is given
returns 404 instead of 403 Yehuda Sadeh
09:12 AM rgw Bug #1777 (Resolved): rgw: user info modification is not atomic
e.g., adding keys, etc.
I think it's more important to identify cases where operations left system in an inconsist...
Yehuda Sadeh
09:05 AM rgw Feature #1776 (Resolved): rgw: swift auth prefix should be configurable (and optional)
Yehuda Sadeh
01:07 AM Revision 50c4b312 (ceph): Handle interactive-on-error also when error is from contextmanager exit.
Closes: http://tracker.newdream.net/issues/1745 Tommi Virtanen

11/30/2011

07:21 PM CephFS Bug #1774 (Resolved): client: files become inaccessible in large directories (with snapshots?)
Taking snapshots of certain directories within ceph that hold backups of root filesystems of my openmoko phone causes... Alexandre Oliva
05:57 PM Revision 353ee000 (ceph): mds: adjust flock lock state on export
Looks like this was missed when flocklock was added. Did a quick grep and
it doesn't look like it is missing anywher...
Sage Weil
05:49 PM Feature #1773 (Resolved): rbd: class interface for header interaction
This will include:
* create(size, order, features)
* get_info(image)
* get_snapc
* snap_add
* later snap_add...
Josh Durgin
05:43 PM Feature #1772 (Resolved): rbd: define new on-disk header format
This should include several new things:
* CompatSet
* read-only flag
* parent_{pool, image_id, snap_id}
* list<...
Josh Durgin
05:28 PM Bug #1771 (Resolved): rbd: delete snapshots when image is deleted
Currently the snapshots are left around with no way to access them. Josh Durgin
05:23 PM CephFS Bug #1770 (Can't reproduce): directory nonexistent on kernel_untar_build.sh
... Sage Weil
05:18 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
the tasks were in nightly_coverage_2011-11-30-a
3433: collection:basic clusters:fixed-3.yaml tasks:kclient_workuni...
Sage Weil
05:13 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
Happened twice today:... Sage Weil
05:08 PM Feature #1745 (Closed): teuthology: make interactive-on-error stop further cleanup
... Anonymous
05:06 PM Bug #1690 (Can't reproduce): osd re-created from scratch will crash on start-up
Sage Weil
03:19 PM CephFS Bug #1753 (Won't Fix): ceph copy raw images from qemu incorrectly
Unfortunately, right now making Ceph report sparse files correctly would be prohibitively expensive. It can be done, ... Greg Farnum
02:57 PM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
To create the sparse file qemu-img just calls ftruncate. It does nothing fs-specific, so this can be replicated with ... Josh Durgin
11:10 AM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
The file copy took 3 minutes. It is ok for 3Gb file but not for 100Kb file. max mikheev
09:43 AM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
I'm a little confused here. Ceph has never reported only the used space for a file; doing so is prohibitively expensi... Greg Farnum
02:20 PM Messengers Bug #1747 (In Progress): msgr: osd connection originates from wrong port
The problem here is somewhere on osd.2 — osd.1 is using the address that osd.2 is providing, and you can see that osd... Greg Farnum
01:17 PM CephFS Bug #1756 (Resolved): mds crash right after successful recovery
Sage Weil
11:28 AM Linux kernel client Bug #1769 (New): osd_client: susceptibility to low memory deadlocks
We could be trying to flush the cache in order to free up memory, and find ourselves unable to allocate a ceph_osd or... Anonymous
11:21 AM Linux kernel client Cleanup #1768 (Closed): osd_client: gratuitous ceph_monc_request_next_osdmap calls
kick_requests() is called from within a loop that iterates through multiple OSD map updates ... which means that it m... Anonymous
11:15 AM Linux kernel client Bug #1767 (Resolved): osd_client: send_request() cannot fail
The static __send_request() routine is sure to succeed in queuing its request for the specified osd client, yet ceph_... Anonymous
11:12 AM Linux kernel client Bug #1766 (New): mon_client: sends request before authentication
The passed request is sent unconditionally, whether or not we have finished authenticating.
If we have not yet com...
Anonymous
10:11 AM Bug #1765 (Resolved): osd: 'call' op can return data even if op is modifying
Not sure if it'd actually return data, but in any case the api is ambiguous. If it does return data it breaks idempot... Yehuda Sadeh
10:07 AM Feature #1764 (Rejected): osd classes: add an optional source object
This can be very useful. Source object should have the same locator as the target object. Similar to clone-range. An ... Yehuda Sadeh
10:03 AM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
This didn't turn out to have anything to do with #1727, did it? Greg Farnum
09:36 AM Linux kernel client Bug #1762: i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
Argh, this is a real pain. igrab() requires i_lock, which we use extensively to protect complicated changes. In the... Sage Weil
09:19 AM Linux kernel client Bug #1762 (Resolved): i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
Reported by Amon Ott on ML.... Sage Weil
09:25 AM Bug #1763 (Resolved): qa: need to run qa tests on kernel with lockdep enabled
We need to catch lock ordering regressions like #1762 in our nightly runs. Sage Weil
02:14 AM Revision 2443878b (ceph): Objecter: loop the right direction when searching for local replicas
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
12:35 AM Revision 1c696b65 (ceph): doc: Add peering state diagram
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
12:20 AM Revision 2918b501 (ceph): Move kclient multiple_rsync workunit to stress collection.
Bug #1760 keeps being triggered by this. Josh Durgin

11/29/2011

11:36 PM Revision 30ede648 (ceph): Makefile: ipaddr.h, pick_address.h
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:05 PM rbd Cleanup #1761: krbd: make block/segment naming consistent
Segment refers to a partial range, a part of an object, so I think we should keep it in this context. So object shoul... Yehuda Sadeh
09:15 PM rbd Cleanup #1761 (Resolved): krbd: make block/segment naming consistent
pick consistent term for an object (segment or object, but not block) and use throughout. Sage Weil
09:31 PM Revision 77a62fdc (ceph): Makefile: add missing uuid.h to tarball
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:30 PM Revision ebb585d9 (ceph): Objecter: fix local reads in recalc_op_target
We want to use the actual OSD, not the index into the array!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
05:27 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Actually, maybe you run with the wip-truncate branch on the mds and see if you triggers a failed assertion on the MDS... Sage Weil
05:19 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
Do you by chance have the log preceeding the first crash?
Working around this is probably a matter of patching wit...
Sage Weil
11:28 AM Bug #1759 (Resolved): mds/client: truncate size overflow, fails with EINVAL
My version of ceph is a minor variant of 0.38, running with ext4, and ceph-fuse. It looks like my fs has gotten corr... Sam Lang
05:07 PM CephFS Cleanup #814: hadoop: refactor hadoop shim in terms of java libceph bindings
http://www.debian.org/doc/packaging-manuals/java-policy/x105.html Sage Weil
04:28 PM Revision 8788a404 (ceph): osd: subscribe to next map if flagged FULL
This ensures the osd finds out when we become un-full in a timely manner.
Fixes: #1755
Signed-off-by: Sage Weil <sag...
Sage Weil
04:26 PM CephFS Bug #1760 (Resolved): multiple_rsync workunit cannot remove non-empty directory intermittently
This has occurred in half of the regression runs since 11/24: ... Josh Durgin
10:52 AM Bug #1757: oi disagrees with stat, or error code on stat
As we talked at #ceph, I've updated kernel to 3.2-rc2 and patched osd with this workaround http://fpaste.org/PKwW/, n... Szymon Szypulski
08:25 AM Bug #1757: oi disagrees with stat, or error code on stat
The fix for #1612 is upstream kernel commit:ed3ee9f44ba55eb6acfbfc8caa881e0253710d2a. Does your kernel on the osds h... Sage Weil
01:52 AM Bug #1757 (Closed): oi disagrees with stat, or error code on stat
I've similar bugs #1334, #1473 which should be solved by #1612, but it doesn't help.
Ubuntu natty, ceph 0.38 with ...
Szymon Szypulski
09:05 AM Bug #1758 (Can't reproduce): OSD segfault in SimpleMessenger::send_message
in the 11/29 nightlies, cfuse_workunit_misc (3335) the osd on sepia5 seg-faulted.
The end of the osd log is:
2011-1...
Anonymous
08:59 AM Bug #1755 (Resolved): OSD: subscribe to map updates on FULL flag
commit:8788a404ae4a10cd10ec8048f0b32d473640a607 Sage Weil
08:25 AM Bug #1612: osd/PG.cc: 3839: FAILED assert(missing[oid].need <= v)
upstream kernel commit:ed3ee9f44ba55eb6acfbfc8caa881e0253710d2a Sage Weil
05:39 AM Revision c2889fef (ceph): mds: encode truncate_pending in inode
Otherwise we don't actually journal this value, and we get confused when
we replay a start_truncate and try to restar...
Sage Weil

11/28/2011

10:11 PM CephFS Bug #1756: mds crash right after successful recovery
This should let you restart your mds:... Sage Weil
09:28 AM CephFS Bug #1756 (Resolved): mds crash right after successful recovery
Ubuntu Natty, ceph 0.38, kernel 2.6.38-12-server, 2x separate mds daemons crashed in the middle of the night
* sho...
Szymon Szypulski
08:52 PM Revision 98e0a6fd (ceph): uclient: remove filer_flags and use Objecter::global_op_flags instead
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Greg Farnum
08:52 PM Revision da2e0c3c (ceph): Objecter: add a new global_op_flags that is passed to every Op construc...
We can use this for a global use of LOCALIZE_READS (and are about
to do so!).
Signed-off-by: Greg Farnum <gregory.fa...
Greg Farnum
08:30 PM Revision 51385930 (ceph): Objecter: remove unused variable in op_submit
These flags are probably relics from when the function got split;
they belong in send_op now.
Signed-off-by: Greg Fa...
Greg Farnum
06:32 PM Revision 4974a9c2 (ceph): uclient: remove useless if-else based on snapid
These are the same command anyway!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum
05:01 PM Revision cef16732 (ceph): debian init: Do not stop or start daemons when installing or upgrading
Signed-off-by: Wido den Hollander <wido@widodh.nl> Wido den Hollander
03:49 PM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
This is using the ceph filesystem, not rbd. Josh Durgin
11:12 AM CephFS Bug #1749: nonexistent directory in kclient_workunit_kernel_untar_build
This could have the same (unknown) root cause as #1741. Anonymous
09:46 AM Feature #1736 (Resolved): collectd: hacky script to generate types.db from perfcounter schema
Sage Weil
09:26 AM Bug #1755 (Resolved): OSD: subscribe to map updates on FULL flag
When the OSDs get a full flag they stop most of their activity, which shuts down the usual map propagation methods. T... Greg Farnum
09:14 AM Bug #1631: osd: failed assert(repop_queue.front() == repop)
Ok, pretty sure this is related to the reconnect. We need to put together a test that artificially triggers messenge... Sage Weil
12:11 AM Revision ce657227 (ceph): mon: search for local ip during mkfs
If an address isn't explicitly specified during mkfs, look for an unnamed
monitor in the (generated) monmap and see i...
Sage Weil
12:11 AM Revision 61b9db3a (ceph): pick_address: implement have_local_addr()
Check for a local ip from within a list of addresses.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:04 AM Revision 84b00597 (ceph): monclient: name nameless monitors noname-<foo>
This makes them easy to pick out as unnamed.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil

11/27/2011

10:50 PM Revision 7a453402 (ceph): pick_address: whitespace
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:44 PM Bug #1751: Copy in CEPH too slow
rbd only. there no plan yet for reflink(2) in the ceph filesystem. Sage Weil
02:48 PM Bug #1751: Copy in CEPH too slow
Is clone for rbd only or for files too.
Copy of files is slow too.
max mikheev
02:45 PM Bug #1751 (Duplicate): Copy in CEPH too slow
A 'clone' operation that does copy-on-write is coming in the next couple weeks. See #988 Sage Weil
05:39 PM Feature #1754 (Resolved): qa: run other suites nightly as well
stick suite name in mail subject?
run all suites nightly (not just regression)
Sage Weil
04:32 PM CephFS Bug #1746 (Resolved): PerfCounters::set segfault
Sage Weil
04:32 PM Bug #1727 (Resolved): osd: failed assert(pending_ops > 0) in dequeue_op
Sage Weil
04:30 PM Feature #1647 (Resolved): mon: robust bootstrap
Sage Weil

11/25/2011

02:08 PM CephFS Bug #1753 (Won't Fix): ceph copy raw images from qemu incorrectly
Hi,
Ceph cannot correctly handle raw images from qemu incorrectly:
oneadmin@s2-8core:~/OpenNebula/var/images/tm...
max mikheev

11/24/2011

01:02 PM CephFS Bug #1752 (Can't reproduce): ceph-fuse isn't releasing caps without flushing data?
Xiaofei Du reported on the mailing list that running an "ls" on a directory with multiple writers takes a while (much... Greg Farnum
10:16 AM Bug #1751 (Duplicate): Copy in CEPH too slow
Hi,
The copy operations for files and for rbd images are too slow. The ceph is a copy on write system I think the c...
max mikheev

11/23/2011

11:56 PM Revision 30def38d (ceph): corrected variable (con) to be consistent with prior examples (cluster)
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com> Mark Kampe
10:07 PM Revision 934e1e52 (ceph): ReplicatedPG: Also count overlaps for snapsets on snapdirs
Previously, the overlaps for snapdirs would not be included in
cstat causing the computed total to be incorrect.
Sig...
Samuel Just
10:07 PM Revision 97d82ed9 (ceph): ReplicatedPG: Account for clone space usage in make_writeable
Previously, we accounted for clone space usage inconsistently in
write_update_size_and_usage etc when walking through...
Samuel Just
05:09 PM Bug #1631: osd: failed assert(repop_queue.front() == repop)
This happened again with the same workload in /var/lib/teuthworker/archive/nightly_coverage_2011-11-23-b/3034/remote/... Josh Durgin
05:06 PM Bug #1530: osd crash during build_inc_scrub_map
A new crash during scrub from /var/lib/teuthworker/archive/nightly_coverage_2011-11-23-b/3051/remote/ubuntu@sepia71.c... Josh Durgin
05:02 PM Bug #1676 (Resolved): stats mismatch during snaps workunit
97d82ed950b26cfaef4267ee44edd9ad927fb828 and 934e1e52514b6036c91c1c7db1c8b6727ac8c6d8 should take care of the size di... Samuel Just
09:41 AM Bug #1676: stats mismatch during snaps workunit
I do not know if this is likely to be related, but in the 11/23a nightlies, 3027 (rgw_s3tests)
1 Aborts found in 3...
Anonymous
05:00 PM Bug #1750 (Closed): xattr errors silently ignored, cause trouble later
Comment
I do not know if this is likely to be related, but in the 11/23a nightlies, 3027 (rgw_s3tests)
1 Aborts f...
Samuel Just
02:45 PM Revision 32a68378 (ceph): Merge branch 'wip-mon'
Sage Weil
02:44 PM Revision ad13d0b7 (ceph): ceph: fix shutdown race
Shut down MonClient before messenger, to avoid race with MonClient::tick()
and MonClient::shutdown().
Fixes
#0 __l...
Sage Weil
01:33 PM Bug #1744: teuthology: race with daemon shutdown?
Josh saw similar, it seems the ctx.daemons data structure loses entries / they never get added / something. So far, r... Anonymous
09:27 AM CephFS Bug #1749 (Can't reproduce): nonexistent directory in kclient_workunit_kernel_untar_build
In the 11/23a nightlies, 3003, there may have been
a transient directory access error:
... lots of stuff works
2...
Anonymous
09:11 AM CephFS Bug #1748 (Can't reproduce): mds segfault CDir::project_fnode
In the 11/23a nighlies, 2995/remote/ubuntu@sepia75.ceph.dreamhost.com/log/mds.0.log.gz
2011-11-22 23:59:14.857453 ...
Anonymous
07:16 AM Feature #1487 (Resolved): config: {cluster,public}_subnets
Sage Weil
04:52 AM Revision 414caa7d (ceph): common/pick_address: Fix IP address stringification.
Different sockaddr_* have the actual address (sin_addr, sin6_addr)
at different offsets, and sockaddr->sa_data just i...
Tommi Virtanen
12:28 AM Revision 9870e2f7 (ceph): mon: pick_addresses before common_init_finish
We can't modify g_conf->public_addr after that.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:22 AM Revision 036ad4c7 (ceph): mon: set default port if not specified...
...when looking for self in monmap during mkfs.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
12:04 AM Revision 0045c901 (ceph): monmap: assign rank by sorting addr, not name
This allows monitors to bootstrap knowing peer addrs but not their names,
as when we specify mon_host.
Signed-off-by...
Sage Weil
12:04 AM Revision 36978a63 (ceph): mon: calculate rank by addr, not name
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil

11/22/2011

11:06 PM Revision ebe5fc60 (ceph): obsync: tear out rgw
Yehuda Sadeh
10:53 PM Revision 3a20b425 (ceph): mon: name self in monmap if --public-addr specified during mkfs
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:40 PM Messengers Bug #1747 (Resolved): msgr: osd connection originates from wrong port
osd.2 sends a couple messages to osd.1:... Sage Weil
06:31 PM Revision a859763b (ceph): rgw: don't remove tail of lru if that's what we touch
Yehuda Sadeh
06:09 PM Revision aeeeade6 (ceph): mon: mark down all connections when rank changes
The election and some other stuff depend on msg->get_source().num() to get
the peer rank, and that is part of the con...
Sage Weil
06:08 PM Revision bed3c472 (ceph): mon: handle rank change in bootstrap
The rank can change either because we probe and get a new monmap, or
because we get one via paxos. Move the checks t...
Sage Weil
05:53 PM Revision 8b464093 (ceph): mon: pick an address when joining and existing cluster
If we are joining an existing cluster, we can pick whatever address we
want (e.g., one specified by public_addr or pu...
Sage Weil
05:52 PM Revision 5ba356b3 (ceph): mon: remove unused myaddr
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
05:52 PM Revision 0c9724d6 (ceph): mon: simplify suicide when removed from map
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
03:02 PM rgw Feature #1697 (Resolved): s3-tests: test bucket headers
Fixed, added the following tests:
s3tests.functional.test_headers.test_bucket_put_bad_canned_acl
s3tests.function...
Yehuda Sadeh
10:33 AM rgw Bug #1719 (Resolved): rgw: crash in ObjectCache::touch_lru
should be fixed by commit:a859763b1cba844d0d56b861a372e5f63f87c607. Yehuda Sadeh
05:58 AM Revision 24ee09b0 (ceph): Revert "more logs (yuck) for #1682"
This reverts commit ea00114f08440563bce8e27ae2cd887bbc85aba5. Sage Weil
01:46 AM Revision eb8d91fe (ceph): PG: it's not necessary to call build_inc_scrub_map in build_scrub_map
Because we have called osr.flush(), it's safe to tag map.valid_through
as last_update. We will still have to catch ...
Samuel Just
12:17 AM Revision 0f4b59a4 (ceph): Merge remote branch 'gh/subnet'
Sage Weil
12:00 AM Revision c651c88e (ceph): Properly handle case where first error is inside a context manager __ex...
Closes: http://tracker.newdream.net/issues/1743 Tommi Virtanen
12:00 AM Revision fab1e55e (ceph): Merge remote branch 'gh/wip-mon'
Sage Weil

11/21/2011

10:27 PM Revision eec61b48 (ceph): common/ipaddr: Add utility function to parse ip/cidr style networks.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
10:27 PM Revision 0477f238 (ceph): common/pickaddr: Pick cluster_addr/public_addr based on *_network.
Tommi Virtanen
10:27 PM Revision c066e926 (ceph): mds, osd, synclient: Pick cluster_addr/public_addr based on *_network.
Instead of specifying an IP address in ceph.conf like
[global]
cluster_addr = 10.1.2.3
you can now avoid the node...
Tommi Virtanen
10:27 PM Revision 0f748d4c (ceph): common/ipaddr: Find a configured IP address in given subnet.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
10:07 PM CephFS Bug #1549 (In Progress): mds: zeroed root CDir* vtable in scatter_writebehind_finish
Sage Weil
09:56 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
happened again on /var/lib/teuthworker/archive/nightly_coverage_2011-11-21-b/2818
This may be the same root cause ...
Sage Weil
09:37 PM Revision 2bae3506 (ceph): osd: Remove unused variable.
Tommi Virtanen
09:37 PM Revision 0f9a0605 (ceph): common/str_list: Make unused return value void.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Tommi Virtanen
09:37 PM Revision 97464bca (ceph): msg: Move public_addr use outside ->bind()
Tommi Virtanen
09:28 PM Revision 3c8fec2d (ceph): osd: fix 'stop' command
Special case. We can't join the command_tp thread from itself.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil
09:23 PM Revision b47347bd (ceph): osd: protect handle_osd_map requeueing with queue lock
pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock. Messy. Also, ...
Sage Weil
07:15 PM Revision 70dfe8e9 (ceph): osd: lock pg when requeuing requests
The op queue is shut down, so this is mostly safe, unless someone comes
through and does requeue_ops() from a callbac...
Sage Weil
06:33 PM Revision 811145f7 (ceph): paxosservice: tolerate _active() call when not active
This can happen when multiple C_Active events are queued, and the first
does a propose_pending() (moving us into upda...
Sage Weil
05:19 PM Revision 88963a18 (ceph): objecter: simplify map request check
We should request a missing/intervening map if it appears to exist.
Otherwise, skip it.
Signed-off-by: Sage Weil <sa...
Sage Weil
05:19 PM Revision cd2e523f (ceph): objecter: cancel tick event on shutdown
Hopefully this is the root cause for
2011-11-20 23:57:41.555292 7f75dd743780 ceph version 0.38-205-g3b53b72
(commit:...
Sage Weil
05:01 PM rgw Bug #1719: rgw: crash in ObjectCache::touch_lru
I think what happens here is that the entry that we touch happens to be the one that we dispose of (at the tail of th... Yehuda Sadeh
04:02 PM Bug #1743 (Closed): teuthology: not exiting with error when ceph-fuse shutdown fails
commit c651c88eacf9c3bbf1f037be3a5dc0425308c730
Author: Tommi Virtanen <tv@eagain.net>
Date: 2011-11-21 16:00:19 ...
Anonymous
03:42 PM Bug #1743: teuthology: not exiting with error when ceph-fuse shutdown fails
This reproduced it nicely:
diff --git a/teuthology/task/internal.py b/teuthology/task/internal.py
index 58e7f14...
Anonymous
03:57 PM Bug #1744: teuthology: race with daemon shutdown?
Tommi Virtanen wrote:
> Was this using any one of the following?
>
> teuthology/task/lost_unfound.py
> teutholog...
Sage Weil
03:33 PM Bug #1744: teuthology: race with daemon shutdown?
Was this using any one of the following?
teuthology/task/lost_unfound.py
teuthology/task/mon_recovery.py
teuthol...
Anonymous
02:57 PM Bug #1741: teuthology: failed to untar
The path mentioned above is incorrect. Run nightly_coverage_2011-11-18-2/2663 failed because of network failure.
T...
Anonymous
02:52 PM Bug #1741: teuthology: failed to untar
This is exactly what would happen if someone nuked the machine, or locking failed and someone else ran a faster test ... Anonymous
01:29 PM Bug #1727: osd: failed assert(pending_ops > 0) in dequeue_op
hopefully fixed by commit:b47347bd7c377037f7fbc199f0c88b447c9626d1 Sage Weil
08:59 AM Bug #1727: osd: failed assert(pending_ops > 0) in dequeue_op
Happened again in the 11/21 nightlies - 2791, sepia33 Anonymous
09:53 AM Bug #1742 (Rejected): qa: s3-tests failed 100-continue test on sepia
This was due to an old entry in /etc/apt/sources.list - older versions of the apache packages were still used. The ch... Josh Durgin
09:43 AM rbd Feature #1713: teuthology: qemu tasks, tests
Sorry comment #2 was meant for another bug.
Anonymous
09:42 AM rbd Feature #1713: teuthology: qemu tasks, tests
This is in the plans after the new sepia hardware is in place; current sepia re-install is too slow & painful to dare... Anonymous
09:23 AM CephFS Bug #1746: PerfCounters::set segfault
i think this is objecter event teardown. see commit:cd2e523fba1d6cf8d15e7a349ad700b744f24ecf Sage Weil
09:05 AM CephFS Bug #1746 (Resolved): PerfCounters::set segfault
In the 11/21 nightlies, while trying to run workunit/ffsb,
2779/remote/ubuntu@sepia57.ceph.dreamhost.com/log/mon.2.l...
Anonymous
08:57 AM Bug #1530: osd crash during build_inc_scrub_map
Both of the above described variants occurred in the 11/21 nightlies
(2775:sepia17, 2783:sepia81, 2805:sepia82)
Anonymous

11/20/2011

11:24 PM Revision ea00114f (ceph): more logs (yuck) for #1682
Sage Weil
10:26 PM Revision f6070282 (ceph): paxos: fix sharing of learned commits during collect/last
We can learn either an uncommitted or committed value during the
collect/last recovery phase. For the committed valu...
Sage Weil
09:18 PM Revision 3b53b722 (ceph): rgw: support alternative date formatting
being used by s3cmd Yehuda Sadeh
09:05 PM Feature #1745 (Closed): teuthology: make interactive-on-error stop further cleanup
It would be nice if a failure in cleanup with prevent further cleanup when interactive-on-error is true. For example... Sage Weil
09:03 PM Bug #1744 (Resolved): teuthology: race with daemon shutdown?
... Sage Weil
08:02 PM Bug #1743 (Closed): teuthology: not exiting with error when ceph-fuse shutdown fails
here's the log tail:... Sage Weil
03:23 PM CephFS Bug #1682: mds: segfault in CInode::authority
Hrm, this has me stumped.
The log leading up is...
Sage Weil
04:56 AM Revision 4b53288b (ceph): ceph_manager: %
Sage Weil
04:56 AM Revision 721c0e97 (ceph): nuke: don't specify full path
/tmp/cephtest/binary may have been removed; kill stray daemons by name
only. we really don't care about false positi...
Sage Weil
03:28 AM Revision dcab329b (ceph): fix conf thinko
'int' object has no attribute 'iteritems' Sage Weil

11/19/2011

10:30 PM Revision becfce35 (ceph): mon: share random osd map from update_from_paxos, not committed()
This will let us remove committed() entirely.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:30 PM Revision b521710f (ceph): mon: mdsmon: tick() from on_active() instead of committed()
Same effect, and avoids useless committed().
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
10:30 PM Revision 10fed791 (ceph): paxosservice: remove unused committed() callback
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
10:30 PM Revision 9aabd398 (ceph): paxosservice: consolidate _active and _commit
Use the same callback for when paxos goes active and for when it commits
something. The response in both cases is th...
Sage Weil
09:56 PM Revision 9920a168 (ceph): config: support --no-<foo> for bool options
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:56 PM Revision 1a468c7e (ceph): config: whitespace
Signed-off-by: Sage Weil <sage@newdream.net> Sage Weil
09:56 PM Revision a08e7f12 (ceph): regression/basic/tasks/kclient_workunit_misc: turn on mds log
Hopefully will catch #1682 Sage Weil
09:45 PM Revision 13c98df9 (ceph): regression/basic/tasks/cfuse_dbench: turn up client debugging
Hopefully we'll hit #1737... Sage Weil
02:28 PM Bug #1732 (Can't reproduce): osdmap assert fail during rados bench
Sage Weil
02:03 PM Bug #1742 (Rejected): qa: s3-tests failed 100-continue test on sepia
/var/lib/teuthworker/archive/nightly_coverage_2011-11-18-2/2683
and the chef task _did_ run...
Sage Weil
01:59 PM Bug #1741 (Can't reproduce): teuthology: failed to untar
teuthology:/var/lib/teuthworker/archive/nightly_coverage_2011-11-18-2/2662... Sage Weil
01:54 PM CephFS Bug #1573 (Duplicate): mds crash during multiple_rsync workunit
Sage Weil
12:13 AM Revision cc5b5e17 (ceph): osdmon: set the maps-to-keep floor to be at least epoch 0
Looks like this conditional was just set backwards by mistake. There
have been a number of issues with OSDMap version...
Greg Farnum

11/18/2011

11:57 PM Revision 45cf89c1 (ceph): Revert "osd: simplify finalizing scrub on replica"
This reverts commit dd5087fabb2a743741a96ee4610379afa8431f68.
Calling osr.flush() is not quite enough since the onre...
Samuel Just
11:56 PM Revision 57ad8b2e (ceph): FileStore.cc: onreadable callbacks in OpSequencer order is enough
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> Samuel Just
10:19 PM rbd Bug #1740: krbd: don't return head data when reading from a non-existent snapshot
The requests are made for the head version, since the removed snapid is not found when looking up the snapshot name i... Josh Durgin
08:58 PM rbd Bug #1740: krbd: don't return head data when reading from a non-existent snapshot
Hmm, what should they return? -ENXIO or -EIO or something? What is the OSD returning in this case?
Sage Weil
05:11 PM rbd Bug #1740 (Resolved): krbd: don't return head data when reading from a non-existent snapshot
If you have an rbd image mapped at a snapshot, and then delete the snapshot, any subsequent reads succeed and give yo... Josh Durgin
09:53 PM Revision 508f4f83 (ceph): Save summary after nuking machines.
This way you can tell when tests are entirely finished running. Josh Durgin
08:22 PM Revision 91cfdfea (ceph): Add an example overrides file for running regression tests.
Josh Durgin
06:21 PM Revision 7c8a7a89 (ceph): Move multimds tests to a new suite, 'experimental'.
This suite is for testing features that aren't expected to be stable yet. Josh Durgin
05:49 PM Revision 09c20c51 (ceph): objecter: trigger oncommit acks if the request returns an error code.
Many users only set oncommit acks, so if they get an error code
(which comes only as a CEPH_OSD_OP_ACK right now) the...
Greg Farnum
05:49 PM Revision dedf2c4a (ceph): osd: error responses should trigger all requested notifications.
There's no good reason I can find to limit error code responses to
the ACK.
Signed-off-by: Greg Farnum <gregory.farn...
Greg Farnum
05:49 PM Revision 9800faeb (ceph): paxos: do not create_pending if !active
This avoids a scenario like:
- _active()
- proposes value
- _commit()
- creates new pending, even though in upda...
Sage Weil
05:43 PM Revision fa587687 (ceph): Revert "mon: don't propose new state from update_from_paxos"
This reverts commit 66c628acc8be71a92e801179431e4b938b857b3d. Sage Weil
05:15 PM rgw Feature #1482 (Resolved): qa: swift-tests
testswift was added to teuthology. Yehuda Sadeh
05:14 PM rgw Feature #1664 (Resolved): rgw: pass swift tests
We pass most of the tests, other than a few which we don't intend to fix at this point (different enforced limits) an... Yehuda Sadeh
05:00 PM rgw Feature #1739 (Resolved): rgw: multipart upload should use manifest object
Yehuda Sadeh
04:39 PM RADOS Bug #1738 (Duplicate): bad crushmap behavior
./osdmaptool --test-map-pg 1.21 <attached osdmap>
pg 1.21 ends up mapped only to osd3 despite there being two othe...
Samuel Just
02:40 PM Bug #1530: osd crash during build_inc_scrub_map
Got a couple more of these today: teuthworker/archive/nightly_coverage_2011-11-18-2/2649/remote/ubuntu@sepia56.ceph.d... Josh Durgin
02:37 PM CephFS Bug #1682: mds: segfault in CInode::authority
Another crash is CInode::Authority happened today, although a different backtrace.
From teuthology:~teuthworker/arc...
Josh Durgin
02:35 PM CephFS Bug #1737 (Resolved): ceph-fuse crash in xlist::remove
From teuthology:~teuthworker/archive/nightly_coverage_2011-11-18-2/2645/remote/ubuntu@sepia13.ceph.dreamhost.com/log/... Josh Durgin
10:11 AM Bug #1351 (Resolved): rados bench should report errors
Fixed by commit:dedf2c4a066876bdab9a0b0154196194cefc1340. Greg Farnum
04:45 AM Revision 66c628ac (ceph): mon: don't propose new state from update_from_paxos
Proposing a new state from within update_from_paxos() confuses some callers,
like PaxosService::_active(). Instead, ...
Sage Weil
04:28 AM phprados Tasks #869 (Resolved): Update to new librados API
Ok, it took some time, but it's done.
v0.9.3 is updated to the librados2 API and wraps all the C functions into PHP.
Wido den Hollander
01:57 AM Revision 94100ad0 (ceph): Move collections into separate suites
For now, there are just two suites:
* regression - tests that should always pass
* stress - tests that have p...
Josh Durgin
01:26 AM Revision 42cecb5e (ceph): suite: put common config before facets
This lets you add tasks to the beginning of a run, like the chef task. Josh Durgin
01:16 AM Revision 044a88ce (ceph): suite: schedule a list of collections for running instead of a single s...
Josh Durgin
01:00 AM Revision d8fc1513 (ceph): Clean up C++isms.
Tommi Virtanen
12:55 AM Revision 6ae0f81e (ceph): rgw: if swift url is not set up, just use whatever client used
Yehuda Sadeh
12:53 AM Revision 23aae67a (ceph): testswift: fix config
Yehuda Sadeh
12:53 AM Revision 6236e7db (ceph): testswift: fix config
Yehuda Sadeh
12:49 AM Revision c5450948 (ceph): Add a task for easily running chef-solo on all the nodes.
Tommi Virtanen

11/17/2011

11:01 PM Revision ef5ca293 (ceph): fuse: fix readdir return code
Ignore ENOSPC generated by our own callback, as it is only used to
terminate the loop.
Broken by commit cd90061239a5...
Sage Weil
10:11 PM Revision d61ba644 (ceph): paxos: fix trimming when we skip over incrementals
Remove open-coded trimming of old states and use our method (that also
removes additional per-state files). Fixes ol...
Sage Weil
10:10 PM Revision 367ab142 (ceph): paxos: store stashed state _and_ incrementals
Paxos::share_state() may share a stashed state and incrementals that
follow; we need to store the same.
Signed-off-b...
Sage Weil
09:53 PM Revision 6bc9a544 (ceph): mon: elector: always start election via monitor
Don't go from active -> electing without passing (monitor) go.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
09:46 PM Revision 89f80412 (ceph): ceph_manager: fix logging
Sage Weil
09:23 PM Bug #1708 (Resolved): mon/PGMonitor.cc: 218: FAILED assert(paxos->get_version() + 1 == pending_in...
This latest variation should be fixed by commit:66c628acc8be71a92e801179431e4b938b857b3d. Thanks for the log! Sage Weil
05:18 PM Bug #1708: mon/PGMonitor.cc: 218: FAILED assert(paxos->get_version() + 1 == pending_inc.version)
Yes, I still get the problem with an updated master 6bc9a544b62bb21f6ee7ef51bfbe9111f7add9cb
I had monitor debuggi...
Josh Pieper
09:07 PM Revision f85f5dd7 (ceph): ceph: deep merge overrides, so e.g. log whitelists can be overridden
Josh Durgin
09:06 PM Revision a7632976 (ceph): misc: move deep_merge out of the MergeConfig class - it's generic
Josh Durgin
08:07 PM Revision 685450b7 (ceph): common: libraries should not log to stdout/stderr
Certainly not by default.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:57 PM Revision c6988a07 (ceph): Save config after locking nodes, so targets are included.
Josh Durgin
07:56 PM Revision f1dd56d9 (ceph): objecter: set skipped_map if we skip a map
This ensures that we resend _all_ requests, since we aren't sure which
may have mapped to a different primary and the...
Sage Weil
07:39 PM Revision 5afef020 (ceph): objecter: add is_locked() asserts
Sanity check.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:39 PM Revision bf91177e (ceph): objecter: send slow osd MPing via Connection*
This may address #1732 indirectly because we have a Connection* reference
here. However, it's still not clear how we...
Sage Weil
07:18 PM Revision 4e6cd55c (ceph): filestore_idempotent: remove unused import
Josh Durgin
07:16 PM Revision 7d51e3d3 (ceph): mon_recovery: remove unused code and import
Josh Durgin
07:11 PM Revision f4d527e7 (ceph): thrashosds: timeout for every clean check, not just the last one
Josh Durgin
07:05 PM Revision 9d12b720 (ceph): ceph_manager: add a default timeout of 5 minutes for mon quorum
Josh Durgin
06:45 PM Revision cb9ac089 (ceph): ceph_manager: log mon quorum status so the logs show progress (or lack ...
Josh Durgin
05:42 PM Bug #1351: rados bench should report errors
Quick skim analysis:
If there's an error, the OSD returns it as an ACK.
The objecter only sends back data on the re...
Greg Farnum
11:05 AM Bug #1351: rados bench should report errors
This is probably what caused #1734. Josh Durgin
05:03 PM Feature #1736 (Resolved): collectd: hacky script to generate types.db from perfcounter schema
... Sage Weil
04:48 PM rgw Bug #1729 (Resolved): test_object_create_bad_expect_empty
Sage Weil
03:22 PM rgw Bug #1729: test_object_create_bad_expect_empty
Yehuda thinks this was a problem with not having the right Apache package installed; I think he's right and I've seen... Greg Farnum
04:43 PM Feature #1387 (Closed): teuthology-nuke: don't fail on down nodes
Josh Durgin
04:36 PM Bug #1723 (Rejected): timeouts during ffsb
Sage Weil
04:36 PM Bug #1723: timeouts during ffsb
also didn't have the umount bug fix.
i think the osd timeouts are just sluggish server, not actual errors per se.....
Sage Weil
04:33 PM Bug #1724 (Resolved): timeout during tiobench test
this test ran commit:dfc3ddc8983fbc7c376394067335b360c68cd314, which did not include the root dentry fix in commit:77... Sage Weil
03:06 PM CephFS Bug #1728 (Resolved): multiple cfuse tests failing with non-empty directories
fixed by commit:ef5ca293a7eee6fd37c1ea8e8027a5f6d83b66da Sage Weil
02:13 PM CephFS Bug #1728: multiple cfuse tests failing with non-empty directories
My guess is the warning cleanup patch that added an error check in the readdir code, commit:cd90061239a598f6fca94326b... Sage Weil
02:41 PM Bug #1731 (Resolved): PAXOS assert(begin->last_committed == last_committed)
fixed by commit:367ab142d7bc938c5a8b40027acd2431a11c8022 Sage Weil
11:56 AM Bug #1732: osdmap assert fail during rados bench
with commit:bf91177e57a4fae54882d78aa6b2bcf1adccae5d this won't crash, but its still not clear how we got an OSDSessi... Sage Weil
08:51 AM Bug #1732 (Can't reproduce): osdmap assert fail during rados bench
... Josh Durgin
11:39 AM Feature #1262 (Closed): teuthology: monitor health during run
Duplicate of #1240. Josh Durgin
11:06 AM Bug #1733 (Duplicate): rados bench duration can be ignored
Probably caused by #1351. Josh Durgin
09:05 AM Bug #1733: rados bench duration can be ignored
Is it generating new writes, or waiting for old writes to complete?
The time you give rados bench was never intend...
Greg Farnum
08:58 AM Bug #1733 (Duplicate): rados bench duration can be ignored
Sometimes a thrashing run with rados bench will continue indefinitely, with rados bench continuing to write after its... Josh Durgin
10:57 AM Bug #1730 (Rejected): mysterious compilation error
These were actually just warnings - the test passed. Josh Durgin
12:00 AM Revision f3c569ee (ceph): rgw: add swift task
still not completely working (for some reason it skips all the tests) Yehuda Sadeh
12:00 AM Revision 1dd607ca (ceph): rgw: add swift task
still not completely working (for some reason it skips all the tests) Yehuda Sadeh

11/16/2011

09:11 PM Revision fa4b0fb9 (ceph): osd: add pending_ops assert
Just a sanity check, hopefully helping us track down #1727.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil
07:01 PM Revision 17fa1e0d (ceph): mon: renamed get_latest* -> get_stashed*
This makes e.g. get_latest_version() vs get_last_committed() less
confusing.
Signed-off-by: Sage Weil <sage@newdream...
Sage Weil
06:57 PM Revision b9d5fbe4 (ceph): mon: fix ver tracking for auth database
Local variable keys_ver needs to be updated when we slurp up latest stashed
version.
Signed-off-by: Sage Weil <sage@...
Sage Weil
06:54 PM Revision b425f6d6 (ceph): mon: always load stashed version when version doesn't match
The slurp process can happen after the monitor has started and has some
in-memory version of the state, and that proc...
Sage Weil
06:30 PM Bug #1731 (Resolved): PAXOS assert(begin->last_committed == last_committed)
In the 11/16 nightlies, there were numerous coredumps in:
sepia72 mon.{f,l,o,r,u}.log
sepia74 mon.q.log
All ...
Anonymous
06:23 PM Bug #1730 (Rejected): mysterious compilation error
In the 11/16 nightlies, 2071 rbd_dbench a compile failed ... with some warnings.
Has this worked in the past?
20...
Anonymous
06:19 PM rgw Bug #1729 (Resolved): test_object_create_bad_expect_empty
in the 11/16 nightly, 2080 rgw_s3tests
2011-11-16T00:51:18.914 INFO:teuthology.orchestra.run.err:s3tests.functional....
Anonymous
05:59 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
This happened again on 11/16, 2056 kclient_workunit_kernel_untar_build
2011-11-16T00:36:30.996 INFO:teuthology.task....
Anonymous
05:51 PM CephFS Bug #1728 (Resolved): multiple cfuse tests failing with non-empty directories
All from the 11/16 nightlies:
2044 cfuse_workunit_snaps ...
2011-11-16T00:05:11.781 INFO:teuthology.task.workunit...
Anonymous
01:10 PM Bug #1727 (Resolved): osd: failed assert(pending_ops > 0) in dequeue_op
from ml:... Sage Weil
 

Also available in: Atom