Activity
From 11/21/2011 to 12/20/2011
12/20/2011
- 11:28 PM Revision 45b9659f (ceph): ReplicatedPG: don't manage waiting_on_backfill in start/finish_recovery_op
- Set waiting_on_backfill in recover_backfill and clear in do_scan.
Signed-off-by: Samuel Just <samuel.just@dreamhost.... - 07:39 PM Revision 3bea1ed4 (ceph): rgw: fix subuser key name when purging subuser keys
- 07:00 PM Revision 9ddb802c (ceph): radosgw-admin: add --purge-keys option
- 06:53 PM Revision 5e9d1019 (ceph): ReplicatedPG: apply_repop: apply local_t before op_t
- We create snap_collections in local_t and clone into them in op_t.
Signed-off-by: Samuel Just <samuel.just@dreamhost... - 04:25 PM CephFS Bug #1737: ceph-fuse crash in xlist::remove
- Another occurence today in teuthology:~teuthworker/archive/nightly_coverage_2011-12-20-b/4585/teuthology.log
- 11:05 AM rgw Bug #1801 (Resolved): rgw: radosgw-admin remove subuser and related swift key in a single command
- done, commit:9ddb802c72fc805ce400f9bf5cceffb88b0f3d47
radosgw-admin subuser rm --subuser=<name> --purge-keys - 10:09 AM Bug #1848 (Resolved): osd got zeroed out fsid
- From teuthology:~teuthworker/archive/nightly_coverage_2011-12-20-a/4579/remote/ubuntu@sepia51.ceph.dreamhost.com/log/...
- 10:00 AM rgw Feature #1847 (Resolved): rgw: revisit the way we store large objects
- We should probably keep large objects in chunks, and not coalesce into a single large object. Chunks shouldn't use th...
- 08:38 AM Bug #1846: Mds crash immediately after start (segmentation fault)
- I got monmap from machine with working mds cause the other one does not have admin key. I hope that this is not a pro...
- 08:34 AM Bug #1846: Mds crash immediately after start (segmentation fault)
- can you 'ceph mon getmap -o /tmp/monmap' and attach that file to this bug?
- 03:25 AM Bug #1846: Mds crash immediately after start (segmentation fault)
- In the same way crashes osd on this machine.
- 03:23 AM Bug #1846 (Resolved): Mds crash immediately after start (segmentation fault)
- I have two mds' in my configuration. One of them works fine and the other crashes immediately after reboot:
@2011-... - 02:03 AM Revision 97dd28c0 (ceph): librados: return -EROFS when trying to write to a snapshot
- operate_read doesn't need this check because it does not write.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> - 02:00 AM Revision 68ba1862 (ceph): librados: make getxattrs ENOMEM return negative
- This is more consistent with the rest of librados.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> - 12:26 AM Revision a798a85f (ceph): PG: Do not update_snap_collections for log entries > last_backfill
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 12:26 AM Revision 2401176b (ceph): PG: Fix stat debug output
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 12:23 AM Revision 1362d3e1 (ceph): calc_acting: Prefer up[0] as primary if possible
- Previously, we could get into a state where although up[0] has been
fully backfilled, acting[0] could be selected as ... - 12:01 AM Revision 01f3f6a6 (ceph): rgw: add timeout to init path
12/19/2011
- 10:57 PM Revision cc22f154 (ceph): MOSDRepScrub,ReplicatedPG: Add scrub_to to MOSDRepScrub
- When scrub_from is set, also set scrub_to to the primary's
last_update_applied (which will also be the official last_... - 10:02 PM Revision 720bab94 (ceph): osd: EINVAL on truncate to huge object size
- Signed-off-by: Sage Weil <sage@newdream.net>
- 10:02 PM Revision ed780fdd (ceph): mds: misc assertions about truncation
- Signed-off-by: Sage Weil <sage@newdream.net>
- 10:02 PM Revision 2710bd85 (ceph): mon: update man page to document --mkfs stuff
- Signed-off-by: Sage Weil <sage@newdream.net>
- 10:00 PM Revision 29e6d6c8 (ceph): Merge pull request #6 from kylemarsh/wip-obsync-swift
- Wip obsync swift
- 09:57 PM Revision 33cb2796 (ceph): rgw: remove temp context in prepare_get_obj
- 09:57 PM Revision 5e739335 (ceph): rgw: fix xml parser internal structure leak
- 09:57 PM Revision a72348ea (ceph): rgw: fix a leak of acl structure (in req_state)
- 09:54 PM Revision 002eb581 (ceph): rgw: remove temp context in prepare_get_obj
- 09:38 PM Revision 27da89f4 (ceph): rgw: fix xml parser internal structure leak
- 09:38 PM Revision 3a8af0f7 (ceph): rgw: fix a leak of acl structure (in req_state)
- 09:25 PM Revision 42980922 (ceph): Merge branch 'wip-osd-maybe-created'
- 09:24 PM Revision 98a4809a (ceph): Merge branch 'wip-osd-fsid'
- 09:24 PM Revision 3af5fff5 (ceph): doc: fix typo
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:15 PM Revision dc977901 (ceph): osd: --get-journal-fsid
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:13 PM Revision c8c5e5d6 (ceph): filestore: make fsid uuid_d instead of uint64_t
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 04:13 PM Revision ae8fbb88 (ceph): filejournal: uuid for fsid
- Decode old header struct, but encode new class using more normal encoding
style. Embed in a bufferlist for later ext... - 04:12 PM Revision dcceb8e8 (ceph): osd: include osd_fsid in OSDSuperblock
- Generated during mkfs.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 04:12 PM Revision a5822095 (ceph): osd: store osd_fsid as text in osd_data dir
- along with ceph_fsid (the cluster fsid) and a few other things.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 04:12 PM Revision c59eb8ca (ceph): osd: --get-osd-fsid and --get-cluster-fsid
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 04:12 PM Revision 237b19cd (ceph): osd: rename OSDSuperblock::fsid -> cluster_fsid
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 04:04 PM Revision cd909aca (ceph): doc: fix mon cluster expansion docs
- Signed-off-by: Sage Weil <sage@newdream.net>
- 04:03 PM Revision f2a95990 (ceph): mon: pull addr from ceph.conf, mon_host as needed when joining mon cluster
- Signed-off-by: Sage Weil <sage@newdream.net>
- 03:57 PM Revision d9593342 (ceph): mon: fix setting of mon addr when joining a cluster
- Signed-off-by: Sage Weil <sage@newdream.net>
- 02:02 PM rgw Bug #1844 (Resolved): radosgw memory leak
- Fixed a few leaks, as of commit:33cb27961e1b20f188d2a83a764ae3f2fabeb141. Current run with massif looks flat.
- 11:15 AM rgw Bug #1844 (Resolved): radosgw memory leak
- apparently radosgw is leaking.
- 01:38 PM Bug #1825 (Resolved): osd loses object deletes by some creates in the same transaction
- Merged to master in commit:42980922f253ed29718bfac64e17c85cdf9805a6. Still haven't written tests but I have a persona...
- 01:22 PM rgw Feature #1838 (Resolved): rgw: update man page
- 01:21 PM Bug #1845 (Rejected): "recovery_ops" performance counter isn't decreased
- Right, it should never decrement. Closing!
- 11:51 AM Bug #1845: "recovery_ops" performance counter isn't decreased
- I'm using munin, wrote plugin myself and I don't divide this value by anything. If it shouldn't decrement please clos...
- 11:47 AM Bug #1845: "recovery_ops" performance counter isn't decreased
- The counters are counting events and never decrement. Normally collectd will divide the change by time to give you s...
- 11:15 AM Bug #1845 (Rejected): "recovery_ops" performance counter isn't decreased
- I'm generating osd statistics based on performance sockets like described here - http://ceph.newdream.net/wiki/Perfom...
- 11:07 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
- For the life of me I cannot seem to get useful symbols out of this, though I'm not sure why. I've been using LD_LIBRA...
- 10:49 AM Bug #1688: Benjamin: pg stuck in scrub
- Still happening, I'm looking into an instance on benjamin now.
- 10:07 AM Bug #1530: osd crash during build_inc_scrub_map
- This was the only failure in the run last night. Core at teuthology:~teuthworker/archive/nightly_coverage_2011-12-19-...
- 10:00 AM Bug #1490 (New): cfuse assert failure: assert(ob->last_commit_tid < tid)
- Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-15-b/4357/remote/ubuntu@sepia63.ceph.dream...
- 08:09 AM Bug #1839 (Resolved): osd: assert in send_incremental_map_msg
- this was the hobject_t::max initialization patch that wasn't in master.
- 08:08 AM Documentation #1840 (Resolved): doc: fix mon addition stpes
12/17/2011
- 03:34 PM Feature #1655 (Resolved): gitbuilder aggregator page
- http://ceph.newdream.net/gitbuilder.cgi
- 06:21 AM Revision 37e7a521 (ceph): rgw: fix updating of object metadata
- being used in swift POST. We were updating wrong object
size and etag - 06:21 AM Revision 44b4e029 (ceph): rgw: bucket cannot be recreated if already exists
- 06:15 AM Revision e5f49104 (ceph): man: Update the configuration example for radosgw
- Signed-off-by: Wido den Hollander <wido@widodh.nl>
- 06:15 AM Revision 83cf1b62 (ceph): man: It is capital -C instead of -c when for creating a new keyring
- Signed-off-by: Wido den Hollander <wido@widodh.nl>
- 06:04 AM Revision 3e323e6a (ceph): rgw: fix updating of object metadata
- being used in swift POST. We were updating wrong object
size and etag - 02:09 AM Revision d0e90d71 (ceph): syslog checking: forgot a pipe
- 01:14 AM Revision 08f968f8 (ceph): rgw: bucket cannot be recreated if already exists
- 12:07 AM Revision f54f4aa0 (ceph): obsync: add authurl to CLI
- s3 connections require the hostname and swift connections require the
authurl. obsync treats these as equivalent int...
12/16/2011
- 10:42 PM Revision bfbde5b1 (ceph): object.h: initialize max in hobject_t(sobject_t) constructor
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 10:09 PM rgw Bug #1830 (Resolved): RGW Swift Metadata Bug
- Fixed, commit:3e323e6adbf87d794be39fd4f75c6626e8968ce1.
- 05:36 PM rgw Bug #1830: RGW Swift Metadata Bug
- Ok, was able to reproduce it. Problem is in the swift specific update metadata operation. Fix should be pretty easy.
- 08:41 PM Revision 061e7619 (ceph): ReplicatedPG: fix handle_watch_timeout ctx->at_version
- ctx->at_version should match the head of the new log entries
during issue_repop. This could cause the scrub hang bug... - 07:43 PM Revision 5274e88d (ceph): ReplicatedPG: add asserts to catch scrub error
- If last_update_applied skipped over last_update, we would see
scrub hang.
Signed-off-by: Samuel Just <samuel.just@dr... - 06:39 PM Revision 3f3913c9 (ceph): doc: fix filename in mon addition process
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:28 PM rgw Bug #1843 (Resolved): rgw: recreation of bucket overrides old one
- Fixed with commit 08f968f8cd74a2e782257eea91a97b52598ef6f1.
- 05:08 PM rgw Bug #1843 (Resolved): rgw: recreation of bucket overrides old one
- Instead of returning success and not doing anything, we actually create a new bucket and override the old one. This i...
- 05:19 PM Revision 7d81a3b5 (ceph): filejournal: preallocate journal bytes on create
- This should reduce fragmentation for large journals that are written
slowly the first time around.
Signed-off-by: Sa... - 05:08 PM Revision 92cb2a20 (ceph): Merge pull request #5 from homac/master
- Minor fix for init files and cleaned up spec file. Please pull
- 03:20 PM Bug #1841 (In Progress): OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
- Yep; it is easy enough to add a check in tick based on how long it's been since we sent a PGStat without getting an a...
- 01:28 PM Bug #1841: OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
- My memory is a bit fuzzy, but I think they're waiting on acks for the MOSDPGStat messages they're sending.. checking ...
- 11:23 AM Bug #1841 (Resolved): OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
- Right now OSDs don't notice their monitor connection has dropped until after the (by default) 15 minute TCP connectio...
- 03:18 PM Bug #1842 (Can't reproduce): osd: failed authorizations leak memory somehow?
- I've got a log from todin showing lots of "fault initiating reconnect" that I suspect are on failed auths. Log in kai...
- 01:03 PM rgw Feature #1838: rgw: update man page
- I just submitted a patch on the ml with an updated version of the manpage. This works in my setup.
- 10:10 AM rgw Feature #1838 (Resolved): rgw: update man page
- use current alexandria as a model, probably. minus the now-unneeded setenv stuff.
- 10:51 AM Documentation #1840 (Resolved): doc: fix mon addition stpes
- --public-addr
ameks ure port is correct, too - 10:37 AM Bug #1839 (Resolved): osd: assert in send_incremental_map_msg
- ...
- 09:59 AM Linux kernel client Feature #1837 (New): krbd: freeze filesystem on snapshot
- The block device can ask for an fs freeze (dm currently does this). We can do this with rbd when we see that the rbd...
- 09:22 AM Feature #1836 (Resolved): filejournal: use async directio to write to the journal
- Currently we're doing a sync direct io write, which means we pay a full rotation between each io.
- 08:44 AM Bug #1833 (Resolved): mon: failed decode in LogMonitor::update_from_paxos
- Yeah, this is one of the things I hit (and fixed) in a few different ways when doing the mon thrashing on the new code.
- 06:33 AM Bug #1835: Monclient crash when keyring is not readable
- Btw, I know I can use the build-in 'secret' functions of libvirt, but I didn't modify my XML's yet.
- 06:32 AM Bug #1835 (Resolved): Monclient crash when keyring is not readable
- I had some issues with my Qemu-RBD VM's to get them online, I saw Qemu segfault and started tracing this back with GD...
- 05:18 AM Bug #1834 (Closed): 'High' memory usage of monitors
- Actually, I seem to be wrong here. My other monitor running on a 4GB box is using about 240MB of memory, I did a smal...
- 04:43 AM Bug #1834 (Closed): 'High' memory usage of monitors
- I'm still hunting this one, but I'm seeing high memory usage of my monitors (three in total).
My monitor configura... - 04:34 AM phprados Feature #424: Stream wrappers
- It took some time to find docs about this, but I'm currently on track.
12/15/2011
- 10:03 PM Revision 739fd9fe (ceph): man: clarify mount.ceph auth options
- Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
- 09:49 PM Revision e5a5ae12 (ceph): man: update rule definition for ceph-rbdnamer
- This is the rule we install since 891025e539a92b5d75011e2e75c475fc0c272042.
Signed-off-by: Josh Durgin <josh.durgin@... - 09:43 PM Revision 4eb83654 (ceph): authx -> cephx everywhere it's used
- The term authx was in the mount.ceph man page, and got accidentally
copied into rbd help.
Signed-off-by: Josh Durgin... - 09:24 PM Revision 7eec3094 (ceph): rountrip: add task
- 09:15 PM Revision 41f64be0 (ceph): ReplicatedPG: calc_clone_subsets fix other clone_overlap case
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 09:15 PM Revision b5c32590 (ceph): ReplicatedPG: fix backfill mismatch error output
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 09:15 PM Revision 5b41c470 (ceph): OSD: use disk_tp.pause() without osd_lock
- Previously, we called disk_tp.pause_new(). This can cause a race
where snap_trimmer queues more transactions after w... - 08:39 PM Revision 97cc6c29 (ceph): readwrite: fix task with default conf
- 04:51 PM Revision ec776f4b (ceph): ceph.spec: Clean up and fix spec file and build for a couple of distrib...
- Clean up and fix the spec file. This includes cleaning up of build and
installed system dependencies, LSB compliance ... - 04:49 PM Revision 0e0583f8 (ceph): init-ceph/init-radosgw: Don't use unspecified runlevel 4
- Don't use runlevel 4 in init scripts. AFAIK, no distribution is using it
and at least the Open Build Service complain... - 02:32 PM Bug #1833 (Resolved): mon: failed decode in LogMonitor::update_from_paxos
- Saw this on benjamin today. It was during catchup; mon.beta had been out for a day or more and was catching up. Perha...
- 03:08 AM Revision 0c547046 (ceph): osd: preserve write order when waiting on src_oids
- We need to preserve the order of write operations on each object. If we
have a write on X that needs to read from Y,... - 03:08 AM Revision ca2e8e5a (ceph): osd: EINVAL on mismatched locator without waiting for degraded
- No reason to recover before returning an error.
Signed-off-by: Sage Weil <sage@newdream.net> - 03:08 AM Revision 7a7aab25 (ceph): osd: wait for src_oid if it on other side of last_backfill from oid
- If the target object is before last_backfill, then the backfill_target
will be asked to apply the operation. If one ... - 01:43 AM Revision da286059 (ceph): client: fix logger deregistration
- Only unregister logger if it is non-NULL (and thus registered) to avoid
running afoul of the cct assertions.
Signed-... - 01:14 AM Revision 659e66aa (ceph): readwrite: fix conf, task runs
- 12:12 AM Revision 7d085ad9 (ceph): readwrite: add readwrite task
- still not really running, but at least getting configured
12/14/2011
- 11:51 PM Revision 62c830f0 (ceph): ReplicatedPG: add_object_context_to_pg_stat, obc->ssc may be null
- obc->ssc is not necessarily filled in by get_object_context.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com> - 11:37 PM Revision 5a400935 (ceph): obsync: add vvprint back in
- Commit ebe5fc60d20f92a0037c53c1e7bd7ae512be3da4 removed the definition of
vvprint without removint all the places tha... - 11:19 PM Revision cda5f0d3 (ceph): PG: clear waiting_on_backfill during clear_recovery_state
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 11:17 PM Revision d32fd8c5 (ceph): ReplicatedPG: list snapid 0 on collection_list_partial for backfill
- 0 will list all objects, CEPH_NO_SNAP will list only head objects.
Signed-off-by: Samuel Just <samuel.just@dreamhost... - 10:10 PM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
- There's two things here, the second being the monitor changes you're focusing on. I need to investigate further why t...
- 07:03 PM Bug #1831: mon: should not accept (and should disconnect) session when not in quorum
- I think there are two parts here:
- the mon shouldn't let sessions start if it is not in the quorum. that may ac... - 03:39 PM Bug #1831 (Resolved): mon: should not accept (and should disconnect) session when not in quorum
- This happened on Benjamin. The OSDs ought to be failing the connection and going to a new monitor, but they failed to...
- 07:40 PM Revision d9d05117 (ceph): Merge remote branch 'upstream/master' into wip_backfill_merged
- 07:39 PM Revision 07b3ba81 (ceph): ReplicatedPG: collection_list_partial also takes a snapid
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 07:38 PM Revision 1430c8ab (ceph): doc: Make overview.rst valid reStructuredText, so I can stop seeing war...
- It's still wrong, but now it won't clutter the output.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> - 07:33 PM Revision 53f7323c (ceph): doc: reStructuredText syntax fix.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 07:33 PM Revision c1190740 (ceph): pybind: Add a description to docstring.
- This avoids a Sphinx warning like this:
.../src/pybind/rbd.py:docstring of rbd.RBD.version:2: WARNING: Field list en... - 07:32 PM Revision 9d633a4f (ceph): PG: A backfill osd can have last_complete < log_tail
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 07:32 PM Revision 51deeef6 (ceph): ReplicatedPG: calc_*_subsets must consider last_backfill
- Objects yet to be backfilled do not show up in the missing set. Thus,
we cannot use an object past last_backfill to ... - 07:32 PM Revision 7832e17e (ceph): PG: activate, backfill replica can have last_complete < log_tail
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 07:32 PM Revision b9eea709 (ceph): osd: object_stat_sum_t::clear()
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:32 PM Revision 940a55e0 (ceph): osd: track backfill target pg stats
- Maintain backfill target pg stats to be the summation over objects to
the left of last_backfill. Reflect this in the... - 07:32 PM Revision 7213c457 (ceph): PG: Ask for digest at most once at a time
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 07:32 PM Revision 9bb77b49 (ceph): osd: observe last_backfill in merge_log() and helpers
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:32 PM Revision e1006d76 (ceph): osd: more backfill changes
- Always ship log for updates to backfill targets to preserve the repgather
ordering.
Fix up recover_backfill() bounds... - 07:32 PM Revision af7536d0 (ceph): hobject_t: fix hobject(sobject_t) constructor
- Initialize max
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:32 PM Revision cd0c8fb3 (ceph): osd: add incomplete, backfill states; simplify calculation
- Set/clear states in peering state machine state ctor/dtors where possible.
Set degraded if the number of non-backfil... - 07:32 PM Revision f83a787e (ceph): osd: some recover_backfill() comments
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:32 PM Revision f1caaa37 (ceph): osd: fix calc_acting()
- Look at usable, not want.size(), so we don't count backfill targets.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:32 PM Revision 57baf9ef (ceph): osd: fix signed/unsigned comp
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:32 PM Revision 71893b0e (ceph): osd: remove bad !is_incomplete() assert
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:32 PM Revision 999846f7 (ceph): PG: fix phantom entry in peer_info
- In GetLog, do not call pg->peer_info[newest_update_osd] if
newest_update_osd is osd->whoami.
Signed-off-by: Samuel J... - 07:32 PM Revision f483df15 (ceph): PG: there may now be backfill entries in the acting set
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 07:32 PM Revision f1ae9ed5 (ceph): objectstore: make list by hash *next > instead of >=
- This means we should set it to a hash boundary or the last item of our
result set (not the next item we didn't includ... - 07:31 PM Revision f7a0b9c5 (ceph): hobject_t: fix sorting by hash key
- Use get_effective_key() to return key (if explicit) or object name. Sort
by that within each hash value.
Clean up o... - 07:31 PM Revision 9288f0e0 (ceph): osd: advance last_backfill by keys only
- This ensures that transactions are never split by last_backfill.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:31 PM Revision 88ee86d0 (ceph): osd: keep backfill targets in acting set
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision b99e1358 (ceph): osd: make backfill (basically) work again
- Still need to handle concurrent updates, log recovery vs backfill, etc.
Signed-off-by: Sage Weil <sage.weil@dreamhos... - 07:31 PM Revision de19a6bb (ceph): Revert "osd: don't keep push state on replicas"
- This reverts commit 69c77e33f8530993dbc280525bd21218ea6f9ddb.
sub_op_pull() calls send_push_op directly, does not pa... - 07:31 PM Revision baa21c9b (ceph): osd: implement PG::copy_range()
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision c03c49ca (ceph): osd: initialize repop gather set in issue_repop instead of new_repop
- Simpler. It will also make the last_backfill correction live in one
place.
Signed-off-by: Sage Weil <sage.weil@drea... - 07:31 PM Revision 5b558dc4 (ceph): osd: strip out some backlog logic
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 82a23dbe (ceph): osd: strip backlog case out of merge_log
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 3f5ced69 (ceph): osd: kill backlog_requested
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 6d299552 (ceph): osd: strip backlog logic out of PG::activate()
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision e7514f75 (ceph): osd: state machine whitespace
- I feel better now
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:31 PM Revision 257b85d8 (ceph): osd: remove log_backlog from PG::Info
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 7521c51a (ceph): osd: remove backlog case from clean_up_local
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 9ceecc89 (ceph): osd: kill PG::Info::backlog
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision d7f7bbdc (ceph): osd: remove recovery-from-backlog kludge last_update
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 722ec7e5 (ceph): osd: kill unused PG_STATE_SCANNING
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision d84a9f6f (ceph): osd: cleanup
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 693950bf (ceph): osd: cleanup lingering backlog refs
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision e63c595a (ceph): osd: kill unused PG::Log::copy_after_unless_divergent
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision b5de19b5 (ceph): osd: kill unused PG::trim_write_ahead
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 0e7f4aff (ceph): osd: pg whitespace
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 400c27da (ceph): osd: track backfill with last_backfill, not interval_set<>
- We always fill from the bottom up anyway. Using an hobject_t also gives us
a precise bound. It also makes things co... - 07:31 PM Revision 91ee3375 (ceph): osd: osd_kill_backfill_at
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 99c614fa (ceph): osd: don't keep push state on replicas
- Primaries need this, but replicas don't: the primary will explicitly pull
the pieces of the object that it wants.
Si... - 07:31 PM Revision 2cdc6b4e (ceph): osd: rewrite choose_acting process
- Consolidate callers, eliminate obsolete backlog ones.
New process:
- pick best log, with preferences for those that... - 07:31 PM Revision 9e51c639 (ceph): osd: MOSDPGScan
- Message to query hash ranges of a PG.
Signed-off-by: Sage Weil <sage@newdream.net> - 07:31 PM Revision 8f14a358 (ceph): osd: add PG::BackfillInterval type
- Describe a range of objects for the purposes of backfilling a PG.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:31 PM Revision 55c24813 (ceph): osd: implement ReplicatedPG::_lookup_object_context
- Look up an existing ObjectContext without taking a reference.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:31 PM Revision 92d290d6 (ceph): osd: implement ReplicatedPG::scan_range
- Scan a range of the local collection.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:31 PM Revision 17b5d5c3 (ceph): osd: implement do_scan
- Handle MOSDPGScan messages to request or send a digest of a range of
objects in a collection, sorted in hobject_t (ha... - 07:31 PM Revision 353195d6 (ceph): types: operator<< for multimaps
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision e4ab0e3b (ceph): osd: add MOSDPGBackfill message
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 910398fe (ceph): osd: recover discontiguous peers using backfill instead of backlog
- Instead of generating a huge list of objects to recover, and then pushing
them, iterate over the collection and copy ... - 07:31 PM Revision 4509e619 (ceph): test_backfill.sh
- 07:31 PM Revision 004e7c92 (ceph): osd: add Incomplete peering state
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 73d15e01 (ceph): osd: do not read backlog off disk
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision b0664856 (ceph): osd: remove backlog generation code
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 6e9d135a (ceph): osd: simplify replica queries for finding divergent objects
- No need to request backlog here, clearly, since those don't exist anymore.
Signed-off-by: Sage Weil <sage.weil@dream... - 07:31 PM Revision b8ee27a3 (ceph): osd: remove Query::BACKLOG processing
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 78b64473 (ceph): osd: kill PG::Log::copy_non_backlog
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:31 PM Revision 10e481d1 (ceph): osd: fix push_to_replica typo
- We are always pushing soid. If we are missing snapdir locally, that means
we can't do an informed efficient clone, a... - 07:19 PM Revision b7a5a6a6 (ceph): doc: More consistency on formatting placeholder names.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 07:19 PM Revision 196d4273 (ceph): doc: Link to manpage when command is mentioned.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 07:19 PM Revision 75fd16a5 (ceph): doc: Use todo directive, rescue list of missing commands from wiki.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 07:19 PM Revision 81feae12 (ceph): doc: Add misc explanations of Ceph internals from email.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 07:19 PM Revision 034dd58f (ceph): doc: Add more missing commands to control.
- This is too unstructured, that will have to be fixed later.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.... - 07:19 PM Revision f5cfdbb7 (ceph): doc: Split intro to talk about the DFS separately. Mention petabytes.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 07:19 PM Revision bc16ac3b (ceph): doc: Fix sentence that ended too abruptly.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 07:19 PM Revision d745ff8d (ceph): doc: "ceph -w" clarification.
- Stop saying "watch cluster state" so many times.
Don't say stdout, that's the assumption.
Don't call showing things... - 07:14 PM Revision 18d99637 (ceph): Merge branch 'wip-messenger'
- 07:11 PM Revision 55639dcd (ceph): msgr: unset did_bind in stop().
- We use did_bind as a flag on whether or not to stop the Accepter thread
and we should clear it when we do the stoppin... - 06:59 PM Revision 41049f30 (ceph): objecter: fix use-after-free
- messenger consumes the m reference. Yay valgrind.
Signed-off-by: Sage Weil <sage@newdream.net> - 06:51 PM Revision 041d0456 (ceph): client: move PerfCounter into Client
- globals are evil.
Fixes: #1826
Signed-off-by: Sage Weil <sage@newdream.net> - 06:50 PM Revision e8e1e5df (ceph): swift: auth response returns X-Auth-Token instead of X-Storage-Token
- 05:31 PM Revision c9d0e556 (ceph): osd: fix build_incremental_map_msg
- We keep both the inc and the full for our oldest osdmap.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 05:27 PM Revision 1a473b7a (ceph): osd: clean up _delete_head
- Might be fixing a subtle logic bug, but old flow was confusing, so not
sure. :)
Signed-off-by: Sage Weil <sage@newd... - 05:26 PM Revision 6c8f60f6 (ceph): osd: simplify creation logic in do_osd_ops
- Drop the maybe_created variable, and track exists over the course of the
transaction.
Fixes: #1825
Signed-off-by: Sa... - 05:16 PM Bug #1832 (Closed): osd: size tracking discrepancy (scrub stat mismatch)
- During fsstress on the kernel client, this occurred:...
- 01:53 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
- Hi,
I've run into this precise problem on a small testing cluster that I'm running -- down to the large 64-bit tru... - 11:55 AM rgw Bug #1830 (Resolved): RGW Swift Metadata Bug
- I believe the rados gateway has a but in the way it's talking swift. When I ask it to list the objects in a container...
- 11:44 AM Feature #1782 (Resolved): mon: dump key cluster stats via perfcounter
- 11:32 AM CephFS Bug #1788: msgr file descriptor leak
- Forgot to update this. Haven't run into it yet and wip-messenger seemed to have fixed things. Thanks Greg!
- 11:27 AM CephFS Bug #1788 (Resolved): msgr file descriptor leak
- Haven't heard any new issues from Noah; merged to master in commit:18d996370efc2fc32d4973e9e6934901558bcbaf.
- 11:26 AM Messengers Bug #1829 (Resolved): SimpleMessenger tries to shut down threads that aren't running
- Oh, even simpler than I expected. Fixed in commit:55639dcd87fe985059355afe5fab787e4d139b11 (compile tested).
- 11:12 AM Messengers Bug #1829 (Resolved): SimpleMessenger tries to shut down threads that aren't running
- Saw this on benjamin yesterday. Looks like the OSD repeatedly restarted its messengers and was eventually unable to r...
- 11:01 AM CephFS Cleanup #1826 (Resolved): client: kill static perfcounter
- commit:041d04563e7cfdb837a345787a1569b07a064307
- 10:54 AM rgw Bug #1780 (Resolved): swift: auth response should return X-Auth-Token instead of X-Storage-Token
- Fixed, commit:e8e1e5dffbd25e2124331e607264e1bc4120676c.
- 10:12 AM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
- This happened again on sepia70 during the kernel untar build workunit on rbd.
- 09:40 AM Bug #1804 (Need More Info): filestore: unexpected EINVAL
- 09:39 AM Bug #1828 (Resolved): osd: preserve write order when ops wait for recovery of src_oids
- This affects current code.
It will need a minor adjustment so that "recovery" includes both is_missing() and osd >... - 09:33 AM CephFS Bug #1549 (Need More Info): mds: zeroed root CDir* vtable in scatter_writebehind_finish
- 09:32 AM Bug #1530: osd crash during build_inc_scrub_map
- fixed that last thing with commit:c9d0e556c7ad294819c60ca4e3cd4d0191811f18, but i think it's unrelated to the rest of...
- 09:22 AM Bug #1825: osd loses object deletes by some creates in the same transaction
- Fix looks good; I'm working on tests to verify and check regressions.
- 02:08 AM Revision abecbc59 (ceph): OSDMonitor: remove useless check
- Session was already verified to exist before this.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> - 12:31 AM Revision 5804477b (ceph): qa: trivial_libceph test
- This currently fails... see #1827
Signed-off-by: Sage Weil <sage@newdream.net> - 12:29 AM Revision c87f31e0 (ceph): client: return errors from init
- Signed-off-by: Sage Weil <sage@newdream.net>
- 12:29 AM Revision 2f281d1f (ceph): libceph: catch errors from Client::init()
- And clean up error paths.
Signed-off-by: Sage Weil <sage@newdream.net> - 12:29 AM Revision 207c40b0 (ceph): libceph: add missing #includes
- Signed-off-by: Sage Weil <sage@newdream.net>
- 12:16 AM Revision 31b5ccbf (ceph): coverage: use locally stored build instead of downloading from a gitbui...
12/13/2011
- 05:31 PM CephFS Bug #1827: libceph: hang on creating a file
- see commit:5804477b20f89a2b02218b518a44e73073b393c9 for reproducer.
fwiw i ran with vstart and 'LD_PRELOAD=../../s... - 04:36 PM CephFS Bug #1827 (Resolved): libceph: hang on creating a file
- Using trivial thinger from Noah.
- 05:15 PM Revision 6b425676 (ceph): objectstore: implement Transaction::dump()
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:15 PM Revision 7133a2fa (ceph): filestore: dump transaction to log if we hit an error
- This will let us see which operation in the transaction failed.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:05 PM Revision 3d13f003 (ceph): objectstore: create Transaction::iterator class
- Remove iterator state from Transaction itself.
Signed-off-by: Sage Weil <sage@newdream.net> - 04:32 PM CephFS Cleanup #1826 (Resolved): client: kill static perfcounter
- Make it a Client member. The CephContext stuff tracks "per-process" state now, so no need to be weird. Also, these ...
- 04:28 PM Revision 4da96ff3 (ceph): rados load-gen workunits
- 04:19 PM Revision 6ff95e9d (ceph): qa: rados load-gen workunits
- Signed-off-by: Sage Weil <sage@newdream.net>
- 03:10 PM Bug #1825: osd loses object deletes by some creates in the same transaction
- see wip-osd-maybe-created
- 02:11 PM Bug #1825 (Resolved): osd loses object deletes by some creates in the same transaction
- We found a missing object in alexandria, caused by the gateway trying to delete an object that seems to not actually ...
- 11:07 AM rgw Tasks #1823: radosgw should have internal timeouts
- I think I wasn't clear enough. RGW doesn't need to do that in the I/O path. Anyway, we need to think of the functiona...
- 10:55 AM rgw Tasks #1823: radosgw should have internal timeouts
- RGW ought to be able to grab information about IOs which are taking too long and figure out what OSD that IO resides ...
- 10:52 AM rgw Tasks #1823: radosgw should have internal timeouts
- We can have timeouts for the init process for other operations I'm not sure it'll make sense doing it in the rgw laye...
- 10:44 AM rgw Tasks #1823 (Rejected): radosgw should have internal timeouts
- Letting Apache time out the rados gateway makes admins sad, since there's no visibility into what is actually timing ...
- 10:53 AM rgw Tasks #1824 (Resolved): ceph monitor status should be available and documented
- I saw last night that I think we can run "ceph quorum_status" to see which monitors are in the quorum, "ceph mon_stat...
- 10:49 AM Bug #1821: librados: rados_create_with_context is unusable
- Josh Durgin wrote:
> The C++ variant librados::Rados::init_with_context is used by librbd, radosgw, and some command... - 10:44 AM Bug #1821: librados: rados_create_with_context is unusable
- The C++ variant librados::Rados::init_with_context is used by librbd, radosgw, and some command line tools, but this ...
- 10:49 AM Bug #1820: deprecate "ceph stop"
- It's not being run because getting the parsing and isolating leaks is a pain, but there are teuthology tasks to run v...
- 10:28 AM Bug #1820: deprecate "ceph stop"
- none of this is tested anywhere.. it's for when you manually want to check for leaks, and need the osd to try to shut...
- 10:08 AM Bug #1820: deprecate "ceph stop"
- I don't see anything in teuthology sending stop commands to the OSDs; I believe the valgrind stuff just uses SIGTERM.
- 09:59 AM Bug #1820: deprecate "ceph stop"
- exit(0) on SIGTERM is perfectly valid.
If we do need more than SIGUSR1 & SIGUSR2, the communication mechanism shou... - 09:38 AM Bug #1820: deprecate "ceph stop"
- ...
- 09:31 AM Bug #1820: deprecate "ceph stop"
- gcov is already using SIGTERM.
- 10:33 AM Bug #1530: osd crash during build_inc_scrub_map
- I'm guessing this is the new incarnation of this issue?
From teuthology:~teuthworker/archive/nightly_coverage_2011-1... - 10:31 AM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
- Happened again in teuthology:teuthworker~/archive/nightly_coverage_2011-12-13-a/4183/remote/ubuntu@sepia74.ceph.dream...
- 10:12 AM rgw Bug #1822 (Closed): radosgw can be slow to respond to requests
- The DHO admins are having problems where sometimes requests take so long that Apache issues an ISE 500. It's often bu...
- 09:48 AM Bug #1789 (Need More Info): mon: failed assert(paxosv == pg_map.version)
- have core, but no matching binary. not clear from code inspection what happened.
- 09:30 AM Bug #1804: filestore: unexpected EINVAL
- as of commit:7133a2faf0ae0710b7cbd9801c64767172d48faf we dump the failed transaction to the log.
- 08:28 AM Feature #1799 (Resolved): qa: add 'rados --load-gen' test(s)
- 12:29 AM Revision c9e4504f (ceph): Ignore lockdep being turned off for now.
- Some machines are hitting this udev issue:
http://marc.info/?l=linux-kernel&m=132033587908426&w=2 and lockdep is
turn... - 12:00 AM Revision 6d5e5bdb (ceph): pybind/rados: add asynchronous write,append,read,write_full operations
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
12/12/2011
- 10:31 PM Revision 78b7a255 (ceph): doc: Import the list of ceph subcommands from wiki.
- This adds the content of the wiki page at
http://ceph.newdream.net/wiki/Monitor_commands
to doc/control.rst in orde... - 10:31 PM Revision 9aadd41b (ceph): doc: Add documentation of missing osd commands.
- The set of OSD commands which added by the previous commit is
incomplete. This patch adds documentation for the follo... - 10:31 PM Revision 1867a745 (ceph): doc: Document pause and unpause osd commands.
- These two commands were undocumented so far. This patch adds a short
description.
Signed-Off-By: Andre Noll <maan@sy... - 10:31 PM Revision 7dce3e6f (ceph): doc: Update the list of fields for the pool set command.
- This list was lacking a few fields: crash_replay_interval, pg_num,
pgp_num and crush_ruleset. Include these fields an... - 10:31 PM Revision db30716b (ceph): doc: Add missing documentation for osd pool get.
- "osd pool set" was already documented, but the corresponding "get"
command was not. This patch adds the list of valid... - 10:31 PM Revision fb8fd186 (ceph): doc: Clarify documentation of reweight command.
- This caused some discussions on the mailing list, so let's try to be clear
about the meaning of an OSD weight.
Signe... - 09:35 PM Bug #1821: librados: rados_create_with_context is unusable
- i think radosgw uses it. it creates a CephContext by linking directly the ceph internals...
- 05:12 PM Bug #1821 (Resolved): librados: rados_create_with_context is unusable
- There's no way to get a CephContext using the C api, so you can't pass one to rados_create_with_context. Maybe a rado...
- 09:24 PM Revision 06046470 (ceph): SimpleMessenger: remove void send_keepalive.
- Nobody uses this; they all call the version that returns an int.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhos... - 09:24 PM Revision e6e66232 (ceph): mds: mark_disposable when closing a Client connection.
- This is causing issues since the Client's ack of the MClientSession
is somehow not getting back to the MDS. We should... - 09:24 PM Revision 1dd173a2 (ceph): messenger: fix up fault()'s "onconnect" parameter.
- We should be setting this true when calling fault() from connect().
And rename it in the header -- it does produce le... - 07:25 PM Bug #1820: deprecate "ceph stop"
- Iirc the real purpose is to make the daemon shut down cleanly. This is important for gprof, valgrind memcheck, etc. ...
- 02:38 PM Bug #1820 (Resolved): deprecate "ceph stop"
- A good daemon supervision system would try to restart any daemons that just exited. For "ceph stop" to work in the wo...
- 05:29 PM Revision 5e215c7e (ceph): Merge branch 'wip-mon-stats'
- 05:27 PM Revision 808a851d (ceph): mdsmap: rename get_num_*_mds() methods
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:27 PM Revision 711447d8 (ceph): mon: add mds, mon info to cluster_logger
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:24 PM Revision ac31d526 (ceph): mon: report basic cluster stats via perfcounters
- These are basic point-in-time cluster stats.
Signed-off-by: Sage Weil <sage@newdream.net> - 05:22 PM Revision 1f1b5fdf (ceph): crush: drop unused label
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:20 PM Revision 62b78de7 (ceph): Merge remote branch 'gh/stable'
- 05:18 PM Revision 495307a1 (ceph): crush: fix force to behave with non-root TAKE
- If the (first) TAKE in the crush rule is not the root, see if they picked
a point somewhere beneath the appropriate p... - 05:17 PM Revision 14f8f00e (ceph): crush: simplify force argument check
- force isn't used past this point, only force_pos. Collapse the if
conditions.
Signed-off-by: Sage Weil <sage@newdre... - 04:45 PM Messengers Bug #1803: msgr: behave better when ending TCP connections
- And I've flipped back and forth umpteen times today about what's going on. At this point I can conclude that nobody o...
- 10:49 AM Messengers Bug #1803 (In Progress): msgr: behave better when ending TCP connections
- From the little I'm reading in Unix Network Programming, it looks like we're just doing this wrong — we call shutdown...
- 11:21 AM Documentation #1819 (Resolved): document librados python api
- 11:21 AM rbd Documentation #1818 (Closed): document librbd C++ api
- 11:20 AM Documentation #1817 (Closed): document librados C++ api
- 11:20 AM rbd Documentation #1816 (Closed): document librbd C api
- Use similar examples to the python api docs.
- 11:19 AM Documentation #1815 (Resolved): document librados C api
- Document the librados C api with doxygen.
- 10:00 AM Documentation #1814 (Resolved): doc: openstack + ceph install howto
- 09:58 AM rgw Documentation #1813 (Resolved): doc: document radosgw api diffs with s3
- move from google docs or wherever. clean up. maintain going forward.
- 09:50 AM Bug #1683 (In Progress): librados: list objects should also return locator key
- 09:48 AM Bug #1744: teuthology: race with daemon shutdown?
- any additional teuthology logging we can add to sort out what is happening?
- 09:47 AM RADOS Bug #1794 (Resolved): crush: creating/destroying buckets of zero items
- fixed by commit:ca002a3389877f5e150659649e27e7ae59d7d402
- 09:45 AM Feature #1782: mon: dump key cluster stats via perfcounter
- 08:53 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
- Verify that last failure was running a commit that included the fix?
- 08:38 AM Linux kernel client Bug #1812 (Resolved): iput scheduling while atomic
- iput can sleep, but is called with spinlocks held in some cases....
- 08:34 AM Bug #1750 (In Progress): xattr errors silently ignored, cause trouble later
- 08:31 AM Bug #1750: xattr errors silently ignored, cause trouble later
- Shouldn't the FileStore have asserted on the -28?
- 03:19 AM Linux kernel client Bug #1795: break d_lock > s_cap_lock ordering
- Seems fixed here now with git branch wip-d-lock.
- 03:18 AM Linux kernel client Bug #1762: i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
- Seems to be fixed here now with git commits be655596b3de5873f994ddbe205751a5ffb4de39 (for-linus) and 1a2fe05d296a35da...
12/10/2011
- 12:31 AM Revision cf279a8b (ceph): workunits: print tests pjd runs
- This will tell us which ones actually failed within a test suite.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost....
12/09/2011
- 11:23 PM Revision 8064440d (ceph): Merge branch 'wip_pgls'
- 11:22 PM Revision 864847b2 (ceph): pybind: add object locator support to pybind pool listing
- list_objects returns Object(). Object therefore now has an optional
locator_key parameter which will set up the obje... - 09:44 PM Revision 111c12ce (ceph): ReplicatedPG: collection_list_handle_t is now an hobject_t
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 09:44 PM Revision 4ce7dd48 (ceph): rados.cc: add --object-locator and object locator output to ls
- --object-locator locator causes io to use the specified locator. For
objects with non-empty locators, rados pool ls ... - 09:44 PM Revision 798ef38b (ceph): osd: delay pg list on a snapid until missing is empty
- We cannot determine from the missing set whether an object existed
at a given snap.
Signed-off-by: Samuel Just <samu... - 04:53 PM CephFS Bug #1811 (Duplicate): 2 pjd chown tests failed on cfuse
- From teuthology:~teuthworker/archive/nightly_coverage_2011-12-09-a/4061/teuthology.log:...
- 04:32 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
- A disk error prevented me from getting logs before:...
- 03:42 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
- Got the same trace on sepia18 while running mkfs.ext3 on an rbd image.
- 03:18 PM Bug #1758 (New): OSD segfault in SimpleMessenger::send_message
- This happened again yesterday. Core is in teuthology:~teuthworker/archive/nightly_coverage_2011-12-08-a/3954/remote/u...
- 11:18 AM Messengers Bug #1803: msgr: behave better when ending TCP connections
- I'm going to see if I can handle this in userspace today — fixing it in the kernel client will be another ticket.
- 11:14 AM Feature #1810 (Resolved): monclient: timeouts?
- It's been suggested that maybe certain categories of clients which are used for gathering statistics rather than comm...
- 11:13 AM Messengers Feature #1809 (New): msgr: limit simultaneous connections
- Right now SimpleMessenger has no mechanism for limiting the number of simultaneous connections it holds open. This is...
- 11:10 AM Feature #1808 (Rejected): filestore: gracefully handle EMFILE
- If the FileStore gets an EMFILE error it asserts out without attempting to handle the problem. I don't know whether t...
- 09:34 AM Revision e2a94505 (ceph): obsync: add swift support to obsync
- A single "url" doesn't make sense for a swift object store the way it does
for an S3 store or local file, so this com... - 07:15 AM Bug #1797: configure doesn't link to pthread on Fedora 14 on linking librados-config
- I just find out it works when you call configure with
LIBS="-lpthread" ./configure
Still a bug, though, the c... - 02:01 AM Revision d21f4abc (ceph): msgr: turn up socket debug printouts
- These shouldn't be too common and will help in debugging
socket leaks.
Signed-off-by: Greg Farnum <gregory.farnum@dr... - 01:47 AM Revision a768ad73 (ceph): coverage: don't generate html reports for each test
- These can always be generated from the lcov files later, right now they just waste space.
- 01:17 AM Revision 7b52dd14 (ceph): syslog: ignore 'task blocked' warnings
- These will happen under heavy load (usually on the osd).
- 12:36 AM Revision 891025e5 (ceph): udev: drop device number from name
- The device number depends on how many rbd images have been
mapped. Removing it makes the name determined solely by th...
12/08/2011
- 11:35 PM Revision 6b8588b7 (ceph): Use btrfs for regression tests
- Some of the tests (particularly the s3 tests) use very long filenames
which trigger bugs related to ext4 xattr handli... - 09:10 PM Revision a5606ca4 (ceph): pybind: trivial fix of missing argument
- Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com>
- 06:40 PM Bug #1805 (Rejected): OSD: fd leak
- I was trying to figure out why the OSD was generating ~600 new sessions in the 4.5 seconds after starting up, when I ...
- 06:20 PM Bug #1805 (Need More Info): OSD: fd leak
- *sigh* It appears that I didn't manage to gather the correlated data that I thought I did. After an audit of who uses...
- 02:10 PM Bug #1805 (Rejected): OSD: fd leak
- There's an fd leak in the OSD. It looks like it's probably related to doing lots of OSDMap advancements at once, base...
- 06:35 PM Bug #1807 (Can't reproduce): CentOS compile error in perfglue/heap_profiler.cc
- on a CentOS system, I did a git fetch/merge followed by a make clean,
and got a compilation error in perf
CXX ... - 05:59 PM Bug #1741: teuthology: failed to untar
- Doesn't look like any other tests that day had the same machines locked while this was run. I think this might just b...
- 05:40 PM Bug #1741: teuthology: failed to untar
- It was 2662 that had this error.
- 05:21 PM Feature #1800 (Resolved): qa: run osd tests on btrfs
- 04:42 PM Revision e4db1297 (ceph): crush: whitespace
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 04:41 PM Revision 808763ea (ceph): osdmap: initialize cluster_snapshot_epoch
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 04:41 PM Revision c94590ab (ceph): crush: set max_devices=0 for map with empty buckets
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 04:06 PM Revision ca002a33 (ceph): crush: fix stepping on unallocated memory
- If size is 0 we can't write here.
Reported-by: pankaj singh <psingh.ait@gmail.com>
Signed-off-by: Sage Weil <sage.we... - 03:56 PM CephFS Bug #1806 (Can't reproduce): MDS won't start
- ceph-mds fails to enter replay on start even though mon appears to instruct it to do so, all 3 mds processes remain i...
- 03:34 PM Bug #1750 (Rejected): xattr errors silently ignored, cause trouble later
- I've updated the regression suite to use btrfs.
- 02:16 PM Bug #1750: xattr errors silently ignored, cause trouble later
- I was able to reproduce this once with logging. It appears to be the ext4 xattr limitation.
2011-12-08 12:45:41.2... - 11:31 AM CephFS Bug #1788: msgr file descriptor leak
- I guess this bug should be considered fixed by commit:8c4f4748e8b683f5b4ea939295793421c0ab7b61 in the wip-messenger b...
- 05:19 AM Revision d940d68d (ceph): client: trim lru after flushing dirty data
- Shouldn't matter, but it would be interesting to see if this affects
#1737.
Signed-off-by: Sage Weil <sage.weil@drea... - 05:19 AM Revision 1545d03c (ceph): client: unmount cleanup
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 05:19 AM Revision f3c90f8d (ceph): client: wait for sync writes even with cache enabled
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 05:19 AM Revision adbe3639 (ceph): client: send umount warnings to log, not stderr
- stderr isn't usually open anyway.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
12/07/2011
- 11:20 PM Revision e69057e4 (ceph): internal: check syslog for errors
- This should catch lockdep warnings and mark tests with them as failed.
- 07:40 PM Revision 9ab445a4 (ceph): ObjectStore: Add collection_list_partial for hash order
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 07:40 PM Revision 997265a2 (ceph): os/HashIndex: some minimal debug output
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:40 PM Revision 0807e7d5 (ceph): hobject_t: make filestore_hobject_key_t 64 bits
- So we can return 0x100000000 when max=true.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:40 PM Revision 322f93a2 (ceph): hobject_t: encode max properly
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:40 PM Revision 717621f6 (ceph): librados,Objecter,PG: list objects now includes the locator key
- Previously, there was no way to recover the locator key used to create
and object. Now, rados_objects_list_next and ... - 07:40 PM Revision 2d3721c6 (ceph): ObjectStore,ReplicatedPG: remove old collection_list_partial
- No need for the old collection_list_partial instance: it's cleaner to
just use an hobject_t as the collection list ha... - 07:40 PM Revision 2026450b (ceph): hobject_t: define max value
- Create a max value that is greater than all other values.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:40 PM Revision 348321a5 (ceph): hobject_t: sort by (max, hash, oid, snap)
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 07:40 PM Revision cada2f2e (ceph): object.h: Sort hobject_t by nibble reversed hash
- To match the HashIndex ordering, we need to sort hobject_t by the nibble
reversed hash. We store objects in the file... - 07:40 PM Revision 63e3d864 (ceph): hobject_t: define explicit hash, operator<<; drop implicit sobject_t()
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 05:56 PM Bug #1804 (Closed): filestore: unexpected EINVAL
- Core file and binary are on gitbuilder-gcov-amd64:~/bug_1804.
The data is still on sepia24 for inspection.... - 05:20 PM Messengers Bug #1803: msgr: behave better when ending TCP connections
- This actually caused a deadlock with ffsb on the kernel client - ffsb ended up with 1006 connections in the CLOSING s...
- 04:56 PM Messengers Bug #1803 (Won't Fix): msgr: behave better when ending TCP connections
- TV is telling me that if we're not confirming that each side of the connection calls ::shutdown() on the socket, we'r...
- 04:51 PM Bug #1791 (Resolved): osd: assert(0) in sub_op_modify
- This looks like the objecter bug, fixed by commit:2f5bd5f737e831a03beb93c3928c74b59a59052e
- 03:38 PM Bug #1763 (Resolved): qa: need to run qa tests on kernel with lockdep enabled
- Lockdep was already enabled, but we weren't marking runs as failed if errors appeared in syslog. Teuthology commit e6...
- 01:49 PM CephFS Bug #1737: ceph-fuse crash in xlist::remove
- This happened again from a different path in teuthology:~teuthworker/archive/nightly_coverage_2011-12-07-a/3843/remot...
- 11:46 AM Feature #1802 (Resolved): qa: test to exercise divergent osd logs
- - generate some write/overwrite workload with many concurrent writes
- extend ceph_manager to pause (kill -STOP) an ... - 11:18 AM rgw Bug #1801 (Resolved): rgw: radosgw-admin remove subuser and related swift key in a single command
- 11:15 AM Feature #1800 (Resolved): qa: run osd tests on btrfs
- i think all the code is there, but we need to make the night runs actually do it.
- 10:41 AM Feature #1799 (Resolved): qa: add 'rados --load-gen' test(s)
- maybe a few tests with a range of options, if appropriate
- 10:41 AM Feature #1798 (Rejected): qa: add rados/librados tests (RadosModel)
- 10:10 AM Feature #1784 (Duplicate): osd: redo pgls api
- 09:27 AM rbd Feature #1790: rbd: have a way of establishing configured mappings at boot time
- Single-file configuration is more annoying to handle with automated tools, file-per-device gives you good atomicity o...
- 09:01 AM Bug #1778 (Resolved): Error after installing an iso-image via qemu / rbd-image
- Hi Oliver,
You can use rbd to take live snapshots with the same consistency as with snapshotting images on nfs. Th... - 03:31 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- Hi Josh,
well, the small fix does it, no more crashes.
But, of course I would love to have back my live-snapsho... - 08:57 AM Bug #1797 (Resolved): configure doesn't link to pthread on Fedora 14 on linking librados-config
- When building ceph 0.39 on Fedora 14, the build process fails with the
following messages:
CXXLD librados-con... - 08:43 AM CephFS Bug #1796 (Resolved): mds: exit cleanly on EBLACKLISTED
- ...
- 08:31 AM Linux kernel client Bug #1795 (Resolved): break d_lock > s_cap_lock ordering
- ...
- 08:01 AM RADOS Bug #1794 (Resolved): crush: creating/destroying buckets of zero items
- we still try to calloc the length zero array
and then try to free it later... - 07:32 AM CephFS Bug #1047: mds: crash on anchor table query
- Got it again with 0.39. Still there.
- 12:16 AM Revision 95e63247 (ceph): workunit: set client id and secretfile env vars
- These are used by the kernel rbd workunit to know how to map images.
Signed-off-by: Josh Durgin <josh.durgin@dreamho...
12/06/2011
- 11:56 PM rbd Feature #1790: rbd: have a way of establishing configured mappings at boot time
- What if your image is not in the pool "rbd" ?
I was thinking about a 'rbdtab' file:... - 11:10 AM rbd Feature #1790 (Resolved): rbd: have a way of establishing configured mappings at boot time
- We need to be careful about the config format, to make automatic editing easy (think Chef).
First draft:
/etc/c... - 11:22 PM Revision 745be30f (ceph): gitignore: Ignore src/keyring, as created by vstart.sh
- Commit 86c34ba9ee8c883b71a8449c3c261154365c35ae changed
the filename but not .gitignore.
Signed-off-by: Tommi Virtan... - 10:44 PM Revision a1ebd725 (ceph): ReplicatedPG: don't crash on empty data_subset in sub_op_push
- If data_subset is empty (i.e., the data we pulled is no longer useful),
we should mark complete false and continue ra... - 10:24 PM Revision 03b03553 (ceph): ReplicatedPG: do not ->put() scrub messages when adding to a WorkQueue.
- This function is passing a reference from PG::active_rep_scrub to
the req_scrub_wq, not eliminating the reference (an... - 10:20 PM Revision 8afa5a5d (ceph): workunits: fix secret file and temp file removal for kernel rbd
- Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
- 09:36 PM Revision bcd26fca (ceph): workunits: make rbd kernel workunit executable
- Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
- 08:13 PM Revision 2bdf9078 (ceph): doc: Reorganize pip calls to use a requirements file.
- The conditional before running pip install was unnecessary,
"pip install" on already installed packages is fast (as l... - 08:07 PM Revision 200d7c89 (ceph): doc: Switch diagram tools from dia to ditaa.
- Now you can create diagrams easily with the ".. ditaa::"
directive in the Sphinx documents.
admin/build-doc now chec... - 06:50 PM Revision 20b7af79 (ceph): doc: fix typo
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:50 PM Revision 33753c82 (ceph): filestore: send back op error to log, not stderr
- Signed-off-by: Sage Weil <sage@newdream.net>
- 06:31 PM Revision 66b6b1bf (ceph): workunits: add some tests for kernel rbd
- This covers some snapshot and resize functions that aren't tested by fs benchmarks.
Signed-off-by: Josh Durgin <josh... - 06:26 PM Revision 575f717f (ceph): rbd: allow snapshots to be mapped
- unmap and showmapped already support snapshots. map should too.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> - 06:26 PM Revision 01d30e6a (ceph): secret: fix error check
- add_key will return -1 when an error occurs, which should be handled at a higher level and not printed here.
Signed-... - 06:26 PM Revision 0ad0fbfe (ceph): secret: add is_kernel_secret function
- This will let us know whether we can add a key mount option
if no secret is specified.
Signed-off-by: Josh Durgin <j... - 06:26 PM Revision 274f4890 (ceph): rbd, mount.ceph: use pre-stored secret if available
- If a secret is specified, store and use it, but otherwise
check for a pre-existing secret to use.
Signed-off-by: Jos... - 06:26 PM Revision 16a211bf (ceph): ceph-rbdnamer: include snapshot name if present
- Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
- 06:26 PM Revision fd9556f0 (ceph): rbd: the showmapped command shouldn't connect to the cluster
- Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
- 06:02 PM Linux kernel client Bug #1793 (Can't reproduce): NULL pointer dereference at try_write+0x627/0x1060
- Found in sepia50's console:...
- 04:44 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- 04:44 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- The bug is in the qemu driver - the fix is "in our qemu repo":https://github.com/NewDreamNetwork/qemu-kvm/commit/7ee2...
- 09:28 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- Hi Oliver,
That gdb session is actually an entirely different crash - I'll take a closer look at both of these tod... - 02:14 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- Well Josh,
being quite busy... and need to understand ( not a "real-coder" these days anymore ;-) ) how to configu... - 04:34 PM Revision ddc11a8f (ceph): test_rados.py: clean up after EEXIST test
- This extra pool caused subsequent pool tests to fail.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> - 02:35 PM Bug #1758 (Resolved): OSD segfault in SimpleMessenger::send_message
- I checked out a core dump, and the OSD is calling send_message with a null Connection* from PG::replica_scrub::2895. ...
- 11:53 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
- And in teuthology:~teuthworker/archive/nightly_coverage_2011-12-06-a/3757/remote/ubuntu@sepia66.ceph.dreamhost.com/lo...
- 11:52 AM Bug #1758: OSD segfault in SimpleMessenger::send_message
- Happened again today in teuthology:~teuthworker/archive/nightly_coverage_2011-12-06-a/3772/remote/ubuntu@sepia66.ceph...
- 02:01 PM CephFS Bug #1702 (Can't reproduce): Ceph MDS crash + client mount problem
- 02:01 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
- I think the next step here is to run the mds under valgrind.
- 02:00 PM Bug #1490 (Resolved): cfuse assert failure: assert(ob->last_commit_tid < tid)
- 11:34 AM CephFS Bug #1792 (Can't reproduce): crash in ceph-mds
- This is the full log from teuthology:~teuthworker/archive/nightly_coverage_2011-12-01-b/3516/remote/ubuntu@sepia70.ce...
- 11:25 AM Bug #1791 (Resolved): osd: assert(0) in sub_op_modify
- From teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-a/3569/remote/ubuntu@sepia6.ceph.dreamhost.com/log/o...
- 11:19 AM Bug #1750 (New): xattr errors silently ignored, cause trouble later
- Happened again after s3tests in teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-b/3624/teuthology.log.
- 11:09 AM CephFS Bug #1675: mds: failed rstat assert
- Happened during fsstress in teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-b/3593/remote/ubuntu@sepia92....
- 11:07 AM Bug #1789 (Resolved): mon: failed assert(paxosv == pg_map.version)
- From teuthology:~teuthworker/archive/nightly_coverage_2011-12-02-b/3603/remote/ubuntu@sepia44.ceph.dreamhost.com/log/...
- 10:54 AM Bug #1530: osd crash during build_inc_scrub_map
- Another one crashed in PG::replica_scrub yesterday. core is in teuthology:~teuthworker/archive/nightly_coverage_2011-...
- 06:01 AM CephFS Bug #1047: mds: crash on anchor table query
- Updated Ceph to 0.39 and the bug seems to be gone.
- 01:33 AM Revision 54758abc (ceph): Merge remote branch 'gh/stable'
- 12:16 AM Revision 9512aed5 (ceph): doc: fix rst syntax
- Signed-off-by: Sage Weil <sage@newdream.net>
12/05/2011
- 10:07 PM Revision 7178f1ca (ceph): doc: document monitor cluster expansion/contraction
- Pretty sure my rst syntax is wrong.
Signed-off-by: Sage Weil <sage@newdream.net> - 09:33 PM Revision 16f79282 (ceph): cephtool: fix shutdown
- Fix 'ceph -w' brokenness from commit ad13d0b7.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 07:21 PM Revision 019597e6 (ceph): filejournal: make FileJournal::open() arg slightly less weird
- Pass in fs_op_seq (last_committed_seq), not the next expected seq, so we
can avoid subtracting and adding 1 in odd pl... - 07:21 PM Revision bfbc4324 (ceph): Merge branch 'stable'
- 07:21 PM Revision 86c34ba9 (ceph): vstart.sh: .ceph_keyring -> keyring
- Signed-off-by: Sage Weil <sage@newdream.net>
- 07:15 PM CephFS Bug #1774: client: files become inaccessible in large directories (with snapshots?)
- Some interesting findings... It appears that the problem has nothing to do with the mds, but with the fuse client. ...
- 06:53 PM Revision 1e3da7ed (ceph): filejournal: remove bogus check in read_entry
- It is perfectly fine to read events that are older than the fs's seq from
the journal; open() will skip them when pos... - 06:08 PM Revision dbd7a3b4 (ceph): Rename "testrados" task to not begin with "test".
- See commit e80c32c44293e6453cce1bf89ad3cf5b1b4917ab in
teuthology.git - 06:07 PM Revision e80c32c4 (ceph): Rename "testrados" and "testswift" tasks to not begin with "test".
- Anything "test*" looks like a unit test, and shouldn't be used for
actual code. - 06:07 PM Revision 9598e479 (ceph): Rename "testrados" and "testswift" tasks to not begin with "test".
- Anything "test*" looks like a unit test, and shouldn't be used for
actual code. - 06:02 PM Revision 0dd4d69f (ceph): Fix unit tests for SSH keep-alive setting.
- Commit 6e3e0d7cdcb5ba70f938f0850a8828aca2753ab5 failed to pass
unit tests. - 05:37 PM Revision dc167bac (ceph): filejournal: set last_committed_seq based on fs, not journal
- last_committed_seq is the last seq committed to the fs, not the journal.
Set it when we begin replay with the fs prov... - 04:15 PM CephFS Bug #1788 (Resolved): msgr file descriptor leak
- With our Hadoop workload (lots of client connections), this problem occurs every couple hours -- although this is the...
- 02:18 PM Bug #1786 (Resolved): ceph -w goes dead after 5 minutes
- commit:16f79282cd0132c3633216f51fbbf0f93a0aec61
- 11:13 AM Bug #1786 (Resolved): ceph -w goes dead after 5 minutes
- 02:18 PM Bug #1785 (Resolved): osd: os/FileJournal.cc: 1011: FAILED assert(seq >= last_committed_seq)
- commit:1e3da7edcf8881b10f35879e4b5b6be93167c636
- 09:14 AM Bug #1785 (Resolved): osd: os/FileJournal.cc: 1011: FAILED assert(seq >= last_committed_seq)
- 11:22 AM CephFS Bug #1787 (Closed): mds: laggy oneshot replays pollute mdsmap
- ...
- 10:53 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
- I lost my setup over the weekend, so I'm not going to be able to try the wip-truncate branch on the deployment to see...
12/03/2011
- 03:11 PM Feature #1784 (Duplicate): osd: redo pgls api
- include locators
use hobject_t as iterator (and hopefully make the objecter split/merge coping logic less ugly in th... - 03:09 PM Feature #1783 (Resolved): osd: scrub incrementally across hash range using MOSDPGScan
- Current scrub will not scale to large PGs.
- 01:01 AM CephFS Bug #1047: mds: crash on anchor table query
- Attached a log of a full run up to the crash. MDS tries to recover from some problem, replays and crashes.
12/02/2011
- 11:35 PM Revision 4a0b00a0 (ceph): mon: stub perfcounters for monitor, cluster
- The 'mon' perfcounter is for the local daemon and is always registered.
The 'cluster' perfcounter is for cluster sta... - 11:27 PM Revision 6dd81485 (ceph): osd: rename {take -> requeue}_object_waiters
- It calls osd->requeue_ops(), so make naming more consistent and avoid
confusing people like me.
Signed-off-by: Sage ... - 11:27 PM Revision 8bbe576c (ceph): osd: safely requeue waiting_for_ondisk waiters on_role_change
- This could conceivably cause the reply ordering mismatch seen in bug
#1490. Not sure why we didn't also fix this cal... - 09:38 PM Revision c8831004 (ceph): rados.py: add list_pools method
- Signed-off-by: Eric Chen <Eric_YH_Chen@wistron.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> - 08:06 PM Revision 6b4b6595 (ceph): Merge branch 'stable'
- 07:28 PM Revision 06228716 (ceph): Doc: add a conceptual overview of the peering process
- Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>
- 07:19 PM Revision c45a8491 (ceph): mds: remove obsolete doc
- 06:52 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- Hi Oliver,
With snapshot=on data is never saved to the backing device - the original file is not modified unless y... - 05:31 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- Well Josh,
attached you will find a crash, qemu-system... started without "-daemonize" to see what's going on ;-)
... - 04:46 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- Hi Josh,
I have just made a session with savevm/loadvm, once without/with the snapshot-option, now with qemu-1.0. ... - 05:58 PM Revision 0c183ec7 (ceph): crush: ignore forcefed input that doesn't exist
- This might happen if, e.g., the file_layout specifies an osd that later
is removed from the cluster entirely. Just i... - 05:47 PM Revision faf5ce62 (ceph): Revert "CrushWrapper: ignore forcefeed if it does not exist"
- This reverts commit 6fbab6da6942c238d40a6b4f1680a7e6da463289.
This fails a unit test.
And I change my mind.. I thin... - 05:01 PM Revision 321ecdab (ceph): v0.39
- 05:00 PM Revision 75aff023 (ceph): OSDMap: build_simple_from_conf pg_num should not be 0 with one osd
- Previously, pg_num would end up set to 0 if osd.0 is the only osd.
Signed-off-by: Samuel Just <samuel.just@dreamhost... - 03:51 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
- Sorry - haven't had a chance yet. I'll try it on Monday.
- 11:50 AM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
- Sam, did you get a chance to try this?
- 03:43 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
- If we're lucky this was caused by taking waiters improperly, which Sage fixed in commit:8bbe576cab9ecdbfea939ad3d7866...
- 03:40 PM Feature #1782: mon: dump key cluster stats via perfcounter
- commit:4a0b00a0f29a87965925e0b44c997bece96b9936 stubs this out. just need to populate the perfcounter with the relev...
- 02:20 PM Feature #1782 (Resolved): mon: dump key cluster stats via perfcounter
- This may be a minor abuse of the perfcounter intent, but it lets us get cluster stats using a common mechanism (via c...
- 03:22 PM Feature #390 (In Progress): Implement bdrv_snapshot_goto (Rollback), bdrv_snapshot_delete
- Have some functions, trying to get a setup to test them with.
- 01:54 PM Feature #1082 (Rejected): obsync: swift support
- dho guys are doing this.
- 01:27 PM Feature #1781 (Resolved): qa: readwrite and roundtrip rgw tests in qa suite
- 01:01 PM rgw Bug #1780 (Resolved): swift: auth response should return X-Auth-Token instead of X-Storage-Token
- 11:56 AM Bug #1750 (Resolved): xattr errors silently ignored, cause trouble later
- 11:54 AM Bug #1757 (Closed): oi disagrees with stat, or error code on stat
- 11:52 AM Bug #1679 (Can't reproduce): assertion failure is_replica()
- and old codepending new code.
- 11:52 AM Bug #1688 (Won't Fix): Benjamin: pg stuck in scrub
- old code.
- 11:50 AM Bug #1689 (Can't reproduce): osd: segfault in recover_primary
- going to ignore this and see how the new backfill code fares.
- 11:48 AM CephFS Bug #1775 (Need More Info): mds startup: _replay journaler got error -22, aborting, possible regr...
- Without logs, it's hard to say, but it looks like something caused the OSD to drop a write (or series of writes). No...
- 11:46 AM Bug #1617 (Won't Fix): pgs stuck down and peering with only one osd down and out
- the new code will have an explicit 'incomplete' state when peering fails, instead of being 'stuck'. let's ignore thi...
- 09:44 AM CephFS Bug #1047 (Need More Info): mds: crash on anchor table query
- Amon Ott just hit this one.
- 04:36 AM Revision 2f5bd5f7 (ceph): objecter: initialize global_op_flags to zero
- Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- 12:13 AM Revision 813523a6 (ceph): Doc: delete gratuitous index.html
- It was not an index, and seems to contain recommendations
for system configuration. I have renamed it to confusing.t... - 12:12 AM Revision 48165af5 (ceph): Doc: complete reversion of architecture.rst
- (abandon in progress improvements until everything works)
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com> - 12:12 AM Revision 3c7a82a6 (ceph): Doc: deleted gratuitious PlanningImplementation.html,
- which was a copy of PlanningImplementation.txt
(and not html at all).
restored previous index.rst, which was overwri... - 12:11 AM Revision fdf3f7bd (ceph): Doc: Restore the previous version of architecture.rst
- it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphin... - 12:07 AM Revision 4cfe0815 (ceph): doc: change state model from .svg to .png
- Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>
12/01/2011
- 10:41 PM Revision 1bbf9ae6 (ceph): fixed ubuntu version typo
- 10:20 PM Revision 6fbab6da (ceph): CrushWrapper: ignore forcefeed if it does not exist
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 08:38 PM Revision 363ebb6c (ceph): librbd: report an error if rbd header does not match
- This will fail on future incompatible versions of the header format.
Signed-off-by: Josh Durgin <josh.durgin@dreamho... - 07:15 PM Revision cce67171 (ceph): Merge branch 'wip_local_reads'
- 07:15 PM Revision d4aef202 (ceph): hadoop: apache license.
- We haven't made explicit that the Hadoop Java code is under the Apache
License. Do so (with permission from the other... - 05:40 PM Messengers Bug #1747 (Need More Info): msgr: osd connection originates from wrong port
- The blank address isn't a problem; it's due to the in_hbmsgr not being bound (deliberately). Unfortunately I've been ...
- 05:17 PM Revision 348c71c4 (ceph): mds: fix blocking in standby replay thread
- We need to hold mylock before waiting on the cond or else we get
./common/Cond.h: In function 'int Cond::Wait(Mutex&... - 05:17 PM Revision f6ee3699 (ceph): global: make daemon banner print explicit
- This eliminates some flags and avoids annoying cases where the banner is
printed but we don't want to see it.
Signed... - 04:19 PM Revision 5828009e (ceph): mds: fix usage text
- Filename is not optional.
Signed-off-by: Sage Weil <sage@newdream.net> - 01:16 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- There's certainly a difference with the snapshot parameter - it doesn't store anything in the rbd image unless you us...
- 12:09 PM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- Hi Josh,
at least my experience showed a different behaviour: no reliable snapshots and even crashes of qemu-syste... - 10:54 AM Bug #1778: Error after installing an iso-image via qemu / rbd-image
- You don't need any special qemu options to use snapshots - the snapshot option is confusingly named. The qemu 'snapsh...
- 09:30 AM Bug #1778 (Resolved): Error after installing an iso-image via qemu / rbd-image
- Hi *,
we are currently running:
ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) fro... - 12:10 PM CephFS Bug #1775: mds startup: _replay journaler got error -22, aborting, possible regresion?
- stick a
continue;
after the set_read_pos() call to avoid the second crash. - 08:36 AM CephFS Bug #1775: mds startup: _replay journaler got error -22, aborting, possible regresion?
- No I didn't have osd logging enabled, I'll provide you with journal in few minutes.
- 08:26 AM CephFS Bug #1775: mds startup: _replay journaler got error -22, aborting, possible regresion?
- Can you dump the mds journal so we can get a closer look at the corruption? Something like
ceph-mds -i foo --dum... - 12:24 AM CephFS Bug #1775 (Resolved): mds startup: _replay journaler got error -22, aborting, possible regresion?
- ubuntu natty, kernel 3.2-rc2, ceph 0.38 (stable from git) with patch from #1756 and workaround for #1757
setup
s1... - 10:13 AM rgw Bug #1779 (Resolved): rgw: swift auth returns wrong error code when unexisting user is given
- returns 404 instead of 403
- 09:12 AM rgw Bug #1777 (Resolved): rgw: user info modification is not atomic
- e.g., adding keys, etc.
I think it's more important to identify cases where operations left system in an inconsist... - 09:05 AM rgw Feature #1776 (Resolved): rgw: swift auth prefix should be configurable (and optional)
- 01:07 AM Revision 50c4b312 (ceph): Handle interactive-on-error also when error is from contextmanager exit.
- Closes: http://tracker.newdream.net/issues/1745
11/30/2011
- 07:21 PM CephFS Bug #1774 (Resolved): client: files become inaccessible in large directories (with snapshots?)
- Taking snapshots of certain directories within ceph that hold backups of root filesystems of my openmoko phone causes...
- 05:57 PM Revision 353ee000 (ceph): mds: adjust flock lock state on export
- Looks like this was missed when flocklock was added. Did a quick grep and
it doesn't look like it is missing anywher... - 05:49 PM Feature #1773 (Resolved): rbd: class interface for header interaction
- This will include:
* create(size, order, features)
* get_info(image)
* get_snapc
* snap_add
* later snap_add... - 05:43 PM Feature #1772 (Resolved): rbd: define new on-disk header format
- This should include several new things:
* CompatSet
* read-only flag
* parent_{pool, image_id, snap_id}
* list<... - 05:28 PM Bug #1771 (Resolved): rbd: delete snapshots when image is deleted
- Currently the snapshots are left around with no way to access them.
- 05:23 PM CephFS Bug #1770 (Can't reproduce): directory nonexistent on kernel_untar_build.sh
- ...
- 05:18 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
- the tasks were in nightly_coverage_2011-11-30-a
3433: collection:basic clusters:fixed-3.yaml tasks:kclient_workuni... - 05:13 PM CephFS Bug #1549: mds: zeroed root CDir* vtable in scatter_writebehind_finish
- Happened twice today:...
- 05:08 PM Feature #1745 (Closed): teuthology: make interactive-on-error stop further cleanup
- ...
- 05:06 PM Bug #1690 (Can't reproduce): osd re-created from scratch will crash on start-up
- 03:19 PM CephFS Bug #1753 (Won't Fix): ceph copy raw images from qemu incorrectly
- Unfortunately, right now making Ceph report sparse files correctly would be prohibitively expensive. It can be done, ...
- 02:57 PM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
- To create the sparse file qemu-img just calls ftruncate. It does nothing fs-specific, so this can be replicated with ...
- 11:10 AM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
- The file copy took 3 minutes. It is ok for 3Gb file but not for 100Kb file.
- 09:43 AM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
- I'm a little confused here. Ceph has never reported only the used space for a file; doing so is prohibitively expensi...
- 02:20 PM Messengers Bug #1747 (In Progress): msgr: osd connection originates from wrong port
- The problem here is somewhere on osd.2 — osd.1 is using the address that osd.2 is providing, and you can see that osd...
- 01:17 PM CephFS Bug #1756 (Resolved): mds crash right after successful recovery
- 11:28 AM Linux kernel client Bug #1769 (New): osd_client: susceptibility to low memory deadlocks
- We could be trying to flush the cache in order to free up memory, and find ourselves unable to allocate a ceph_osd or...
- 11:21 AM Linux kernel client Cleanup #1768 (Closed): osd_client: gratuitous ceph_monc_request_next_osdmap calls
- kick_requests() is called from within a loop that iterates through multiple OSD map updates ... which means that it m...
- 11:15 AM Linux kernel client Bug #1767 (Resolved): osd_client: send_request() cannot fail
- The static __send_request() routine is sure to succeed in queuing its request for the specified osd client, yet ceph_...
- 11:12 AM Linux kernel client Bug #1766 (New): mon_client: sends request before authentication
- The passed request is sent unconditionally, whether or not we have finished authenticating.
If we have not yet com... - 10:11 AM Bug #1765 (Resolved): osd: 'call' op can return data even if op is modifying
- Not sure if it'd actually return data, but in any case the api is ambiguous. If it does return data it breaks idempot...
- 10:07 AM Feature #1764 (Rejected): osd classes: add an optional source object
- This can be very useful. Source object should have the same locator as the target object. Similar to clone-range. An ...
- 10:03 AM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
- This didn't turn out to have anything to do with #1727, did it?
- 09:36 AM Linux kernel client Bug #1762: i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
- Argh, this is a real pain. igrab() requires i_lock, which we use extensively to protect complicated changes. In the...
- 09:19 AM Linux kernel client Bug #1762 (Resolved): i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
- Reported by Amon Ott on ML....
- 09:25 AM Bug #1763 (Resolved): qa: need to run qa tests on kernel with lockdep enabled
- We need to catch lock ordering regressions like #1762 in our nightly runs.
- 02:14 AM Revision 2443878b (ceph): Objecter: loop the right direction when searching for local replicas
- Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
- 12:35 AM Revision 1c696b65 (ceph): doc: Add peering state diagram
- Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
- 12:20 AM Revision 2918b501 (ceph): Move kclient multiple_rsync workunit to stress collection.
- Bug #1760 keeps being triggered by this.
11/29/2011
- 11:36 PM Revision 30ede648 (ceph): Makefile: ipaddr.h, pick_address.h
- Signed-off-by: Sage Weil <sage@newdream.net>
- 10:05 PM rbd Cleanup #1761: krbd: make block/segment naming consistent
- Segment refers to a partial range, a part of an object, so I think we should keep it in this context. So object shoul...
- 09:15 PM rbd Cleanup #1761 (Resolved): krbd: make block/segment naming consistent
- pick consistent term for an object (segment or object, but not block) and use throughout.
- 09:31 PM Revision 77a62fdc (ceph): Makefile: add missing uuid.h to tarball
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:30 PM Revision ebb585d9 (ceph): Objecter: fix local reads in recalc_op_target
- We want to use the actual OSD, not the index into the array!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> - 05:27 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
- Actually, maybe you run with the wip-truncate branch on the mds and see if you triggers a failed assertion on the MDS...
- 05:19 PM Bug #1759: mds/client: truncate size overflow, fails with EINVAL
- Do you by chance have the log preceeding the first crash?
Working around this is probably a matter of patching wit... - 11:28 AM Bug #1759 (Resolved): mds/client: truncate size overflow, fails with EINVAL
- My version of ceph is a minor variant of 0.38, running with ext4, and ceph-fuse. It looks like my fs has gotten corr...
- 05:07 PM CephFS Cleanup #814: hadoop: refactor hadoop shim in terms of java libceph bindings
- http://www.debian.org/doc/packaging-manuals/java-policy/x105.html
- 04:28 PM Revision 8788a404 (ceph): osd: subscribe to next map if flagged FULL
- This ensures the osd finds out when we become un-full in a timely manner.
Fixes: #1755
Signed-off-by: Sage Weil <sag... - 04:26 PM CephFS Bug #1760 (Resolved): multiple_rsync workunit cannot remove non-empty directory intermittently
- This has occurred in half of the regression runs since 11/24: ...
- 10:52 AM Bug #1757: oi disagrees with stat, or error code on stat
- As we talked at #ceph, I've updated kernel to 3.2-rc2 and patched osd with this workaround http://fpaste.org/PKwW/, n...
- 08:25 AM Bug #1757: oi disagrees with stat, or error code on stat
- The fix for #1612 is upstream kernel commit:ed3ee9f44ba55eb6acfbfc8caa881e0253710d2a. Does your kernel on the osds h...
- 01:52 AM Bug #1757 (Closed): oi disagrees with stat, or error code on stat
- I've similar bugs #1334, #1473 which should be solved by #1612, but it doesn't help.
Ubuntu natty, ceph 0.38 with ... - 09:05 AM Bug #1758 (Can't reproduce): OSD segfault in SimpleMessenger::send_message
- in the 11/29 nightlies, cfuse_workunit_misc (3335) the osd on sepia5 seg-faulted.
The end of the osd log is:
2011-1... - 08:59 AM Bug #1755 (Resolved): OSD: subscribe to map updates on FULL flag
- commit:8788a404ae4a10cd10ec8048f0b32d473640a607
- 08:25 AM Bug #1612: osd/PG.cc: 3839: FAILED assert(missing[oid].need <= v)
- upstream kernel commit:ed3ee9f44ba55eb6acfbfc8caa881e0253710d2a
- 05:39 AM Revision c2889fef (ceph): mds: encode truncate_pending in inode
- Otherwise we don't actually journal this value, and we get confused when
we replay a start_truncate and try to restar...
11/28/2011
- 10:11 PM CephFS Bug #1756: mds crash right after successful recovery
- This should let you restart your mds:...
- 09:28 AM CephFS Bug #1756 (Resolved): mds crash right after successful recovery
- Ubuntu Natty, ceph 0.38, kernel 2.6.38-12-server, 2x separate mds daemons crashed in the middle of the night
* sho... - 08:52 PM Revision 98e0a6fd (ceph): uclient: remove filer_flags and use Objecter::global_op_flags instead
- Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
- 08:52 PM Revision da2e0c3c (ceph): Objecter: add a new global_op_flags that is passed to every Op construc...
- We can use this for a global use of LOCALIZE_READS (and are about
to do so!).
Signed-off-by: Greg Farnum <gregory.fa... - 08:30 PM Revision 51385930 (ceph): Objecter: remove unused variable in op_submit
- These flags are probably relics from when the function got split;
they belong in send_op now.
Signed-off-by: Greg Fa... - 06:32 PM Revision 4974a9c2 (ceph): uclient: remove useless if-else based on snapid
- These are the same command anyway!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> - 05:01 PM Revision cef16732 (ceph): debian init: Do not stop or start daemons when installing or upgrading
- Signed-off-by: Wido den Hollander <wido@widodh.nl>
- 03:49 PM CephFS Bug #1753: ceph copy raw images from qemu incorrectly
- This is using the ceph filesystem, not rbd.
- 11:12 AM CephFS Bug #1749: nonexistent directory in kclient_workunit_kernel_untar_build
- This could have the same (unknown) root cause as #1741.
- 09:46 AM Feature #1736 (Resolved): collectd: hacky script to generate types.db from perfcounter schema
- 09:26 AM Bug #1755 (Resolved): OSD: subscribe to map updates on FULL flag
- When the OSDs get a full flag they stop most of their activity, which shuts down the usual map propagation methods. T...
- 09:14 AM Bug #1631: osd: failed assert(repop_queue.front() == repop)
- Ok, pretty sure this is related to the reconnect. We need to put together a test that artificially triggers messenge...
- 12:11 AM Revision ce657227 (ceph): mon: search for local ip during mkfs
- If an address isn't explicitly specified during mkfs, look for an unnamed
monitor in the (generated) monmap and see i... - 12:11 AM Revision 61b9db3a (ceph): pick_address: implement have_local_addr()
- Check for a local ip from within a list of addresses.
Signed-off-by: Sage Weil <sage@newdream.net> - 12:04 AM Revision 84b00597 (ceph): monclient: name nameless monitors noname-<foo>
- This makes them easy to pick out as unnamed.
Signed-off-by: Sage Weil <sage@newdream.net>
11/27/2011
- 10:50 PM Revision 7a453402 (ceph): pick_address: whitespace
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:44 PM Bug #1751: Copy in CEPH too slow
- rbd only. there no plan yet for reflink(2) in the ceph filesystem.
- 02:48 PM Bug #1751: Copy in CEPH too slow
- Is clone for rbd only or for files too.
Copy of files is slow too. - 02:45 PM Bug #1751 (Duplicate): Copy in CEPH too slow
- A 'clone' operation that does copy-on-write is coming in the next couple weeks. See #988
- 05:39 PM Feature #1754 (Resolved): qa: run other suites nightly as well
- stick suite name in mail subject?
run all suites nightly (not just regression)
- 04:32 PM CephFS Bug #1746 (Resolved): PerfCounters::set segfault
- 04:32 PM Bug #1727 (Resolved): osd: failed assert(pending_ops > 0) in dequeue_op
- 04:30 PM Feature #1647 (Resolved): mon: robust bootstrap
11/25/2011
- 02:08 PM CephFS Bug #1753 (Won't Fix): ceph copy raw images from qemu incorrectly
- Hi,
Ceph cannot correctly handle raw images from qemu incorrectly:
oneadmin@s2-8core:~/OpenNebula/var/images/tm...
11/24/2011
- 01:02 PM CephFS Bug #1752 (Can't reproduce): ceph-fuse isn't releasing caps without flushing data?
- Xiaofei Du reported on the mailing list that running an "ls" on a directory with multiple writers takes a while (much...
- 10:16 AM Bug #1751 (Duplicate): Copy in CEPH too slow
- Hi,
The copy operations for files and for rbd images are too slow. The ceph is a copy on write system I think the c...
11/23/2011
- 11:56 PM Revision 30def38d (ceph): corrected variable (con) to be consistent with prior examples (cluster)
- Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>
- 10:07 PM Revision 934e1e52 (ceph): ReplicatedPG: Also count overlaps for snapsets on snapdirs
- Previously, the overlaps for snapdirs would not be included in
cstat causing the computed total to be incorrect.
Sig... - 10:07 PM Revision 97d82ed9 (ceph): ReplicatedPG: Account for clone space usage in make_writeable
- Previously, we accounted for clone space usage inconsistently in
write_update_size_and_usage etc when walking through... - 05:09 PM Bug #1631: osd: failed assert(repop_queue.front() == repop)
- This happened again with the same workload in /var/lib/teuthworker/archive/nightly_coverage_2011-11-23-b/3034/remote/...
- 05:06 PM Bug #1530: osd crash during build_inc_scrub_map
- A new crash during scrub from /var/lib/teuthworker/archive/nightly_coverage_2011-11-23-b/3051/remote/ubuntu@sepia71.c...
- 05:02 PM Bug #1676 (Resolved): stats mismatch during snaps workunit
- 97d82ed950b26cfaef4267ee44edd9ad927fb828 and 934e1e52514b6036c91c1c7db1c8b6727ac8c6d8 should take care of the size di...
- 09:41 AM Bug #1676: stats mismatch during snaps workunit
- I do not know if this is likely to be related, but in the 11/23a nightlies, 3027 (rgw_s3tests)
1 Aborts found in 3... - 05:00 PM Bug #1750 (Closed): xattr errors silently ignored, cause trouble later
- Comment
I do not know if this is likely to be related, but in the 11/23a nightlies, 3027 (rgw_s3tests)
1 Aborts f... - 02:45 PM Revision 32a68378 (ceph): Merge branch 'wip-mon'
- 02:44 PM Revision ad13d0b7 (ceph): ceph: fix shutdown race
- Shut down MonClient before messenger, to avoid race with MonClient::tick()
and MonClient::shutdown().
Fixes
#0 __l... - 01:33 PM Bug #1744: teuthology: race with daemon shutdown?
- Josh saw similar, it seems the ctx.daemons data structure loses entries / they never get added / something. So far, r...
- 09:27 AM CephFS Bug #1749 (Can't reproduce): nonexistent directory in kclient_workunit_kernel_untar_build
- In the 11/23a nightlies, 3003, there may have been
a transient directory access error:
... lots of stuff works
2... - 09:11 AM CephFS Bug #1748 (Can't reproduce): mds segfault CDir::project_fnode
- In the 11/23a nighlies, 2995/remote/ubuntu@sepia75.ceph.dreamhost.com/log/mds.0.log.gz
2011-11-22 23:59:14.857453 ... - 07:16 AM Feature #1487 (Resolved): config: {cluster,public}_subnets
- 04:52 AM Revision 414caa7d (ceph): common/pick_address: Fix IP address stringification.
- Different sockaddr_* have the actual address (sin_addr, sin6_addr)
at different offsets, and sockaddr->sa_data just i... - 12:28 AM Revision 9870e2f7 (ceph): mon: pick_addresses before common_init_finish
- We can't modify g_conf->public_addr after that.
Signed-off-by: Sage Weil <sage@newdream.net> - 12:22 AM Revision 036ad4c7 (ceph): mon: set default port if not specified...
- ...when looking for self in monmap during mkfs.
Signed-off-by: Sage Weil <sage@newdream.net> - 12:04 AM Revision 0045c901 (ceph): monmap: assign rank by sorting addr, not name
- This allows monitors to bootstrap knowing peer addrs but not their names,
as when we specify mon_host.
Signed-off-by... - 12:04 AM Revision 36978a63 (ceph): mon: calculate rank by addr, not name
- Signed-off-by: Sage Weil <sage@newdream.net>
11/22/2011
- 11:06 PM Revision ebe5fc60 (ceph): obsync: tear out rgw
- 10:53 PM Revision 3a20b425 (ceph): mon: name self in monmap if --public-addr specified during mkfs
- Signed-off-by: Sage Weil <sage@newdream.net>
- 09:40 PM Messengers Bug #1747 (Resolved): msgr: osd connection originates from wrong port
- osd.2 sends a couple messages to osd.1:...
- 06:31 PM Revision a859763b (ceph): rgw: don't remove tail of lru if that's what we touch
- 06:09 PM Revision aeeeade6 (ceph): mon: mark down all connections when rank changes
- The election and some other stuff depend on msg->get_source().num() to get
the peer rank, and that is part of the con... - 06:08 PM Revision bed3c472 (ceph): mon: handle rank change in bootstrap
- The rank can change either because we probe and get a new monmap, or
because we get one via paxos. Move the checks t... - 05:53 PM Revision 8b464093 (ceph): mon: pick an address when joining and existing cluster
- If we are joining an existing cluster, we can pick whatever address we
want (e.g., one specified by public_addr or pu... - 05:52 PM Revision 5ba356b3 (ceph): mon: remove unused myaddr
- Signed-off-by: Sage Weil <sage@newdream.net>
- 05:52 PM Revision 0c9724d6 (ceph): mon: simplify suicide when removed from map
- Signed-off-by: Sage Weil <sage@newdream.net>
- 03:02 PM rgw Feature #1697 (Resolved): s3-tests: test bucket headers
- Fixed, added the following tests:
s3tests.functional.test_headers.test_bucket_put_bad_canned_acl
s3tests.function... - 10:33 AM rgw Bug #1719 (Resolved): rgw: crash in ObjectCache::touch_lru
- should be fixed by commit:a859763b1cba844d0d56b861a372e5f63f87c607.
- 05:58 AM Revision 24ee09b0 (ceph): Revert "more logs (yuck) for #1682"
- This reverts commit ea00114f08440563bce8e27ae2cd887bbc85aba5.
- 01:46 AM Revision eb8d91fe (ceph): PG: it's not necessary to call build_inc_scrub_map in build_scrub_map
- Because we have called osr.flush(), it's safe to tag map.valid_through
as last_update. We will still have to catch ... - 12:17 AM Revision 0f4b59a4 (ceph): Merge remote branch 'gh/subnet'
- 12:00 AM Revision c651c88e (ceph): Properly handle case where first error is inside a context manager __ex...
- Closes: http://tracker.newdream.net/issues/1743
- 12:00 AM Revision fab1e55e (ceph): Merge remote branch 'gh/wip-mon'
11/21/2011
- 10:27 PM Revision eec61b48 (ceph): common/ipaddr: Add utility function to parse ip/cidr style networks.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 10:27 PM Revision 0477f238 (ceph): common/pickaddr: Pick cluster_addr/public_addr based on *_network.
- 10:27 PM Revision c066e926 (ceph): mds, osd, synclient: Pick cluster_addr/public_addr based on *_network.
- Instead of specifying an IP address in ceph.conf like
[global]
cluster_addr = 10.1.2.3
you can now avoid the node... - 10:27 PM Revision 0f748d4c (ceph): common/ipaddr: Find a configured IP address in given subnet.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 10:07 PM CephFS Bug #1549 (In Progress): mds: zeroed root CDir* vtable in scatter_writebehind_finish
- 09:56 PM Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
- happened again on /var/lib/teuthworker/archive/nightly_coverage_2011-11-21-b/2818
This may be the same root cause ... - 09:37 PM Revision 2bae3506 (ceph): osd: Remove unused variable.
- 09:37 PM Revision 0f9a0605 (ceph): common/str_list: Make unused return value void.
- Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
- 09:37 PM Revision 97464bca (ceph): msg: Move public_addr use outside ->bind()
- 09:28 PM Revision 3c8fec2d (ceph): osd: fix 'stop' command
- Special case. We can't join the command_tp thread from itself.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> - 09:23 PM Revision b47347bd (ceph): osd: protect handle_osd_map requeueing with queue lock
- pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock. Messy. Also, ... - 07:15 PM Revision 70dfe8e9 (ceph): osd: lock pg when requeuing requests
- The op queue is shut down, so this is mostly safe, unless someone comes
through and does requeue_ops() from a callbac... - 06:33 PM Revision 811145f7 (ceph): paxosservice: tolerate _active() call when not active
- This can happen when multiple C_Active events are queued, and the first
does a propose_pending() (moving us into upda... - 05:19 PM Revision 88963a18 (ceph): objecter: simplify map request check
- We should request a missing/intervening map if it appears to exist.
Otherwise, skip it.
Signed-off-by: Sage Weil <sa... - 05:19 PM Revision cd2e523f (ceph): objecter: cancel tick event on shutdown
- Hopefully this is the root cause for
2011-11-20 23:57:41.555292 7f75dd743780 ceph version 0.38-205-g3b53b72
(commit:... - 05:01 PM rgw Bug #1719: rgw: crash in ObjectCache::touch_lru
- I think what happens here is that the entry that we touch happens to be the one that we dispose of (at the tail of th...
- 04:02 PM Bug #1743 (Closed): teuthology: not exiting with error when ceph-fuse shutdown fails
- commit c651c88eacf9c3bbf1f037be3a5dc0425308c730
Author: Tommi Virtanen <tv@eagain.net>
Date: 2011-11-21 16:00:19 ... - 03:42 PM Bug #1743: teuthology: not exiting with error when ceph-fuse shutdown fails
- This reproduced it nicely:
diff --git a/teuthology/task/internal.py b/teuthology/task/internal.py
index 58e7f14... - 03:57 PM Bug #1744: teuthology: race with daemon shutdown?
- Tommi Virtanen wrote:
> Was this using any one of the following?
>
> teuthology/task/lost_unfound.py
> teutholog... - 03:33 PM Bug #1744: teuthology: race with daemon shutdown?
- Was this using any one of the following?
teuthology/task/lost_unfound.py
teuthology/task/mon_recovery.py
teuthol... - 02:57 PM Bug #1741: teuthology: failed to untar
- The path mentioned above is incorrect. Run nightly_coverage_2011-11-18-2/2663 failed because of network failure.
T... - 02:52 PM Bug #1741: teuthology: failed to untar
- This is exactly what would happen if someone nuked the machine, or locking failed and someone else ran a faster test ...
- 01:29 PM Bug #1727: osd: failed assert(pending_ops > 0) in dequeue_op
- hopefully fixed by commit:b47347bd7c377037f7fbc199f0c88b447c9626d1
- 08:59 AM Bug #1727: osd: failed assert(pending_ops > 0) in dequeue_op
- Happened again in the 11/21 nightlies - 2791, sepia33
- 09:53 AM Bug #1742 (Rejected): qa: s3-tests failed 100-continue test on sepia
- This was due to an old entry in /etc/apt/sources.list - older versions of the apache packages were still used. The ch...
- 09:43 AM rbd Feature #1713: teuthology: qemu tasks, tests
- Sorry comment #2 was meant for another bug.
- 09:42 AM rbd Feature #1713: teuthology: qemu tasks, tests
- This is in the plans after the new sepia hardware is in place; current sepia re-install is too slow & painful to dare...
- 09:23 AM CephFS Bug #1746: PerfCounters::set segfault
- i think this is objecter event teardown. see commit:cd2e523fba1d6cf8d15e7a349ad700b744f24ecf
- 09:05 AM CephFS Bug #1746 (Resolved): PerfCounters::set segfault
- In the 11/21 nightlies, while trying to run workunit/ffsb,
2779/remote/ubuntu@sepia57.ceph.dreamhost.com/log/mon.2.l... - 08:57 AM Bug #1530: osd crash during build_inc_scrub_map
- Both of the above described variants occurred in the 11/21 nightlies
(2775:sepia17, 2783:sepia81, 2805:sepia82)
Also available in: Atom