General

Profile

Samuel Just's activity

From 01/26/2017 to 02/24/2017

02/24/2017

10:12 PM Ceph Bug #19076: osd/ReplicatedBackend.cc: 884: FAILED a ssert(j != bc->pulling.end())
Something like https://github.com/athanatos/ceph/tree/wip-17831-18583-18809-18927-19076 Samuel Just
07:31 PM Ceph Bug #19076: osd/ReplicatedBackend.cc: 884: FAILED a ssert(j != bc->pulling.end())
Samuel Just
10:12 PM RADOS Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
Something like: https://github.com/athanatos/ceph/tree/wip-19023 Samuel Just
10:10 PM Ceph Bug #18961 (Resolved): objecter continually resends ops which don't have a callback
Samuel Just
10:09 PM Ceph Bug #18937 (Resolved): cache/tiering flush bug with head delete
Samuel Just
10:09 PM Ceph Revision 44b26f6a (ceph): Merge pull request #13594 from athanatos/wip-snap-trim-sleep
osd: add snap trim reservation and re-implement osd_snap_trim_sleep
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Samuel Just
10:08 PM Ceph Revision 4f856fe9 (ceph): Merge pull request #13570 from athanatos/wip-18937
osd: don't use ORDERSNAP for flush; always request/send ondisk ack
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Rev...
Samuel Just
07:27 PM Ceph Revision 0c0feca3 (ceph): osd,osdc: eliminate FLAG_ONDISK and helpers
The objecter actually always needs to get a response in order to
be able to not continually resend ops (even if the c...
Samuel Just
07:26 PM Ceph Revision 48cc5d26 (ceph): PrimaryLogPG::start_flush: don't use ORDERSNAP, eliminate the second de...
I think that whole thing was a misguided attempt to avoid deleting head
if it exists in the base tier (in reality it ...
Samuel Just

02/22/2017

11:44 PM RADOS Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
Well, sort of. last_epoch_clean is really about when we can forget OSDMaps. Should we retain OSDMaps on the mon (an... Samuel Just
11:33 PM RADOS Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
2017-02-20 20:45:59.104093 7f75c93f8700 10 osd.3 pg_epoch: 284 pg[1.16( v 278'379 (0'0,278'379] local-les=277 n=1 ec=... Samuel Just
12:09 AM RADOS Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
2017-02-20 20:46:28.567065 7ffa3242c700 10 osd.4 pg_epoch: 255 pg[1.16( v 254'369 (0'0,254'369] local-les=164 n=3 ec=... Samuel Just
12:05 AM RADOS Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
2017-02-20 20:46:40.165108 7f9e2ffc3700 10 osd.0 pg_epoch: 300 pg[1.16( DNE empty local-les=0 n=0 ec=0 les/c/f 0/0/0 ... Samuel Just
12:03 AM RADOS Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
2017-02-20 20:46:41.743173 7f9e277b2700 10 osd.0 pg_epoch: 301 pg[1.16( empty local-les=0 n=0 ec=141 les/c/f 164/164/... Samuel Just
10:34 PM Ceph Feature #18052: Replace past_intervals with more compact structure
https://github.com/athanatos/ceph/tree/wip-past-intervals Samuel Just
10:34 PM Ceph Bug #17916 (Can't reproduce): osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)
Samuel Just
10:33 PM Ceph Bug #18961 (Fix Under Review): objecter continually resends ops which don't have a callback
https://github.com/ceph/ceph/pull/13570 Samuel Just
10:33 PM Ceph Bug #18927 (Fix Under Review): on_flushed: object ... obc still alive
https://github.com/ceph/ceph/pull/13569 Samuel Just
10:32 PM Ceph Bug #18937 (Fix Under Review): cache/tiering flush bug with head delete
https://github.com/ceph/ceph/pull/13570 Samuel Just

02/21/2017

11:39 PM RADOS Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
Notably, when it goes active at the end there, it's missing the 10 commits which happened during the [3,1] interval. Samuel Just
11:38 PM RADOS Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
At epoch 255, 1.16 is on [4,3] and is active+clean
2017-02-20 20:45:10.962790 7fd9b7cba700 10 osd.4 pg_epoch: 255 ...
Samuel Just
01:27 AM RADOS Bug #19023 (Resolved): ceph_test_rados invalid read caused apparently by lost intervals due to mo...
samuelj@teuthology:/a/samuelj-2017-02-20_18:45:04-rados-wip-18937---basic-smithi/839771/remote
If you look back in...
Samuel Just
05:24 AM Ceph Revision 2ed7759c (ceph): PrimaryLogPG: reimplement osd_snap_trim_sleep within the state machine
Rather than blocking the main op queue, just pause for that amount of
time between state machine cycles.
Also, add o...
Samuel Just
01:42 AM Ceph Bug #19024 (Can't reproduce): ec pool stuck incomplete, active+remapped -- crush mapping anomaly?
samuelj@teuthology:/a/samuelj-2017-02-20_18:45:04-rados-wip-18937---basic-smithi/839838
I killed the osd.5 process...
Samuel Just

02/17/2017

05:48 PM Ceph Revision 51eee55c (ceph): ReplicatedBackend: don't queue Context outside of ObjectStore with obc
We only flush the ObjectStore callbacks, not everything else. Thus,
there isn't a guarrantee that the obc held by pu...
Samuel Just

02/16/2017

09:29 PM Ceph Bug #18961 (Resolved): objecter continually resends ops which don't have a callback
This is triggered by the delete op sent during OSD flush. Samuel Just

02/15/2017

09:55 PM Ceph Bug #18929: "osd/PG.cc: 6896: FAILED assert(pg->is_acting(osd_with_shard) || pg->is_up(osd_with_s...
I don't understand why this is not popping up. Sage's patch is correct, but something else is going on. Why is the ... Samuel Just
09:54 PM Ceph Bug #18929: "osd/PG.cc: 6896: FAILED assert(pg->is_acting(osd_with_shard) || pg->is_up(osd_with_s...
samuelj@teuthology:/a/samuelj-2017-02-15_01:03:44-rados-wip-sam-testing---basic-smithi/816292 also Samuel Just
07:37 PM teuthology Bug #18946 (Rejected): apt-get dependency failures on rados run
Maybe already fixed? Samuel Just
07:36 PM teuthology Bug #18946 (Rejected): apt-get dependency failures on rados run
samuelj@teuthology:/a/samuelj-2017-02-15_01:03:44-rados-wip-sam-testing---basic-smithi$ for i in $(~/teuthology/virtu... Samuel Just
12:09 AM Ceph Bug #18937 (Resolved): cache/tiering flush bug with head delete
base: 77=[77,76,74,71,6f,6d,62,61]:[]+head
promoted at 77, then deleted in cache
cache: 7a=[7a,76,74,6f,6d,62,...
Samuel Just
12:08 AM Ceph Bug #18809 (Resolved): FAILED assert(object_contexts.empty()) (live on master only from Jan-Feb 2...
Samuel Just
12:07 AM Ceph Bug #18529 (Resolved): ERROR: test_rados.TestRados.test_ping_monitor
Samuel Just
12:07 AM Ceph Bug #18927: on_flushed: object ... obc still alive
Samuel Just

02/14/2017

08:00 PM Ceph Bug #17831 (Resolved): osd: ENOENT on clone
http://tracker.ceph.com/issues/18927 and http://tracker.ceph.com/issues/18809 were caused by this series, I don't thi... Samuel Just

02/13/2017

05:47 PM Ceph Revision c2eac34c (ceph): osd/: add PG_STATE_SNAPTRIM[_WAIT] to expose snap trim state to user
Signed-off-by: Samuel Just <sjust@redhat.com> Samuel Just
05:47 PM Ceph Revision 4aebf59d (ceph): rados: check that pool is done trimming before removing it
Signed-off-by: Samuel Just <sjust@redhat.com> Samuel Just
05:47 PM Ceph Revision 21cc515a (ceph): osd/PrimaryLogPG: limit the number of concurrently trimming pgs
This patch introduces an AsyncReserver for snap trimming to limit the
number of pgs on any single OSD which can be tr...
Samuel Just

02/10/2017

07:20 PM Ceph Revision 6da3f9a5 (ceph): Merge pull request #13344 from gregsfortytwo/wip-osd-discussion-docs
Wip osd discussion docs
Reviewed-by: Samuel Just <sjust@redhat.com>
Samuel Just
07:18 PM Ceph Revision 534ae8fe (ceph): Merge pull request #13342 from athanatos/wip-17831-18583-18809
osd/: don't leak context for Blessed*Context or RecoveryQueueAsync
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewe...
Samuel Just

02/09/2017

06:51 PM Ceph Bug #18533: two instances of omap_digest mismatch
wip-18533 is now cleaned up and has two specific unit tests and a fuzzer which reproduce invalid iterator results. Samuel Just
01:50 AM Ceph Bug #18533: two instances of omap_digest mismatch
Nevermind, the bug can produce a more general set of errors than I had realized. See the more recent updates to the ... Samuel Just
12:28 AM Ceph Bug #18533: two instances of omap_digest mismatch
If the entries David added a few days ago are the right ones, then the above bug doesn't explain what's happening in ... Samuel Just
12:13 AM Ceph Bug #18533: two instances of omap_digest mismatch
David: Can you add the list of keys which are present on that node but shouldn't be? Samuel Just

02/08/2017

11:33 PM Ceph Bug #18533: two instances of omap_digest mismatch
I'm pretty comfortable pinning the cluster trouble on that one, assuming the extra keys and the overlapping complete ... Samuel Just
11:30 PM Ceph Bug #18533: two instances of omap_digest mismatch
wip-18533 above now has a unit test which causes the iterator to return a deleted value. Samuel Just
08:28 PM Ceph Bug #18533: two instances of omap_digest mismatch
debuggging: https://github.com/athanatos/ceph/tree/wip-18533 Samuel Just
08:28 PM Ceph Bug #18533: two instances of omap_digest mismatch
<davidzlap> sjust: 100011cf577.00000000
<davidzlap> sjust: I meant http://pastebin.com/19W78B6U
<sjust> davidzlap: ...
Samuel Just

02/07/2017

12:31 AM Ceph Revision a00efd8d (ceph): Merge pull request #13280 from athanatos/wip-revert-jewel-18581
Revert "Merge pull request #12978 from asheplyakov/jewel-18581"
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Samuel Just
12:28 AM Ceph Revision 0cf7a613 (ceph): Revert "Merge pull request #12978 from asheplyakov/jewel-18581"
See: http://tracker.ceph.com/issues/18809
This reverts commit 8e69580c97622abfcbda73f92d9b6b6780be031f, reversing
ch...
Samuel Just

02/06/2017

06:30 PM Ceph Backport #18724 (New): jewel: osd: calc_clone_subsets misuses try_read_lock vs missing
I have reverted this backport, it needs to be backported with http://tracker.ceph.com/issues/18809 as well. Samuel Just
06:29 PM Ceph Bug #18583: osd: calc_clone_subsets misuses try_read_lock vs missing
This needs to be backported with http://tracker.ceph.com/issues/18809 (not in master yet, wait on that) Samuel Just

02/03/2017

09:19 PM Ceph Backport #18610: kraken: osd: ENOENT on clone
See http://tracker.ceph.com/issues/18809 as well (will want to backport the branch there, it has the commits from the... Samuel Just
09:12 PM Ceph Revision 91b74235 (ceph): osd/: don't leak context for Blessed*Context or RecoveryQueueAsync
This has always been a bug, but until
68defc2b0561414711d4dd0a76bc5d0f46f8a3f8, nothing deleted those contexts
withou...
Samuel Just
09:11 PM Ceph Bug #18809 (Resolved): FAILED assert(object_contexts.empty()) (live on master only from Jan-Feb 2...
bless_context and bless_gencontext don't behave properly if the returned Context is deleted without calling complete(... Samuel Just

02/02/2017

12:45 AM Ceph Bug #18533: two instances of omap_digest mismatch
I have copied the omap dirs for osds 72 (mira019:~samuelj/omap-osd-72), 7 (mira049:~samuelj/omap-osd-7), and 60 (mira... Samuel Just
12:03 AM Ceph Bug #18533: two instances of omap_digest mismatch
ubuntu@mira049:~$ ( for i in {7..1}; do sudo zcat /var/log/ceph/ceph.log.$i.gz; done; sudo cat /var/log/ceph/ceph.log... Samuel Just
12:01 AM Ceph Bug #18533: two instances of omap_digest mismatch
I suggest grabbing a copy of the leveldb instances from primary and a replica and examining the actual keys in the st... Samuel Just
12:00 AM Ceph Bug #18533: two instances of omap_digest mismatch
samuelj@mira049:~$ ( for i in {7..1}; do sudo zcat /var/log/ceph/ceph.log.$i.gz; done; sudo cat /var/log/ceph/ceph.lo... Samuel Just

02/01/2017

11:56 PM Ceph Bug #18533: two instances of omap_digest mismatch
Whatever happened, happened in the last few days.
samuelj@mira049:~$ ( for i in {7..1}; do sudo zcat /var/log/ceph...
Samuel Just

01/29/2017

04:59 AM Ceph Revision 509de4d9 (ceph): PrimaryLogPG::try_lock_for_read: give up if missing
The only users calc_*_subsets might try to read_lock an object which is
missing on the primary. Returning false in t...
Samuel Just
04:59 AM Ceph Revision cedaecf8 (ceph): ReplicatedBackend: take read locks for clone sources during recovery
Otherwise, we run the risk of a clone source which hasn't actually
come into existence yet being used if we grab a cl...
Samuel Just

01/26/2017

07:46 PM Ceph Revision 43e677dd (ceph): test/pybind/test_rados.py: tolerate empty output from mon ping
Fixes: http://tracker.ceph.com/issues/18529
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just
 

Also available in: Atom