Activity
From 09/15/2018 to 10/14/2018
10/14/2018
- 01:05 PM Bug #36300 (Resolved): Clients receive "wrong fsid" error when CephX is disabled
- 01:04 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /a/sage-2018-10-13_00:36:33-rados-wip-sage-testing-2018-10-12-1741-distro-basic-smithi/3133276
- 01:10 AM Bug #35847 (Fix Under Review): wrong cluster_network doesn't cause any errors and ends up using m...
10/13/2018
- 02:19 AM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- Note that there is a common PR to be backported for this issue and https://tracker.ceph.com/issues/21931
- 02:17 AM Bug #22330 (Pending Backport): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 02:16 AM Bug #21931 (Pending Backport): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <...
10/12/2018
- 09:26 PM Bug #36186 (Resolved): failed to become clean before timeout expired - pg stuck in clean+premerge...
- this run predates fcb1679eab4240c046ba922060c20423fb35ce43, which fixed the problem!
- 09:14 PM Feature #24176 (Resolved): osd: add command to drop OSD cache
- 09:13 PM Bug #36358 (Pending Backport): Interactive mode CLI prints no output since Mimic
- 09:12 PM Backport #36419 (Resolved): luminous: osd: get loadavg per cpu for scrub load threshold check
- https://github.com/ceph/ceph/pull/24593
- 08:43 PM Bug #36418 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
- ...
- 08:39 PM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
- /a/sage-2018-10-12_13:16:07-rados-wip-sage-testing-2018-10-11-1437-distro-basic-smithi/3131789
- 08:25 PM Bug #22330 (Fix Under Review): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- https://github.com/ceph/ceph/pull/24564
- 08:25 PM Bug #21931 (Fix Under Review): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <...
- https://github.com/ceph/ceph/pull/24564
- 05:39 PM Bug #36417 (Pending Backport): osd: get loadavg per cpu for scrub load threshold check
- 04:53 PM Bug #36417 (Resolved): osd: get loadavg per cpu for scrub load threshold check
- https://github.com/ceph/ceph/pull/17718
- 03:52 PM Bug #36406 (Fix Under Review): Cache-tier forward mode hang in luminous (again)
- 09:32 AM Bug #36406: Cache-tier forward mode hang in luminous (again)
- Patch https://github.com/ceph/ceph/pull/24548
- 11:16 AM Bug #36345: librados C API aio read empty buffer
- i tested both v13.2.2 and v12.2.8, with the provided source files. and still no luck: i am not able to reproduce this...
- 02:59 AM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
- @David Zafman Do you have time to take a look?
- 02:57 AM Bug #36412 (Closed): ceph-objectstore-tool import after pg splits which will lost objects
- Hi, i have a test cluster, doing the follow steps, the pool is erasure k:m=3:1
step 1: export pg 2.f from osd.2, ori... - 02:15 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- @Nathan, Understood, will open a new issue.
- 02:12 AM Bug #36250 (Need More Info): ceph-osd process crashing
- 02:11 AM Bug #24835: osd daemon spontaneous segfault
- Thanks Soenke,
These should help to isolate the problem.
10/11/2018
- 10:06 PM Bug #36411 (Closed): OSD crash starting recovery/backfill with EC pool
- We have one pg on a 4+2 EC pool in which the OSDs will crash with the following error, on reaching an active set of m...
- 07:25 PM Bug #36177 (Pending Backport): rados rm --force-full is blocked when cluster is in full status
- 07:19 PM Bug #23879: test_mon_osdmap_prune.sh fails
- /a/sage-2018-10-10_15:50:53-rados-wip-sage-testing-2018-10-10-0850-distro-basic-smithi/3125020
- 06:53 PM Bug #36306 (Pending Backport): monstore tool rebuild does not generate creating_pgs
- https://github.com/ceph/ceph/pull/24506
- 06:36 PM Bug #36408 (Resolved): [cache tier] failed guarded write + promotion results in "success" op result
- Simple reproducer: ...
- 06:25 PM Bug #35845 (Pending Backport): osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- 06:09 PM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
- 05:08 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
- Iain Bucław wrote:
> Similar to https://tracker.ceph.com/issues/23296
>
Looking at the fix for the other issue.... - 04:46 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
- Iain Bucław wrote:
> In the logs, it looks like the client/server enters an infinite loop.
>
> [...]
These are... - 04:28 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
- In the logs, it looks like the client/server enters an infinite loop....
- 04:19 PM Bug #36406 (Resolved): Cache-tier forward mode hang in luminous (again)
- Similar to https://tracker.ceph.com/issues/23296
Commands ran to reproduce (in vstart.sh)... - 03:14 PM Bug #36405 (Resolved): unittest_seastar_messenger failure on ARM
- We often ignore these failures, but when I looked at the log I realised it's actually a recently added test that's fa...
- 01:48 PM Bug #24835: osd daemon spontaneous segfault
- Coredump: 258b1ec0-ebc6-43df-b35e-f16a780148b5...
- 01:44 PM Bug #24835: osd daemon spontaneous segfault
- Coredump: bf9b2d5c-96f5-4d30-b852-3888dda66a6b...
- 01:33 PM Bug #24835: osd daemon spontaneous segfault
- We do have some more core dumps with different stack traces.
Coredump: ebb8eff9-b0d6-4321-b85b-d31be87ed7c2
<pr... - 02:53 AM Bug #24835 (New): osd daemon spontaneous segfault
- Looking into this. Will update when I have analysed these coredumps.
In the meantime, if you get any that have a d... - 11:10 AM Bug #24956 (Fix Under Review): osd: parent process need to restart log service after fork, or cep...
- 09:17 AM Bug #35969 (Pending Backport): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on cent...
- 09:16 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- @Brad: The backporting process for the original fix is already well-along. If a follow-up fix is required, could you ...
- 03:00 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- Not resolved as per https://github.com/ceph/ceph/pull/24260#issuecomment-427144712. Looking into this further.
- 07:17 AM Bug #36345: librados C API aio read empty buffer
- I am personally not running into the issue, but the reporter is. The reporter contacted me to forward the fix which s...
- 06:40 AM Bug #36345 (Fix Under Review): librados C API aio read empty buffer
- PR posted by Wido: https://github.com/ceph/ceph/pull/24534
- 06:19 AM Bug #36345: librados C API aio read empty buffer
- Wido, i am not able to reproduce this issue on master:...
10/10/2018
- 11:51 PM Backport #36393 (Resolved): luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- https://github.com/ceph/ceph/pull/24532
- 11:25 PM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
- https://github.com/ceph/ceph/pull/24535
- 09:53 PM Backport #36321: luminous: Add support for osd_delete_sleep configuration value
- Thank you, David.
I hope you will do new patches to mimic and master as this is very specific to luminous. - 08:53 PM Backport #36321: luminous: Add support for osd_delete_sleep configuration value
- https://github.com/ceph/ceph/pull/24501
- 09:33 PM Support #36326: Huge traffic spike and assert(is_primary())
- Given what you've showed here it's unlikely that the network issue was caused by this — more likely the other way aro...
- 09:28 PM Bug #36345: librados C API aio read empty buffer
- Yeah can you make a PR, Wido? Somebody will need to know or run through how the IoCtx works with these data members a...
- 02:14 PM Bug #36345: librados C API aio read empty buffer
- I was notified about this issue and a simple fix would be:...
- 09:24 PM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
- 12.2.2 is pretty out-of-date for Luminous and you appear to be running a custom build, so I'm not sure my line number...
- 01:07 AM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
- Maybe the same as this issues: http://tracker.ceph.com/issues/12941
- 01:02 AM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
- *By using the tool (ceph-monstore-tool) to start the abnormal mon directory, I can get the osdmap and monmap informat...
- 08:47 PM Bug #26890 (Resolved): scrub livelock
- 08:47 PM Backport #26932 (Resolved): luminous: scrub livelock
- 06:53 PM Backport #26932: luminous: scrub livelock
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24396
merged - 08:22 PM Bug #35076 (Resolved): mon: mgr options not parse propertly
- 08:21 PM Backport #35836 (Resolved): mimic: mon: mgr options not parse propertly
- 02:06 AM Backport #35836: mimic: mon: mgr options not parse propertly
- merged
- 07:46 PM Bug #36388 (Resolved): osd: "out of order op"
- ...
- 03:20 PM Bug #36378 (New): upmap to same osd twice possible, crashes calc_pg_upmaps
- Somehow v12.2.8 let me define an upmap like this:
pg_upmap_items 75.1ef [643,1100,625,907,647,1100]
and this... - 02:16 PM Bug #36358 (Fix Under Review): Interactive mode CLI prints no output since Mimic
- https://github.com/ceph/ceph/pull/24521
- 08:48 AM Support #36341 (Resolved): build: compile with dpdk failed in master branch
- 04:02 AM Bug #36250: ceph-osd process crashing
- Also...
In your original post you showed a message from the log showing an exception "buffer::malformed_input: ent... - 01:55 AM Bug #36250: ceph-osd process crashing
- Hello Josh,
Sorry it took me a while to see this.
Could you attach the output of "ceph report" please?
10/09/2018
- 11:07 PM Bug #36306 (Fix Under Review): monstore tool rebuild does not generate creating_pgs
- https://github.com/ceph/ceph/pull/24506
- 09:01 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
- Saw this again in a luminous QA run:...
- 01:34 PM Bug #36358 (Resolved): Interactive mode CLI prints no output since Mimic
- The polling command stuff (for iostat) changed the path for printing output, and now you just don't get anything when...
- 09:02 AM Support #36341 (In Progress): build: compile with dpdk failed in master branch
- https://github.com/ceph/ceph/pull/24487 should resolve it.
- 02:48 AM Support #36351 (New): mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
- I have a CEPH cluster which contains 3 mons, due to abnormal power failure, one mon service starts abnormally. The ex...
- 01:03 AM Bug #36347 (Resolved): Upgrade test in jewel fails with "Unable to locate package python3-rados"
10/08/2018
- 11:54 PM Backport #36149 (In Progress): luminous: output format is invalid of the crush tree json dumper
- https://github.com/ceph/ceph/pull/24482
- 11:51 PM Backport #36150 (In Progress): mimic: output format is invalid of the crush tree json dumper
- https://github.com/ceph/ceph/pull/24481
- 10:56 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- Another set:...
- 10:02 PM Bug #36347 (Fix Under Review): Upgrade test in jewel fails with "Unable to locate package python3...
- https://github.com/ceph/ceph/pull/24479
- 05:43 PM Bug #36347 (Resolved): Upgrade test in jewel fails with "Unable to locate package python3-rados"
- ...
- 04:11 PM Bug #36345: librados C API aio read empty buffer
- the 'same' in c++ seems to work, so i guess it's limited to the c api
- 02:56 PM Bug #36345 (Resolved): librados C API aio read empty buffer
- When using the AIO functions, the readbuffer remains empty. when using the normal rados_read, the buffer is filled wi...
- 02:27 PM Bug #24835: osd daemon spontaneous segfault
- Hi Brad,
thanks for investigating this issue and sorry for my late response, I was on holidays.
The file IDs as... - 10:16 AM Bug #36239 (Resolved): osd/PrimaryLogPG: fix potential pg-log overtrimming
- 10:16 AM Backport #36275 (Resolved): mimic: osd/PrimaryLogPG: fix potential pg-log overtrimming
- 10:15 AM Bug #35924 (Resolved): choose_acting picked want > pool size
- 10:15 AM Backport #35963 (Resolved): mimic: choose_acting picked want > pool size
- 10:15 AM Bug #35546 (Resolved): RADOS: probably missing clone location for async_recovery_targets
- 10:07 AM Backport #35964 (Resolved): mimic: RADOS: probably missing clone location for async_recovery_targets
- 09:53 AM Backport #26840 (Resolved): luminous: librados application's symbol could conflict with the libce...
- 07:58 AM Bug #23387: Building Ceph on armhf fails due to out-of-memory
- I found a way (it is not directly a solution) to this problem, but using Clang/LLVM instead of the GCC toolchain, I m...
- 04:08 AM Support #36341 (Resolved): build: compile with dpdk failed in master branch
- sh do_cmake.sh -DWITH_DPDK=ON -DWITH_TESTS=OFF
make -j 12
[ 51%] Building CXX object src/os/CMakeFiles/os.dir/f...
10/07/2018
- 11:54 PM Bug #36337: OSDs crash with failed assertion in PGLog::merge_log as logs do not overlap
- Because I had to delete many broken PGs (46 to date), I've created a tool that finds the broken PGs and their OSDs an...
- 01:32 PM Bug #36337 (New): OSDs crash with failed assertion in PGLog::merge_log as logs do not overlap
- Hello!
Unfortunately, our single-node-"Cluster" with 11 OSDs is broken because some OSDs crash when they start pee...
10/06/2018
- 04:13 PM Backport #36275: mimic: osd/PrimaryLogPG: fix potential pg-log overtrimming
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24309
merged - 04:13 PM Backport #35963: mimic: choose_acting picked want > pool size
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24344
merged - 04:12 PM Backport #35964: mimic: RADOS: probably missing clone location for async_recovery_targets
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24345
merged
10/05/2018
- 08:12 PM Backport #26840: luminous: librados application's symbol could conflict with the libceph-common
- Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/23483
merged - 02:35 PM Bug #35845: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- ...
- 10:12 AM Support #36326 (New): Huge traffic spike and assert(is_primary())
- Hello,
We use ceph version 12.2.8 now. It was upgraded from jewel.
We faced with osd assert after wiered networ... - 02:35 AM Bug #24406 (Resolved): read object attrs failed at EC recovery
- 02:34 AM Backport #24478 (Resolved): luminous: read object attrs failed at EC recovery
- 02:34 AM Bug #24687 (Resolved): Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- 02:33 AM Backport #25145 (Resolved): luminous: Automatically set expected_num_objects for new pools with >...
- 02:31 AM Bug #23769 (Resolved): osd/EC: slow/hung ops in multimds suite test
- 02:30 AM Backport #23998 (Resolved): luminous: osd/EC: slow/hung ops in multimds suite test
10/04/2018
- 09:49 PM Backport #24478: luminous: read object attrs failed at EC recovery
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24327
merged - 09:48 PM Backport #25145: luminous: Automatically set expected_num_objects for new pools with >=100 PGs pe...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24395
merged - 09:42 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- /a/yuriw-2018-10-03_22:57:24-rados-wip-yuri4-testing-2018-10-03-2104-luminous-distro-basic-smithi/3099072/
- 09:42 PM Backport #36321 (Resolved): luminous: Add support for osd_delete_sleep configuration value
- https://github.com/ceph/ceph/pull/24501
- 09:40 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
- /a/yuriw-2018-10-03_22:57:24-rados-wip-yuri4-testing-2018-10-03-2104-luminous-distro-basic-smithi/3099098/
- 09:16 PM Backport #23998: luminous: osd/EC: slow/hung ops in multimds suite test
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24393
merged
10/03/2018
- 11:25 PM Feature #36310 (New): Add norecover and nobackfill flags for per pool as we have for global cluster
- Add norecover and nobackfill flags for per pool as we have for global cluster
This was done for noscrub and nodeep-s... - 09:35 PM Backport #24806 (Resolved): luminous: rgw workload makes osd memory explode
- 09:34 PM Bug #23871 (Resolved): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then fu...
- 09:34 PM Backport #24908 (Resolved): luminous: luminous->mimic: missing primary copy of xxx, wil try copie...
- 09:32 PM Bug #24588 (Resolved): osd: may get empty info at recovery
- 09:32 PM Backport #24772 (Resolved): luminous: osd: may get empty info at recovery
- 09:32 PM Bug #24486 (Resolved): osd: segv in Session::have_backoff
- 09:32 PM Backport #24495 (Resolved): luminous: osd: segv in Session::have_backoff
- 09:31 PM Bug #24371 (Resolved): Ceph-osd crash when activate SPDK
- 09:31 PM Backport #24471 (Resolved): luminous: Ceph-osd crash when activate SPDK
- 09:20 PM Bug #23916 (Resolved): LibRadosAio.PoolQuotaPP failed
- 09:20 PM Backport #23924 (Resolved): luminous: LibRadosAio.PoolQuotaPP failed
- 09:19 PM Bug #23713 (Resolved): High MON cpu usage when cluster is changing
- 09:19 PM Backport #23912 (Resolved): luminous: mon: High MON cpu usage when cluster is changing
- 09:18 PM Bug #36285: qa/workunits/cephtool/test.sh test fails setting pg_num to 97
- Did all the OSDs actually come on? Note the last line there: specified pg_num 97 is too large (creating 87 new PGs on...
- 09:18 PM Bug #22095 (Resolved): ceph status shows wrong number of objects
- 09:18 PM Backport #23772 (Resolved): luminous: ceph status shows wrong number of objects
- 09:16 PM Bug #23940 (Resolved): recursive lock of objecter session::lock on cancel
- 09:16 PM Backport #23986 (Resolved): luminous: recursive lock of objecter session::lock on cancel
- 09:09 PM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
- I'll take a look.
- 02:10 PM Bug #36300 (Resolved): Clients receive "wrong fsid" error when CephX is disabled
- Related to the changes introduced here [1]. The following reproducer shows the issue hit by a client application:
... - 08:52 PM Feature #22086 (Resolved): ceph-objectstore-tool: Add option "dump-import" to examine an export
- 08:52 PM Backport #22390 (Rejected): jewel: ceph-objectstore-tool: Add option "dump-import" to examine an ...
- Jewel is EOL
- 07:47 PM Bug #36306 (Resolved): monstore tool rebuild does not generate creating_pgs
- The rebuild function does not populate creating_pgs's created_pools. this leads to every (existing) pg being (re)crea...
- 05:25 PM Bug #36305 (New): test_mon_ping fails with [errno 2] error calling ping_monitor
- ...
- 05:08 PM Bug #36304 (Need More Info): FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wa...
- ...
- 02:37 PM Backport #26932 (In Progress): luminous: scrub livelock
- 01:00 PM Backport #26932 (Need More Info): luminous: scrub livelock
- 12:19 PM Backport #26932 (In Progress): luminous: scrub livelock
- 12:17 PM Backport #25145 (In Progress): luminous: Automatically set expected_num_objects for new pools wit...
- 12:10 PM Backport #23998 (In Progress): luminous: osd/EC: slow/hung ops in multimds suite test
- 08:24 AM Backport #23926 (Need More Info): luminous: disable bluestore cache caused a rocksdb error
- An attempt at this backport is here: https://github.com/ceph/ceph/pull/24325
@smithfarm i think you need to backpo... - 08:10 AM Backport #36298 (Resolved): mimic: ceph pg ls creating: EINVAL
- https://github.com/ceph/ceph/pull/24601
- 08:09 AM Backport #36297 (Resolved): luminous: ceph pg ls creating: EINVAL
- https://github.com/ceph/ceph/pull/24602
- 08:09 AM Backport #36296 (Resolved): mimic: [objecter] client socket failure leads to hung connection
- https://github.com/ceph/ceph/pull/24600
- 08:09 AM Backport #36295 (Resolved): luminous: [objecter] client socket failure leads to hung connection
- https://github.com/ceph/ceph/pull/24574
10/02/2018
- 10:46 PM Bug #36183 (Pending Backport): [objecter] client socket failure leads to hung connection
- 10:45 PM Bug #36174 (Pending Backport): ceph pg ls creating: EINVAL
- 10:43 PM Bug #36174: ceph pg ls creating: EINVAL
- Dan van der Ster wrote:
> https://github.com/ceph/ceph/pull/24262
merged - 07:32 PM Backport #36292 (Resolved): mimic: pg dout log had backfill=[] and bft= which are the same thing
- https://github.com/ceph/ceph/pull/24573
- 02:55 PM Bug #20439: PG never finishes getting created
- Seen again:
/a/dzafman-2018-09-26_22:31:44-rados-wip-zafman-testing-distro-basic-smithi/3074605 - 02:37 PM Bug #36289 (New): Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- This is a continuation of [1] this thread from the ML. The only difference we've found is that I'm using CentOS and t...
- 01:28 AM Bug #36260 (Resolved): qa/workunits/mon/test_mon_config_key.py fails
10/01/2018
- 10:51 PM Bug #36170 (Pending Backport): pg dout log had backfill=[] and bft= which are the same thing
- 10:39 PM Bug #36285 (New): qa/workunits/cephtool/test.sh test fails setting pg_num to 97
/a/dzafman-2018-09-26_22:31:44-rados-wip-zafman-testing-distro-basic-smithi/3074562
The portion of test shown ...- 09:08 PM Bug #36177 (Fix Under Review): rados rm --force-full is blocked when cluster is in full status
- 08:07 PM Backport #35962 (Resolved): luminous: choose_acting picked want > pool size
- 02:43 PM Backport #35962: luminous: choose_acting picked want > pool size
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24299
merged - 08:06 PM Backport #36274 (Resolved): luminous: osd/PrimaryLogPG: fix potential pg-log overtrimming
- 08:04 PM Backport #36274 (Resolved): luminous: osd/PrimaryLogPG: fix potential pg-log overtrimming
- https://github.com/ceph/ceph/pull/24308
- 08:06 PM Backport #36275 (In Progress): mimic: osd/PrimaryLogPG: fix potential pg-log overtrimming
- 08:04 PM Backport #36275 (Resolved): mimic: osd/PrimaryLogPG: fix potential pg-log overtrimming
- https://github.com/ceph/ceph/pull/24309
- 05:36 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- Another with full logs (no cores):
/ceph/teuthology-archive/pdonnell-2018-10-01_03:14:44-fs-wip-pdonnell-testing-2... - 04:36 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- Latest instance with logs/cores: /ceph/teuthology-archive/pdonnell-2018-10-01_03:19:12-multimds-wip-pdonnell-testing-...
- 04:35 PM Bug #36271 (Duplicate): src/common/interval_map.h: 161: FAILED ceph_assert(len > 0)
- 04:13 PM Bug #36271 (Duplicate): src/common/interval_map.h: 161: FAILED ceph_assert(len > 0)
- ...
- 04:08 PM Bug #36270 (New): The "Many more objects per PG than average" warning does not work well when obj...
- See https://bugzilla.redhat.com/show_bug.cgi?id=1633221
It's rare, but some clusters have pools with very differen... - 02:42 PM Bug #36239: osd/PrimaryLogPG: fix potential pg-log overtrimming
- merged https://github.com/ceph/ceph/pull/24308
- 02:36 AM Backport #35964 (In Progress): mimic: RADOS: probably missing clone location for async_recovery_t...
- https://github.com/ceph/ceph/pull/24345
- 02:29 AM Backport #35963 (In Progress): mimic: choose_acting picked want > pool size
- https://github.com/ceph/ceph/pull/24344
09/30/2018
- 08:01 AM Bug #36260 (Fix Under Review): qa/workunits/mon/test_mon_config_key.py fails
- https://github.com/ceph/ceph/pull/24340
09/29/2018
09/28/2018
- 04:23 PM Bug #36250 (Can't reproduce): ceph-osd process crashing
- ceph-osd process crashes in thread msgr-worker. This happens with all OSDs in the cluster, roughly once per day at th...
- 08:29 AM Backport #24478 (In Progress): luminous: read object attrs failed at EC recovery
- 08:16 AM Backport #23926 (In Progress): luminous: disable bluestore cache caused a rocksdb error
- 06:57 AM Backport #35844 (Resolved): luminous: objecter cannot resend split-dropped op when racing with co...
- 03:58 AM Backport #36131 (Resolved): luminous: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" ...
09/27/2018
- 09:07 PM Backport #35844: luminous: objecter cannot resend split-dropped op when racing with con reset
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24188
merged - 09:04 PM Backport #36131: luminous: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24259
merged - 02:48 PM Bug #36239 (Resolved): osd/PrimaryLogPG: fix potential pg-log overtrimming
- https://github.com/ceph/ceph/pull/23317
- 02:31 PM Bug #35813 (Resolved): should remove mentioning of "scrubq" in ceph(8) manpage
- 02:31 PM Backport #35855 (Resolved): mimic: should remove mentioning of "scrubq" in ceph(8) manpage
- 02:29 PM Backport #35854 (Resolved): luminous: should remove mentioning of "scrubq" in ceph(8) manpage
- 10:46 AM Bug #35974: Apparent export-diff/import-diff corruption
- Cliff Pajaro wrote:
> [...]
> You mention cache tiering but that is not something we use.
Sorry, I mixed this on... - 05:57 AM Bug #35974: Apparent export-diff/import-diff corruption
- Cliff Pajaro wrote:
> the issue is not seen with krbd (rbd map)
rbd_balance_snap_reads config option is only fo... - 06:49 AM Backport #35962 (In Progress): luminous: choose_acting picked want > pool size
- https://github.com/ceph/ceph/pull/24299
- 02:50 AM Bug #36105: OSD hangs during shutdown
- https://github.com/ceph/ceph/pull/24296
- 12:57 AM Bug #35847 (In Progress): wrong cluster_network doesn't cause any errors and ends up using monito...
- Working Greg's comments to my PR.
- 12:17 AM Bug #25146 (Fix Under Review): "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x...
- ceph/rocksdb: https://github.com/ceph/rocksdb/pull/40
09/26/2018
- 10:04 PM Bug #36105 (In Progress): OSD hangs during shutdown
- 10:03 PM Bug #36105: OSD hangs during shutdown
In all 3 OSDs I found that hung, the last thread to drain is the smallest thread_index (thus the thread that handle...- 01:51 AM Bug #36105: OSD hangs during shutdown
I'll look at this more tomorrow but I suspect that the following commit from https://github.com/ceph/ceph/pull/2273...- 09:38 PM Bug #36096 (Need More Info): osd: crashing with errors: "write_log_and_missing with: dirty_to" an...
- Could you please attach logs to this tracker?
- 09:21 PM Bug #35810: FAILED assert(entries.begin()->version > info.last_update)
- It may be related to error-pg-log entry bugs in 12.2.2. The latest luminous releases have fixed a few of these.
- 09:20 PM Bug #35847 (Fix Under Review): wrong cluster_network doesn't cause any errors and ends up using m...
- 09:19 PM Bug #35974: Apparent export-diff/import-diff corruption
- Should disable balanced and localized reads when a cache tier is in use.
- 08:57 PM Bug #35974: Apparent export-diff/import-diff corruption
- ...
- 06:15 PM Bug #35974: Apparent export-diff/import-diff corruption
- Ultimately we have disabled the rbd_balance_snap_reads feature.
To help the developers troubleshoot the problem, h... - 06:06 PM Bug #35974: Apparent export-diff/import-diff corruption
- Moving to OSD team in case they have any follow-up -- but it just sounds like your OSDs were inconsistent for some re...
- 06:02 PM Bug #35974: Apparent export-diff/import-diff corruption
- @Jason: If rbd_balance_snap_reads is enabled, deep-scrub fixes the issue.
- 01:03 PM Bug #35974: Apparent export-diff/import-diff corruption
- @Cliff: are you saying the deep-scrub fixed the corruption between snapshot0 and snapshot1 -- or are you saying that ...
- 02:45 PM Bug #22052 (Resolved): ceph-mon: possible Leak in OSDMap::build_simple_optioned
- 08:01 AM Bug #22052 (Fix Under Review): ceph-mon: possible Leak in OSDMap::build_simple_optioned
- https://github.com/ceph/teuthology/pull/1213
- 05:15 AM Bug #36172: osd: hit suicide timeout
- Most likely can't flush filestore output to the hardware. Can you thoroughly check the hardware is in perfect working...
- 04:18 AM Feature #36187 (New): Crush rule ssd-primary should take previous emit result into consideration
- http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/
The document entry "PLACING DIFFERENT POOLS ON DI... - 03:20 AM Bug #36166: pg merge can collide with remapped, upmap pgs
- https://github.com/ceph/ceph/pull/24184
09/25/2018
- 11:17 PM Bug #36186: failed to become clean before timeout expired - pg stuck in clean+premerge+peered
- /a/nojha-2018-09-24_16:58:52-rados-master-distro-basic-smithi/3065624/
- 08:16 PM Bug #36186 (Resolved): failed to become clean before timeout expired - pg stuck in clean+premerge...
- ...
- 11:12 PM Bug #36105: OSD hangs during shutdown
- /a/nojha-2018-09-24_16:58:52-rados-master-distro-basic-smithi/3065653/
- 02:47 PM Bug #36105: OSD hangs during shutdown
- I've reproduced this running the test in my local tree. I'll work on generating a core dump to find out what is stuck.
- 10:29 PM Bug #36174 (In Progress): ceph pg ls creating: EINVAL
- 08:43 AM Bug #36174: ceph pg ls creating: EINVAL
- https://github.com/ceph/ceph/pull/24262
- 08:41 AM Bug #36174 (Resolved): ceph pg ls creating: EINVAL
- ...
- 10:17 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- Logs in /a/pdonnell-2018-09-25_01:23:37-fs-wip-pdonnell-testing-20180924.230702-distro-basic-smithi/3066511/remote/log
- 05:11 PM Bug #36182 (Resolved): osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is ...
- From: http://pulpito.ceph.com/pdonnell-2018-09-25_01:23:37-fs-wip-pdonnell-testing-20180924.230702-distro-basic-smith...
- 06:19 PM Bug #36183 (Fix Under Review): [objecter] client socket failure leads to hung connection
- *PR*: https://github.com/ceph/ceph/pull/24276
- 05:50 PM Bug #36183 (Resolved): [objecter] client socket failure leads to hung connection
- During an rbd-mirror thrash test run, the process failed to shut down cleanly because it was stuck in an librados rea...
- 02:05 PM Feature #24176: osd: add command to drop OSD cache
- Patrick, sorry I completely missed your comment. I opened a PR for it: https://github.com/ceph/ceph/pull/24270
- 01:28 PM Bug #22837 (Resolved): discover_all_missing() not always called during activating
- 01:28 PM Backport #26992 (Resolved): luminous: discover_all_missing() not always called during activating
- 10:24 AM Bug #36177: rados rm --force-full is blocked when cluster is in full status
- https://github.com/ceph/ceph/pull/24264
- 09:57 AM Bug #36177 (Resolved): rados rm --force-full is blocked when cluster is in full status
- ...
- 09:54 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- FWIW, we hit this issue several times, it seems relate with our operational works that change `mon_osd_force_trim_to`...
- 07:33 AM Bug #36172 (New): osd: hit suicide timeout
- ceph version 0.94.9-9.el7cp
A osd-drive died some days agoo and after a restart today again with the same error:
... - 06:07 AM Backport #36132 (In Progress): mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" ...
- https://github.com/ceph/ceph/pull/24260
- 06:02 AM Backport #36131 (In Progress): luminous: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPv...
- https://github.com/ceph/ceph/pull/24259
- 02:35 AM Bug #35810: FAILED assert(entries.begin()->version > info.last_update)
- Neha Ojha wrote:
> Hi Chang. Can you reproduce this bug with higher level of debugging? It is hard to find out what'...
09/24/2018
- 09:43 PM Bug #36170: pg dout log had backfill=[] and bft= which are the same thing
- https://github.com/ceph/ceph/pull/24256
- 09:25 PM Bug #36170 (Resolved): pg dout log had backfill=[] and bft= which are the same thing
This is confusing to log analysis. I would have preferred to leave bft= added in 2013, but backfill=[] is in mimic...- 06:09 PM Bug #36166 (Resolved): pg merge can collide with remapped, upmap pgs
- If either source or target pg is remapped or has an upmap it may map to a different set of osds.
- 06:07 PM Bug #22329: mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
- /ceph/teuthology-archive/pdonnell-2018-09-23_19:17:54-fs-wip-pdonnell-testing-20180923.160923-distro-basic-smithi/306...
- 05:44 PM Bug #35847 (In Progress): wrong cluster_network doesn't cause any errors and ends up using monito...
- PR: https://github.com/ceph/ceph/pull/24236
- 03:56 PM Bug #36164 (New): cephtool/test fails 'ceph tell mon.a help' with EINTR
- ...
- 02:22 PM Bug #36163 (Fix Under Review): mon osdmap cash too small during upgrade to mimic
- https://github.com/ceph/ceph/pull/24247
- 02:15 PM Bug #36163 (Resolved): mon osdmap cash too small during upgrade to mimic
- At least one large cluster upgrading from luminous to mimic had its' mons fall over due to heavy load that turned out...
- 11:01 AM Backport #36150 (Resolved): mimic: output format is invalid of the crush tree json dumper
- https://github.com/ceph/ceph/pull/24481
- 11:01 AM Backport #36149 (Resolved): luminous: output format is invalid of the crush tree json dumper
- https://github.com/ceph/ceph/pull/24482
- 11:00 AM Backport #36132 (Resolved): mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on ...
- https://github.com/ceph/ceph/pull/24260
- 11:00 AM Backport #36131 (Resolved): luminous: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" ...
- https://github.com/ceph/ceph/pull/24259
- 08:50 AM Bug #24373 (Resolved): osd: eternal stuck PG in 'unfound_recovery'
- 08:50 AM Backport #24501 (Resolved): luminous: osd: eternal stuck PG in 'unfound_recovery'
09/23/2018
- 03:27 PM Support #36115: After Mimic upgrade OSD's stuck at booting.
- My main kernel is: Linux 4.14.70-1-lts Also I tried 4.18.8-arch1-1-ARCH. Nothing changed.
I'm sure this problem re... - 03:12 PM Support #36115: After Mimic upgrade OSD's stuck at booting.
- IPERF test between 2 node: https://paste.ubuntu.com/p/7rRYSSqtyh/
I dont think this is related to network or firew... - 02:49 PM Support #36115 (New): After Mimic upgrade OSD's stuck at booting.
- After Luminous to Mimic upgrade when I try to start an OSD. Its
stucking at "booting". (I edit the hostnames so do n...
09/22/2018
- 03:48 PM Bug #36113 (New): fusestore test umount failed?
- ...
- 03:46 PM Bug #21143: bad RESETSESSION between OSDs?
- /a/sage-2018-09-22_02:47:58-rados-master-distro-basic-smithi/3053124
seeing more of this! - 03:42 PM Bug #26972 (Fix Under Review): cluster [ERR] Error -2 reading object
- https://github.com/ceph/ceph/pull/24225
- 03:30 PM Bug #24866 (Resolved): FAILED assert(0 == "past_interval start interval mismatch") in check_past_...
- resolved by https://github.com/ceph/ceph/pull/24064
09/21/2018
- 11:28 PM Bug #35974: Apparent export-diff/import-diff corruption
- I did a deep analysis of the export-diff files created with read-balance set to true and false. When read-balance is...
- 11:24 PM Bug #22329 (Need More Info): mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
- Neha Ojha wrote:
> Patrick, which set of logs have the (Leak_DefinitelyLost, Leak_IndirectlyLost) errors?
Old log... - 09:43 PM Bug #22329: mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
- Patrick, which set of logs have the (Leak_DefinitelyLost, Leak_IndirectlyLost) errors?
- 09:55 PM Bug #36073 (Resolved): failed to recover before timeout expired -- premerge+peered PGs?
- 09:54 PM Bug #35810 (Need More Info): FAILED assert(entries.begin()->version > info.last_update)
- Hi Chang. Can you reproduce this bug with higher level of debugging? It is hard to find out what's happening from the...
- 09:31 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 09:27 PM Bug #24866 (Need More Info): FAILED assert(0 == "past_interval start interval mismatch") in check...
- 01:22 PM Bug #35955: ceph-objectstore-tool past_intervals broken
- This is fixed for nautilus since the behavior totally changed with https://github.com/ceph/ceph/pull/23985. The prob...
- 01:22 PM Bug #35955 (Resolved): ceph-objectstore-tool past_intervals broken
- 04:38 AM Bug #23828: ec gen object leaks into different filestore collection just after split
- ...
- 03:50 AM Backport #35854 (In Progress): luminous: should remove mentioning of "scrubq" in ceph(8) manpage
- https://github.com/ceph/ceph/pull/24211
- 03:47 AM Backport #35855 (In Progress): mimic: should remove mentioning of "scrubq" in ceph(8) manpage
- https://github.com/ceph/ceph/pull/24210
09/20/2018
- 10:56 PM Bug #36105: OSD hangs during shutdown
- Yes, the kill_daemons failed because after 6 minutes several terminated OSDs still hadn't finished shutting down. I ...
- 10:20 PM Bug #36105: OSD hangs during shutdown
- 10:19 PM Bug #36105 (Resolved): OSD hangs during shutdown
- ...
- 10:30 PM Bug #25153 (Pending Backport): output format is invalid of the crush tree json dumper
- 10:10 PM Backport #26992: luminous: discover_all_missing() not always called during activating
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23817
merged - 09:17 PM Bug #35974: Apparent export-diff/import-diff corruption
- @Jason
For the logs sent previously, I performed export-diff between snapshot1 and snapshot2. When I did an rbd exp... - 07:23 PM Subtask #36091 (In Progress): [rbd top] collect client perf stats when query is enabled
- 01:29 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- /a/sage-2018-09-19_21:52:06-rados-wip-sage2-testing-2018-09-19-1236-distro-basic-smithi/3044506
- 12:29 PM Bug #36096 (Need More Info): osd: crashing with errors: "write_log_and_missing with: dirty_to" an...
- Our Ceph with version Mimic is crashing OSD nodes with the following error in the log:...
- 02:20 AM Backport #35844 (In Progress): luminous: objecter cannot resend split-dropped op when racing with...
- https://github.com/ceph/ceph/pull/24188
09/19/2018
- 10:46 PM Subtask #36091 (Resolved): [rbd top] collect client perf stats when query is enabled
- The OSD's 'collect_perf_metrics' MgrClient callback should record whether or not the query is enabled/disabled and ma...
- 03:13 PM Bug #35974: Apparent export-diff/import-diff corruption
- @Patrick: Interesting find. If it truly is related to just that option, we will have to get a RADOS core team member ...
- 01:27 PM Bug #21143: bad RESETSESSION between OSDs?
- ...
- 12:35 PM Backport #35836 (In Progress): mimic: mon: mgr options not parse propertly
- https://github.com/ceph/ceph/pull/24176
- 09:56 AM Bug #35969 (Pending Backport): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on cent...
09/18/2018
- 10:24 PM Bug #35682 (Resolved): 34164d55c839acd35bbb1be5279e3e23e3bec1fd broke the librados examples
- 07:56 PM Bug #36040: mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
- Also in Mimic: /ceph/teuthology-archive/yuriw-2018-09-13_19:40:54-fs-mimic-distro-basic-smithi/3018437/remote/smithi0...
- 05:39 PM Feature #24176: osd: add command to drop OSD cache
- Mohamad, any update on this?
- 04:20 PM Bug #36073 (In Progress): failed to recover before timeout expired -- premerge+peered PGs?
- https://github.com/ceph/ceph/pull/24064
https://github.com/ceph/ceph/pull/23985 - 03:24 PM Bug #36073 (Resolved): failed to recover before timeout expired -- premerge+peered PGs?
- Appeared between 93748a325cd8 ("Merge pull request #23944 from ceph/wip-s3a-update-mirror") and 5a3344f0e52c ("Merge ...
- 03:38 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- /a/kchai-2018-09-18_07:16:16-rados-wip-kefu2-testing-2018-09-18-1224-distro-basic-smithi/3037527
- 03:11 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- Running the multimds:basic suite with --filter 'clusters/9-mds.yaml conf/{client.yaml mds.yaml mon.yaml osd.yaml} inl...
- 03:09 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- Running the multimds:basic suite with --filter 'clusters/9-mds.yaml conf/{client.yaml mds.yaml mon.yaml osd.yaml} inl...
- 01:08 AM Bug #35849 (Closed): mimic: test_envlibrados_for_rocksdb.sh: build failed with error: #endif with...
- sure.
- 12:28 AM Bug #35849: mimic: test_envlibrados_for_rocksdb.sh: build failed with error: #endif without #if
- Ah, I see what happened. The github.com/facebook/rocksdb/ was broken on the day these tasks failed. See https://githu...
09/17/2018
- 08:55 PM Bug #22329: mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
- See also #36040
- 08:38 PM Bug #22329: mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
- Still not seeing anything in RADOS runs AFAIK, but I did notice there might be some disparity in coverage....
>13:... - 07:35 PM Bug #22329: mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
- -/ceph/teuthology-archive/pdonnell-2018-09-13_04:59:57-multimds-wip-pdonnell-testing-20180913.024004-distro-basic-smi...
- 08:54 PM Bug #36040 (New): mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
- From: /ceph/teuthology-archive/pdonnell-2018-09-13_04:59:57-multimds-wip-pdonnell-testing-20180913.024004-distro-basi...
- 02:24 PM Bug #35849: mimic: test_envlibrados_for_rocksdb.sh: build failed with error: #endif without #if
- Hey Brad,
I had reproduced it here: http://pulpito.ceph.com/nojha-2018-09-07_17:42:05-rados:singleton-mimic-distro... - 10:28 AM Bug #35849: mimic: test_envlibrados_for_rocksdb.sh: build failed with error: #endif without #if
- Hey Neha,
Can you reproduce this?
I tried mimicking the job in a Bionic container and it builds correctly. I al... - 07:53 AM Bug #35923 (Resolved): "ceph_assert(values.size() == 2)" in PG::peek_map_epoch()
- 06:26 AM Bug #35969 (Fix Under Review): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on cent...
- 06:25 AM Bug #35969 (In Progress): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- https://github.com/ceph/ceph/pull/24124
as suggested by Brad, we can just bump the BuildRequires of gperftools. - 05:42 AM Bug #24835: osd daemon spontaneous segfault
- Soenke,
Could you upload a coredump for each of the different backtraces as well as details of your environment (t...
Also available in: Atom