Project

General

Profile

Actions

Bug #12811

closed

ENXIO from kernel client (cephfs)

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

this is on the lab cluster. It has happened twice in 2 days.

In each case, the requests are sent to the second acting OSD instead of the first.

Teuthology kernel is v4.1-rc1-1-gb9ddddf.

The first time, we mounted the same thing on flab (3.16) and the same files were accessible.

The second time, flab was still mounted and it also couldn't access the same file. This suggests it may be triggered by the stream of osdmap incrementals?

Actions #1

Updated by Sage Weil over 8 years ago

during this time the apama osds were flapping and we were rebalancing lots of data... either of those may be the trigger?

Actions #2

Updated by Zheng Yan over 8 years ago

I compared osdmaps of good and bad mounts, found that the 'exist' flag is missing in bad mount's osdmap.

root@mira049:~# ceph osd map data 1000b1dd68e.00000000
osdmap e600761 pool 'data' (0) object '1000b1dd68e.00000000' -> pg 0.5c0c9f2c (0.f2c) -> up ([4,24,127], p4) acting ([4,24,127], p4)
root@teuthology:/tmp# diff osdmap.good osdmap.bad -u
--- osdmap.good    2015-08-28 01:48:23.219372417 -0700
+++ osdmap.bad    2015-08-28 01:49:00.679826791 -0700
@@ -1,4 +1,4 @@
-epoch 600790
+epoch 600800
 flags
 pool 0 pg_num 4096 (4095) read_tier -1 write_tier -1
 pool 1 pg_num 64 (63) read_tier -1 write_tier -1
@@ -8,7 +8,7 @@
 osd1    10.214.133.104:6804    100%    (exists, up)    100%
 osd2    10.214.134.140:6807    100%    (exists, up)    100%
 osd3    10.214.134.140:6813    100%    (exists, up)    100%
-osd4    10.214.133.104:6808     90%    (exists, up)    100%
+osd4    10.214.133.104:6808     90%    (up)    100%
 osd5    10.214.133.104:6812    100%    (exists, up)    100%
 osd6    10.214.133.104:6801    100%    (exists, up)    100%
 osd7    10.214.133.114:6804    100%    (exists, up)    100%
Actions #3

Updated by Zheng Yan over 8 years ago

  • Status changed from New to Fix Under Review

https://github.com/ceph/ceph-client/commit/4a18ede97ba5029587b362446360cfd42379f9d0

diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
index 4a31258..7d8f581 100644
--- a/net/ceph/osdmap.c
+++ b/net/ceph/osdmap.c
@@ -1300,7 +1300,7 @@ struct ceph_osdmap *osdmap_apply_incremental(void **p, void *end,
                ceph_decode_addr(&addr);
                pr_info("osd%d up\n", osd);
                BUG_ON(osd >= map->max_osd);
-               map->osd_state[osd] |= CEPH_OSD_UP;
+               map->osd_state[osd] |= CEPH_OSD_UP | CEPH_OSD_EXISTS;
                map->osd_addr[osd] = addr;
        }

Actions #4

Updated by Sage Weil over 8 years ago

  • Status changed from Fix Under Review to Resolved

fix deployed to teuthology, so far so good! Reviewed-by:

Actions

Also available in: Atom PDF