Bug #5216
closedrestarted or failed osd resulted in a lot of caller_ops.size error messages and stalled I/O
0%
Description
I'm running upstream/cuttlefish 85ad65e294f2b3d4bd1cfef6ae613e31d1cea635
I've seen the following today while just restarting a single osd.
/etc/init.d/ceph restart osd.3
ceph -w started to show the following immediatly:
13-05-31 13:42:44.342142 osd.54 [ERR] 4.97d caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.362058 osd.11 [ERR] 4.d15 caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.365393 osd.11 [ERR] 4.af4 caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.381021 osd.11 [ERR] 4.bc9 caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.432011 osd.54 [ERR] 4.651 caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.451333 osd.54 [ERR] 4.332 caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.458137 osd.11 [ERR] 4.b87 caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.552218 osd.54 [ERR] 4.9ea caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.573992 osd.54 [ERR] 4.d56 caller_ops.size 3002 > log
size 3001
2013-05-31 13:42:44.576582 osd.11 [ERR] 4.8ce caller_ops.size 3002 > log
...
and then after a few seconds i saw a lot of messages like these:
2013-05-31 13:43:40.732750 osd.3 [WRN] slow request 45.152737 seconds
old, received at 2013-05-31 13:42:55.579934:
osd_op(client.9439559.0:3387070 rbd_data.5039256b8b4567.000000000000290f
[write 2043904~8192] 4.f4a3cae1 RETRY=1 snapc 4da2=[] e90858) v4
currently reached pg
2013-05-31 13:43:40.732752 osd.3 [WRN] slow request 45.152716 seconds
old, received at 2013-05-31 13:42:55.579955:
osd_op(client.9439559.0:3387071 rbd_data.5039256b8b4567.000000000000290f
[write 2568192~12288] 4.f4a3cae1 RETRY=1 snapc 4da2=[] e90858) v4
currently reached pg
Then my VMs (Qemu with rbd) started to hang.
Updated by Stefan Priebe almost 11 years ago
Then the whole ceph storage became instable until the osd is up and running again and had recovered.
Updated by Stefan Priebe almost 11 years ago
To me it seems that the osd sets itself online / available before it is really ready which then results in slow I/O.
Starting an OSD results in 2-3 min 200% CPU and a lot of I/O. But it sets itself online / available just after the start.
Updated by Stefan Priebe almost 11 years ago
So i get the caller_ops.size 3002 > log size 3001 messages while the osd is offline and i get the slow request messages, when it comes online again.
Updated by Samuel Just almost 11 years ago
- Status changed from New to Pending Backport
- Assignee set to Samuel Just
pushed fix to master, needs backport to cuttlefish
Note, this probably did not cause the IO hang.
Updated by Stefan Priebe almost 11 years ago
this is the backport:
commit 2af3f1d40b9c64f58d1a05232c52b2a47426fef5
Author: Samuel Just <sam.just@inktank.com>
Date: Fri May 31 11:08:47 2013 -0700
PGLog: only add entry to caller_ops in add() if reqid_is_indexed()
Fixes: #5216
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e6ad9da03a17e9bfa12386ecaf0a2ab257327da6)
Conflicts:
src/osd/PGLog.h
diff --git a/src/osd/PG.h b/src/osd/PG.h
index fe89879..015fc87 100644
--- a/src/osd/PG.h
+++ b/src/osd/PG.h@ -279,7 +279,8
@ public:
// to our index
objects[e.soid] = &(log.back());
- caller_ops[e.reqid] = &(log.back());
+ if (e.reqid_is_indexed())
+ caller_ops[e.reqid] = &(log.back());
}
void trim(ObjectStore::Transaction &t, hobject_t& oid, eversion_t s);
Updated by Stefan Priebe almost 11 years ago
The stalled I/O seems to come from the freshly started OSDs. They seem to tell ceph hey i can handle I/O but they're still in the start phase and seem to scan their osd folder or flush journal or whatever...
Updated by Samuel Just almost 11 years ago
- Status changed from Pending Backport to Resolved