Project

General

Profile

Actions

Bug #2765

closed

Extremely Long Object Latencies

Added by Mark Nelson almost 12 years ago. Updated over 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Saw this during testing on one of the aging clusters.

example object tid: 312322

Full logs available here:

http://nhm.ceph.com/object_latency

seekwatcher movie:

http://nhm.ceph.com/object_latency/osd.19.mpg

Synopsis:

-- Client Objecter Log --

2012-07-09 17:08:52.801370 7ff5a284e780 10 client.4154.objecter recalc_op_target tid 312322 pgid 6.4cdeab85 acting [19]
2012-07-09 17:08:52.801380 7ff5a284e780 10 client.4154.objecter op_submit oid plana83_12703_object312321 @6 [write 0~131072] tid 312322 osd.19
2012-07-09 17:08:52.801387 7ff5a284e780 15 client.4154.objecter send_op 312322 to osd.19

-- OSD Log --

ubuntu@burnupi14:/var/log/ceph$ less osd.19.log | grep 312322
2012-07-09 20:09:04.587859 7f734e463700 1 -- 10.214.147.24:6803/22182 <== client.4154 10.214.146.33:0/1012703 7261 ==== osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4 ==== 165+0+131072 (851583332 0 2079310498) 0x4f73240 con 0x3f49dc0
2012-07-09 20:09:04.587866 7f734e463700 20 osd.19 14 _dispatch 0x4f73240 osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4
2012-07-09 20:09:04.587879 7f734e463700 15 osd.19 14 pg[6.b85( v 14'73 (0'0,14'73] n=73 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 mlcod 14'73 active+clean] enqueue_op 0x4f73240 osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4
2012-07-09 20:09:04.671631 7f734b25b700 10 osd.19 14 dequeue_op osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4 pg pg[6.b85( v 14'79 (0'0,14'79] n=79 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'73 lcod 14'73 mlcod 14'73 active+clean]
2012-07-09 20:09:04.671640 7f734b25b700 10 osd.19 14 pg[6.b85( v 14'79 (0'0,14'79] n=79 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'73 lcod 14'73 mlcod 14'73 active+clean] do_op osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4 may_write
2012-07-09 20:09:04.671843 7f734b25b700 10 osd.19 14 pg[6.b85( v 14'79 (0'0,14'79] n=80 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'73 lcod 14'73 mlcod 14'73 active+clean] append_log log(0'0,14'79] [14'80 (0'0) modify 4cdeab85/plana83_12703_object312321/head//6 by client.4154.0:312322 2012-07-09 20:08:52.801359]
2012-07-09 20:09:04.671861 7f734b25b700 10 osd.19 14 pg[6.b85( v 14'80 (0'0,14'80] n=80 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'73 lcod 14'73 mlcod 14'73 active+clean] add_log_entry 14'80 (0'0) modify 4cdeab85/plana83_12703_object312321/head//6 by client.4154.0:312322 2012-07-09 20:08:52.801359
2012-07-09 20:09:04.671911 7f734b25b700 10 osd.19 14 pg[6.b85( v 14'80 (0'0,14'80] n=80 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'73 lcod 14'73 mlcod 14'73 active+clean] new_repop rep_tid 7256 on osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4
2012-07-09 20:09:04.671939 7f734b25b700 10 osd.19 14 pg[6.b85( v 14'80 (0'0,14'80] n=80 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'73 lcod 14'73 mlcod 14'73 active+clean] eval_repop repgather(0x3c34480 14'80 rep_tid=7256 wfack=19 wfdisk=19 op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4) wants=ad
2012-07-09 20:09:04.671948 7f734b25b700 10 osd.19 14 pg[6.b85( v 14'80 (0'0,14'80] n=80 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'73 lcod 14'73 mlcod 14'73 active+clean] apply_repop applying update on repgather(0x3c34480 14'80 rep_tid=7256 wfack=19 wfdisk=19 op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4)
2012-07-09 20:09:29.723118 7f7351c6a700 10 osd.19 14 pg[6.b85( v 14'87 (0'0,14'87] n=87 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'79 lcod 14'79 mlcod 14'79 active+clean] op_commit repgather(0x3c34480 applying 14'80 rep_tid=7256 wfack=19 wfdisk=19 op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4)
2012-07-09 20:09:29.723143 7f7351c6a700 10 osd.19 14 pg[6.b85( v 14'87 (0'0,14'87] n=87 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'80 lcod 14'80 mlcod 14'79 active+clean] eval_repop repgather(0x3c34480 applying 14'80 rep_tid=7256 wfack= wfdisk= op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4) wants=ad
2012-07-09 20:09:29.723161 7f7351c6a700 15 osd.19 14 pg[6.b85( v 14'87 (0'0,14'87] n=87 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'80 lcod 14'80 mlcod 14'79 active+clean] log_op_stats osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4 inb 131209 outb 0 rlat 0.000000 lat 36.922872
2012-07-09 20:09:29.723186 7f7351c6a700 10 osd.19 14 pg[6.b85( v 14'87 (0'0,14'87] n=87 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'80 lcod 14'80 mlcod 14'79 active+clean] sending commit on repgather(0x3c34480 applying 14'80 rep_tid=7256 wfack= wfdisk= op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4) 0x3f08e00
2012-07-09 20:09:29.723198 7f7351c6a700 1 -- 10.214.147.24:6803/22182 --> 10.214.146.33:0/1012703 -- osd_op_reply(312322 plana83_12703_object312321 [write 0~131072] ondisk = 0) v4 -- ?+0 0x3f08e00 con 0x3f49dc0
2012-07-09 20:09:29.723213 7f7351c6a700 10 osd.19 14 pg[6.b85( v 14'87 (0'0,14'87] n=87 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'80 lcod 14'80 mlcod 14'80 active+clean] removing repgather(0x3c34480 applying 14'80 rep_tid=7256 wfack= wfdisk= op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4)
2012-07-09 20:09:29.723223 7f7351c6a700 20 osd.19 14 pg[6.b85( v 14'87 (0'0,14'87] n=87 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'80 lcod 14'80 mlcod 14'80 active+clean] q front is repgather(0x3c34480 applying 14'80 rep_tid=7256 wfack= wfdisk= op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4)
2012-07-09 20:09:29.794436 7f734fc66700 10 osd.19 14 pg[6.b85( v 14'89 (0'0,14'89] n=89 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'87 lcod 14'87 mlcod 14'87 active+clean] op_applied repgather(0x3c34480 applying 14'80 rep_tid=7256 wfack= wfdisk= op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4)
2012-07-09 20:09:29.794491 7f734fc66700 10 osd.19 14 pg[6.b85( v 14'89 (0'0,14'89] n=89 ec=5 les/c 8/8 7/7/5) [19] r=0 lpr=7 luod=14'87 lcod 14'87 mlcod 14'87 active+clean] eval_repop repgather(0x3c34480 applied 14'80 rep_tid=7256 wfack= wfdisk= op=osd_op(client.4154.0:312322 plana83_12703_object312321 [write 0~131072] 6.4cdeab85) v4) wants=ad DONE

-- blkparse snippet --

8,65   5        0   151.501289729     0  m   N cfq20379 dispatched a request
8,65 5 0 151.501290084 0 m N cfq20379 activate rq, drv=29
8,65 5 14679 151.501290188 20379 D WS 81477474 + 8 [btrfs-submit-1]
8,64 5 14680 151.501983044 20379 A WS 81525242 + 256 <- (8,65) 81525208
8,65 5 14681 151.501983232 20379 Q WS 81525242 + 256 [btrfs-submit-1]
8,65 5 14682 151.501984806 20379 G WS 81525242 + 256 [btrfs-submit-1]
8,65 5 14683 151.501985311 20379 P N [btrfs-submit-1]
8,65 5 14684 151.501986194 20379 I WS 81525242 + 256 [btrfs-submit-1]
8,65 5 0 151.501986772 0 m N cfq20379 insert_request
8,65 5 14685 151.501987147 20379 U N [btrfs-submit-1] 1
8,65 5 0 151.501987612 0 m N cfq20379 Not idling. st->count:1
8,65 5 0 151.501987901 0 m N cfq20379 dispatch_insert
8,65 5 0 151.501988319 0 m N cfq20379 dispatched a request
8,65 5 0 151.501988629 0 m N cfq20379 activate rq, drv=30
8,65 5 14686 151.501988749 20379 D WS 81525242 + 256 [btrfs-submit-1]
8,65 0 24853 166.084614182 0 C WS 81070322 + 8 [0]
8,65 0 0 166.084623814 0 m N cfq20379 complete rqnoidle 1
8,65 0 0 166.084624886 0 m N cfq20379 will busy wait
8,65 0 0 166.084625619 0 m N cfq20379 Not idling. st->count:1
8,65 0 24854 166.084631518 0 C WS 81521610 + 256 [0]
8,65 0 0 166.084633824 0 m N cfq20379 complete rqnoidle 1
8,65 0 0 166.084634138 0 m N cfq20379 will busy wait
8,65 0 0 166.084634513 0 m N cfq20379 Not idling. st->count:1
8,65 0 24855 166.084636948 0 C WS 80540626 + 16 [0]
8,65 0 0 166.084646311 0 m N cfq20379 complete rqnoidle 1
8,65 0 0 166.084646552 0 m N cfq20379 will busy wait
8,65 0 0 166.084646915 0 m N cfq20379 Not idling. st->count:1
Actions #1

Updated by Samuel Just almost 12 years ago

I noticed one 15 second lag on an OSD caused by the filestore requiring 15 seconds to drain open ops before a sync.

Actions #2

Updated by Sage Weil over 11 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF