Project

General

Profile

Bug #17468

CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs

Added by Parikshith B over 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

CephFs Environment:
1 MDS node
1 Ceph Kernel Filesystem, Ubuntu 14.04 LTS, kernel version 4.4.0-36-generic

Test Steps:

1. Installed cluster with 1 MDS and created one CephFs.
2. I extended the root ceph file system on 3 clients, that is by creating three different sub directories and mounting respective directories on each client.
3. Created 1000 files-5g each using vdbench.*Threads used is 256*,xfersize=(4k),fileio=random,fileselect=seq
4. Did a write intensive run over these files with the following profile,

fsd=fsd1,anchor=/home/ems/cephfs_mnt,depth=1,width=1,files=1000,size=5G,openflags=o_sync
fwd=fwd2,fsd=fsd*,rdpct=0,xfersize=4k,fileio=random,threads=256,fileselect=seq
rd=rd2,fwd=fwd2,elapsed=300,interval=1,fwdrate=max,format=no

5. Write pauses after a while for slightly less than a minute. This time out happens many times over long duration run.

Timest Inter ReqstdOps mb/sec xfer
01:33.1 31 64192 250.7 4096
01:34.1 32 61601 240.6 4096
01:35.1 33 64745 252.9 4096
01:36.1 34 64670 252.6 4096
01:37.1 35 62773 245.2 4096
01:38.1 36 52801 206.2 4096
01:39.1 37 60966 238.1 4096
01:40.0 38 63795 249.2 4096
01:41.0 39 60914 237.9 4096
01:42.1 40 63908 249.6 4096
01:43.0 41 64171 250.6 4096
01:44.1 42 64085 250.3 4096
01:45.0 43 57086 222.9 4096
01:46.0 44 68964 269.3 4096
01:47.0 45 62038 242.3 4096
01:48.0 46 63627 248.5 4096
01:49.0 47 61936 241.9 4096
01:50.0 48 66454 259.5 4096
01:51.0 49 62026 242.2 4096
01:52.0 50 58361 227.9 4096
01:53.1 51 66363 259.2 4096
01:54.0 52 64689 252.7 4096
01:55.1 53 67096 262.1 4096
01:56.0 54 64224 250.8 4096
01:57.0 55 64018 250 4096
01:58.1 56 61333 239.5 4096
01:59.0 57 65359 255.3 4096
02:00.0 58 52828 206.3 4096
02:01.0 59 63845 249.3 4096
02:02.0 60 62656 244.7 4096

Timest Inter ReqstdOps mb/sec xfer
02:03.0 61 65246 254.8 4096
02:04.0 62 61839 241.5 4096
02:05.0 63 63338 247.4 4096
02:06.0 64 57838 225.9 4096
02:07.0 65 61039 238.4 4096
02:08.0 66 61246 239.2 4096
02:09.0 67 63624 248.5 4096
02:10.0 68 64887 253.4 4096
02:11.0 69 59726 233.3 4096
02:12.0 70 65632 256.3 4096
02:13.0 71 2714 10.6 4096
02:14.0 72 0 0 0
02:15.0 73 0 0 0
02:16.0 74 0 0 0
02:17.0 75 0 0 0
02:18.0 76 0 0 0
02:19.0 77 0 0 0
02:20.0 78 0 0 0
02:21.0 79 0 0 0
02:22.0 80 0 0 0
02:23.0 81 0 0 0
02:24.0 82 0 0 0
02:25.0 83 0 0 0
02:26.0 84 0 0 0
02:27.0 85 0 0 0
02:28.0 86 0 0 0
02:29.0 87 0 0 0
02:30.0 88 0 0 0
02:31.0 89 0 0 0
02:32.0 90 0 0 0

Timest Inter ReqstdOps mb/sec xfer
02:33.0 91 0 0 0
02:34.0 92 0 0 0
02:35.0 93 0 0 0
02:36.0 94 0 0 0
02:37.0 95 0 0 0
02:38.0 96 0 0 0
02:39.0 97 0 0 0
02:40.0 98 0 0 0
02:41.0 99 0 0 0
02:42.0 100 0 0 0
02:43.0 101 0 0 0
02:44.0 102 0 0 0
02:45.0 103 0 0 0
02:46.0 104 0 0 0
02:47.0 105 0 0 0
02:48.0 106 0 0 0
02:49.0 107 0 0 0
02:50.0 108 0 0 0
02:51.0 109 0 0 0
02:52.0 110 0 0 0
02:53.0 111 0 0 0
02:54.0 112 0 0 0
02:55.0 113 0 0 0
02:56.0 114 0 0 0
02:57.0 115 0 0 0
02:58.0 116 0 0 0
02:59.0 117 57906 226.2 4096
03:00.0 118 64699 252.7 4096
03:01.0 119 63075 246.3 4096
03:02.0 120 63674 248.7 4096

Timest Inter ReqstdOps mb/sec xfer
03:03.1 121 63221 246.9 4096
03:04.0 122 55609 217.2 4096
03:05.0 123 60163 235 4096
03:06.0 124 61651 240.8 4096
03:07.0 125 63223 246.9 4096
03:08.0 126 65191 254.6 4096
03:09.0 127 60244 235.3 4096
03:10.0 128 66336 259.1 4096
03:11.0 129 64951 253.7 4096
03:12.0 130 61045 238.4 4096
03:13.1 131 65566 256.1 4096
03:14.0 132 63936 249.7 4096

4. Mds logs were clean.

IO_logs.txt View (63.8 KB) Parikshith B, 10/01/2016 06:02 AM

History

#1 Updated by Vishal Kanaujia over 7 years ago

I collected stack unwind traces in MDS process while no I/O was observed during write ops on cephFS. Stack trace frequency was one second. There was no thread waiting on lock or hung. Other than beacon messages, MDS logs emit periodical information on caps flushes.

2016-09-28 15:35:12.042635 7fb7a506a700 1 -- 10.242.50.40:6804/69787 <== client.4111 10.242.50.40:0/557006669 322 ==== client_caps(flush ino 1000000001e 32 seq 4 tid 9 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFsxcrwb follows 1 size 1073741824/0 mtime 2016-09-28 15:35:12.035619) ==== 176+0+0 (3366429683 0 0) 0x7fb7b66eb800 con 0x7fb7b66e7000 2016-09-28 15:35:12.042670 7fb7a506a700 7 mds.0.locker handle_client_caps on 1000000001e tid 9 follows 1 op flush

Though vdbench client is showing as paused activity, its in cpu IO wait state, for the cluster to acknowledge the issued IOs. We observe paused were seen while load phase, however didn't observe any such in overwrite case.

#2 Updated by Vishal Kanaujia over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Vishal Kanaujia

#3 Updated by Vishal Kanaujia over 7 years ago

1. MDS runs with default configuration.

2. On MDS node, I ran following commands:

  1. ceph --admin-daemon ceph-mds.iflab12.asok objecter_requests {
    "ops": [],
    "linger_ops": [],
    "pool_ops": [],
    "pool_stat_ops": [],
    "statfs_ops": [],
    "command_ops": []
    }
  1. ceph --admin-daemon ceph-mds.iflab12.asok dump_ops_in_flight {
    "ops": [],
    "num_ops": 0
    }

3. There is no disk activity during I/O drop on MDS node, client node and OSD nodes.

4. MDS health is good.

  1. ceph -s
    cluster 355dbc43-a767-43c2-b0b4-9a702619153f
    health HEALTH_OK
    monmap e1: 1 mons at {iflab12=10.10.10.101:6789/0}
    election epoch 53, quorum 0 iflab12
    fsmap e19963: 1/1/1 up {0=iflab12=up:active}
    osdmap e200: 16 osds: 16 up, 16 in
    flags sortbitwise
    pgmap v480842: 2112 pgs, 3 pools, 34925 MB data, 8768 objects
    81605 MB used, 111 TB / 111 TB avail
    2112 active+clean
    client io 361 MB/s wr, 0 op/s rd, 2893 op/s wr

5. dmesg on client and MDS are clean.

6. Same VDBench test runs with consistent write I/O throughput on local file system.

#4 Updated by John Spray over 7 years ago

Vishal: are you still investigating this or do you need some input from others?

#5 Updated by Vishal Kanaujia over 7 years ago

John, there are two test cases.
1. Load files into file system. We created 900k files of 1k size
2. Run I/O tests such as random writes. We rewrite 900k files of 1k size

We were getting I/O drop to zero for both the tests.

During debugging, I found that our test network had packet drops on OSD nodes. After fixing the packet drop problem, I/O stayed more consistent than before for loading files. There were still zero I/O events, for a second or two.

For rewrite tests, after creating 900k files of 1k size, we rewrote these file in a random sequence. Now, we are still seeing the I/O drop to zero, all the while making sure there is no packet drop. The zero I/O time lasts for more than 30 seconds.

#6 Updated by Vishal Kanaujia over 7 years ago

Further analysis of I/O drop points to correlation of MDS config parameter mds_log_max_segments and I/O drop. Using default MDS conf causes I/O drop to occur at a high frequency (every 5-10 seconds). As we increase mds_log_max_segments to higher values, I/O drop point shifts in timeline. After setting mds log max segments to -1, I/O drop becomes less frequent. However MDS process size bloats as we are not trimming segments.

It suggests segment trimming frequency has implication on application IO throughput in our test.

#7 Updated by Vishal Kanaujia almost 7 years ago

  • Status changed from In Progress to Closed

Unable to consistently reproduce the bug.

Also available in: Atom PDF