Project

General

Profile

Actions

Bug #12789

closed

Long tail latency due to journal aio io_submit takes long time to return?

Added by Zhi Zhang over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There are some (not much) long tail latency due to journal aio io_submit takes long time to return. We add some logs in the FileJouranl.cc and get following output as an example. AIO's io_submit function takes ~400ms to return, occasionally even takes up to 900ms.

------------------------------------------------------------
2015-08-25 15:24:39.535206 7f22699c2700 20 journal write_thread_entry woke up
2015-08-25 15:24:39.535212 7f22699c2700 10 journal room 10460594175 max_size 10484711424 pos 4786679808 header.start 4762566656 top 4096
2015-08-25 15:24:39.535215 7f22699c2700 10 journal check_for_full at 4786679808 : 532480 < 10460594175
2015-08-25 15:24:39.535216 7f22699c2700 15 journal prepare_single_write 1 will write 4786679808 : seq 13075224 len 526537 -> 532480 (head 40 pre_pad 4046 ebl 526537 post_pad 1817 tail 40) (ebl alignment 4086)
2015-08-25 15:24:39.535322 7f22699c2700 20 journal prepare_multi_write queue_pos now 4787212288
2015-08-25 15:24:39.535324 7f22699c2700 15 journal do_aio_write writing 4786679808~532480
2015-08-25 15:24:39.535434 7f22699c2700 10 journal align_bl total memcopy: 532480
2015-08-25 15:24:39.535439 7f22699c2700 20 journal write_aio_bl 4786679808~532480 seq 13075224
2015-08-25 15:24:39.535442 7f22699c2700 20 journal write_aio_bl .. 4786679808~532480 in 1
2015-08-25 15:24:39.535444 7f22699c2700 5 journal io_submit starting...

2015-08-25 15:24:39.990372 7f22699c2700 5 journal io_submit return value: 1
2015-08-25 15:24:39.990396 7f22699c2700 5 journal put_throttle finished 1 ops and 526537 bytes, now 12 ops and 6318444 bytes
2015-08-25 15:24:39.990401 7f22691c1700 20 journal write_finish_thread_entry waiting for aio(s)
2015-08-25 15:24:39.990406 7f22699c2700 20 journal write_thread_entry aio throttle: aio num 1 bytes 532480 ... exp 2 min_new 4 ... pending 6318444
2015-08-25 15:24:39.990410 7f22699c2700 10 journal room 10460061695 max_size 10484711424 pos 4787212288 header.start 4762566656 top 4096
2015-08-25 15:24:39.990413 7f22699c2700 10 journal check_for_full at 4787212288 : 532480 < 10460061695
2015-08-25 15:24:39.990415 7f22699c2700 15 journal prepare_single_write 1 will write 4787212288 : seq 13075225 len 526537 > 532480 (head 40 pre_pad 4046 ebl 526537 post_pad 1817 tail 40) (ebl alignment 4086)
-----------------------------------------------------------

And write_aio_bl will hold aio_lock until it returns, so write_finish_thread_entry can't process completed aio when io_submit is blocking. We reduced the aio_lock's locking scope in write_aio_bl, specifically not locking io_submit. It doesn't seem to need lock on io_submit in journal because there is only 1 write_thread_entry and 1 write_finish_thread_entry.

This might help those completed aio to get processed asap. But for the one blocked in io_submit and the ones in the queue, they still need to take more time.

Actions #2

Updated by Sage Weil over 8 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by Sage Weil over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF