Project

General

Profile

Actions

Bug #58027

closed

op slow from throttled to header_read

Added by yite gu over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 15.2.7
Op spend 500ms from throttled to header_read

            "description": "osd_repop(client.10960891.0:99657833 3.2d e4817/3582 3:b5e0c452:::rbd_data.a43b952dd64c1a.0000000000013581:head v 4817'36005738, mlcod=4817'36005736)",
            "initiated_at": "2022-11-15T06:39:34.428593+0000",
            "age": 456.631042654,
            "duration": 0.50463920799999995,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "events": [
                    {
                        "event": "initiated",
                        "time": "2022-11-15T06:39:34.428593+0000",
                        "duration": 0
                    },
                    {
                        "event": "throttled",
                        "time": "2022-11-15T06:39:34.428593+0000",
                        "duration": 0.50122236099999995
                    },
                    {
                        "event": "header_read",
                        "time": "2022-11-15T06:39:34.929815+0000",
                        "duration": 3.9079999999999997e-06
                    },
                    {
                        "event": "all_read",
                        "time": "2022-11-15T06:39:34.929819+0000",
                        "duration": 1.6670000000000001e-06
                    },
                    {
                        "event": "dispatched",
                        "time": "2022-11-15T06:39:34.929821+0000",
                        "duration": 2.6900000000000001e-06
                    },
                    {
                        "event": "queued_for_pg",
                        "time": "2022-11-15T06:39:34.929823+0000",
                        "duration": 1.9052999999999999e-05
                    },
                    {
                        "event": "reached_pg",
                        "time": "2022-11-15T06:39:34.929842+0000",
                        "duration": 9.4639999999999995e-06
                    },
                    {
                        "event": "started",
                        "time": "2022-11-15T06:39:34.929852+0000",
                        "duration": 0.0033629210000000001
                    },
                    {
                        "event": "commit_sent",
                        "time": "2022-11-15T06:39:34.933215+0000",
                        "duration": 1.7144000000000001e-05
                    },
                    {
                        "event": "done",
                        "time": "2022-11-15T06:39:34.933232+0000",
                        "duration": 0.50463920799999995
                    }
                    }
                ]
            }
        },

Actions #1

Updated by Radoslaw Zarzynski over 1 year ago

  • Status changed from New to Closed

Hello! The most important thing is Octopus is EOL. Second, I'm also not sure whether this is really a bug. Seeing 0,5s delay between these stages on an overloaded system wouldn't be a surprise. Unfortunately, the description field misses the info about the system state.

Actions #2

Updated by yite gu over 1 year ago

Radoslaw Zarzynski wrote:

Hello! The most important thing is Octopus is EOL. Second, I'm also not sure whether this is really a bug. Seeing 0,5s delay between these stages on an overloaded system wouldn't be a surprise. Unfortunately, the description field misses the info about the system state.

thank your reply, I'm planning to use version 16. But about this question, I capture io stack by perf on msgr-work thread:

-    5.59%     0.00%  msgr-worker-2    libc-2.28.so          [.] __GI___clone (inlined)                                                                                                                                                ▒
     __GI___clone (inlined)                                                                                                                                                                                                            ▒
     start_thread                                                                                                                                                                                                                      ▒
   - 0x7fef46a23b72                                                                                                                                                                                                                    ▒
      - 5.27% 0x56026850997b                                                                                                                                                                                                           ▒
         - EventCenter::process_events                                                                                                                                                                                                 ▒
            - 4.62% AsyncConnection::process                                                                                                                                                                                           ▒
               - 2.55% AsyncConnection::read                                                                                                                                                                                           ▒
                    AsyncConnection::read_until                                                                                                                                                                                        ▒
                  - AsyncConnection::read_bulk                                                                                                                                                                                         ▒
                     - PosixConnectedSocketImpl::read                                                                                                                                                                                  ▒
                       __libc_read                                                                                                                                                                                                     ▒
                       entry_SYSCALL_64_after_hwframe                                                                                                                                                                                  ▒
                       do_syscall_64                                                                                                                                                                                                   ▒
                       ksys_read                                                                                                                                                                                                       ▒
                       vfs_read                                                                                                                                                                                                        ▒
                       new_sync_read                                                                                                                                                                                                   ▒
                       sock_read_iter                                                                                                                                                                                                  ▒
                       inet_recvmsg                                                                                                                                                                                                    ▒
                     - tcp_recvmsg                                                                                                                                                                                                     ▒
                        - 2.36% skb_copy_datagram_iter                                                                                                                                                                                 ▒
                           - __skb_datagram_iter                                                                                                                                                                                       ▒
                              - 1.36% _copy_to_iter                                                                                                                                                                                    ▒
                                 - copyout                                                                                                                                                                                             ▒
                                 - copy_user_enhanced_fast_string                                                                                                                                                                      ▒
                                    - page_fault                                                                                                                                                                                       ▒
                                       - do_page_fault                                                                                                                                                                                 ▒
                                         do_user_addr_fault                                                                                                                                                                            ▒
                                       - handle_mm_fault                                                                                                                                                                               ▒
                                          - 0.55% __lru_cache_add                                                                                                                                                                      ▒
                                               pagevec_lru_move_fn                                                                                                                                                                     ▒
                                             - __pagevec_lru_add_fn                                                                                                                                                                    ▒
                                                  0.52% mem_cgroup_update_lru_size                                                                                                                                                     ▒
                              - 0.99% __skb_datagram_iter                                                                                                                                                                              ▒
                                 - 0.68% simple_copy_to_iter                                                                                                                                                                           ▒
                                      __check_object_size                      

Will page fault cause the cost of latency?

The osd page fault count:

# pidstat -p 39 -r 1
Linux 5.4.56.bsk.9-amd64 (n6-011-034)     11/20/22     _x86_64_    (128 CPU)

13:39:23      UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
13:39:24      167        39  13604.00      0.00 5238560 3206228   0.61  ceph-osd
13:39:25      167        39   7502.00      0.00 5238560 3197992   0.61  ceph-osd
13:39:26      167        39    655.00      0.00 5238560 3180840   0.60  ceph-osd
13:39:27      167        39   1588.00      0.00 5238560 3185420   0.60  ceph-osd
13:39:28      167        39   7014.00      0.00 5238560 3181972   0.60  ceph-osd
13:39:29      167        39   6058.00      0.00 5238560 3182320   0.60  ceph-osd
13:39:30      167        39  11866.00      0.00 5238560 3185644   0.60  ceph-osd
13:39:31      167        39    168.00      0.00 5238560 3180964   0.60  ceph-osd
13:39:32      167        39   4155.00      0.00 5238560 3192220   0.60  ceph-osd
13:39:33      167        39  16519.00      0.00 5238560 3196880   0.61  ceph-osd
13:39:34      167        39    493.00      0.00 5238560 3181448   0.60  ceph-osd
13:39:35      167        39  12038.00      0.00 5238560 3204436   0.61  ceph-osd
13:39:36      167        39   5057.00      0.00 5238560 3181728   0.60  ceph-osd
13:39:37      167        39   4273.00      0.00 5238560 3197488   0.61  ceph-osd
13:39:38      167        39   2989.00      0.00 5238560 3183068   0.60  ceph-osd
13:39:39      167        39   2379.00      0.00 5238560 3183604   0.60  ceph-osd
13:39:40      167        39   3536.00      0.00 5238560 3187676   0.60  ceph-osd
13:39:41      167        39   8212.00      0.00 5238560 3185692   0.60  ceph-osd
13:39:42      167        39    803.00      0.00 5238560 3185140   0.60  ceph-osd
13:39:43      167        39   3699.00      0.00 5238560 3184828   0.60  ceph-osd
13:39:44      167        39   6535.00      0.00 5238560 3184524   0.60  ceph-osd
13:39:45      167        39    349.00      0.00 5238560 3185384   0.60  ceph-osd
13:39:46      167        39   7476.00      0.00 5238560 3192228   0.60  ceph-osd
13:39:47      167        39   1294.00      0.00 5238560 3184652   0.60  ceph-osd
13:39:48      167        39   9576.00      0.00 5238560 3184756   0.60  ceph-osd
13:39:49      167        39   2484.00      0.00 5238560 3188288   0.60  ceph-osd
13:39:50      167        39  12107.00      0.00 5238560 3216644   0.61  ceph-osd
13:39:51      167        39   6238.00      0.00 5238560 3197620   0.61  ceph-osd
13:39:52      167        39   1791.00      0.00 5238560 3186704   0.60  ceph-osd
13:39:53      167        39   5636.00      0.00 5238560 3196064   0.61  ceph-osd
13:39:54      167        39    175.00      0.00 5238560 3186432   0.60  ceph-osd
13:39:55      167        39    606.00      0.00 5238560 3186968   0.60  ceph-osd
13:39:56      167        39   4737.00      0.00 5238560 3187940   0.60  ceph-osd
13:39:57      167        39    214.00      0.00 5238560 3187364   0.60  ceph-osd
13:39:58      167        39   8543.00      0.00 5238560 3205596   0.61  ceph-osd
13:39:59      167        39   7011.00      0.00 5238560 3187972   0.60  ceph-osd
13:40:00      167        39   6673.00      0.00 5238560 3193500   0.61  ceph-osd
13:40:01      167        39  15690.00      0.00 5238560 3195520   0.61  ceph-osd
13:40:02      167        39   5459.00      0.00 5238560 3205512   0.61  ceph-osd
13:40:03      167        39   8159.00      0.00 5238560 3225232   0.61  ceph-osd
13:40:04      167        39   5601.00      0.00 5238560 3237804   0.61  ceph-osd
13:40:05      167        39   5106.00      0.00 5238560 3232152   0.61  ceph-osd
13:40:06      167        39   5089.00      0.00 5238560 3235572   0.61  ceph-osd
13:40:07      167        39   7844.00      0.00 5238560 3232300   0.61  ceph-osd
13:40:08      167        39   4734.00      0.00 5238560 3238236   0.61  ceph-osd
13:40:09      167        39    568.00      0.00 5238560 3229796   0.61  ceph-osd

Actions

Also available in: Atom PDF