Project

General

Profile

Actions

Bug #64748

closed

reef: snaptest-git-ceph.sh failure

Added by Venky Shankar about 2 months ago. Updated about 2 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

- /a/vshankar-2024-03-05_07:34:10-fs-wip-vshankar-testing1-reef-2024-03-05-1017-testing-default-smithi/7582083
- /a/vshankar-2024-03-05_07:34:10-fs-wip-vshankar-testing1-reef-2024-03-05-1017-testing-default-smithi/7582140

First job has the following failure

2024-03-05T10:07:24.695 INFO:tasks.workunit.client.0.smithi100.stderr:error: RPC failed; curl 18 transfer closed with outstanding read data remaining
2024-03-05T10:07:24.695 INFO:tasks.workunit.client.0.smithi100.stderr:fetch-pack: unexpected disconnect while reading sideband packet
2024-03-05T10:07:24.697 INFO:tasks.workunit.client.0.smithi100.stderr:fatal: early EOF
2024-03-05T10:07:24.698 INFO:tasks.workunit.client.0.smithi100.stderr:fatal: fetch-pack: invalid index-pack output

While the second one has

2024-03-05T11:35:55.301 INFO:tasks.workunit.client.0.smithi120.stderr:Updating files:   2% (279/12313)^MUpdating files:   3% (370/12313)^MUpdating files:   4% (493/12313)^MUpdating files:   4% (515/12313)^MUpdating files:   5% (616/12313)^MUpdat
ing files:   5% (714/12313)^MUpdating files:   6% (739/12313)^MUpdating files:   6% (749/12313)^MUpdating files:   6% (821/12313)^MUpdating files:   7% (862/12313)^MUpdating files:   8% (986/12313)^MUpdating files:   8% (1025/12313)^MUpdating fi
les:   9% (1109/12313)^MUpdating files:   9% (1218/12313)^MUpdating files:  10% (1232/12313)^MUpdating files:  11% (1355/12313)^MUpdating files:  11% (1467/12313)^MUpdating files:  12% (1478/12313)^MUpdating files:  12% (1503/12313)^MUpdating fi
les:  12% (1560/12313)^MUpdating files:  13% (1601/12313)^MUpdating files:  14% (1724/12313)^MUpdating files:  14% (1832/12313)^MUpdating files:  15% (1847/12313)^MUpdating files:  16% (1971/12313)^MUpdating files:  16% (2012/12313)^MUpdating fi
les:  16% (2085/12313)^MUpdating files:  17% (2094/12313)^MUpdating files:  17% (2146/12313)^MUpdating files:  17% (2161/12313)^MUpdating files:  18% (2217/12313)^MUpdating files:  18% (2249/12313)^MUpdating files:  19% (2340/12313)^MUpdating fi
les:  19% (2369/12313)^MUpdating files:  20% (2463/12313)^MUpdating files:  20% (2585/12313)^MUpdating files:  21% (2586/12313)^MUpdating files:  22% (2709/12313)^MUpdating files:  22% (2761/12313)^MUpdating files:  23% (2832/12313)^MUpdating fi
les:  23% (2891/12313)^MUpdating files:  24% (2956/12313)^MUpdating files:  24% (2983/12313)^MUpdating files:  24% (3008/12313)^MUpdating files:  25% (3079/12313)^MUpdating files:  26% (3202/12313)^MUpdating files:  26% (3229/12313)^MUpdating fi
les:  26% (3230/12313)^MUpdating files:  26% (3231/12313)^MUpdating files:  27% (3325/12313)^MUpdating files:  27% (3335/12313)^MUpdating files:  28% (3448/12313)^MUpdating files:  29% (3571/12313)^MUpdating files:  29% (3683/12313)^MUpdating fi
les:  30% (3694/12313)^MUpdating files:  31% (3818/12313)^MUpdating files:  32% (3941/12313)^MUpdating files:  32% (3952/12313)^MUpdating files:  32% (3982/12313)^MUpdating files:  32% (4039/12313)^MUpdating files:  33% (4064/12313)^MUpdating fi
les:  33% (4175/12313)^MUpdating files:  34% (4187/12313)^MUpdating files:  34% (4266/12313)^MUpdating files:  34% (4282/12313)^MUpdating files:  35% (4310/12313)^MUpdating files:  36% (4433/12313)^MUpdating files:  36% (4453/12313)^MUpdating fi
les:  37% (4556/12313)^MUpdating files:  37% (4650/12313)^MUpdating files:  38% (4679/12313)^MUpdating files:  39% (4803/12313)^MUpdating files:  39% (4831/12313)^MUpdating files:  40% (4926/12313)^MUpdating files:  41% (5049/12313)^MUpdating fi
les:  41% (5103/12313)^MUpdating files:  42% (5172/12313)^MUpdating files:  42% (5231/12313)^MUpdating files:  43% (5295/12313)^MUpdating files:  43% (5296/12313)^MUpdating files:  44% (5418/12313)^MUpdating files:  45% (5541/12313)^MUpdating fi
les:  46% (5664/12313)^MUpdating files:  46% (5684/12313)^MUpdating files:  47% (5788/12313)^MUpdating files:  47% (5806/12313)^MUpdating files:  48% (5911/12313)^MUpdating files:  49% (6034/12313)^MUpdating files:  50% (6157/12313)^MUpdating fi
les:  50% (6238/12313)^MUpdating files:  51% (6280/12313)^MUpdating files:  51% (6382/12313)^MUpdating files:  52% (6403/12313)^MUpdating files:  52% (6440/12313)^MUpdating files:  52% (6461/12313)^MUpdating files:  53% (6526/12313)^MUpdating fi
les:  53% (6607/12313)^MUpdating files:  53% (6612/12313)^MUpdating files:  54% (6650/12313)^MUpdating files:  55% (6773/12313)^MUpdating files:  55% (6855/12313)^MUpdating files:  56% (6896/12313)^MUpdating files:  57% (7019/12313)^MUpdating fi
les:  58% (7142/12313)^MUpdating files:  58% (7207/12313)^MUpdating files:  59% (7265/12313)^MUpdating files:  60% (7388/12313)^MUpdating files:  60% (7498/12313)^MUpdating files:  61% (7511/12313)^MUpdating files:  61% (7526/12313)^MUpdating fi
les:  61% (7545/12313)^MUpdating files:  62% (7635/12313)^MUpdating files:  62% (7664/12313)^MUpdating files:  62% (7750/12313)^MUpdating files:  63% (7758/12313)^MUpdating files:  64% (7881/12313)^MUpdating files:  64% (7952/12313)^MUpdating fi
les:  65% (8004/12313)^MUpdating files:  65% (8087/12313)^MUpdating files:  66% (8127/12313)^MUpdating files:  67% (8250/12313)^MUpdating files:  67% (8290/12313)^MUpdating files:  68% (8373/12313)^MUpdating files:  68% (8468/12313)^MUpdating fi
les:  69% (8496/12313)^MUpdating files:  70% (8620/12313)^MUpdating files:  70% (8657/12313)^MUpdating files:  70% (8681/12313)^MUpdating files:  71% (8743/12313)^MUpdating files:  71% (8798/12313)^MUpdating files:  72% (8866/12313)^MUpdating fi
les:  73% (8989/12313)^MUpdating files:  73% (9008/12313)^MUpdating files:  74% (9112/12313)^MUpdating files:  74% (9117/12313)^MUpdating files:  74% (9188/12313)^MUpdating files:  75% (9235/12313)^MUpdating files:  76% (9358/12313)^MUpdating fi
les:  76% (9370/12313)^MUpdating files:  77% (9482/12313)^MUpdating files:  78% (9605/12313)^MUpdating files:  79% (9728/12313)^MUpdating files:  79% (9792/12313)^MUpdating files:  80% (9851/12313)^MUpdating files:  80% (9874/12313)^MUpdating fi
les:  81% (9974/12313)^MUpdating files:  82% (10097/12313)^MUpdating files:  83% (10220/12313)^MUpdating files:  83% (10298/12313)^MUpdating files:  84% (10343/12313)^MUpdating files:  84% (10402/12313)^MUpdating files:  84% (10407/12313)^MUpdat
ing files:  84% (10441/12313)^MUpdating files:  85% (10467/12313)^MUpdating files:  86% (10590/12313)^MUpdating files:  86% (10592/12313)^MUpdating files:  86% (10595/12313)^MUpdating files:  86% (10668/12313)^MUpdating files:  87% (10713/12313)
^MUpdating files:  88% (10836/12313)^MUpdating files:  88% (10875/12313)^MUpdating files:  89% (10959/12313)^MUpdating files:  89% (11054/12313)^MUpdating files:  90% (11082/12313)^MUpdating files:  90% (11105/12313)^MUpdating files:  91% (11205
/12313)^MUpdating files:  91% (11233/12313)^MUpdating files:  92% (11328/12313)^MUpdating files:  92% (11367/12313)^MUpdating files:  92% (11384/12313)^MUpdating files:  93% (11452/12313)^MUpdating files:  93% (11514/12313)^MUpdating files:  93%
 (11525/12313)^MUpdating files:  94% (11575/12313)^MUpdating files:  94% (11685/12313)^MUpdating files:  95% (11698/12313)^MUpdating files:  96% (11821/12313)^MUpdating files:  96% (11836/12313)^MUpdating files:  97% (11944/12313)^MUpdating file
s:  97% (11981/12313)^MUpdating files:  98% (12067/12313)^MUpdating files:  98% (12131/12313)^MUpdating files:  99% (12190/12313)^MUpdating files: 100% (12313/12313)^MUpdating files: 100% (12313/12313), done.
2024-03-05T11:35:55.302 DEBUG:teuthology.orchestra.run.smithi158:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.7.asok dump_ops_in_flight
2024-03-05T11:35:55.302 INFO:tasks.ceph.osd.6.smithi158.stderr:2024-03-05T11:35:55.299+0000 7f9c6b20e700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 6  (PID: 89081) UID: 0
2024-03-05T11:35:55.322 DEBUG:teuthology.orchestra.run:got remote process result: 128
2024-03-05T11:35:55.322 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...

No coredumps, but the kernel ring buffer shows:

2024-03-05T11:27:44.304184+00:00 smithi120 kernel: ceph: sync getxattr 00000000df3c3b41 during filling trace
2024-03-05T11:27:44.304276+00:00 smithi120 kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 16 for dev=ceph ino=1099511628279
2024-03-05T11:27:44.329101+00:00 smithi120 kernel: libceph: osd5 primary-affinity 0xb9f0
2024-03-05T11:27:44.744662+00:00 smithi120 kernel: ceph: sync getxattr 0000000073eef473 during filling trace
2024-03-05T11:27:44.744736+00:00 smithi120 kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 16 for dev=ceph ino=1099511628281
2024-03-05T11:27:44.796822+00:00 smithi120 kernel: ceph: sync getxattr 00000000c3d9dfae during filling trace
2024-03-05T11:27:44.796895+00:00 smithi120 kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 16 for dev=ceph ino=1099511628281
2024-03-05T11:27:44.974044+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:45.305045+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:49.572057+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:49.848044+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:50.115068+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:50.514082+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:52.407051+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:52.733746+00:00 smithi120 kernel: ceph: sync getxattr 000000009ce92dd3 during filling trace
2024-03-05T11:27:52.733854+00:00 smithi120 kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 16 for dev=ceph ino=1099511628286
2024-03-05T11:27:52.760047+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:53.939103+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:54.226045+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:57.788133+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:58.430047+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:59.318047+00:00 smithi120 kernel: libceph: mds0 (1)172.21.15.158:6837 socket closed (con state OPEN)
2024-03-05T11:27:59.627193+00:00 smithi120 kernel: ceph: sync getxattr 000000001da0a228 during filling trace
2024-03-05T11:27:59.627250+00:00 smithi120 kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 16 for dev=ceph ino=1099511628287
2024-03-05T11:27:59.667451+00:00 smithi120 kernel: ceph: sync getxattr 000000008c47fcaa during filling trace
2024-03-05T11:27:59.667533+00:00 smithi120 kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 16 for dev=ceph ino=1099511628299
2024-03-05T11:27:59.708028+00:00 smithi120 kernel: ceph: sync getxattr 000000007d19aa90 during filling trace
2024-03-05T11:27:59.708116+00:00 smithi120 kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 16 for dev=ceph ino=1099511628311

Xiubo, are the messages in the kernel ring buffer telling us anything?


Related issues 1 (0 open1 closed)

Has duplicate CephFS - Bug #59413: cephfs: qa snaptest-git-ceph.sh failed with "got remote process result: 128"DuplicateXiubo Li

Actions
Actions #1

Updated by Xiubo Li about 2 months ago

Venky,

These two failures both caused by the EOF issue and there has a existing tracker for this and please see https://tracker.ceph.com/issues/59413, we have a fix for it and but the backport PR hasn't be merged to reef yet.

This kernel log is just a warning and the call will be retried and won't cause failure:

2024-03-05T11:27:52.733746+00:00 smithi120 kernel: ceph: sync getxattr 000000009ce92dd3 during filling trace

While for the following one also is okay because the caller won't do anything when it fails:

2024-03-05T11:27:52.733854+00:00 smithi120 kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 16 for dev=ceph ino=1099511628286

Checked the kernel logs I didn't see other errors relate to this.

So let's merge the reef backport first and then try to see could we see it again. Make sense ?

Thanks
- Xiubo

Actions #2

Updated by Venky Shankar about 2 months ago

  • Has duplicate Bug #59413: cephfs: qa snaptest-git-ceph.sh failed with "got remote process result: 128" added
Actions #3

Updated by Venky Shankar about 2 months ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF