Project

General

Profile

Actions

Bug #57655

open

qa: fs:mixed-clients kernel_untar_build failure

Added by Patrick Donnelly over 1 year ago. Updated 5 months ago.

Status:
Pending Backport
Priority:
Immediate
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-09-12T12:12:00.425 INFO:tasks.workunit.client.1.smithi176.stderr:fs/compat.o: warning: objtool: missing symbol for section .text
2022-09-12T12:12:00.487 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/binfmt_misc.o
2022-09-12T12:12:00.842 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/binfmt_script.o
2022-09-12T12:12:00.980 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/binfmt_elf.o
2022-09-12T12:12:01.273 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/compat_binfmt_elf.o
2022-09-12T12:12:01.278 INFO:tasks.workunit.client.1.smithi176.stdout:  AR      kernel/built-in.a
2022-09-12T12:12:01.714 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/mbcache.o
2022-09-12T12:12:01.739 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/posix_acl.o
2022-09-12T12:12:01.742 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/coredump.o
2022-09-12T12:12:01.777 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/drop_caches.o
2022-09-12T12:12:01.795 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/fhandle.o
2022-09-12T12:12:02.186 INFO:tasks.workunit.client.1.smithi176.stdout:  CC      fs/dcookies.o
2022-09-12T12:12:02.982 INFO:tasks.workunit.client.1.smithi176.stderr:fs/dcookies.o: warning: objtool: missing symbol for section .text
2022-09-12T12:12:02.999 INFO:tasks.workunit.client.1.smithi176.stdout:  AR      fs/built-in.a
2022-09-12T12:12:03.195 DEBUG:teuthology.orchestra.run:got remote process result: 2
2022-09-12T12:12:03.196 INFO:tasks.workunit:Stopping ['kernel_untar_build.sh'] on client.1...

Seen: /ceph/teuthology-archive/dparmar-2022-09-12_11:38:14-fs:mixed-clients-main-distro-default-smithi/7029223/teuthology.log

and more recently: /ceph/teuthology-archive/pdonnell-2022-09-22_12:22:37-fs-wip-pdonnell-testing-20220920.234701-distro-default-smithi/7041086/teuthology.log


Related issues 3 (1 open2 closed)

Copied to CephFS - Backport #63588: pacific: qa: fs:mixed-clients kernel_untar_build failureResolvedMilind ChangireActions
Copied to CephFS - Backport #63589: quincy: qa: fs:mixed-clients kernel_untar_build failureIn ProgressMilind ChangireActions
Copied to CephFS - Backport #63590: reef: qa: fs:mixed-clients kernel_untar_build failureResolvedMilind ChangireActions
Actions #1

Updated by Patrick Donnelly over 1 year ago

  • Related to Bug #57280: qa: tasks/kernel_cfuse_workunits_untarbuild_blogbench fails - Failed to fetch package version from shaman added
Actions #2

Updated by Venky Shankar over 1 year ago

  • Related to deleted (Bug #57280: qa: tasks/kernel_cfuse_workunits_untarbuild_blogbench fails - Failed to fetch package version from shaman)
Actions #3

Updated by Venky Shankar over 1 year ago

  • Category set to Correctness/Safety
  • Status changed from New to Triaged
  • Assignee set to Milind Changire
  • Backport set to pacific,quincy
Actions #7

Updated by Venky Shankar 12 months ago

  • Target version changed from v18.0.0 to v19.0.0
  • Backport changed from pacific,quincy to reef,quincy,pacific

- https://pulpito.ceph.com/vshankar-2023-05-12_08:25:27-fs-wip-vshankar-testing-20230509.090020-1-testing-default-smithi/7272427/

Milind, do we know if this (kernel build failure) is a compiler related things or an issue (bug) in cephfs in the IO path?

Actions #8

Updated by Kotresh Hiremath Ravishankar 11 months ago

Milind, PTAL at the discussion in https://tracker.ceph.com/issues/59342 if it helps. I closed that as duplicate of this.

Actions #11

Updated by Milind Changire 11 months ago

kernel_untar_build.sh test passes with latest code (HEAD 17f4abe9c9c) in the main branch
Xiubo pointed that a revert PR helps with that

so we just need to get this backported

Actions #15

Updated by Venky Shankar 8 months ago

/a/https://pulpito.ceph.com/vshankar-2023-09-12_06:47:30-fs-wip-vshankar-testing-20230908.065909-testing-default-smithi/7394622/

This time with a slightly different failure (but along the same lines):

2023-09-12T08:12:53.754 INFO:tasks.workunit.client.0.smithi064.stderr:drivers/scsi/sr.o: In function `sr_block_check_events':
2023-09-12T08:12:53.755 INFO:tasks.workunit.client.0.smithi064.stderr:sr.c:(.text+0x1062): undefined reference to `cdrom_check_events'
2023-09-12T08:12:53.755 INFO:tasks.workunit.client.0.smithi064.stderr:drivers/scsi/sr.o: In function `sr_block_revalidate_disk':
2023-09-12T08:12:53.755 INFO:tasks.workunit.client.0.smithi064.stderr:sr.c:(.text+0x11f2): undefined reference to `cdrom_get_last_written'
2023-09-12T08:12:53.755 INFO:tasks.workunit.client.0.smithi064.stderr:drivers/scsi/sr_ioctl.o: In function `sr_drive_status':
2023-09-12T08:12:53.755 INFO:tasks.workunit.client.0.smithi064.stderr:sr_ioctl.c:(.text+0x6b0): undefined reference to `cdrom_get_media_event'
2023-09-12T08:12:54.083 INFO:tasks.workunit.client.0.smithi064.stderr:make: *** [Makefile:1077: vmlinux] Error 1
Actions #17

Updated by Patrick Donnelly 6 months ago

  • Priority changed from Normal to Immediate
2023-10-26T06:08:42.898 INFO:tasks.workunit.client.1.smithi155.stderr:ld: arch/x86/boot/compressed/pgtable_64.o:(.bss+0x0): multiple definition of `__force_order'; arch/x86/boot/compressed/kaslr_64.o:(.bss+0x0): first defined here
2023-10-26T06:08:42.902 INFO:tasks.workunit.client.1.smithi155.stderr:ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
2023-10-26T06:08:42.931 INFO:tasks.workunit.client.1.smithi155.stderr:ld: warning: creating DT_TEXTREL in a PIE
2023-10-26T06:08:42.936 INFO:tasks.workunit.client.1.smithi155.stderr:make[2]: *** [arch/x86/boot/compressed/Makefile:118: arch/x86/boot/compressed/vmlinux] Error 1
2023-10-26T06:08:42.937 INFO:tasks.workunit.client.1.smithi155.stderr:make[1]: *** [arch/x86/boot/Makefile:112: arch/x86/boot/compressed/vmlinux] Error 2

/teuthology/pdonnell-2023-10-26_05:21:22-fs-wip-batrick-testing-20231024.144545-distro-default-smithi/7438447/teuthology.log

This one is pretty scary. I'm raising priority.

Actions #18

Updated by Milind Changire 6 months ago

Patrick Donnelly wrote:

[...]

/teuthology/pdonnell-2023-10-26_05:21:22-fs-wip-batrick-testing-20231024.144545-distro-default-smithi/7438447/teuthology.log

This one is pretty scary. I'm raising priority.

Can we replace the kernel source tarball with a newer one ?
... because I did find multiple definition of the variable unsigned long __force_order ... one in arch/x86/boot/compressed/pgtable_64.c and the other in arch/x86/boot/compressed/kaslr_64.c as the per the error dumped during the kernel build

e.g. there's no multiple definition of this variable in the testing kernel sources

Actions #19

Updated by Patrick Donnelly 6 months ago

Milind Changire wrote:

Patrick Donnelly wrote:

[...]

/teuthology/pdonnell-2023-10-26_05:21:22-fs-wip-batrick-testing-20231024.144545-distro-default-smithi/7438447/teuthology.log

This one is pretty scary. I'm raising priority.

Can we replace the kernel source tarball with a newer one ?
... because I did find multiple definition of the variable unsigned long __force_order ... one in arch/x86/boot/compressed/pgtable_64.c and the other in arch/x86/boot/compressed/kaslr_64.c as the per the error dumped during the kernel build

e.g. there's no multiple definition of this variable in the testing kernel sources

You can try; maybe this is related to centos09 but I was worried there was some kind of corruption.

Just update the workunit to clone some recent linux tag and run 1 job to test. Also double-check this is 100% reproducible only for centos9.

Actions #20

Updated by Milind Changire 6 months ago

Patrick Donnelly wrote:

Milind Changire wrote:

Patrick Donnelly wrote:

[...]

/teuthology/pdonnell-2023-10-26_05:21:22-fs-wip-batrick-testing-20231024.144545-distro-default-smithi/7438447/teuthology.log

This one is pretty scary. I'm raising priority.

Can we replace the kernel source tarball with a newer one ?
... because I did find multiple definition of the variable unsigned long __force_order ... one in arch/x86/boot/compressed/pgtable_64.c and the other in arch/x86/boot/compressed/kaslr_64.c as the per the error dumped during the kernel build

e.g. there's no multiple definition of this variable in the testing kernel sources

You can try; maybe this is related to centos09 but I was worried there was some kind of corruption.

Just update the workunit to clone some recent linux tag and run 1 job to test. Also double-check this is 100% reproducible only for centos9.

centos9 job failed with the exact 'multiple definition' error
ubuntu 22.04 job failed as well, but the error is not obvious from the teuthology logs

so this is 100% reproducible for centos9 and ubuntu 22.04 as well

we have a successful rhel_8 job
unfortunately, I couldn't find a filter to specifically launch a centos9 job

Actions #21

Updated by Venky Shankar 6 months ago

Milind Changire wrote:

Patrick Donnelly wrote:

Milind Changire wrote:

Patrick Donnelly wrote:

[...]

/teuthology/pdonnell-2023-10-26_05:21:22-fs-wip-batrick-testing-20231024.144545-distro-default-smithi/7438447/teuthology.log

This one is pretty scary. I'm raising priority.

Can we replace the kernel source tarball with a newer one ?
... because I did find multiple definition of the variable unsigned long __force_order ... one in arch/x86/boot/compressed/pgtable_64.c and the other in arch/x86/boot/compressed/kaslr_64.c as the per the error dumped during the kernel build

e.g. there's no multiple definition of this variable in the testing kernel sources

You can try; maybe this is related to centos09 but I was worried there was some kind of corruption.

Just update the workunit to clone some recent linux tag and run 1 job to test. Also double-check this is 100% reproducible only for centos9.

centos9 job failed with the exact 'multiple definition' error
ubuntu 22.04 job failed as well, but the error is not obvious from the teuthology logs

so this is 100% reproducible for centos9 and ubuntu 22.04 as well

we have a successful rhel_8 job
unfortunately, I couldn't find a filter to specifically launch a centos9 job

Are you saying it was an issue with the kernel tarball the test was fetching and using a latest tarball gives a clean build?

EDIT: Since the current tarball fails build in other distros, then we could update our tests to fetch the latest tarball to be used.

Actions #22

Updated by Milind Changire 6 months ago

  • Pull request ID set to 54414

Venky Shankar wrote:

Milind Changire wrote:

Patrick Donnelly wrote:

Milind Changire wrote:

Patrick Donnelly wrote:

[...]

/teuthology/pdonnell-2023-10-26_05:21:22-fs-wip-batrick-testing-20231024.144545-distro-default-smithi/7438447/teuthology.log

This one is pretty scary. I'm raising priority.

Can we replace the kernel source tarball with a newer one ?
... because I did find multiple definition of the variable unsigned long __force_order ... one in arch/x86/boot/compressed/pgtable_64.c and the other in arch/x86/boot/compressed/kaslr_64.c as the per the error dumped during the kernel build

e.g. there's no multiple definition of this variable in the testing kernel sources

You can try; maybe this is related to centos09 but I was worried there was some kind of corruption.

Just update the workunit to clone some recent linux tag and run 1 job to test. Also double-check this is 100% reproducible only for centos9.

centos9 job failed with the exact 'multiple definition' error
ubuntu 22.04 job failed as well, but the error is not obvious from the teuthology logs

so this is 100% reproducible for centos9 and ubuntu 22.04 as well

we have a successful rhel_8 job
unfortunately, I couldn't find a filter to specifically launch a centos9 job

Are you saying it was an issue with the kernel tarball the test was fetching and using a latest tarball gives a clean build?

EDIT: Since the current tarball fails build in other distros, then we could update our tests to fetch the latest tarball to be used.

yes, using a newer tarball fixed the build issue
btw, the old tarball wasn't corrupted, it just a had a 'multiple deinition' of a symbol
I've added the PR number as well

Actions #25

Updated by Backport Bot 5 months ago

  • Copied to Backport #63588: pacific: qa: fs:mixed-clients kernel_untar_build failure added
Actions #26

Updated by Backport Bot 5 months ago

  • Copied to Backport #63589: quincy: qa: fs:mixed-clients kernel_untar_build failure added
Actions #27

Updated by Backport Bot 5 months ago

  • Copied to Backport #63590: reef: qa: fs:mixed-clients kernel_untar_build failure added
Actions #28

Updated by Backport Bot 5 months ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF