Project

General

Profile

Actions

Bug #11139

closed

"Exception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run

Added by Yuri Weinstein about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
firefly
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/teuthology-2015-03-16_17:13:01-upgrade:firefly-x-hammer-distro-basic-multi/
Job: 806778
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-16_17:13:01-upgrade:firefly-x-hammer-distro-basic-multi/806778/

2015-03-17T01:36:32.442 INFO:tasks.rados:joining rados
2015-03-17T01:36:32.442 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-03-17T01:36:32.442 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2015-03-17T01:36:32.443 INFO:tasks.thrashosds:joining thrashosds
2015-03-17T01:36:32.443 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/thrashosds.py", line 174, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/ceph_manager.py", line 418, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: ceph-objectstore-tool: import failure with status 1

Related issues 3 (0 open3 closed)

Related to Ceph - Bug #11513: "ceph-objectstore-tool: import failure with status 139" in upgrade:firefly-x-hammer-distro-basic-vps runDuplicate04/30/2015

Actions
Blocks Ceph - Bug #11156: FAILED assert(soid < scrubber.start || soid >= scrubber.end)ResolvedSamuel Just03/18/2015

Actions
Copied to Ceph - Bug #11142: "Exception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run (TCMALLOC)Can't reproduceDavid Zafman03/17/2015

Actions
Actions #1

Updated by Yuri Weinstein about 9 years ago

  • Subject changed from "xception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run to "Exception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run
Actions #2

Updated by Yuri Weinstein about 9 years ago

This looks similar with segmentation fault info.

Run: http://pulpito.ceph.com/teuthology-2015-03-16_17:00:02-upgrade:firefly-firefly-distro-basic-vps/
Job: 806521
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-16_17:00:02-upgrade:firefly-firefly-distro-basic-vps/806521/teuthology.log

2015-03-16T19:46:46.363 INFO:teuthology.orchestra.run.vpm037.stdout:remove_coll FORREMOVAL_0_2.15
2015-03-16T19:46:46.364 INFO:teuthology.orchestra.run.vpm037.stdout:Remove successful
2015-03-16T19:46:46.370 INFO:teuthology.orchestra.run.vpm037:Running: 'sudo ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1 --journal-path /var/lib/ceph/osd/ceph-1/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op import --file /home/ubuntu/cephtest/data/exp.2.15.1'
2015-03-16T19:46:46.412 INFO:teuthology.orchestra.run.vpm037.stdout:Importing pgid 2.15
2015-03-16T19:46:46.417 INFO:teuthology.orchestra.run.vpm037.stdout:Import successful
2015-03-16T19:46:46.421 INFO:teuthology.orchestra.run.vpm037.stderr:*** Caught signal (Segmentation fault) **
2015-03-16T19:46:46.422 INFO:teuthology.orchestra.run.vpm037.stderr: in thread 7fb4dedf3780
2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: ceph version 0.80.9-183-g493d285 (493d285508914769cba3639b601ae6c20303af0d)
2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: 1: ceph-objectstore-tool() [0x9c0e7a]
2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: 2: (()+0xfcb0) [0x7fb4de320cb0]
2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 3: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x93) [0x7fb4de99ede3]
2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 4: (tcmalloc::ThreadCache::Scavenge()+0x44) [0x7fb4de99f0b4]
2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 5: (tc_free()+0x24b) [0x7fb4de9aae8b]
2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 6: (()+0x3b618) [0x7fb4dba9f618]
2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 7: (()+0x3b635) [0x7fb4dba9f635]
2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 8: (__libc_start_main()+0xf4) [0x7fb4dba85774]
2015-03-16T19:46:46.450 INFO:teuthology.orchestra.run.vpm037.stderr: 9: ceph-objectstore-tool() [0x616699]
2015-03-16T19:53:30.847 INFO:tasks.workunit.client.0.vpm071.stdout:stopping iogen
2015-03-16T19:53:31.293 INFO:tasks.workunit.client.0.vpm071.stdout:OK
2015-03-16T19:53:35.707 INFO:teuthology.orchestra.run.vpm071:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp'
2015-03-16T19:53:36.930 INFO:tasks.workunit:Stopping suites/iogen.sh on client.0...
2015-03-16T19:53:36.930 INFO:teuthology.orchestra.run.vpm071:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2015-03-16T19:53:37.028 DEBUG:teuthology.parallel:result is None
2015-03-16T19:53:37.028 INFO:tasks.ceph_fuse:Unmounting ceph-fuse clients...
2015-03-16T19:53:37.028 INFO:teuthology.orchestra.run.vpm071:Running: 'sudo fusermount -u /home/ubuntu/cephtest/mnt.0'
2015-03-16T19:53:37.197 INFO:tasks.ceph_fuse.ceph-fuse.0.vpm071.stderr:ceph-fuse[22581]: fuse finished with error 0
2015-03-16T19:53:46.142 INFO:teuthology.orchestra.run.vpm071:Running: 'rmdir -- /home/ubuntu/cephtest/mnt.0'
2015-03-16T19:53:46.156 INFO:tasks.thrashosds:joining thrashosds
2015-03-16T19:53:46.157 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 53, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 55, in task
    mgr.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 353, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: ceph-objectstore-tool: import failure with status 139
2015-03-16T19:53:46.252 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/search?q=7a641f494d7046c6ab74780edbf191a2
Exception: ceph-objectstore-tool: import failure with status 139
2015-03-16T19:53:46.252 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade
Actions #3

Updated by David Zafman about 9 years ago

The original problem description looks to be a testing issue. This was masked before the tool's firefly backport because the command name change causing us not to use the tool on the older release. The cause of the non-zero exit status from ceph-objectstore-tool --op import was the following:

2015-03-17T01:21:47.804 INFO:teuthology.orchestra.run.plana75.stderr:Export has incompatible features set compat={},rocompat={},incompat={12=transaction hints,13=pg meta object}

I didn't verify this, but I assume the export could have occurred while firefly was installed and the import occurred after upgrading to hammer where incompatible features are set. This is the proper operation of the tool. With further analysis it could be determined that the import would have gone just fine, but we'd have to special case something like that.

Actions #4

Updated by David Zafman about 9 years ago

Looking backwards in the log I see that this was caused because 2 machines were running different versions and we were moving a PG from one machine to another using export/import. I could imagine a customer during an upgrade having
a machine go south. In that case they might need to recover a PG from the non-upgraded machine. It would be nice if we figured out a way to deal better with incompatible features, only when they aren't a problem for import. For actual repairs we could have a --force flag that ignores the feature incompatibility. This would need to warn the user about possible corruption.

Actions #5

Updated by David Zafman about 9 years ago

Created #11142 to cover the TCMALLOC segmentation fault variant.

Actions #6

Updated by David Zafman about 9 years ago

  • Assignee set to David Zafman

I'm going to have a special exit code and fix the thrashosds to check for that.

Actions #8

Updated by David Zafman about 9 years ago

  • Status changed from New to In Progress
Actions #10

Updated by David Zafman about 9 years ago

  • Status changed from In Progress to Resolved
Actions #11

Updated by Loïc Dachary about 9 years ago

  • Status changed from Resolved to In Progress
  • Backport set to firefly

I'm going to have a special exit code and fix the thrashosds to check for that.

Could you please add a URL to the pull request / tracker issue that deals with this (and re-close this one) ? I suppose the following error I get is what it is about.

http://pulpito.ceph.com/loic-2015-03-27_08:42:59-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi/824326/

2015-03-27T01:26:43.755 INFO:teuthology.orchestra.run.burnupi35.stdout:finish_remove_pgs 4.0_TEMP clearing temp
2015-03-27T01:26:43.774 INFO:teuthology.orchestra.run.burnupi35.stderr:Export has incompatible features set compat={},rocompat={},incompat={12=transaction hints,13=pg meta object}
2015-03-27T01:26:43.810 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 632, in wrapper
    return func(self)
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 672, in do_thrash
    self.choose_action()()
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 274, in kill_osd
    format(ret=proc.exitstatus))
Exception: ceph-objectstore-tool: import failure with status 11

Actions #13

Updated by David Zafman about 9 years ago

  • Status changed from In Progress to Pending Backport

Pull requests in hammer branch:
https://github.com/ceph/ceph/pull/4128
https://github.com/ceph/ceph-qa-suite/pull/374

Pull requests in firefly branch
https://github.com/ceph/ceph/pull/4129

I found that we hadn't back ported the both parts for firefly
https://github.com/ceph/ceph-qa-suite/pull/394

Actions #14

Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF