Project

General

Profile

Bug #11139

"Exception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run

Added by Yuri Weinstein over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
03/17/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
firefly
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Run: http://pulpito.ceph.com/teuthology-2015-03-16_17:13:01-upgrade:firefly-x-hammer-distro-basic-multi/
Job: 806778
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-16_17:13:01-upgrade:firefly-x-hammer-distro-basic-multi/806778/

2015-03-17T01:36:32.442 INFO:tasks.rados:joining rados
2015-03-17T01:36:32.442 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-03-17T01:36:32.442 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2015-03-17T01:36:32.443 INFO:tasks.thrashosds:joining thrashosds
2015-03-17T01:36:32.443 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/thrashosds.py", line 174, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/ceph_manager.py", line 418, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: ceph-objectstore-tool: import failure with status 1

Related issues

Related to Ceph - Bug #11513: "ceph-objectstore-tool: import failure with status 139" in upgrade:firefly-x-hammer-distro-basic-vps run Duplicate 04/30/2015
Blocks Ceph - Bug #11156: FAILED assert(soid < scrubber.start || soid >= scrubber.end) Resolved 03/18/2015
Copied to Ceph - Bug #11142: "Exception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run (TCMALLOC) Can't reproduce 03/17/2015

Associated revisions

Revision 175aff8a (diff)
Added by David Zafman over 3 years ago

ceph-objectstore-tool: Use exit status 11 for incompatible import attempt

This is used so upgrade testing doesn't generate false failure.
Fixes: #11139

Signed-off-by: David Zafman <>

Revision 43053fcd (diff)
Added by David Zafman over 3 years ago

ceph-objectstore-tool: Use exit status 11 for incompatible import attempt

This is used so upgrade testing doesn't generate false failure.
Fixes: #11139

Signed-off-by: David Zafman <>
(cherry picked from commit 175aff8afe8215547ab57f8d8017ce8fdc0ff543)

Revision 6c530055 (diff)
Added by David Zafman over 3 years ago

ceph_manager: Check for exit status 11 from ceph-objectstore-tool import

Fixes: #11139

Signed-off-by: David Zafman <>

Revision 707a0b72 (diff)
Added by David Zafman over 3 years ago

ceph_manager: Check for exit status 11 from ceph-objectstore-tool import

Fixes: #11139

Signed-off-by: David Zafman <>
(cherry picked from commit 6c5300552d00232d6ecb2c1aa641d515c9d8cd34)

History

#1 Updated by Yuri Weinstein over 3 years ago

  • Subject changed from "xception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run to "Exception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run

#2 Updated by Yuri Weinstein over 3 years ago

This looks similar with segmentation fault info.

Run: http://pulpito.ceph.com/teuthology-2015-03-16_17:00:02-upgrade:firefly-firefly-distro-basic-vps/
Job: 806521
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-16_17:00:02-upgrade:firefly-firefly-distro-basic-vps/806521/teuthology.log

2015-03-16T19:46:46.363 INFO:teuthology.orchestra.run.vpm037.stdout:remove_coll FORREMOVAL_0_2.15
2015-03-16T19:46:46.364 INFO:teuthology.orchestra.run.vpm037.stdout:Remove successful
2015-03-16T19:46:46.370 INFO:teuthology.orchestra.run.vpm037:Running: 'sudo ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1 --journal-path /var/lib/ceph/osd/ceph-1/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op import --file /home/ubuntu/cephtest/data/exp.2.15.1'
2015-03-16T19:46:46.412 INFO:teuthology.orchestra.run.vpm037.stdout:Importing pgid 2.15
2015-03-16T19:46:46.417 INFO:teuthology.orchestra.run.vpm037.stdout:Import successful
2015-03-16T19:46:46.421 INFO:teuthology.orchestra.run.vpm037.stderr:*** Caught signal (Segmentation fault) **
2015-03-16T19:46:46.422 INFO:teuthology.orchestra.run.vpm037.stderr: in thread 7fb4dedf3780
2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: ceph version 0.80.9-183-g493d285 (493d285508914769cba3639b601ae6c20303af0d)
2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: 1: ceph-objectstore-tool() [0x9c0e7a]
2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: 2: (()+0xfcb0) [0x7fb4de320cb0]
2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 3: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x93) [0x7fb4de99ede3]
2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 4: (tcmalloc::ThreadCache::Scavenge()+0x44) [0x7fb4de99f0b4]
2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 5: (tc_free()+0x24b) [0x7fb4de9aae8b]
2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 6: (()+0x3b618) [0x7fb4dba9f618]
2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 7: (()+0x3b635) [0x7fb4dba9f635]
2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 8: (__libc_start_main()+0xf4) [0x7fb4dba85774]
2015-03-16T19:46:46.450 INFO:teuthology.orchestra.run.vpm037.stderr: 9: ceph-objectstore-tool() [0x616699]
2015-03-16T19:53:30.847 INFO:tasks.workunit.client.0.vpm071.stdout:stopping iogen
2015-03-16T19:53:31.293 INFO:tasks.workunit.client.0.vpm071.stdout:OK
2015-03-16T19:53:35.707 INFO:teuthology.orchestra.run.vpm071:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp'
2015-03-16T19:53:36.930 INFO:tasks.workunit:Stopping suites/iogen.sh on client.0...
2015-03-16T19:53:36.930 INFO:teuthology.orchestra.run.vpm071:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2015-03-16T19:53:37.028 DEBUG:teuthology.parallel:result is None
2015-03-16T19:53:37.028 INFO:tasks.ceph_fuse:Unmounting ceph-fuse clients...
2015-03-16T19:53:37.028 INFO:teuthology.orchestra.run.vpm071:Running: 'sudo fusermount -u /home/ubuntu/cephtest/mnt.0'
2015-03-16T19:53:37.197 INFO:tasks.ceph_fuse.ceph-fuse.0.vpm071.stderr:ceph-fuse[22581]: fuse finished with error 0
2015-03-16T19:53:46.142 INFO:teuthology.orchestra.run.vpm071:Running: 'rmdir -- /home/ubuntu/cephtest/mnt.0'
2015-03-16T19:53:46.156 INFO:tasks.thrashosds:joining thrashosds
2015-03-16T19:53:46.157 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 53, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 55, in task
    mgr.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 353, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: ceph-objectstore-tool: import failure with status 139
2015-03-16T19:53:46.252 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/search?q=7a641f494d7046c6ab74780edbf191a2
Exception: ceph-objectstore-tool: import failure with status 139
2015-03-16T19:53:46.252 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade

#3 Updated by David Zafman over 3 years ago

The original problem description looks to be a testing issue. This was masked before the tool's firefly backport because the command name change causing us not to use the tool on the older release. The cause of the non-zero exit status from ceph-objectstore-tool --op import was the following:

2015-03-17T01:21:47.804 INFO:teuthology.orchestra.run.plana75.stderr:Export has incompatible features set compat={},rocompat={},incompat={12=transaction hints,13=pg meta object}

I didn't verify this, but I assume the export could have occurred while firefly was installed and the import occurred after upgrading to hammer where incompatible features are set. This is the proper operation of the tool. With further analysis it could be determined that the import would have gone just fine, but we'd have to special case something like that.

#4 Updated by David Zafman over 3 years ago

Looking backwards in the log I see that this was caused because 2 machines were running different versions and we were moving a PG from one machine to another using export/import. I could imagine a customer during an upgrade having
a machine go south. In that case they might need to recover a PG from the non-upgraded machine. It would be nice if we figured out a way to deal better with incompatible features, only when they aren't a problem for import. For actual repairs we could have a --force flag that ignores the feature incompatibility. This would need to warn the user about possible corruption.

#5 Updated by David Zafman over 3 years ago

Created #11142 to cover the TCMALLOC segmentation fault variant.

#6 Updated by David Zafman over 3 years ago

  • Assignee set to David Zafman

I'm going to have a special exit code and fix the thrashosds to check for that.

#8 Updated by David Zafman over 3 years ago

  • Status changed from New to In Progress

#10 Updated by David Zafman over 3 years ago

  • Status changed from In Progress to Resolved

#11 Updated by Loic Dachary over 3 years ago

  • Status changed from Resolved to In Progress
  • Backport set to firefly

I'm going to have a special exit code and fix the thrashosds to check for that.

Could you please add a URL to the pull request / tracker issue that deals with this (and re-close this one) ? I suppose the following error I get is what it is about.

http://pulpito.ceph.com/loic-2015-03-27_08:42:59-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi/824326/

2015-03-27T01:26:43.755 INFO:teuthology.orchestra.run.burnupi35.stdout:finish_remove_pgs 4.0_TEMP clearing temp
2015-03-27T01:26:43.774 INFO:teuthology.orchestra.run.burnupi35.stderr:Export has incompatible features set compat={},rocompat={},incompat={12=transaction hints,13=pg meta object}
2015-03-27T01:26:43.810 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 632, in wrapper
    return func(self)
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 672, in do_thrash
    self.choose_action()()
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 274, in kill_osd
    format(ret=proc.exitstatus))
Exception: ceph-objectstore-tool: import failure with status 11

#13 Updated by David Zafman over 3 years ago

  • Status changed from In Progress to Pending Backport

Pull requests in hammer branch:
https://github.com/ceph/ceph/pull/4128
https://github.com/ceph/ceph-qa-suite/pull/374

Pull requests in firefly branch
https://github.com/ceph/ceph/pull/4129

I found that we hadn't back ported the both parts for firefly
https://github.com/ceph/ceph-qa-suite/pull/394

#14 Updated by Loic Dachary over 3 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF