Bug #11139
closed"Exception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run
0%
Description
Run: http://pulpito.ceph.com/teuthology-2015-03-16_17:13:01-upgrade:firefly-x-hammer-distro-basic-multi/
Job: 806778
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-16_17:13:01-upgrade:firefly-x-hammer-distro-basic-multi/806778/
2015-03-17T01:36:32.442 INFO:tasks.rados:joining rados 2015-03-17T01:36:32.442 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart 2015-03-17T01:36:32.442 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds 2015-03-17T01:36:32.443 INFO:tasks.thrashosds:joining thrashosds 2015-03-17T01:36:32.443 ERROR:teuthology.run_tasks:Manager failed: thrashosds Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/thrashosds.py", line 174, in task thrash_proc.do_join() File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/ceph_manager.py", line 418, in do_join self.thread.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception Exception: ceph-objectstore-tool: import failure with status 1
Updated by Yuri Weinstein about 9 years ago
- Subject changed from "xception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run to "Exception: ceph-objectstore-tool: import failure" in upgrade:firefly-x-hammer-distro-basic-multi run
Updated by Yuri Weinstein about 9 years ago
This looks similar with segmentation fault info.
Run: http://pulpito.ceph.com/teuthology-2015-03-16_17:00:02-upgrade:firefly-firefly-distro-basic-vps/
Job: 806521
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-16_17:00:02-upgrade:firefly-firefly-distro-basic-vps/806521/teuthology.log
2015-03-16T19:46:46.363 INFO:teuthology.orchestra.run.vpm037.stdout:remove_coll FORREMOVAL_0_2.15 2015-03-16T19:46:46.364 INFO:teuthology.orchestra.run.vpm037.stdout:Remove successful 2015-03-16T19:46:46.370 INFO:teuthology.orchestra.run.vpm037:Running: 'sudo ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1 --journal-path /var/lib/ceph/osd/ceph-1/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op import --file /home/ubuntu/cephtest/data/exp.2.15.1' 2015-03-16T19:46:46.412 INFO:teuthology.orchestra.run.vpm037.stdout:Importing pgid 2.15 2015-03-16T19:46:46.417 INFO:teuthology.orchestra.run.vpm037.stdout:Import successful 2015-03-16T19:46:46.421 INFO:teuthology.orchestra.run.vpm037.stderr:*** Caught signal (Segmentation fault) ** 2015-03-16T19:46:46.422 INFO:teuthology.orchestra.run.vpm037.stderr: in thread 7fb4dedf3780 2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: ceph version 0.80.9-183-g493d285 (493d285508914769cba3639b601ae6c20303af0d) 2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: 1: ceph-objectstore-tool() [0x9c0e7a] 2015-03-16T19:46:46.447 INFO:teuthology.orchestra.run.vpm037.stderr: 2: (()+0xfcb0) [0x7fb4de320cb0] 2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 3: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x93) [0x7fb4de99ede3] 2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 4: (tcmalloc::ThreadCache::Scavenge()+0x44) [0x7fb4de99f0b4] 2015-03-16T19:46:46.448 INFO:teuthology.orchestra.run.vpm037.stderr: 5: (tc_free()+0x24b) [0x7fb4de9aae8b] 2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 6: (()+0x3b618) [0x7fb4dba9f618] 2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 7: (()+0x3b635) [0x7fb4dba9f635] 2015-03-16T19:46:46.449 INFO:teuthology.orchestra.run.vpm037.stderr: 8: (__libc_start_main()+0xf4) [0x7fb4dba85774] 2015-03-16T19:46:46.450 INFO:teuthology.orchestra.run.vpm037.stderr: 9: ceph-objectstore-tool() [0x616699] 2015-03-16T19:53:30.847 INFO:tasks.workunit.client.0.vpm071.stdout:stopping iogen 2015-03-16T19:53:31.293 INFO:tasks.workunit.client.0.vpm071.stdout:OK 2015-03-16T19:53:35.707 INFO:teuthology.orchestra.run.vpm071:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp' 2015-03-16T19:53:36.930 INFO:tasks.workunit:Stopping suites/iogen.sh on client.0... 2015-03-16T19:53:36.930 INFO:teuthology.orchestra.run.vpm071:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0' 2015-03-16T19:53:37.028 DEBUG:teuthology.parallel:result is None 2015-03-16T19:53:37.028 INFO:tasks.ceph_fuse:Unmounting ceph-fuse clients... 2015-03-16T19:53:37.028 INFO:teuthology.orchestra.run.vpm071:Running: 'sudo fusermount -u /home/ubuntu/cephtest/mnt.0' 2015-03-16T19:53:37.197 INFO:tasks.ceph_fuse.ceph-fuse.0.vpm071.stderr:ceph-fuse[22581]: fuse finished with error 0 2015-03-16T19:53:46.142 INFO:teuthology.orchestra.run.vpm071:Running: 'rmdir -- /home/ubuntu/cephtest/mnt.0' 2015-03-16T19:53:46.156 INFO:tasks.thrashosds:joining thrashosds 2015-03-16T19:53:46.157 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 53, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task return fn(**kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 55, in task mgr.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task thrash_proc.do_join() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 353, in do_join self.thread.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception Exception: ceph-objectstore-tool: import failure with status 139 2015-03-16T19:53:46.252 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/search?q=7a641f494d7046c6ab74780edbf191a2 Exception: ceph-objectstore-tool: import failure with status 139 2015-03-16T19:53:46.252 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade
Updated by David Zafman about 9 years ago
The original problem description looks to be a testing issue. This was masked before the tool's firefly backport because the command name change causing us not to use the tool on the older release. The cause of the non-zero exit status from ceph-objectstore-tool --op import was the following:
2015-03-17T01:21:47.804 INFO:teuthology.orchestra.run.plana75.stderr:Export has incompatible features set compat={},rocompat={},incompat={12=transaction hints,13=pg meta object}
I didn't verify this, but I assume the export could have occurred while firefly was installed and the import occurred after upgrading to hammer where incompatible features are set. This is the proper operation of the tool. With further analysis it could be determined that the import would have gone just fine, but we'd have to special case something like that.
Updated by David Zafman about 9 years ago
Looking backwards in the log I see that this was caused because 2 machines were running different versions and we were moving a PG from one machine to another using export/import. I could imagine a customer during an upgrade having
a machine go south. In that case they might need to recover a PG from the non-upgraded machine. It would be nice if we figured out a way to deal better with incompatible features, only when they aren't a problem for import. For actual repairs we could have a --force flag that ignores the feature incompatibility. This would need to warn the user about possible corruption.
Updated by David Zafman about 9 years ago
Created #11142 to cover the TCMALLOC segmentation fault variant.
Updated by David Zafman about 9 years ago
- Assignee set to David Zafman
I'm going to have a special exit code and fix the thrashosds to check for that.
Updated by Yuri Weinstein about 9 years ago
Run: http://pulpito.ceph.com/teuthology-2015-03-15_17:18:02-upgrade:firefly-x-hammer-distro-basic-vps/
Jobs: ['805125', '805130']
Updated by David Zafman about 9 years ago
- Status changed from New to In Progress
Updated by Loïc Dachary about 9 years ago
- firefly backport https://github.com/ceph/ceph/pull/4129
Updated by David Zafman about 9 years ago
- Status changed from In Progress to Resolved
Updated by Loïc Dachary about 9 years ago
- Status changed from Resolved to In Progress
- Backport set to firefly
I'm going to have a special exit code and fix the thrashosds to check for that.
Could you please add a URL to the pull request / tracker issue that deals with this (and re-close this one) ? I suppose the following error I get is what it is about.
2015-03-27T01:26:43.755 INFO:teuthology.orchestra.run.burnupi35.stdout:finish_remove_pgs 4.0_TEMP clearing temp 2015-03-27T01:26:43.774 INFO:teuthology.orchestra.run.burnupi35.stderr:Export has incompatible features set compat={},rocompat={},incompat={12=transaction hints,13=pg meta object} 2015-03-27T01:26:43.810 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last): File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 632, in wrapper return func(self) File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 672, in do_thrash self.choose_action()() File "/var/lib/teuthworker/src/ceph-qa-suite_wip-ec-cache-agent-firefly-backports/tasks/ceph_manager.py", line 274, in kill_osd format(ret=proc.exitstatus)) Exception: ceph-objectstore-tool: import failure with status 11
Updated by Loïc Dachary about 9 years ago
Same error as above, in case more logs help. Tell me when to stop ;-)
Updated by David Zafman about 9 years ago
- Status changed from In Progress to Pending Backport
Pull requests in hammer branch:
https://github.com/ceph/ceph/pull/4128
https://github.com/ceph/ceph-qa-suite/pull/374
Pull requests in firefly branch
https://github.com/ceph/ceph/pull/4129
I found that we hadn't back ported the both parts for firefly
https://github.com/ceph/ceph-qa-suite/pull/394
Updated by Loïc Dachary about 9 years ago
- Status changed from Pending Backport to Resolved