Project

General

Profile

Actions

Bug #36524

closed

hang on 'sudo tar cz -f - -C /var/log/ceph -- .'

Added by Patrick Donnelly over 5 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2018-10-18T14:28:55.826 INFO:tasks.ceph:Shutting down logrotate
2018-10-18T14:28:55.826 INFO:teuthology.orchestra.run.smithi088:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-10-18T14:28:55.830 INFO:teuthology.orchestra.run.smithi178:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-10-18T14:28:55.938 INFO:teuthology.orchestra.run.smithi088:Running: 'sudo rm /etc/logrotate.d/ceph-test.conf'
2018-10-18T14:28:56.017 INFO:teuthology.orchestra.run.smithi178:Running: 'sudo rm /etc/logrotate.d/ceph-test.conf'
2018-10-18T14:28:56.116 INFO:tasks.ceph:Compressing logs...
2018-10-18T14:28:56.116 INFO:teuthology.orchestra.run.smithi088:Running: "sudo find /var/log/ceph -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --" 
2018-10-18T14:28:56.120 INFO:teuthology.orchestra.run.smithi178:Running: "sudo find /var/log/ceph -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --" 
2018-10-18T14:29:12.373 INFO:tasks.ceph:Archiving logs...
2018-10-18T14:29:12.375 DEBUG:teuthology.misc:Transferring archived files from smithi178:/var/log/ceph to /home/teuthworker/archive/pdonnell-2018-10-17_19:40:05-fs-wip-pdonnell-testing-20181017.175152-distro-basic-smithi/3152239/remote/smithi178/log
2018-10-18T14:29:12.376 INFO:teuthology.orchestra.run.smithi178:Running: 'sudo tar cz -f - -C /var/log/ceph -- .'

From: /ceph/teuthology-archive/pdonnell-2018-10-17_19:40:05-fs-wip-pdonnell-testing-20181017.175152-distro-basic-smithi/3152239/teuthology.log
Teuthology branch: https://github.com/ceph/teuthology/commit/432e976e6965ad8fa466cf2360a1cd14efed0844

In another job where I killed gzip on the remote, I saw this traceback:

2018-10-18T14:29:39.417 INFO:teuthology.orchestra.run.smithi082:Running: 'sudo tar cz -f - -C /var/log/ceph -- .'
2018-10-18T19:32:45.277 ERROR:paramiko.transport:Socket exception: Connection reset by peer (104)
2018-10-18T19:32:45.280 ERROR:teuthology.run_tasks:Manager failed: ceph
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 159, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20181017.175152/qa/tasks/ceph.py", line 1753, in task
    '--no-mon-health-to-clog',
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/contextutil.py", line 46, in nested
    if exit(*exc):
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20181017.175152/qa/tasks/ceph.py", line 243, in ceph_log
    os.path.join(sub, 'log'))
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/misc.py", line 771, in pull_directory
    tar.makefile(ti, targetpath=os.path.join(localdir, sub))
  File "/usr/lib/python2.7/tarfile.py", line 2233, in makefile
    copyfileobj(source, target)
  File "/usr/lib/python2.7/tarfile.py", line 266, in copyfileobj
    shutil.copyfileobj(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/lib/python2.7/tarfile.py", line 831, in read
    buf += self.fileobj.read(size - len(buf))
  File "/usr/lib/python2.7/tarfile.py", line 743, in read
    return self.readnormal(size)
  File "/usr/lib/python2.7/tarfile.py", line 758, in readnormal
    return self.__read(size)
  File "/usr/lib/python2.7/tarfile.py", line 750, in __read
    raise ReadError("unexpected end of data")
ReadError: unexpected end of data

From: /ceph/teuthology-archive/pdonnell-2018-10-17_19:40:05-fs-wip-pdonnell-testing-20181017.175152-distro-basic-smithi/3152202/teuthology.log

Edit: strace shows gzip is blocked on a write syscall:

[root@smithi173 108181]# strace -f -v -p 108181 -s 100 
strace: Process 108181 attached
write(1, "\31\32\317\214\2036\2371Ij\216\357\327b+\0317O\316,\f9 Dm\3z\204o\304\362\2348\0M-\r\303\272\206\322k\244\370\336\350\317`\34d\265\2\227*\271\273\370\305@S!(\332\326\267\357E>\302f\2732\246\35 \365\23\301B\343\1\253\312?\323\325\311Y\224Zf\370\350\301\36Y\217\370H\217"..., 16384^Cstrace: Process 108181 detached

Edit2: not completely blocked, just very slow:

[root@smithi173 108181]# strace -f -v -p 108181 -s 100 
strace: Process 108181 attached
write(1, "\31\32\317\214\2036\2371Ij\216\357\327b+\0317O\316,\f9 Dm\3z\204o\304\362\2348\0M-\r\303\272\206\322k\244\370\336\350\317`\34d\265\2\227*\271\273\370\305@S!(\332\326\267\357E>\302f\2732\246\35 \365\23\301B\343\1\253\312?\323\325\311Y\224Zf\370\350\301\36Y\217\370H\217"..., 16384^Cstrace: Process 108181 detached
 <detached ...>
[root@smithi173 108181]# who^C
[root@smithi173 108181]# strace -f -v -p 108181 -s 100 
strace: Process 108181 attached
write(1, "\207\3459\337\354/\265\17&\250\30\276'\244\277\35\232\33\237\307\224-:\306\233\263\346\265\362\233\333|a\\\301\340\353\203\225q\320\347\273\262\31\\Mq\335\356\344\4\344,\372S\305\242\200@iJ\341\34\255\266\305\360\16\31\275\r(f=\242\347\262;3y\257s\226\0B\301'\330\200kA\16\20n\367O\376|\366"..., 16384) = 16384
write(1, "\330\t\352\n\230\326\5_\207\305JF\325\236\200Eo\r\36\232F\247\253\263x\225\366\305\263\0\211\3^\225\340YF\301\276b\255\356\261;\275\0\357}\324\260\252Q\376\225\345X\223\373\225\345\2702\23\360c\305\240\37!\312\30\216\315\6mR\25\216\372\313\31\326\17U\204\"\317Q\221\243\310\325V\362S7\363\10O\210"..., 16384) = 16384
write(1, "*f\326\277\330\365\16\203\373\35n=r\300\372\25\276Kl\271\276s\225\235\217_\236\257U\311\367\255\370\2717\31:\267\357\317\204\344Q\273\340\257\205\324\261\257\345\25\205\344\372\242\201\316\336D_\345.\21\377\270\230\242\354\212\341?Jsoq\373\376[d\301\273\32\314\"\254dS\330v9\237\245\245\2\3151ih\341"..., 16384) = 16384
read(0, "d\235\207sg\n\336\205-\305\347\327\302\307\327EL\326I,1^\273\325\342\333\366\365\30\374U\226\5\302\277}\n\212\337\267\207\343\341s\20\5\207\303\352\252\fM\226a\21\226\35\357PH%E\337\373\f\235\26L\273\2157\333\327u\243\221m\215\372\177\224\377\356\266_\356\322\206\300\21\237M\271QN\303\314\366>\275"..., 32768) = 32768
write(1, "\203\350\344\t\364]E\214\362/\\\347\221}\374\365lW\277\372\256;a\375\211\377\34@7\31@\375\355\244\341\261l\276\216\6\364\334\344\344\335(\223Q\207\342\313z\313\217n^\345\335c\366\311\327\"\374I\307\304\204\335mN{1^\245\371\206yH\34\364\340\207\370\201\277\246\341\0\372wb\"H\374F=0\37\254"..., 16384) = 16384
write(1, "\363\f^=\370\332\356orm\30\345\231\317]\330>o_u\330#M\265N2\224\277\273\26\325*\374\333\211b\217\244\370z\304\213!:\377!\300\330\vN\327\325T\276z\233\223\212\204\274\\)\1_?f\25\235\277N\340\302\223m\22\245f\364s\323\23\306\246h\376\v\0\263\231E\331\321K\205-\0003\233!{"..., 16384) = 16384
read(0, "2\211x\220\2774\366_#z\345G\217\340_\nz/\314\276\226\363\321\371K[\277\330\312^\240\310R\340\276\221\247\207\252\210\356\24:\366\200<\335\251\312\310*\304o(\206g\313;\301\256\2267x\5\277h\224^BG\270/5E\267\261T9/\254%\342't\37\327\5\367 \351\261\3\273b97\roL\17"..., 32768) = 32768
write(1, "1\313G8\233y\230\317\327b\275\347`g\257 \3172\vF5\353\256 \257\367fg\264vI\251\322\21\v\177\316\tR\201P\320\245d\327px|\20\f;$W\372\316s\343tR\236[-f?\244\236\301s\357\214O\346JH\36\273\364\252+\252@\361\252h\214\302\207\364\324\323\272\337\203u}'`~\347\270"..., 16384) = 16384
write(1, "\223nXQ\223\202\354],[y*|K\311Sp\303G\"[\30\274\347\360\357se\303)H\316|#\r\217\326s\310\20:\350d\323\331M\301\0a \34\210\360\177]\312[\5m&W&\0cA\372\n\367\247\223$Tw\2077\344\277\tDb\321\347\235\260WI\332a5d\237\202Q?\"\204\22*V\354"..., 16384) = 16384
read(0, "\311A\21\3378\275d\2105w\344u\214W\r\375\335\214\327.\24\367i\274\246\317\341\261\25\256\327\16\7\326I\1\266\20L/=aY#\3752\216\2224\216\f\232\fJ:\34\330\23\352\275\303\301\5\211\346\240\231\202\225{\177\27_\233\303k\207\203!\304\300\252\303\201=\241\336;\34\310\304@S\335*Xy\17b\200"..., 32768) = 32768
write(1, "\331\30\20\t\274\346\352\314\nR \273\277\3\31\310\302\342w\215\210?\2\263\215\3102OtV\265\310oz\20B8`\214\360 \202\0\206\30\5\234\"\177V(\361`\376rH\232\177\35!HNX\252\360[7\233Z\335\r\177\305\237\3201cE`S5e\216\26\302\237\375\335D\256\344\2263\16\325\314\25\21\310\371"..., 16384) = 16384
write(1, "\205I[\361\231\265\342\"D\362\301\321Yy^\303B\352R\261^P\"l\177j\347\367\207s\363*{te\325=\254\241\250\373A\203f]s\341\372:\243\303R<6\273\244\205\374L^<\265\216*\272\260\244\245\302\365*\257<Y\273\256\266\252\302X\273\256\223\2526\37\16\370\242R\207TG\315\20\nn\3|#"..., 16384) = 16384
read(0, "\235\326`\371$\6\326<\347\320=\202\1\10wk)A\24\222G\357\240\4\326L\243\4\300\272\2\r\262\246\7\237\304\310\232Z%\260d3\0\302#(\201\374@\357$Lo\243\233\313\24\35E\303j0\231Q\262J\375\257(y\341\237\336\306\363I\235\336>\343\320=\263\t\20N\236\331t\27\16*\275\235B\tP\351"..., 32768) = 32768
write(1, "\20\317.\234'\367\247\221M\27\205V\32\342\352*\264~\375\30!\300\321\17\230Q\321\367\224o\310'z\373\237>\316\314\377\366qJ\313\257$\313\315=\267\\\345<w>)\272rr[\305\370>)Cq\360\342:o3\26\352\225\205\373|\262\356\326\343\333R>\345\212\241\305\275\34\0376\257\264\255\354S\247\312Y["..., 16384) = 16384
write(1, "\252\37\277h\237\217\251^\273\257\31\227\311>\177\20\254\367Me{]!\360\331\272k\265o9\355\270p\374(\200\2169^\30_\350x\21\264\371\232\204\234w\366q\360\\\224\3C\337\340g\306\357\326\367\341\272\325q\236\261\36zXC\24}\262/\370\356\304\366\25\330\203c\356\373w\17\216\207\264\234z\270\240\3633\225"..., 16384) = 16384
read(0, "O`Ry\304\241\307\244\362\31\207\356\223&#\312\221`\317\223/\304X\301S~!(\37\255\276\20\34\27\2_\210\274K\20/Kh^\375S\212\204Z`\231\204\1(\217 \35\301\t`\3\201\307\320\204^\320\203\177\243\320\313\203\275%\205^\36\354-)\364\362`oI\241\227\7{K\n\275<\330[R\350\345\301"..., 32768) = 32768
write(1, "\270\32W\251\367\201\267L\356\244\271\321s\341\264\371IbO\374Bt\r\304}y\354\23?\32p\372\340d.a>\303\344\2\235\272\252>\346\271\223Y\304\341I\275/Qz_\305\265\220\24E\17\1\230i\1\2750\343\311\34\322\257\25\4c2\362Hy\206\325Tqd~\1\343\331\215|\320\304\4d\251\310\315\200\316"..., 16384) = 16384
write(1, "\350\363\356\316F\35NEw\323\243\335*\353\365y\352\177\367\342\236\10\37\274\\\265\361F\23656\365\305\37\377\217\10\261!~\357\277\240\17\331Y\275\212\263\375\277\333\253W}|\267\0370y\374\357\366\3\240\26c\310\200\rzqu\346{0\334A\2447\337L\"\"\33\337\367\360\346\341\353%\216i\372R\373\241\355\337"..., 16384) = 16384
read(0, "K\347\250\305\275\271+kd\2~Cb_\206\370q_\262P \340\250\24\3503\354&\v\30\234\273\322\r\31\26\16}\304g`_\204\210\23\210\21qn(\v\376\251VF\204\226\23\207\331<i\242\345\4\306\255M\rA\335\322\324i\270\240\304\2209\1\207\314qh\335\25\22098\236\301 s\232\362\207@\3464\6"..., 32768) = 32768
write(1, "\200fX\f\363MJ\332,1\376\f\211\342\363\262\366\346\324\374\n\217\10qQ\v}^\nI<\"0 z\242\241\245\333\207D\242N\303\224?\37\230\233\207\330O,\207M\237\310A`\276\33\3610\374\331\267n\3014\322\16!\261=\367,\277_\17\341F\357]\334Y\0018E\265\357\16\310=\365K\24\363\227\270\261"..., 16384) = 16384
write(1, "\337\265$\345\10\342\273e\1I\212N\347I\31\301\210\320\335\214\241)\263;\2\2255\10\210]\234\314z}\320\245\321~q\322\256\347\262\273\231\352\326\302d\206\365v\320\323\31\362\242_\247Y\267\272\271\367\252\247\332\322\270\355\27\216\356+\275-<\261\34\v|\26\276\267\306D\364\310\221\353\232\307\337:\34\223\350\247\222\367"..., 16384
Actions #1

Updated by Neha Ojha over 5 years ago

I think I noticed something similar here:

2018-10-16T13:01:19.328 INFO:tasks.ceph:Archiving logs...
2018-10-16T13:01:19.330 DEBUG:teuthology.misc:Transferring archived files from smithi007:/var/log/ceph to /home/teuthworker/archive/yuriw-2018-10-16_00:00:14-rados-wip-yuri-testing-2018-10-15-2110-mimic-distro-basic-smithi/3145897/remote/smithi007/log
2018-10-16T13:01:19.331 INFO:teuthology.orchestra.run.smithi007:Running: 'sudo tar cz -f - -C /var/log/ceph -- .'
2018-10-17T00:27:34.616 DEBUG:teuthology.exit:Got signal 15; running 2 handlers...
2018-10-17T00:27:34.641 DEBUG:teuthology.task.console_log:Killing console logger for smithi007

Only some of the logs were archived:

nojha@senta02:/ceph/teuthology-archive/yuriw-2018-10-16_00:00:14-rados-wip-yuri-testing-2018-10-15-2110-mimic-distro-basic-smithi/3145897/remote/smithi007/log$ ls
ceph-client.admin.38050.log.gz ceph-osd.2.log.gz

/a/yuriw-2018-10-16_00:00:14-rados-wip-yuri-testing-2018-10-15-2110-mimic-distro-basic-smithi/3145897/

Actions #2

Updated by Patrick Donnelly over 5 years ago

  • Description updated (diff)
Actions #3

Updated by Patrick Donnelly over 5 years ago

  • Description updated (diff)
Actions #4

Updated by Deepika Upadhyay over 3 years ago

observed this with batch involving logrotate changes:

http://qa-proxy.ceph.com/teuthology/yuriw-2020-11-13_17:17:14-rados-wip-yuri8-testing-2020-11-12-1029-octopus-distro-basic-smithi/5619853/teuthology.log

2020-11-13T20:06:33.684 ERROR:teuthology.run_tasks:Manager failed: ceph
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 172, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri8-testing-2020-11-12-1029-octopus/qa/tasks/ceph.py", line 1870, in task
    check_status=False,
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 55, in nested
    raise exc[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 47, in nested
    if exit(*exc):
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri8-testing-2020-11-12-1029-octopus/qa/tasks/ceph.py", line 292, in ceph_log
    os.path.join(sub, 'log'))
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/misc.py", line 739, in pull_directory
    tar.makefile(ti, targetpath=os.path.join(localdir, sub))
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/lib/python3.6/tarfile.py", line 2173, in makefile
    copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/lib/python3.6/tarfile.py", line 251, in copyfileobj
    raise exception("unexpected end of data")
tarfile.ReadError: unexpected end of data
Actions #5

Updated by Josh Durgin about 3 years ago

This can occur when the test machine's disk fills up - it should be less likely now that we turned debug_ms back to 1.

Actions #6

Updated by Josh Durgin over 2 years ago

  • Status changed from New to Resolved

Closing due to lack of re-occurrence

Actions

Also available in: Atom PDF