Project

General

Profile

Actions

Support #38995

open

Writing parquet file to a bucket through rados-gw usuing multipart uploads is failing with read timeout

Added by dpetrov dpetrov about 5 years ago. Updated about 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

I am writing a parquet file using python dask.dataframe.

tdf = dd.read_parquet('file.pq', engine='fastparquet')
tdf.to_parquet(s3://databuck/file.pq', engine='fastparquet', storage_options=storage_options)
  1. API Exception
    ~/develop/python/osiris/.venv/lib/python3.6/site-packages/botocore/httpsession.py in send(self, request)
        282             raise ConnectTimeoutError(endpoint_url=request.url, error=e)
        283         except URLLib3ReadTimeoutError as e:
    --> 284             raise ReadTimeoutError(endpoint_url=request.url, error=e)
        285         except ProtocolError as e:
        286             raise ConnectionClosedError(
    
    ReadTimeoutError: Read timeout on endpoint URL: "http://ceph:5434/databuck/file.pq/part.6.parquet" 
    
  1. radosgw-log
    2019-03-28 06:28:24.454 7efcf9ed3700  1 ====== req done req=0x7efcf9eca850 op status=0 http_status=200 ======
    2019-03-28 06:28:24.454 7efcf9ed3700  1 civetweb: 0x55bc62b94000: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "POST /databuck/file.pq/part.2.parquet?uploads HTTP/1.1" 200 438 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122
    2019-03-28 06:28:24.484 7efcf9ed3700  1 ====== starting new request req=0x7efcf9eca850 =====
    2019-03-28 06:28:24.514 7efcf96d2700  1 ====== req done req=0x7efcf96c9850 op status=0 http_status=200 ======
    2019-03-28 06:28:24.514 7efcf96d2700  1 civetweb: 0x55bc62b949d8: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "POST /databuck/file.pq/part.9.parquet?uploads HTTP/1.1" 200 438 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122
    2019-03-28 06:28:24.551 7efcf96d2700  1 ====== starting new request req=0x7efcf96c9850 =====
    2019-03-28 06:28:24.671 7efcf8ed1700  1 ====== req done req=0x7efcf8ec8850 op status=0 http_status=200 ======
    2019-03-28 06:28:24.671 7efcf8ed1700  1 civetweb: 0x55bc62b953b0: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "POST /databuck/file.pq/part.4.parquet?uploads HTTP/1.1" 200 438 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122
    2019-03-28 06:28:24.701 7efcf8ed1700  1 ====== starting new request req=0x7efcf8ec8850 =====
    2019-03-28 06:28:24.767 7efcf86d0700  1 ====== req done req=0x7efcf86c7850 op status=0 http_status=200 ======
    2019-03-28 06:28:24.767 7efcf86d0700  1 civetweb: 0x55bc62b95d88: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "POST /databuck/file.pq/part.1.parquet?uploads HTTP/1.1" 200 438 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122
    2019-03-28 06:28:24.814 7efcf86d0700  1 ====== starting new request req=0x7efcf86c7850 =====
    2019-03-28 06:28:41.380 7efcf7ecf700  1 ====== starting new request req=0x7efcf7ec6850 =====
    2019-03-28 06:28:43.487 7efcf76ce700  1 ====== starting new request req=0x7efcf76c5850 =====
    2019-03-28 06:28:48.884 7efcf6ecd700  1 ====== starting new request req=0x7efcf6ec4850 =====
    2019-03-28 06:28:51.074 7efcf66cc700  1 ====== starting new request req=0x7efcf66c3850 =====
    2019-03-28 06:28:58.137 7efcf5ecb700  1 ====== starting new request req=0x7efcf5ec2850 =====
    2019-03-28 06:29:05.743 7efcf56ca700  1 ====== starting new request req=0x7efcf56c1850 =====
    2019-03-28 06:29:09.717 7efcf4ec9700  1 ====== starting new request req=0x7efcf4ec0850 =====
    2019-03-28 06:29:10.100 7efcf46c8700  1 ====== starting new request req=0x7efcf46bf850 =====
    2019-03-28 06:29:10.223 7efcf3ec7700  1 ====== starting new request req=0x7efcf3ebe850 =====
    2019-03-28 06:29:14.777 7efcf8ed1700  1 ====== req done req=0x7efcf8ec8850 op status=0 http_status=200 ======
    2019-03-28 06:29:14.777 7efcf8ed1700  1 civetweb: 0x55bc62b953b0: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "PUT /databuck/file.pq/part.4.parquet?partNumber=1&uploadId=2~vMhP76lLfKTxcgJ9T7hQ6JAYZX9u9H9 HTTP/1.1" 200 231 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122
    2019-03-28 06:29:24.253 7efcf9ed3700  1 ====== req done req=0x7efcf9eca850 op status=0 http_status=200 ======
    2019-03-28 06:29:24.253 7efcf9ed3700  1 civetweb: 0x55bc62b94000: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "PUT /databuck/file.pq/part.2.parquet?partNumber=1&uploadId=2~8sP5FipUDZWZjTIhYxf7vsLUdd2lg0v HTTP/1.1" 200 231 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122
    2019-03-28 06:29:27.377 7efcf9ed3700  1 ====== starting new request req=0x7efcf9eca850 =====
    2019-03-28 06:29:28.340 7efcf96d2700  1 ====== req done req=0x7efcf96c9850 op status=0 http_status=200 ======
    2019-03-28 06:29:28.340 7efcf96d2700  1 civetweb: 0x55bc62b949d8: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "PUT /databuck/file.pq/part.9.parquet?partNumber=1&uploadId=2~6gGhyU8lj0INLNDvsGQQpPA0mgTQaZ6 HTTP/1.1" 200 231 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122
    2019-03-28 06:29:32.636 7efcf96d2700  1 ====== starting new request req=0x7efcf96c9850 =====
    2019-03-28 06:29:35.126 7efcf8ed1700  1 ====== starting new request req=0x7efcf8ec8850 =====
    2019-03-28 06:29:35.746 7efcf36c6700  1 ====== starting new request req=0x7efcf36bd850 =====
    2019-03-28 06:29:45.020 7efcf2ec5700  1 ====== starting new request req=0x7efcf2ebc850 =====
    2019-03-28 06:29:50.190 7efcf26c4700  1 ====== starting new request req=0x7efcf26bb850 =====
    2019-03-28 06:29:50.500 7efcf1ec3700  1 ====== starting new request req=0x7efcf1eba850 =====
    2019-03-28 06:29:53.170 7efcf16c2700  1 ====== starting new request req=0x7efcf16b9850 =====
    2019-03-28 06:29:55.693 7efcf0ec1700  1 ====== starting new request req=0x7efcf0eb8850 =====
    2019-03-28 06:29:56.480 7efcf06c0700  1 ====== starting new request req=0x7efcf06b7850 =====
    
    

Full api log is attached to the ticket.

Thanks in advance for the assistance.


Files

api-ceph.log (34.2 KB) api-ceph.log Log from dask.dataframe.to_parquet dpetrov dpetrov, 03/28/2019 07:07 AM
Actions #1

Updated by Zheng Yan about 5 years ago

  • Project changed from CephFS to rgw
  • Category deleted (Performance/Resource Usage)
Actions

Also available in: Atom PDF