Actions
Support #38995
openWriting parquet file to a bucket through rados-gw usuing multipart uploads is failing with read timeout
Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Tags:
Reviewed:
Affected Versions:
Pull request ID:
Description
I am writing a parquet file using python dask.dataframe.
tdf = dd.read_parquet('file.pq', engine='fastparquet')
tdf.to_parquet(s3://databuck/file.pq', engine='fastparquet', storage_options=storage_options)
- API Exception
~/develop/python/osiris/.venv/lib/python3.6/site-packages/botocore/httpsession.py in send(self, request) 282 raise ConnectTimeoutError(endpoint_url=request.url, error=e) 283 except URLLib3ReadTimeoutError as e: --> 284 raise ReadTimeoutError(endpoint_url=request.url, error=e) 285 except ProtocolError as e: 286 raise ConnectionClosedError( ReadTimeoutError: Read timeout on endpoint URL: "http://ceph:5434/databuck/file.pq/part.6.parquet"
- radosgw-log
2019-03-28 06:28:24.454 7efcf9ed3700 1 ====== req done req=0x7efcf9eca850 op status=0 http_status=200 ====== 2019-03-28 06:28:24.454 7efcf9ed3700 1 civetweb: 0x55bc62b94000: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "POST /databuck/file.pq/part.2.parquet?uploads HTTP/1.1" 200 438 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122 2019-03-28 06:28:24.484 7efcf9ed3700 1 ====== starting new request req=0x7efcf9eca850 ===== 2019-03-28 06:28:24.514 7efcf96d2700 1 ====== req done req=0x7efcf96c9850 op status=0 http_status=200 ====== 2019-03-28 06:28:24.514 7efcf96d2700 1 civetweb: 0x55bc62b949d8: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "POST /databuck/file.pq/part.9.parquet?uploads HTTP/1.1" 200 438 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122 2019-03-28 06:28:24.551 7efcf96d2700 1 ====== starting new request req=0x7efcf96c9850 ===== 2019-03-28 06:28:24.671 7efcf8ed1700 1 ====== req done req=0x7efcf8ec8850 op status=0 http_status=200 ====== 2019-03-28 06:28:24.671 7efcf8ed1700 1 civetweb: 0x55bc62b953b0: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "POST /databuck/file.pq/part.4.parquet?uploads HTTP/1.1" 200 438 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122 2019-03-28 06:28:24.701 7efcf8ed1700 1 ====== starting new request req=0x7efcf8ec8850 ===== 2019-03-28 06:28:24.767 7efcf86d0700 1 ====== req done req=0x7efcf86c7850 op status=0 http_status=200 ====== 2019-03-28 06:28:24.767 7efcf86d0700 1 civetweb: 0x55bc62b95d88: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "POST /databuck/file.pq/part.1.parquet?uploads HTTP/1.1" 200 438 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122 2019-03-28 06:28:24.814 7efcf86d0700 1 ====== starting new request req=0x7efcf86c7850 ===== 2019-03-28 06:28:41.380 7efcf7ecf700 1 ====== starting new request req=0x7efcf7ec6850 ===== 2019-03-28 06:28:43.487 7efcf76ce700 1 ====== starting new request req=0x7efcf76c5850 ===== 2019-03-28 06:28:48.884 7efcf6ecd700 1 ====== starting new request req=0x7efcf6ec4850 ===== 2019-03-28 06:28:51.074 7efcf66cc700 1 ====== starting new request req=0x7efcf66c3850 ===== 2019-03-28 06:28:58.137 7efcf5ecb700 1 ====== starting new request req=0x7efcf5ec2850 ===== 2019-03-28 06:29:05.743 7efcf56ca700 1 ====== starting new request req=0x7efcf56c1850 ===== 2019-03-28 06:29:09.717 7efcf4ec9700 1 ====== starting new request req=0x7efcf4ec0850 ===== 2019-03-28 06:29:10.100 7efcf46c8700 1 ====== starting new request req=0x7efcf46bf850 ===== 2019-03-28 06:29:10.223 7efcf3ec7700 1 ====== starting new request req=0x7efcf3ebe850 ===== 2019-03-28 06:29:14.777 7efcf8ed1700 1 ====== req done req=0x7efcf8ec8850 op status=0 http_status=200 ====== 2019-03-28 06:29:14.777 7efcf8ed1700 1 civetweb: 0x55bc62b953b0: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "PUT /databuck/file.pq/part.4.parquet?partNumber=1&uploadId=2~vMhP76lLfKTxcgJ9T7hQ6JAYZX9u9H9 HTTP/1.1" 200 231 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122 2019-03-28 06:29:24.253 7efcf9ed3700 1 ====== req done req=0x7efcf9eca850 op status=0 http_status=200 ====== 2019-03-28 06:29:24.253 7efcf9ed3700 1 civetweb: 0x55bc62b94000: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "PUT /databuck/file.pq/part.2.parquet?partNumber=1&uploadId=2~8sP5FipUDZWZjTIhYxf7vsLUdd2lg0v HTTP/1.1" 200 231 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122 2019-03-28 06:29:27.377 7efcf9ed3700 1 ====== starting new request req=0x7efcf9eca850 ===== 2019-03-28 06:29:28.340 7efcf96d2700 1 ====== req done req=0x7efcf96c9850 op status=0 http_status=200 ====== 2019-03-28 06:29:28.340 7efcf96d2700 1 civetweb: 0x55bc62b949d8: 10.10.0.101 - - [28/Mar/2019:06:28:24 +0000] "PUT /databuck/file.pq/part.9.parquet?partNumber=1&uploadId=2~6gGhyU8lj0INLNDvsGQQpPA0mgTQaZ6 HTTP/1.1" 200 231 - Boto3/1.9.122 Python/3.6.6 Linux/5.0.3-arch1-1-ARCH Botocore/1.12.122 2019-03-28 06:29:32.636 7efcf96d2700 1 ====== starting new request req=0x7efcf96c9850 ===== 2019-03-28 06:29:35.126 7efcf8ed1700 1 ====== starting new request req=0x7efcf8ec8850 ===== 2019-03-28 06:29:35.746 7efcf36c6700 1 ====== starting new request req=0x7efcf36bd850 ===== 2019-03-28 06:29:45.020 7efcf2ec5700 1 ====== starting new request req=0x7efcf2ebc850 ===== 2019-03-28 06:29:50.190 7efcf26c4700 1 ====== starting new request req=0x7efcf26bb850 ===== 2019-03-28 06:29:50.500 7efcf1ec3700 1 ====== starting new request req=0x7efcf1eba850 ===== 2019-03-28 06:29:53.170 7efcf16c2700 1 ====== starting new request req=0x7efcf16b9850 ===== 2019-03-28 06:29:55.693 7efcf0ec1700 1 ====== starting new request req=0x7efcf0eb8850 ===== 2019-03-28 06:29:56.480 7efcf06c0700 1 ====== starting new request req=0x7efcf06b7850 =====
Full api log is attached to the ticket.
Thanks in advance for the assistance.
Files
Updated by Zheng Yan about 5 years ago
- Project changed from CephFS to rgw
- Category deleted (
Performance/Resource Usage)
Actions