Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
51 views

My overall goal is the set up a virtual dataset of ERA5 data using Icechunk. As a smaller test example, I'm trying to pull all the data located in the 194001 ERA5 folder. I've been mostly able to ...
Kieran Bartels's user avatar
0 votes
0 answers
39 views

When creating an instance of S3FileSystem class, you can provide the config_kwargs dictionary to set further properties (like region or signature_version). The pyiceberg FileIO implementation is based ...
Doel10's user avatar
  • 11
0 votes
0 answers
41 views

I'm creating a subclass of s3fs.S3FileSystem that connects to either AWS S3 or MinIO based on environment variables. The connection should fail immediately during initialization if credentials are ...
Jose Constenla's user avatar
0 votes
1 answer
178 views

I am using jupyter-fs to connect JupyterHub's notebooks with a MinIO instance deployed in another namespace on Kubernetes. The MinIO endpoint is configured with HTTPS and uses a self-signed ...
A_A's user avatar
  • 36
0 votes
1 answer
32 views

Having an s3 object storage, I want to know which directories in a base directory have changed since a give datetime. It would work similar to get_changed_directories: bucket_directory = "...
Joost Döbken's user avatar
0 votes
1 answer
387 views

With S3fs we can set fs = s3fs.S3FileSystem(profile=profile_name) However this passing doesn't work for fsspec with caching: fs = fsspec.filesystem( "filecache", ...
Roelant's user avatar
  • 5,229
0 votes
1 answer
195 views

I installed s3fs-fuse on the centos server and I have a question about synchronization. The current situation is that Object Storage is synchronized with the CentOS server. I'm asking if every time a ...
hyundai-autoever's user avatar
0 votes
1 answer
369 views

I have a file named data_[2022-10-03:2022-10-23].csv.gzip in S3, inside a bucket and folder named s3://<bucket_name>/data/cache/ I am attempting to delete this file using S3FS. When I attempt to ...
Edy Bourne's user avatar
  • 6,306
2 votes
0 answers
250 views

I've been struggling to make s3fs and ProcessPoolExecutor work together. Essentially, the issue is that s3fs, by default, holds some session information for connections. So, that doesn't play well ...
David Moye's user avatar
1 vote
1 answer
2k views

I am using pyarrow fs.S3FileSystem library to write a csv to s3 bucket. Although this code runs fine in my local when I deploy to VM (linux) it throws error: OSError: When listing objects under key xx ...
prashant's user avatar
0 votes
1 answer
482 views

Using python s3fs, how do you copy an object from one s3 bucket to another? I have found answers using boto3, but could not find anything when looking through the s3fs docs.
jjbskir's user avatar
  • 11.3k
0 votes
1 answer
2k views

I have the following error on my notebook after setting up and EMR 6.3.0: An error was encountered: Install s3fs to access S3 Traceback (most recent call last): File "/usr/local/lib64/python3.7/...
Airone's user avatar
  • 1
0 votes
1 answer
2k views

I have the following method in Python: def read_file(self, bucket, table_name, file_name, format="csv"): data = None read_from_path = f"s3://{bucket}/{table_name}/{file_name}&...
HuLu ViCa's user avatar
  • 5,515
-1 votes
1 answer
840 views

I am copying folder to S3 with s3fs.put(..., recursive=True) and I experience weird behavior. The code is: import s3fs source_path = 'foo/bar' # there are some <files and ...
Pepacz's user avatar
  • 969
2 votes
2 answers
2k views

I'm trying to implement Unit Tests using Pytest, Moto (4.1.6) and s3fs (0.4.2) for my functions that interact with S3. So far I am able to create a bucket and populate it with all the files that live ...
A Campos's user avatar
  • 833
14 votes
4 answers
10k views

I can install boto3, s3fs and pandas using : pip install boto3 pandas s3fs But it fails with poetry : poetry add boto3 pandas s3fs Here is the error : Because no versions of s3fs match >2023.3.0,&...
jtobelem's user avatar
  • 981
0 votes
2 answers
2k views

I am only able to gain limited/top-level access to my aws s3. I can see the buckets, but not their contents; neither subfolders nor files. I'm running everything from inside a conda environment. I've ...
Dylan Bodkin's user avatar
1 vote
0 answers
345 views

I'm exploring the S3FS framework which I need for reading/writing from/to the S3 file system. From what I can see in docs, we can pass the AWS credentials explicitly, but I don't see any information ...
Monica's user avatar
  • 1,070
1 vote
0 answers
913 views

I'm trying to get an ML job to run on AWS Batch. The job runs in a docker container, using credentials generated for a Task IAM Role. I use DVC to manage the large data files needed for the task, ...
Kevin Yancey's user avatar
1 vote
1 answer
2k views

For test purposes, I'm trying to connect a module that intoduces an absration layer over s3fs with custom business logic. It seems like I have trouble connecting the s3fs client to the Minio container....
KaizenCat's user avatar
0 votes
0 answers
299 views

I want to create s3 file system for uploading files to s3 bucket using the pyarrows write_to_dataset function fs = s3fs.S3FileSystem() pa.parquet.write_to_dataset(table, root_path=output_folder, ...
amber_coder's user avatar
-1 votes
1 answer
348 views

I have multiple URL's like 'https://static.nseindia.com/s3fs-public/2022-09/ind_prs01092022.pdf' and I want to loop through an array of these and download them to a local folder. I saw that I may need ...
wanderingtrader's user avatar
0 votes
1 answer
263 views

I have a S3 bucket where objects are generated from salesforce on daily basis. I want to copy those objects from S3 bucket to a local Linux server. An application will run on that Linux server which ...
jarral rajput's user avatar
0 votes
1 answer
632 views

I'm pushing a dataframe to an s3 bucket using s3fs with the following code: s3fs = s3fs.S3FileSystem(anon=False) with s3fs.open(f"bucket-name/csv-name.csv",'w') as f: my_df.to_csv(f) ...
ire's user avatar
  • 591
1 vote
1 answer
2k views

I would like to read a S3 directory with multiple parquet files with same schema. The implemented code works outside the proxy, but the main problem is when enabling the proxy, I'm facing the ...
HouKaide's user avatar
  • 363
2 votes
1 answer
839 views

I am opening and using a netcdf file that is located on s3. I have the following code, however it creates an exception. import s3fs import xarray as xr filepath = "s3://mybucket/myfile.nc" ...
Scott's user avatar
  • 175
3 votes
1 answer
2k views

I'm trying to simplify access to datasets in various file formats (csv, pickle, feather, partitioned parquet, ...) stored as S3 objects. Since some users I support have different environments with ...
Wassadamo's user avatar
  • 1,416
3 votes
1 answer
1k views

I want to use s3fs based on fsspec to access files on S3. Mainly because of 2 neat features: local caching of files to disk with checking if files change, i.e. a file gets redownloaded if the local ...
Darkdragon84's user avatar
2 votes
1 answer
2k views

I am experiecing issue with pandas latest release 1.4.2 while reading csv file from S3. I am using AWS Lambda python runtime environment using python 3.8, that comes with following boto3 and botocore ...
Amit Kumar's user avatar
0 votes
1 answer
614 views

I am looking to deploy a Python Flask app on an AWS EC2 (Ubuntu 20.04) instance. The app fetches data from an S3 bucket (in the same region as the EC2 instance) and performs some data processing. I ...
mfcss's user avatar
  • 1,611
0 votes
1 answer
2k views

I get this error when trying to import s3fs in Python 3.10.2 in Windows: ImportError: cannot import name 'is_valid_ipv6_endpoint_url' from 'botocore.endpoint' I found this question in Github that ...
HuLu ViCa's user avatar
  • 5,515
1 vote
1 answer
529 views

I'm downloading a file (to be precise a parquet set of files) from S3 and converting that to a Pandas DataFrame. I'm doing that with the Pandas function read_parquet and s3fs, as described here: df = ...
gsmafra's user avatar
  • 2,504
1 vote
2 answers
5k views

I am using the latest version of s3fs-0.5.2 and fsspec-0.9.0, when import s3fs, encountered the following error: File "/User/.conda/envs/py376/lib/python3.7/site-packages/s3fs/__init__.py", ...
xsqian's user avatar
  • 299
1 vote
2 answers
969 views

I have a S3 URL to a public file similar to the following URL example: s3://test-public/new/solution/file.csv (this is not the actual link . just a close example to the one i'm using) I am able to ...
Dror's user avatar
  • 5,535
0 votes
0 answers
2k views

My use case is that I am trying to write my dataframe to S3 bucket for which I installed s3fs==2015.5.0 using pip3. Now when I run the code import s3fs def my_func(): # my logic my_func() It ...
muazfaiz's user avatar
  • 5,079
0 votes
1 answer
336 views

import contextlib import gzip import s3fs AWS_S3 = s3fs.S3FileSystem(anon=False) # AWS env must be set up correctly source_file_path = "/tmp/your_file.txt" s3_file_path = "my-bucket/...
x89's user avatar
  • 3,532
14 votes
1 answer
14k views

Yesterday the following cell sequence in Google Colab would work. (I am using colab-env to import environment variables from Google Drive.) This morning, when I run the same code, I get the following ...
Andrew Fogg's user avatar
2 votes
1 answer
1k views

I am new to Glue job and followed the way to configure whl file as per the below link Import failure of s3fs library in AWS Glue I am getting the following error for the AWS Glue Python - 3 job ...
Phani's user avatar
  • 863
1 vote
1 answer
1k views

I have a numpy ndarray with 2 columns that looks like below [[1.8238497e+03 5.2642276e-06] [2.7092224e+03 6.7980350e-06] [2.3406370e+03 6.6842499e-06] ... [1.7234612e+03 6.6842499e-06] [2....
nad's user avatar
  • 2,890
2 votes
2 answers
4k views

What I'm trying to do is to connect a s3 bucket from my EC2 machine. This error comes up if I don't set the endpoint_url in s3fs.S3FileSystem(). Traceback (most recent call last): File "/usr/...
Trey Yi's user avatar
  • 89
0 votes
2 answers
2k views

Using s3fs, I am uploading a file to the already created s3 bucket (not deleting the bucket). On execution, the following error is thrown: [Operation Aborted]: A conflicting conditional operation is ...
Roxy's user avatar
  • 1,043
2 votes
0 answers
844 views

I'm trying to use Dask to get multiple files (JSON) from AWS S3 into memory in a Sagemaker Jupyter Notebook. When I submit 10 or 20 workers, everything runs smoothly. However, when I submit 100 ...
PHinchey's user avatar
0 votes
0 answers
314 views

I'm building a TFX pipeline that contains images as input from an S3 bucket. At the TF Transform component step, I'm attempting to read in a series of images with their URLs stored in TFX's ...
John Sukup's user avatar
2 votes
0 answers
758 views

I am using the following combination of h5py and s3fs to read a couple of small datasets from larger HDF5 files on Amazon S3. s3 = s3fs.S3FileSystem() h5_file = h5py.File(s3.open(s3_path,'rb'), 'r') ...
Emmanuel Wildiers's user avatar
5 votes
1 answer
3k views

I'm testing this locally where I have a ~/.aws/config file. ~/.aws/config looks some thing like: [profile a] ... [profile b] ... I also have a AWS_PROFILE environmental variable set as "a"....
Ray Bell's user avatar
  • 1,628
4 votes
2 answers
6k views

When i try importing the s3fs library in pyspark using the following code: import s3fs I get the following error: An error was encountered: cannot import name 'maybe_sync' from 'fsspec.asyn' (/usr/...
thentangler's user avatar
  • 1,264
4 votes
1 answer
6k views

I'm trying to use s3fs in python to connect to an s3 bucket. The associated credentials are saved in a profile called 'pete' in ~/.aws/credentials: [default] aws_access_key_id=**** ...
Pete M's user avatar
  • 194
0 votes
0 answers
896 views

I am trying to write a csv in S3 using S3FileSystem in python. Every time I write, I create a file with the current date-time ('%Y-%m-%d-%H-%M-%S') within a key of the current date ('%Y-%m-%d') so ...
nad's user avatar
  • 2,890
1 vote
1 answer
2k views

I've tried various ways to set the read timeout on a s3fs.S3FileSystem object such as s3 = s3fs.S3FileSystem(s3_additional_kwargs={"read_timeout": 500}, config_kwargs={"read_timeout&...
Andy's user avatar
  • 23
1 vote
1 answer
2k views

Writing xarray datasets to AWS S3 takes a surprisingly big amount of time, even when no data is actually written with compute=False. Here's an example: import fsspec import xarray as xr x = xr....
Val's user avatar
  • 7,093