Newest 'python-s3fs' Questions

0 votes

0 answers

51 views

Combing two .nc files with different dimensions using Icechunk, Virtualizarr, and Xarray

My overall goal is the set up a virtual dataset of ERA5 data using Icechunk. As a smaller test example, I'm trying to pull all the data located in the 194001 ERA5 folder. I've been mostly able to ...

Kieran Bartels

1

asked Jul 16 at 18:36

0 votes

0 answers

39 views

pyiceberg-s3fs: can't set custom config_kwargs

When creating an instance of S3FileSystem class, you can provide the config_kwargs dictionary to set further properties (like region or signature_version). The pyiceberg FileIO implementation is based ...

Doel10

11

asked Apr 29 at 7:22

0 votes

0 answers

41 views

NotImplementedError when testing S3 connection during init in s3fs.S3FileSystem subclass

I'm creating a subclass of s3fs.S3FileSystem that connects to either AWS S3 or MinIO based on environment variables. The connection should fail immediately during initialization if credentials are ...

Jose Constenla

121

asked Mar 25 at 13:56

0 votes

1 answer

178 views

How to configure jupyter-fs to connect to MinIO with self-signed SSL certificates?

I am using jupyter-fs to connect JupyterHub's notebooks with a MinIO instance deployed in another namespace on Kubernetes. The MinIO endpoint is configured with HTTPS and uses a self-signed ...

A_A

36

asked Jan 27 at 18:24

0 votes

1 answer

32 views

Identify changed directories in Object Storage since a specific datetime with Python

Having an s3 object storage, I want to know which directories in a base directory have changed since a give datetime. It would work similar to get_changed_directories: bucket_directory = "...

Joost Döbken

4,157

asked Jan 16 at 15:22

0 votes

1 answer

387 views

Using fsspec with aws profile_name

With S3fs we can set fs = s3fs.S3FileSystem(profile=profile_name) However this passing doesn't work for fsspec with caching: fs = fsspec.filesystem( "filecache", ...

Roelant

5,229

asked May 22, 2024 at 15:11

0 votes

1 answer

195 views

Ask for s3fs-fuse synchronize

I installed s3fs-fuse on the centos server and I have a question about synchronization. The current situation is that Object Storage is synchronized with the CentOS server. I'm asking if every time a ...

hyundai-autoever

1

asked May 21, 2024 at 5:40

0 votes

1 answer

369 views

How to use Python's S3FS to delete a file that contains dashes and brackets in the filename?

I have a file named data_[2022-10-03:2022-10-23].csv.gzip in S3, inside a bucket and folder named s3://<bucket_name>/data/cache/ I am attempting to delete this file using S3FS. When I attempt to ...

Edy Bourne

6,306

asked Mar 27, 2024 at 0:17

2 votes

0 answers

250 views

Making s3fs work alongside ProcessPoolExecutor

I've been struggling to make s3fs and ProcessPoolExecutor work together. Essentially, the issue is that s3fs, by default, holds some session information for connections. So, that doesn't play well ...

David Moye

818

asked Nov 13, 2023 at 15:28

1 vote

1 answer

2k views

Disable ssl validation while connecting to s3 using pyarrow fs library/ s3fs library

I am using pyarrow fs.S3FileSystem library to write a csv to s3 bucket. Although this code runs fine in my local when I deploy to VM (linux) it throws error: OSError: When listing objects under key xx ...

prashant

11

asked Aug 17, 2023 at 20:33

0 votes

1 answer

482 views

how to copy s3 object from one bucket to another using python s3fs

Using python s3fs, how do you copy an object from one s3 bucket to another? I have found answers using boto3, but could not find anything when looking through the s3fs docs.

jjbskir

11.3k

asked Jul 12, 2023 at 18:30

0 votes

1 answer

2k views

ImportError: Install s3fs to access S3 on amazon EMR 6.3.0

I have the following error on my notebook after setting up and EMR 6.3.0: An error was encountered: Install s3fs to access S3 Traceback (most recent call last): File "/usr/local/lib64/python3.7/...

Airone

1

asked Jul 2, 2023 at 17:39

0 votes

1 answer

2k views

How to read a file from s3 using s3fs

I have the following method in Python: def read_file(self, bucket, table_name, file_name, format="csv"): data = None read_from_path = f"s3://{bucket}/{table_name}/{file_name}&...

HuLu ViCa

5,515

asked Jun 8, 2023 at 19:48

-1 votes

1 answer

840 views

s3fs.put into empty and non-empty S3 folder

I am copying folder to S3 with s3fs.put(..., recursive=True) and I experience weird behavior. The code is: import s3fs source_path = 'foo/bar' # there are some <files and ...

Pepacz

969

asked May 23, 2023 at 11:59

2 votes

2 answers

2k views

How to access my own fake bucket with S3FileSystem, Pytest and Moto

I'm trying to implement Unit Tests using Pytest, Moto (4.1.6) and s3fs (0.4.2) for my functions that interact with S3. So far I am able to create a bucket and populate it with all the files that live ...

A Campos

833

asked Mar 31, 2023 at 21:10

14 votes

4 answers

10k views

Resolving dependencies fails on boto3 and s3fs using poetry

I can install boto3, s3fs and pandas using : pip install boto3 pandas s3fs But it fails with poetry : poetry add boto3 pandas s3fs Here is the error : Because no versions of s3fs match >2023.3.0,&...

jtobelem

981

asked Mar 15, 2023 at 9:52

0 votes

2 answers

2k views

s3fs FileNotFoundError

I am only able to gain limited/top-level access to my aws s3. I can see the buckets, but not their contents; neither subfolders nor files. I'm running everything from inside a conda environment. I've ...

Dylan Bodkin

1

asked Feb 16, 2023 at 23:45

1 vote

0 answers

345 views

S3F3 framework with sso credentials

I'm exploring the S3FS framework which I need for reading/writing from/to the S3 file system. From what I can see in docs, we can pass the AWS credentials explicitly, but I don't see any information ...

Monica

1,070

asked Feb 14, 2023 at 17:02

1 vote

0 answers

913 views

Using the `s3fs` python library with Task IAM role credentials on AWS Batch

I'm trying to get an ML job to run on AWS Batch. The job runs in a docker container, using credentials generated for a Task IAM Role. I use DVC to manage the large data files needed for the task, ...

Kevin Yancey

11

asked Feb 3, 2023 at 22:13

1 vote

1 answer

2k views

How to connect python s3fs client to a running Minio docker container?

For test purposes, I'm trying to connect a module that intoduces an absration layer over s3fs with custom business logic. It seems like I have trouble connecting the s3fs client to the Minio container....

KaizenCat

11

asked Jan 11, 2023 at 19:43

0 votes

0 answers

299 views

Want to create S3 Filesystem in python with stable library

I want to create s3 file system for uploading files to s3 bucket using the pyarrows write_to_dataset function fs = s3fs.S3FileSystem() pa.parquet.write_to_dataset(table, root_path=output_folder, ...

amber_coder

67

asked Dec 12, 2022 at 13:34

-1 votes

1 answer

348 views

How do I download a linked pdf file from its url using python?

I have multiple URL's like 'https://static.nseindia.com/s3fs-public/2022-09/ind_prs01092022.pdf' and I want to loop through an array of these and download them to a local folder. I saw that I may need ...

wanderingtrader

3

asked Sep 13, 2022 at 9:06

0 votes

1 answer

263 views

Copy only new objects from S3 to on-premise server

I have a S3 bucket where objects are generated from salesforce on daily basis. I want to copy those objects from S3 bucket to a local Linux server. An application will run on that Linux server which ...

jarral rajput

33

asked Aug 19, 2022 at 8:02

0 votes

1 answer

632 views

Trouble with formatting when writing a csv to s3 with s3fs

I'm pushing a dataframe to an s3 bucket using s3fs with the following code: s3fs = s3fs.S3FileSystem(anon=False) with s3fs.open(f"bucket-name/csv-name.csv",'w') as f: my_df.to_csv(f) ...

ire

591

asked Jul 8, 2022 at 13:02

1 vote

1 answer

2k views

Read Parquet files with Pandas from S3 bucket directory with Proxy

I would like to read a S3 directory with multiple parquet files with same schema. The implemented code works outside the proxy, but the main problem is when enabling the proxy, I'm facing the ...

HouKaide

363

asked Jun 30, 2022 at 10:21

2 votes

1 answer

839 views

xarray I/O operation on closed file

I am opening and using a netcdf file that is located on s3. I have the following code, however it creates an exception. import s3fs import xarray as xr filepath = "s3://mybucket/myfile.nc" ...

Scott

175

asked Jun 21, 2022 at 18:50

3 votes

1 answer

2k views

Read timeout in pd.read_parquet from S3, and understanding configs

I'm trying to simplify access to datasets in various file formats (csv, pickle, feather, partitioned parquet, ...) stored as S3 objects. Since some users I support have different environments with ...

Wassadamo

1,416

asked May 5, 2022 at 21:11

3 votes

1 answer

1k views

s3fs local filecache of versioned flies

I want to use s3fs based on fsspec to access files on S3. Mainly because of 2 neat features: local caching of files to disk with checking if files change, i.e. a file gets redownloaded if the local ...

Darkdragon84

931

asked Apr 15, 2022 at 9:03

2 votes

1 answer

2k views

Pandas 1.4.2 gives errors for installing s3fs while reading csv from S3 bucket

I am experiecing issue with pandas latest release 1.4.2 while reading csv file from S3. I am using AWS Lambda python runtime environment using python 3.8, that comes with following boto3 and botocore ...

Amit Kumar

21

asked Apr 8, 2022 at 18:28

0 votes

1 answer

614 views

Can I use s3fs to perform "free data transfer" between AWS EC2 and S3?

I am looking to deploy a Python Flask app on an AWS EC2 (Ubuntu 20.04) instance. The app fetches data from an S3 bucket (in the same region as the EC2 instance) and performs some data processing. I ...

mfcss

1,611

asked Mar 8, 2022 at 8:25

0 votes

1 answer

2k views

s3fs library unable to be imported in python

I get this error when trying to import s3fs in Python 3.10.2 in Windows: ImportError: cannot import name 'is_valid_ipv6_endpoint_url' from 'botocore.endpoint' I found this question in Github that ...

HuLu ViCa

5,515

asked Feb 28, 2022 at 17:50

1 vote

1 answer

529 views

S3 to Pandas with local variable authentication

I'm downloading a file (to be precise a parquet set of files) from S3 and converting that to a Pandas DataFrame. I'm doing that with the Pandas function read_parquet and s3fs, as described here: df = ...

gsmafra

2,504

asked Feb 8, 2022 at 16:21

1 vote

2 answers

5k views

What is the working combination of the s3fs and fsspec version? ImportError: cannot import name 'maybe_sync' from 'fsspec.asyn'

I am using the latest version of s3fs-0.5.2 and fsspec-0.9.0, when import s3fs, encountered the following error: File "/User/.conda/envs/py376/lib/python3.7/site-packages/s3fs/__init__.py", ...

xsqian

299

asked Nov 6, 2021 at 1:35

1 vote

2 answers

969 views

Snowflake is not able to download file from S3 without access key, while s3fs is able to download that file from S3

I have a S3 URL to a public file similar to the following URL example: s3://test-public/new/solution/file.csv (this is not the actual link . just a close example to the one i'm using) I am able to ...

Dror

5,535

asked Nov 4, 2021 at 11:06

0 votes

0 answers

2k views

Python3: ImportError: cannot import name 'InvalidProxiesConfigError' from 'botocore.httpsession'

My use case is that I am trying to write my dataframe to S3 bucket for which I installed s3fs==2015.5.0 using pip3. Now when I run the code import s3fs def my_func(): # my logic my_func() It ...

muazfaiz

5,079

asked Oct 20, 2021 at 8:26

0 votes

1 answer

336 views

use boto for gzipping files instead of sfs3

import contextlib import gzip import s3fs AWS_S3 = s3fs.S3FileSystem(anon=False) # AWS env must be set up correctly source_file_path = "/tmp/your_file.txt" s3_file_path = "my-bucket/...

x89

3,532

asked Oct 20, 2021 at 7:09

14 votes

1 answer

14k views

s3fs suddenly stopped working in Google Colab with error "AttributeError: module 'aiobotocore' has no attribute 'AioSession'" [closed]

Yesterday the following cell sequence in Google Colab would work. (I am using colab-env to import environment variables from Google Drive.) This morning, when I run the same code, I get the following ...

Andrew Fogg

705

asked Aug 20, 2021 at 15:54

2 votes

1 answer

1k views

glue python shell job failure to import s3fs

I am new to Glue job and followed the way to configure whl file as per the below link Import failure of s3fs library in AWS Glue I am getting the following error for the AWS Glue Python - 3 job ...

Phani

863

asked Aug 5, 2021 at 1:52

1 vote

1 answer

1k views

How to write a numpy array as a csv to S3

I have a numpy ndarray with 2 columns that looks like below [[1.8238497e+03 5.2642276e-06] [2.7092224e+03 6.7980350e-06] [2.3406370e+03 6.6842499e-06] ... [1.7234612e+03 6.6842499e-06] [2....

nad

2,890

asked Jul 30, 2021 at 20:13

2 votes

2 answers

4k views

Does s3fs.S3FileSystem() always need a specific region setting?

What I'm trying to do is to connect a s3 bucket from my EC2 machine. This error comes up if I don't set the endpoint_url in s3fs.S3FileSystem(). Traceback (most recent call last): File "/usr/...

Trey Yi

89

asked Jul 29, 2021 at 1:23

0 votes

2 answers

2k views

A conflicting conditional operation is currently in progress against this resource. (bucket already created)

Using s3fs, I am uploading a file to the already created s3 bucket (not deleting the bucket). On execution, the following error is thrown: [Operation Aborted]: A conflicting conditional operation is ...

Roxy

1,043

asked Jul 15, 2021 at 15:15

2 votes

0 answers

844 views

AWS Sagemaker notebook intermittent 'Unable to locate credentials'

I'm trying to use Dask to get multiple files (JSON) from AWS S3 into memory in a Sagemaker Jupyter Notebook. When I submit 10 or 20 workers, everything runs smoothly. However, when I submit 100 ...

PHinchey

31

asked Jul 1, 2021 at 4:37

0 votes

0 answers

314 views

Accessing S3 bucket object in TFX pipeline with S3FS

I'm building a TFX pipeline that contains images as input from an S3 bucket. At the TF Transform component step, I'm attempting to read in a series of images with their URLs stored in TFX's ...

John Sukup

323

asked Jun 6, 2021 at 1:29

2 votes

0 answers

758 views

h5py slow when reading through an s3fs file object

I am using the following combination of h5py and s3fs to read a couple of small datasets from larger HDF5 files on Amazon S3. s3 = s3fs.S3FileSystem() h5_file = h5py.File(s3.open(s3_path,'rb'), 'r') ...

Emmanuel Wildiers

21

asked Apr 29, 2021 at 21:33

5 votes

1 answer

3k views

use AWS_PROFILE in pandas.read_parquet

I'm testing this locally where I have a ~/.aws/config file. ~/.aws/config looks some thing like: [profile a] ... [profile b] ... I also have a AWS_PROFILE environmental variable set as "a"....

Ray Bell

1,628

asked Apr 28, 2021 at 4:28

4 votes

2 answers

6k views

cannot import s3fs in pyspark

When i try importing the s3fs library in pyspark using the following code: import s3fs I get the following error: An error was encountered: cannot import name 'maybe_sync' from 'fsspec.asyn' (/usr/...

thentangler

1,264

asked Apr 21, 2021 at 18:17

4 votes

1 answer

6k views

Profile argument in python s3fs

I'm trying to use s3fs in python to connect to an s3 bucket. The associated credentials are saved in a profile called 'pete' in ~/.aws/credentials: [default] aws_access_key_id=**** ...

Pete M

194

asked Mar 28, 2021 at 17:41

0 votes

0 answers

896 views

Writing to an S3 key based on current date

I am trying to write a csv in S3 using S3FileSystem in python. Every time I write, I create a file with the current date-time ('%Y-%m-%d-%H-%M-%S') within a key of the current date ('%Y-%m-%d') so ...

nad

2,890

asked Mar 17, 2021 at 17:23

1 vote

1 answer

2k views

What is the correct way to set timeouts in s3fs.S3FileSystem?

I've tried various ways to set the read timeout on a s3fs.S3FileSystem object such as s3 = s3fs.S3FileSystem(s3_additional_kwargs={"read_timeout": 500}, config_kwargs={"read_timeout&...

Andy

23

asked Mar 16, 2021 at 15:07

1 vote

1 answer

2k views

Zarr: improve xarray writing performance to S3

Writing xarray datasets to AWS S3 takes a surprisingly big amount of time, even when no data is actually written with compute=False. Here's an example: import fsspec import xarray as xr x = xr....

Val

7,093

asked Mar 11, 2021 at 9:57

Collectives™ on Stack Overflow