90 questions
0
votes
0
answers
51
views
Combing two .nc files with different dimensions using Icechunk, Virtualizarr, and Xarray
My overall goal is the set up a virtual dataset of ERA5 data using Icechunk. As a smaller test example, I'm trying to pull all the data located in the 194001 ERA5 folder. I've been mostly able to ...
0
votes
0
answers
39
views
pyiceberg-s3fs: can't set custom config_kwargs
When creating an instance of S3FileSystem class, you can provide the config_kwargs dictionary to set further properties (like region or signature_version).
The pyiceberg FileIO implementation is based ...
0
votes
0
answers
41
views
NotImplementedError when testing S3 connection during __init__ in s3fs.S3FileSystem subclass
I'm creating a subclass of s3fs.S3FileSystem that connects to either AWS S3 or MinIO based on environment variables. The connection should fail immediately during initialization if credentials are ...
0
votes
1
answer
178
views
How to configure jupyter-fs to connect to MinIO with self-signed SSL certificates?
I am using jupyter-fs to connect JupyterHub's notebooks with a MinIO instance deployed in another namespace on Kubernetes. The MinIO endpoint is configured with HTTPS and uses a self-signed ...
0
votes
1
answer
32
views
Identify changed directories in Object Storage since a specific datetime with Python
Having an s3 object storage, I want to know which directories in a base directory have changed since a give datetime.
It would work similar to get_changed_directories:
bucket_directory = "...
0
votes
1
answer
387
views
Using fsspec with aws profile_name
With S3fs we can set
fs = s3fs.S3FileSystem(profile=profile_name)
However this passing doesn't work for fsspec with caching:
fs = fsspec.filesystem(
"filecache",
...
0
votes
1
answer
195
views
Ask for s3fs-fuse synchronize
I installed s3fs-fuse on the centos server and I have a question about synchronization.
The current situation is that Object Storage is synchronized with the CentOS server.
I'm asking if every time a ...
0
votes
1
answer
369
views
How to use Python's S3FS to delete a file that contains dashes and brackets in the filename?
I have a file named data_[2022-10-03:2022-10-23].csv.gzip in S3, inside a bucket and folder named s3://<bucket_name>/data/cache/
I am attempting to delete this file using S3FS. When I attempt to ...
2
votes
0
answers
250
views
Making s3fs work alongside ProcessPoolExecutor
I've been struggling to make s3fs and ProcessPoolExecutor work together. Essentially, the issue is that s3fs, by default, holds some session information for connections. So, that doesn't play well ...
1
vote
1
answer
2k
views
Disable ssl validation while connecting to s3 using pyarrow fs library/ s3fs library
I am using pyarrow fs.S3FileSystem library to write a csv to s3 bucket. Although this code runs fine in my local when I deploy to VM (linux) it throws error:
OSError: When listing objects under key xx ...
0
votes
1
answer
482
views
how to copy s3 object from one bucket to another using python s3fs
Using python s3fs, how do you copy an object from one s3 bucket to another? I have found answers using boto3, but could not find anything when looking through the s3fs docs.
0
votes
1
answer
2k
views
ImportError: Install s3fs to access S3 on amazon EMR 6.3.0
I have the following error on my notebook after setting up and EMR 6.3.0:
An error was encountered:
Install s3fs to access S3
Traceback (most recent call last):
File "/usr/local/lib64/python3.7/...
0
votes
1
answer
2k
views
How to read a file from s3 using s3fs
I have the following method in Python:
def read_file(self, bucket, table_name, file_name, format="csv"):
data = None
read_from_path = f"s3://{bucket}/{table_name}/{file_name}&...
-1
votes
1
answer
840
views
s3fs.put into empty and non-empty S3 folder
I am copying folder to S3 with s3fs.put(..., recursive=True) and I experience weird behavior. The code is:
import s3fs
source_path = 'foo/bar' # there are some <files and ...
2
votes
2
answers
2k
views
How to access my own fake bucket with S3FileSystem, Pytest and Moto
I'm trying to implement Unit Tests using Pytest, Moto (4.1.6) and s3fs (0.4.2) for my functions that interact with S3.
So far I am able to create a bucket and populate it with all the files that live ...
14
votes
4
answers
10k
views
Resolving dependencies fails on boto3 and s3fs using poetry
I can install boto3, s3fs and pandas using :
pip install boto3 pandas s3fs
But it fails with poetry :
poetry add boto3 pandas s3fs
Here is the error :
Because no versions of s3fs match >2023.3.0,&...
0
votes
2
answers
2k
views
s3fs FileNotFoundError
I am only able to gain limited/top-level access to my aws s3. I can see the buckets, but not their contents; neither subfolders nor files. I'm running everything from inside a conda environment. I've ...
1
vote
0
answers
345
views
S3F3 framework with sso credentials
I'm exploring the S3FS framework which I need for reading/writing from/to the S3 file system.
From what I can see in docs, we can pass the AWS credentials explicitly, but I don't see any information ...
1
vote
0
answers
913
views
Using the `s3fs` python library with Task IAM role credentials on AWS Batch
I'm trying to get an ML job to run on AWS Batch. The job runs in a docker container, using credentials generated for a Task IAM Role.
I use DVC to manage the large data files needed for the task, ...
1
vote
1
answer
2k
views
How to connect python s3fs client to a running Minio docker container?
For test purposes, I'm trying to connect a module that intoduces an absration layer over s3fs with custom business logic.
It seems like I have trouble connecting the s3fs client to the Minio container....
0
votes
0
answers
299
views
Want to create S3 Filesystem in python with stable library
I want to create s3 file system for uploading files to s3 bucket using the pyarrows write_to_dataset function
fs = s3fs.S3FileSystem()
pa.parquet.write_to_dataset(table, root_path=output_folder, ...
-1
votes
1
answer
348
views
How do I download a linked pdf file from its url using python?
I have multiple URL's like 'https://static.nseindia.com/s3fs-public/2022-09/ind_prs01092022.pdf' and I want to loop through an array of these and download them to a local folder.
I saw that I may need ...
0
votes
1
answer
263
views
Copy only new objects from S3 to on-premise server
I have a S3 bucket where objects are generated from salesforce on daily basis. I want to copy those objects from S3 bucket to a local Linux server. An application will run on that Linux server which ...
0
votes
1
answer
632
views
Trouble with formatting when writing a csv to s3 with s3fs
I'm pushing a dataframe to an s3 bucket using s3fs with the following code:
s3fs = s3fs.S3FileSystem(anon=False)
with s3fs.open(f"bucket-name/csv-name.csv",'w') as f:
my_df.to_csv(f)
...
1
vote
1
answer
2k
views
Read Parquet files with Pandas from S3 bucket directory with Proxy
I would like to read a S3 directory with multiple parquet files with same schema.
The implemented code works outside the proxy, but the main problem is when enabling the proxy, I'm facing the ...
2
votes
1
answer
839
views
xarray I/O operation on closed file
I am opening and using a netcdf file that is located on s3. I have the following code, however it creates an exception.
import s3fs
import xarray as xr
filepath = "s3://mybucket/myfile.nc"
...
3
votes
1
answer
2k
views
Read timeout in pd.read_parquet from S3, and understanding configs
I'm trying to simplify access to datasets in various file formats (csv, pickle, feather, partitioned parquet, ...) stored as S3 objects. Since some users I support have different environments with ...
3
votes
1
answer
1k
views
s3fs local filecache of versioned flies
I want to use s3fs based on fsspec to access files on S3. Mainly because of 2 neat features:
local caching of files to disk with checking if files change, i.e. a file gets redownloaded if the local ...
2
votes
1
answer
2k
views
Pandas 1.4.2 gives errors for installing s3fs while reading csv from S3 bucket
I am experiecing issue with pandas latest release 1.4.2 while reading csv file from S3.
I am using AWS Lambda python runtime environment using python 3.8, that comes with following boto3 and botocore ...
0
votes
1
answer
614
views
Can I use s3fs to perform "free data transfer" between AWS EC2 and S3?
I am looking to deploy a Python Flask app on an AWS EC2 (Ubuntu 20.04) instance. The app fetches data from an S3 bucket (in the same region as the EC2 instance) and performs some data processing.
I ...
0
votes
1
answer
2k
views
s3fs library unable to be imported in python
I get this error when trying to import s3fs in Python 3.10.2 in Windows:
ImportError: cannot import name 'is_valid_ipv6_endpoint_url' from 'botocore.endpoint'
I found this question in Github that ...
1
vote
1
answer
529
views
S3 to Pandas with local variable authentication
I'm downloading a file (to be precise a parquet set of files) from S3 and converting that to a Pandas DataFrame. I'm doing that with the Pandas function read_parquet and s3fs, as described here:
df = ...
1
vote
2
answers
5k
views
What is the working combination of the s3fs and fsspec version? ImportError: cannot import name 'maybe_sync' from 'fsspec.asyn'
I am using the latest version of s3fs-0.5.2 and fsspec-0.9.0, when import s3fs, encountered the following error:
File "/User/.conda/envs/py376/lib/python3.7/site-packages/s3fs/__init__.py", ...
1
vote
2
answers
969
views
Snowflake is not able to download file from S3 without access key, while s3fs is able to download that file from S3
I have a S3 URL to a public file similar to the following URL example: s3://test-public/new/solution/file.csv (this is not the actual link . just a close example to the one i'm using)
I am able to ...
0
votes
0
answers
2k
views
Python3: ImportError: cannot import name 'InvalidProxiesConfigError' from 'botocore.httpsession'
My use case is that I am trying to write my dataframe to S3 bucket for which I installed s3fs==2015.5.0 using pip3. Now when I run the code
import s3fs
def my_func():
# my logic
my_func()
It ...
0
votes
1
answer
336
views
use boto for gzipping files instead of sfs3
import contextlib
import gzip
import s3fs
AWS_S3 = s3fs.S3FileSystem(anon=False) # AWS env must be set up correctly
source_file_path = "/tmp/your_file.txt"
s3_file_path = "my-bucket/...
14
votes
1
answer
14k
views
s3fs suddenly stopped working in Google Colab with error "AttributeError: module 'aiobotocore' has no attribute 'AioSession'" [closed]
Yesterday the following cell sequence in Google Colab would work.
(I am using colab-env to import environment variables from Google Drive.)
This morning, when I run the same code, I get the following ...
2
votes
1
answer
1k
views
glue python shell job failure to import s3fs
I am new to Glue job and followed the way to configure whl file as per the below link
Import failure of s3fs library in AWS Glue
I am getting the following error for the AWS Glue Python - 3 job
...
1
vote
1
answer
1k
views
How to write a numpy array as a csv to S3
I have a numpy ndarray with 2 columns that looks like below
[[1.8238497e+03 5.2642276e-06]
[2.7092224e+03 6.7980350e-06]
[2.3406370e+03 6.6842499e-06]
...
[1.7234612e+03 6.6842499e-06]
[2....
2
votes
2
answers
4k
views
Does s3fs.S3FileSystem() always need a specific region setting?
What I'm trying to do is to connect a s3 bucket from my EC2 machine.
This error comes up if I don't set the endpoint_url in s3fs.S3FileSystem().
Traceback (most recent call last):
File "/usr/...
0
votes
2
answers
2k
views
A conflicting conditional operation is currently in progress against this resource. (bucket already created)
Using s3fs, I am uploading a file to the already created s3 bucket (not deleting the bucket). On execution, the following error is thrown:
[Operation Aborted]: A conflicting conditional operation is ...
2
votes
0
answers
844
views
AWS Sagemaker notebook intermittent 'Unable to locate credentials'
I'm trying to use Dask to get multiple files (JSON) from AWS S3 into memory in a Sagemaker Jupyter Notebook.
When I submit 10 or 20 workers, everything runs smoothly. However, when I submit 100 ...
0
votes
0
answers
314
views
Accessing S3 bucket object in TFX pipeline with S3FS
I'm building a TFX pipeline that contains images as input from an S3 bucket. At the TF Transform component step, I'm attempting to read in a series of images with their URLs stored in TFX's ...
2
votes
0
answers
758
views
h5py slow when reading through an s3fs file object
I am using the following combination of h5py and s3fs to read a couple of small datasets from larger HDF5 files on Amazon S3.
s3 = s3fs.S3FileSystem()
h5_file = h5py.File(s3.open(s3_path,'rb'), 'r')
...
5
votes
1
answer
3k
views
use AWS_PROFILE in pandas.read_parquet
I'm testing this locally where I have a ~/.aws/config file.
~/.aws/config looks some thing like:
[profile a]
...
[profile b]
...
I also have a AWS_PROFILE environmental variable set as "a"....
4
votes
2
answers
6k
views
cannot import s3fs in pyspark
When i try importing the s3fs library in pyspark using the following code:
import s3fs
I get the following error:
An error was encountered: cannot import name 'maybe_sync' from
'fsspec.asyn' (/usr/...
4
votes
1
answer
6k
views
Profile argument in python s3fs
I'm trying to use s3fs in python to connect to an s3 bucket. The associated credentials are saved in a profile called 'pete' in ~/.aws/credentials:
[default]
aws_access_key_id=****
...
0
votes
0
answers
896
views
Writing to an S3 key based on current date
I am trying to write a csv in S3 using S3FileSystem in python. Every time I write, I create a file with the current date-time ('%Y-%m-%d-%H-%M-%S') within a key of the current date ('%Y-%m-%d') so ...
1
vote
1
answer
2k
views
What is the correct way to set timeouts in s3fs.S3FileSystem?
I've tried various ways to set the read timeout on a s3fs.S3FileSystem object such as
s3 = s3fs.S3FileSystem(s3_additional_kwargs={"read_timeout": 500}, config_kwargs={"read_timeout&...
1
vote
1
answer
2k
views
Zarr: improve xarray writing performance to S3
Writing xarray datasets to AWS S3 takes a surprisingly big amount of time, even when no data is actually written with compute=False.
Here's an example:
import fsspec
import xarray as xr
x = xr....