0

I have an array of unix timestamps:

d = {'timestamp': [1551675611, 1551676489, 1551676511, 1551676533, 1551676554]}
df = pd.DataFrame(data=d)
timestamps = df[['timestamp']].values

That I would like to format into a concatenated string, like so:

'1551675611;1551676489;1551676511;1551676533;1551676554'

So far I have prepared this:

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp:f}" for timestamp in timestamps])
    return timestamps

Running:

format_timestamps(timestamps)

Gives the following error:

TypeError: unsupported format string passed to numpy.ndarray.__format__

Since I'm new to python I'm having trouble understanding how I can fix the error

1
  • replace "{timestamp:f}" with "{timestamp[0]}", does it work? Commented Dec 16, 2020 at 11:30

4 Answers 4

2

It's because in your list comprehension, timestamp is a numpy.ndarray object. Just flatten first and convert to string:

>>> ";".join(timestamps.flatten().astype(str))
'1551675611;1551676489;1551676511;1551676533;1551676554'
Sign up to request clarification or add additional context in comments.

Comments

2

Since you have pandas, why not consider a pandaic solution with str.cat:

df['timestamp'].astype(str).str.cat(sep=';')
# '1551675611;1551676489;1551676511;1551676533;1551676554'

If NaNs or invalid data are a possibility, you can handle them with pd.to_numeric:

(pd.to_numeric(df['timestamp'], errors='coerce')
   .dropna()
   .astype(int)
   .astype(str)
   .str.cat(sep=';'))
# '1551675611;1551676489;1551676511;1551676533;1551676554'

Another idea is to iterate over the list of timestamps and join:

';'.join([f'{t}' for t in  df['timestamp'].tolist()])
# '1551675611;1551676489;1551676511;1551676533;1551676554'

1 Comment

.str.cat ah, always forget about that guy.
1

Why the error?

You're getting this error because of how you extract the 'timestamp' column values with the following line:

timestamps = df[['timestamp']].values

Accessing DataFrame column values passing a list of column names as here will return a multi-dimensional ndarray with the top-level containing ndarray objects containing values for each column name listed for each row in the DataFrame. This approach is generally only useful when selecting multiple columns by name.

The error is being thrown by your function because eachtimestamp here:

";".join([f"{timestamp:f}" for timestamp in timestamps])

Is an ndarray containing a single value when timestamps is defined as in your original post - where a str value would be desirable/expected.

Accounting for the error

To remedy this error in your code, simply replace:

timestamps = df[['timestamp']].values

With:

timestamps = df['timestamp'].values

By passing a single str to extract a single column from your DataFrame, timestamps will here be defined as a one-dimensional ndarray with 'timestamp' column values for each row stored within - which will pass through your original format_timestamps without error.

format_timestamps

Running format_timestamps(timestamps) using the above approach and your original implementation of format_timestamps will return:

'1551675611.000000;1551676489.000000;1551676511.000000;1551676533.000000;1551676554.000000'

This is better (no errors at least) but still not quite what you want. This root of this issue is that you are passing f as a format specifier when joining timestamp values, this will format each value as a float when in actuality you want to format each value as an int (format specifier d).

You can either, change your format specifier from f to d in your function definition.

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp:d}" for timestamp in timestamps])
    return timestamps

Or simply not pass a format specifier - as timestamps values are already numpy.int64 type.

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp}" for timestamp in timestamps])
    return timestamps

Running format_timestamps(timestamps) using either definition above will return what you're after:

'1551675611;1551676489;1551676511;1551676533;1551676554'

Comments

1

A quick fix to your code would be:

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp[0]}" for timestamp in timestamps])
    return timestamps

Here I only replaced timestamp:f with timestamp[0], so you get each timestamp as a scalar instead of an array

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.