1

Following is the StringIO object value

DebugPoint csv_log_stream type

csv_log_stream.getvalue()

Raw Output

'"2022-06-04 12:02:40,248",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60,function Name: ProcessWebSaleExportFilesInRSBlobStorage"\n"2022-06-04 12:02:40,252",azure_functions_worker,INFO,"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fcebfunction Name: ReadIntegrationInterfaceConfiguration"\n"2022-06-04 12:02:40,259",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fceb,function Name: ReadIntegrationInterfaceConfiguration"\n"2022-06-04 12:02:40,261",azure_functions_worker,INFO,"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225function Name: SaveLogsToRSBlobStorage"\n"2022-06-04 12:02:40,265",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225,function Name: SaveLogsToRSBlobStorage"\n"2022-06-04 12:02:43,000",azure_functions_worker,INFO,"Received FunctionInvocationRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60, function name: ProcessWebSaleExportFilesInRSBlobStorage, invocation ID: c42bf678-d155-4859-a71a-b0108645080d, function type: sync, sync threadpool max workers: 1000"\n"2022-06-04 12:02:43,007",root,INFO,Python HTTP trigger :: ProcessWebSaleExportFilesInRSBlobStorage function processed a request.\n"2022-06-04 12:02:43,008",root,INFO,Processing Request object started for the desired parameters.\n"2022-06-04 12:02:43,009",root,INFO,Processing Request object completed for the desired parameters.\n"2022-06-04 12:02:43,010",root,INFO,Processing Request object started for the desired parameters.\n"2022-06-04 12:02:43,011",root,INFO,Processing Request object completed for the desired parameters.\n"2022-06-04 12:02:43,041",azure.core.pipeline.policies.http_logging_policy,INFO,"Request URL: \'https://koxdsrssa.blob.core.windows.net/koxds-export?restype=REDACTED&comp=REDACTED&prefix=REDACTED&st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED\'\nRequest method: \'GET\'\nRequest headers:\n    \'x-ms-version\': \'REDACTED\'\n    \'Accept\': \'application/xml\'\n    \'User-Agent\': \'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)\'\n    \'x-ms-date\': \'REDACTED\'\n    \'x-ms-client-request-id\': \'79b647c5-e3ed-11ec-8c08-48a4728e3a8b\'\nNo body was attached to the request"\n"2022-06-04 12:02:43,564",azure.core.pipeline.policies.http_logging_policy,INFO,"Response status: 200\nResponse headers:\n    \'Transfer-Encoding\': \'chunked\'\n    \'Content-Type\': \'application/xml\'\n    \'Server\': \'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0\'\n    \'x-ms-request-id\': \'4a6cab6b-e01e-002d-5ffa-77769c000000\'\n    \'x-ms-client-request-id\': \'79b647c5-e3ed-11ec-8c08-48a4728e3a8b\'\n    \'x-ms-version\': \'REDACTED\'\n    \'Access-Control-Expose-Headers\': \'REDACTED\'\n    \'Access-Control-Allow-Origin\': \'REDACTED\'\n    \'Date\': \'Sat, 04 Jun 2022 10:02:43 GMT\'"\n"2022-06-04 12:02:44,070",azure.core.pipeline.policies.http_logging_policy,INFO,"Request URL: \'https://koxdsrssa.blob.core.windows.net/koxds-export/WebSale/Test/2022_06_03_20_13_23_782-0500_c841f873-9a12-4402-a164-5819cbcddc3e_Test_0.json?st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED\'\nRequest method: \'GET\'\nRequest headers:\n    \'x-ms-range\': \'REDACTED\'\n    \'x-ms-version\': \'REDACTED\'\n    \'Accept\': \'application/xml\'\n    \'User-Agent\': \'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)\'\n    \'x-ms-date\': \'REDACTED\'\n    \'x-ms-client-request-id\': \'7a5398fe-e3ed-11ec-a414-48a4728e3a8b\'\nNo body was attached to the request"\n"2022-06-04 12:02:44,226",azure.core.pipeline.policies.http_logging_policy,INFO,"Response status: 206\nResponse headers:\n    \'Content-Length\': \'8337358\'\n    \'Content-Type\': \'application/json\'\n    \'Content-Range\': \'REDACTED\'\n    \'Last-Modified\': \'Sat, 04 Jun 2022 01:14:56 GMT\'\n    \'Accept-Ranges\': \'REDACTED\'\n    \'ETag\': \'""0x8DA45C7A2F73E96""\'\n    \'Server\': \'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0\'\n    \'x-ms-request-id\': \'4a6cad87-e01e-002d-3cfa-77769c000000\'\n    \'x-ms-client-request-id\': \'7a5398fe-e3ed-11ec-a414-48a4728e3a8b\'\n    \'x-ms-version\': \'REDACTED\'\n    \'x-ms-creation-time\': \'REDACTED\'\n    \'x-ms-blob-content-md5\': \'REDACTED\'\n    \'x-ms-lease-status\': \'REDACTED\'\n    \'x-ms-lease-state\': \'REDACTED\'\n    \'x-ms-blob-type\': \'REDACTED\'\n    \'Content-Disposition\': \'REDACTED\'\n    \'x-ms-server-encrypted\': \'REDACTED\'\n    \'Access-Control-Expose-Headers\': \'REDACTED\'\n    \'Access-Control-Allow-Origin\': \'REDACTED\'\n    \'Date\': \'Sat, 04 Jun 2022 10:02:44 GMT\'"\n"2022-06-04 12:09:07,090",root,INFO,Total time taken: 6 minutes and 24 seconds\n'

Reading from StringIO to pandas.DataFrame:

df_logs = pd.read_csv(csv_log_stream, header=None)

Output:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "ProjectDir\\.venv\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 1235, in _make_engine
    return mapping[engine](f, **self.options)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 75, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 551, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

The above attempt to read DataFrame from StringIO throws error. So, I did the following and am getting empty DataFrame.

df_logs = pd.read_csv(csv_log_stream, names=["Timestamp", "LogName", "LogLevel", "LogMessage"])
print(df_logs)

Output:

Empty DataFrame
Columns: [Timestamp, LogName, LogLevel, LogMessage]
Index: []

I am not able to understand what am I doing wrong. My input StringIO value seems to be correct. What am I missing?!!

2
  • What type of an object is csv_log_stream? What does type(csv_log_stream) return? Commented Jun 3, 2022 at 15:44
  • csv_log_stream is StringIO object. getvalue() method returns the string value. Commented Jun 3, 2022 at 18:28

1 Answer 1

1

It might be that you are calling pd.read_csv on the string which StringIO.getvalue() outputs instead of the StringIO object itself:

import pandas as pd
from io import StringIO

file = StringIO(
    "\"2022-06-04 12:02:40,248\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60,function Name: ProcessWebSaleExportFilesInRSBlobStorage\"\n\"2022-06-04 12:02:40,252\",azure_functions_worker,INFO,\"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fcebfunction Name: ReadIntegrationInterfaceConfiguration\"\n\"2022-06-04 12:02:40,259\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fceb,function Name: ReadIntegrationInterfaceConfiguration\"\n\"2022-06-04 12:02:40,261\",azure_functions_worker,INFO,\"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225function Name: SaveLogsToRSBlobStorage\"\n\"2022-06-04 12:02:40,265\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225,function Name: SaveLogsToRSBlobStorage\"\n\"2022-06-04 12:02:43,000\",azure_functions_worker,INFO,\"Received FunctionInvocationRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60, function name: ProcessWebSaleExportFilesInRSBlobStorage, invocation ID: c42bf678-d155-4859-a71a-b0108645080d, function type: sync, sync threadpool max workers: 1000\"\n\"2022-06-04 12:02:43,007\",root,INFO,Python HTTP trigger :: ProcessWebSaleExportFilesInRSBlobStorage function processed a request.\n\"2022-06-04 12:02:43,008\",root,INFO,Processing Request object started for the desired parameters.\n\"2022-06-04 12:02:43,009\",root,INFO,Processing Request object completed for the desired parameters.\n\"2022-06-04 12:02:43,010\",root,INFO,Processing Request object started for the desired parameters.\n\"2022-06-04 12:02:43,011\",root,INFO,Processing Request object completed for the desired parameters.\n\"2022-06-04 12:02:43,041\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Request URL: 'https://koxdsrssa.blob.core.windows.net/koxds-export?restype=REDACTED&comp=REDACTED&prefix=REDACTED&st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED'\nRequest method: 'GET'\nRequest headers:\n    'x-ms-version': 'REDACTED'\n    'Accept': 'application/xml'\n    'User-Agent': 'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)'\n    'x-ms-date': 'REDACTED'\n    'x-ms-client-request-id': '79b647c5-e3ed-11ec-8c08-48a4728e3a8b'\nNo body was attached to the request\"\n\"2022-06-04 12:02:43,564\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Response status: 200\nResponse headers:\n    'Transfer-Encoding': 'chunked'\n    'Content-Type': 'application/xml'\n    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'\n    'x-ms-request-id': '4a6cab6b-e01e-002d-5ffa-77769c000000'\n    'x-ms-client-request-id': '79b647c5-e3ed-11ec-8c08-48a4728e3a8b'\n    'x-ms-version': 'REDACTED'\n    'Access-Control-Expose-Headers': 'REDACTED'\n    'Access-Control-Allow-Origin': 'REDACTED'\n    'Date': 'Sat, 04 Jun 2022 10:02:43 GMT'\"\n\"2022-06-04 12:02:44,070\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Request URL: 'https://koxdsrssa.blob.core.windows.net/koxds-export/WebSale/Test/2022_06_03_20_13_23_782-0500_c841f873-9a12-4402-a164-5819cbcddc3e_Test_0.json?st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED'\nRequest method: 'GET'\nRequest headers:\n    'x-ms-range': 'REDACTED'\n    'x-ms-version': 'REDACTED'\n    'Accept': 'application/xml'\n    'User-Agent': 'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)'\n    'x-ms-date': 'REDACTED'\n    'x-ms-client-request-id': '7a5398fe-e3ed-11ec-a414-48a4728e3a8b'\nNo body was attached to the request\"\n\"2022-06-04 12:02:44,226\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Response status: 206\nResponse headers:\n    'Content-Length': '8337358'\n    'Content-Type': 'application/json'\n    'Content-Range': 'REDACTED'\n    'Last-Modified': 'Sat, 04 Jun 2022 01:14:56 GMT'\n    'Accept-Ranges': 'REDACTED'\n    'ETag': '\"\"0x8DA45C7A2F73E96\"\"'\n    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'\n    'x-ms-request-id': '4a6cad87-e01e-002d-3cfa-77769c000000'\n    'x-ms-client-request-id': '7a5398fe-e3ed-11ec-a414-48a4728e3a8b'\n    'x-ms-version': 'REDACTED'\n    'x-ms-creation-time': 'REDACTED'\n    'x-ms-blob-content-md5': 'REDACTED'\n    'x-ms-lease-status': 'REDACTED'\n    'x-ms-lease-state': 'REDACTED'\n    'x-ms-blob-type': 'REDACTED'\n    'Content-Disposition': 'REDACTED'\n    'x-ms-server-encrypted': 'REDACTED'\n    'Access-Control-Expose-Headers': 'REDACTED'\n    'Access-Control-Allow-Origin': 'REDACTED'\n    'Date': 'Sat, 04 Jun 2022 10:02:44 GMT'\"\n\"2022-06-04 12:09:07,090\",root,INFO,Total time taken: 6 minutes and 24 seconds\n"
)


df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)
print(df)
# Output
                  Timestamp                                           LogName  \
0   2022-06-04 12:02:40,248                            azure_functions_worker
1   2022-06-04 12:02:40,252                            azure_functions_worker
2   2022-06-04 12:02:40,259                            azure_functions_worker
3   2022-06-04 12:02:40,261                            azure_functions_worker
4   2022-06-04 12:02:40,265                            azure_functions_worker
5   2022-06-04 12:02:43,000                            azure_functions_worker
6   2022-06-04 12:02:43,007                                              root
7   2022-06-04 12:02:43,008                                              root
8   2022-06-04 12:02:43,009                                              root
9   2022-06-04 12:02:43,010                                              root
10  2022-06-04 12:02:43,011                                              root
11  2022-06-04 12:02:43,041  azure.core.pipeline.policies.http_logging_policy
12  2022-06-04 12:02:43,564  azure.core.pipeline.policies.http_logging_policy
13  2022-06-04 12:02:44,070  azure.core.pipeline.policies.http_logging_policy
14  2022-06-04 12:02:44,226  azure.core.pipeline.policies.http_logging_policy
15  2022-06-04 12:09:07,090                                              root

   LogLevel                                         LogMessage
0      INFO  Successfully processed FunctionLoadRequest, re...
1      INFO  Received FunctionLoadRequest, request ID: 5bc6...
2      INFO  Successfully processed FunctionLoadRequest, re...
3      INFO  Received FunctionLoadRequest, request ID: 5bc6...
4      INFO  Successfully processed FunctionLoadRequest, re...
5      INFO  Received FunctionInvocationRequest, request ID...
6      INFO  Python HTTP trigger :: ProcessWebSaleExportFil...
7      INFO  Processing Request object started for the desi...
8      INFO  Processing Request object completed for the de...
9      INFO  Processing Request object started for the desi...
10     INFO  Processing Request object completed for the de...
11     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...
12     INFO  Response status: 200\nResponse headers:\n    '...
13     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...
14     INFO  Response status: 206\nResponse headers:\n    '...
15     INFO         Total time taken: 6 minutes and 24 seconds

Be careful with StringIO objects when accessing their content dynamically, which is not the same as playing with a raw string.

Here is an example with the same "file" object:

file.seek(20000)  # Change the stream position to the given byte offset.

df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)

print(df)
# Output
Empty DataFrame
Columns: [Timestamp, LogName, LogLevel, LogMessage]
Index: []

Whereas:

file.seek(0)  # Change the stream position to the beginning of the file

df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)

print(df)
# Ouput
   LogLevel                                         LogMessage  
0      INFO  Successfully processed FunctionLoadRequest, re...  
1      INFO  Received FunctionLoadRequest, request ID: 5bc6...  
2      INFO  Successfully processed FunctionLoadRequest, re...  
3      INFO  Received FunctionLoadRequest, request ID: 5bc6...  
4      INFO  Successfully processed FunctionLoadRequest, re...  
5      INFO  Received FunctionInvocationRequest, request ID...  
6      INFO  Python HTTP trigger :: ProcessWebSaleExportFil...  
7      INFO  Processing Request object started for the desi...  
8      INFO  Processing Request object completed for the de...  
9      INFO  Processing Request object started for the desi...  
10     INFO  Processing Request object completed for the de...  
11     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...  
12     INFO  Response status: 200\nResponse headers:\n    '...  
13     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...  
14     INFO  Response status: 206\nResponse headers:\n    '...  
15     INFO         Total time taken: 6 minutes and 24 seconds  
Sign up to request clarification or add additional context in comments.

10 Comments

I am using csv_log_stream StringIO object itself instead of the string value. But in your implementation I see that you have added three additional columns while reading it from StringIO i.e. "requestID", "functionID", "functionName". Any particular reason why you did that? Especially when you are initializing the StringIO object with only four columns.
Unfortunately, your solution doesn't work for me. I have added the raw string value to my question. Could you please try your solution again by initializing with the raw string value?
Seems like it's working for you with the raw string. It works for me also with the raw string but when executed as part of the main program, it doesn't. To ensure if I am actually working with StringIO object, I checked its type during a debug session and I have posted the screenshot from my IDE as well. Is there anything else I could try?
You are assuming that the object (csv_log_stream) you pass to pd.read_csv in your main code is not empty because calling getvalue on it returns a string, but you are disregarding the fact that getvalue consumes the StringIO object all at once, which is not the case of pd.read_csv when called with the StringIO object itself. Before creating the dataframe, you should call csv_log_stream.seek(0), see my updated answer.
Thank you for adding the detailed answer :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.