I have a txt file (which is basically a log file) having blocks of text. Each block or paragraph has certain information about the event. What I need is to extract only a certain information from each block and save it as an array or list.
Each paragraph has following format:
id: [id] Name: [name] time: [timestamp] user: [username] ip: [ip_address of the user] processing_time: [processing time in seconds]
A sample paragraph can be:
id: 23455 Name: ymalsen time: 03:20:20 user: ymanlls ip: 230.33.45.32 processing_time: 05
What I need to extract from each block is:
id:[]
Name:[]
processing_time: []
So that my resulting array for each block's result would be:
array = [id, name, processing_time]
An issue is that my text files are fairly large in size and have thousands of these records. What is the best way to do what I need to do in Python (2.7 to be precise). Once I have each array (corresponding to each record), I will save all of them in a single ND numpy array and that is it. Any help will be greatly appreciated.
Here is something I am using to plainly extract all the lines starting with ID:
import string
log = 'log_1.txt'
file = open(log, 'r')
name_array = []
line = file.readlines()
for a in line:
if a.startswith('Name: '):
' '.join(a.split())
host_array.append(a)
But it simply extracts all the blocks and puts them into a single array, which is kind of useless given that I am following the parameters of Id, name, etc.
Name:-- contain whitespace?