Split a variable table in python

Question

After call lsof Im looking the generic way to split every row to get in a string each cell of the table, the problem came because each time the command is called the size of every column can change.

COMMAND     PID       USER   FD      TYPE             DEVICE  SIZE/OFF       NODE NAME
init          1       root  cwd       DIR                8,1      4096          2 /
kthreadd      2       root  txt   unknown                                         /proc/2/exe
kjournald    42       root  txt   unknown                                         /proc/42/exe
udevd        77       root  cwd       DIR                8,1      4096          2 /
udevd        77       root  txt       REG                8,1    133176     139359 /sbin/udevd
flush-8:1 26221       root  cwd       DIR                8,1      4096          2 /
flush-8:1 26221       root  rtd       DIR                8,1      4096          2 /
flush-8:1 26221       root  txt   unknown                                         /proc/26221/exe
sudo      26228       root    5u     unix 0xfff999002579d3c0       0t0     515611 socket
python    30077       root    2u      CHR                1,3       0t0        700 /dev/null

Ahh... so this is the real problem that your previous question was trying to address? — Jon Clements
– Jon Clements, Commented Dec 3, 2013 at 11:31
It's possible for the command to have a space in the name, so it's not safe to just .split it. Perhaps you can use the headings to discover the field widths. — John La Rooy
– John La Rooy, Commented Dec 3, 2013 at 11:44
@gnibbler You are right. I updated my answer to deal with this issue — Andrei Boyanov
– Andrei Boyanov, Commented Dec 4, 2013 at 8:32

Jon Clements · Accepted Answer · 2013-12-03 11:50:15Z

4

Instead of parsing lsof command output, install the psutil module instead - it also has the advantage of being cross-platform.

import psutil

def get_all_files():
    files = set()
    for proc in psutil.process_iter():
        try:
            files.update(proc.get_open_files())
        except Exception: # probably don't have permission to get the files
            pass
    return files

print get_all_files()
# set([openfile(path='/opt/google/chrome/locales/en-GB.pak', fd=28), openfile(path='/home/jon/.config/google-chrome/Default/Session Storage/000789.log', fd=95), openfile(path='/proc/2414/mounts', fd=8) ... ]

You can then adapt this to include the parent process and other information, eg:

import psutil

for proc in psutil.process_iter():
    try:
        fids = proc.get_open_files()
    except Exception:
        continue
    for fid in fids:
        #print dir(proc)
        print proc.name, proc.pid, proc.username, fid.path

#gnome-settings-daemon 2147 jon /proc/2147/mounts
#pulseaudio 2155 jon /home/jon/.config/pulse/2f6a9045c2bc8db6bf32b2d7517969bf-device-volumes.tdb
#pulseaudio 2155 jon /home/jon/.config/pulse/2f6a9045c2bc8db6bf32b2d7517969bf-stream-volumes.tdb

edited Dec 3, 2013 at 11:50

answered Dec 3, 2013 at 11:44

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

John Lapoya Over a year ago

As i see psutil return regular files opened by process, I want all files opened in the system.

Jon Clements Over a year ago

@JohnSnow okay... but running lsof on my machine returns 26,005 lines, of which, a load are all permission denied and other messages... at least the above filters it down to regular files (you can also retrieve network resources if wanted) from processes the program has rights to...

John Lapoya Over a year ago

my idea is only run like root, so shouldn't be any problems with the permissions.

tzelleke · Accepted Answer · 2013-12-03 15:48:30Z

You know that column labels are right aligned except for the first and last. Hence you can extract the column borders from the ending of the column labels (equivalent to: from the beginning of whitespace between adjacent column labels).

import re
# assuming input_file to be a file-like object
header = input_file.next()

borders = [match.start() for match in re.finditer(r'\s+', header)]
second_to_third_border = borders[1]
borders = borders[1:-1] # delete the first and last because not right-aligned

for line in input_file:
    first_to_second_border = line[:second_to_third_border].rfind(' ')
    actual_borders = [0, first_to_second_border] + borders + [len(line)]
    dset = []
    for (s, e) in zip(actual_borders[:-1], actual_borders[1:]):
        dset.append(line[s:e].strip())
    print dset

Concerning the first column:
You can search for the border between first and second column on each line. Search backwards for whitespace from the border between columns two and three. You should do backwards because, as mentioned in the comments above, the command might contain spaces - the PID certainly not so.

Concerning the last column:
The column stretches from the border between the second-last and last to the end of the given line.

Example:

from StringIO import StringIO

input_file = StringIO('''\
COMMAND     PID       USER   FD      TYPE             DEVICE  SIZE/OFF       NODE NAME
init          1       root  cwd       DIR                8,1      4096          2 /
kthreadd      2       root  txt   unknown                                         /proc/2/exe
kjournald    42       root  txt   unknown                                         /proc/42/exe
''')

prints

['init', '1', 'root', 'cwd', 'DIR', '8,1', '4096', '2', '/']
['kthreadd', '2', 'root', 'txt', 'unknown', '', '', '', '/proc/2/exe']
['kjournald', '42', 'root', 'txt', 'unknown', '', '', '', '/proc/42/exe']

Andrei Boyanov · Accepted Answer · 2013-12-04 08:31:07Z

What about this:

import fileinput

for line in fileinput.input():
    print(line.split())

You can try it like that:

lsof | python your_script.py

Addressing the 'spaces in NAME problem'

For addressing the issue about possible spaces in NAME column mentioned in the comments I can propose the following solution. It's based on my desire to keep it simple and on the fact that only the last column could have spaces.

The algorithm is simple: 1. Find the position where the last columns start - I use the header NAME starting position 2. Cut the line after that position> What you just cut is the value of the NAME column 3. split() the rest of the line.

Here is the code:

import fileinput

header_limits = dict()
records = list()
input = fileinput.input()

header_line = None
for line in input:
    if not header_line:
        header_line = line
        col_names = header_line.split()
        for col_name in col_names:
            header_limits[col_name] = header_line.find(col_name)
        continue
    else:
        record = dict()
        record['NAME'] = line[header_limits['NAME']:].strip()
        line = line[:header_limits['NAME'] - 1]
        record.update(zip(col_names, line.split()))
        records.append(record)

for record in records:
    print "%s\n" % repr(record)

The result is a list of dictionaries. Every dictionary correspond to one line of the lsof output.

This is interesting task showing the power of python for everyday tasks.

Any way, if it's possible I would prefer the use of some python library as the proposed psutils

Collectives™ on Stack Overflow

Split a variable table in python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related