How to Parse the Output of /Proc/Net/Dev into Key:Value Pairs Per Interface Using Python

How can I parse the output of /proc/net/dev into key:value pairs per interface using Python?

this is pretty formatted input and you can easily get columns and data list by splitting each line, and then create a dict of of it.

here is a simple script without regex

lines = open("/proc/net/dev", "r").readlines()

columnLine = lines[1]
_, receiveCols , transmitCols = columnLine.split("|")
receiveCols = map(lambda a:"recv_"+a, receiveCols.split())
transmitCols = map(lambda a:"trans_"+a, transmitCols.split())

cols = receiveCols+transmitCols

faces = {}
for line in lines[2:]:
if line.find(":") < 0: continue
face, data = line.split(":")
faceData = dict(zip(cols, data.split()))
faces[face] = faceData

import pprint
pprint.pprint(faces)

it outputs

{'    lo': {'recv_bytes': '7056295',
'recv_compressed': '0',
'recv_drop': '0',
'recv_errs': '0',
'recv_fifo': '0',
'recv_frame': '0',
'recv_multicast': '0',
'recv_packets': '12148',
'trans_bytes': '7056295',
'trans_carrier': '0',
'trans_colls': '0',
'trans_compressed': '0',
'trans_drop': '0',
'trans_errs': '0',
'trans_fifo': '0',
'trans_packets': '12148'},
' eth0': {'recv_bytes': '34084530',
'recv_compressed': '0',
'recv_drop': '0',
'recv_errs': '0',
'recv_fifo': '0',
'recv_frame': '0',
'recv_multicast': '0',
'recv_packets': '30599',
'trans_bytes': '6170441',
'trans_carrier': '0',
'trans_colls': '0',
'trans_compressed': '0',
'trans_drop': '0',
'trans_errs': '0',
'trans_fifo': '0',
'trans_packets': '32377'}}

Parsing the output of /proc/net/dev with awk and dismissing the first two lines

Tweaking your awk a little bit:

awk 'NR>2{print $1}' /proc/net/dev

Python writing to a linux /proc/mystats file

You can't. /proc/ is a special filesystem for exposing information the kernel knows about various processes, it's managed by the kernel.

If you want to write running information to a file, use a different directory -- for example, firefox writes stuff into $HOME/.mozilla

Utility for parsing /proc/net/route

ROUTE(8) does exactly that if you invoke it with -n flag. Moreover, it could be used on systems without procfs support. For example:

$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.2 0.0.0.0 UG 100 0 0 eth0
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

Running shell command and capturing the output

In all officially maintained versions of Python, the simplest approach is to use the subprocess.check_output function:

>>> subprocess.check_output(['ls', '-l'])
b'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

check_output runs a single program that takes only arguments as input.1 It returns the result exactly as printed to stdout. If you need to write input to stdin, skip ahead to the run or Popen sections. If you want to execute complex shell commands, see the note on shell=True at the end of this answer.

The check_output function works in all officially maintained versions of Python. But for more recent versions, a more flexible approach is available.

Modern versions of Python (3.5 or higher): run

If you're using Python 3.5+, and do not need backwards compatibility, the new run function is recommended by the official documentation for most tasks. It provides a very general, high-level API for the subprocess module. To capture the output of a program, pass the subprocess.PIPE flag to the stdout keyword argument. Then access the stdout attribute of the returned CompletedProcess object:

>>> import subprocess
>>> result = subprocess.run(['ls', '-l'], stdout=subprocess.PIPE)
>>> result.stdout
b'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

The return value is a bytes object, so if you want a proper string, you'll need to decode it. Assuming the called process returns a UTF-8-encoded string:

>>> result.stdout.decode('utf-8')
'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

This can all be compressed to a one-liner if desired:

>>> subprocess.run(['ls', '-l'], stdout=subprocess.PIPE).stdout.decode('utf-8')
'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

If you want to pass input to the process's stdin, you can pass a bytes object to the input keyword argument:

>>> cmd = ['awk', 'length($0) > 5']
>>> ip = 'foo\nfoofoo\n'.encode('utf-8')
>>> result = subprocess.run(cmd, stdout=subprocess.PIPE, input=ip)
>>> result.stdout.decode('utf-8')
'foofoo\n'

You can capture errors by passing stderr=subprocess.PIPE (capture to result.stderr) or stderr=subprocess.STDOUT (capture to result.stdout along with regular output). If you want run to throw an exception when the process returns a nonzero exit code, you can pass check=True. (Or you can check the returncode attribute of result above.) When security is not a concern, you can also run more complex shell commands by passing shell=True as described at the end of this answer.

Later versions of Python streamline the above further. In Python 3.7+, the above one-liner can be spelled like this:

>>> subprocess.run(['ls', '-l'], capture_output=True, text=True).stdout
'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

Using run this way adds just a bit of complexity, compared to the old way of doing things. But now you can do almost anything you need to do with the run function alone.

Older versions of Python (3-3.4): more about check_output

If you are using an older version of Python, or need modest backwards compatibility, you can use the check_output function as briefly described above. It has been available since Python 2.7.

subprocess.check_output(*popenargs, **kwargs)  

It takes takes the same arguments as Popen (see below), and returns a string containing the program's output. The beginning of this answer has a more detailed usage example. In Python 3.5+, check_output is equivalent to executing run with check=True and stdout=PIPE, and returning just the stdout attribute.

You can pass stderr=subprocess.STDOUT to ensure that error messages are included in the returned output. When security is not a concern, you can also run more complex shell commands by passing shell=True as described at the end of this answer.

If you need to pipe from stderr or pass input to the process, check_output won't be up to the task. See the Popen examples below in that case.

Complex applications and legacy versions of Python (2.6 and below): Popen

If you need deep backwards compatibility, or if you need more sophisticated functionality than check_output or run provide, you'll have to work directly with Popen objects, which encapsulate the low-level API for subprocesses.

The Popen constructor accepts either a single command without arguments, or a list containing a command as its first item, followed by any number of arguments, each as a separate item in the list. shlex.split can help parse strings into appropriately formatted lists. Popen objects also accept a host of different arguments for process IO management and low-level configuration.

To send input and capture output, communicate is almost always the preferred method. As in:

output = subprocess.Popen(["mycmd", "myarg"], 
stdout=subprocess.PIPE).communicate()[0]

Or

>>> import subprocess
>>> p = subprocess.Popen(['ls', '-a'], stdout=subprocess.PIPE,
... stderr=subprocess.PIPE)
>>> out, err = p.communicate()
>>> print out
.
..
foo

If you set stdin=PIPE, communicate also allows you to pass data to the process via stdin:

>>> cmd = ['awk', 'length($0) > 5']
>>> p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
... stderr=subprocess.PIPE,
... stdin=subprocess.PIPE)
>>> out, err = p.communicate('foo\nfoofoo\n')
>>> print out
foofoo

Note Aaron Hall's answer, which indicates that on some systems, you may need to set stdout, stderr, and stdin all to PIPE (or DEVNULL) to get communicate to work at all.

In some rare cases, you may need complex, real-time output capturing. Vartec's answer suggests a way forward, but methods other than communicate are prone to deadlocks if not used carefully.

As with all the above functions, when security is not a concern, you can run more complex shell commands by passing shell=True.

Notes

1. Running shell commands: the shell=True argument

Normally, each call to run, check_output, or the Popen constructor executes a single program. That means no fancy bash-style pipes. If you want to run complex shell commands, you can pass shell=True, which all three functions support. For example:

>>> subprocess.check_output('cat books/* | wc', shell=True, text=True)
' 1299377 17005208 101299376\n'

However, doing this raises security concerns. If you're doing anything more than light scripting, you might be better off calling each process separately, and passing the output from each as an input to the next, via

run(cmd, [stdout=etc...], input=other_output)

Or

Popen(cmd, [stdout=etc...]).communicate(other_output)

The temptation to directly connect pipes is strong; resist it. Otherwise, you'll likely see deadlocks or have to do hacky things like this.

Selecting specific columns from df -h output in python

You can use op.popen to run the command and retrieve its output, then splitlines and split to split the lines and fields. Run df -Ph rather than df -h so that lines are not split if a column is too long.

df_output_lines = [s.split() for s in os.popen("df -Ph").read().splitlines()]

The result is a list of lines. To extract the first column, you can use [line[0] for line in df_output_lines] (note that columns are numbered from 0) and so on. You may want to use df_output_lines[1:] instead of df_output_lines to strip the title line.

If you already have the output of df -h stored in a file somewhere, you'll need to join the lines first.

fixed_df_output = re.sub('\n\s+', ' ', raw_df_output.read())
df_output_lines = [s.split() for s in fixed_df_output.splitlines()]

Note that this assumes that neither the filesystem name nor the mount point contain whitespace. If they do (which is possible with some setups on some unix variants), it's practically impossible to parse the output of df, even df -P. You can use os.statvfs to obtain information on a given filesystem (this is the Python interface to the C function that df calls internally for each filesystem), but there's no portable way of enumerating the filesystems.



Related Topics



Leave a reply



Submit