How to get current CPU and RAM usage in Python?
The psutil library gives you information about CPU, RAM, etc., on a variety of platforms:
psutil is a module providing an interface for retrieving information on running processes and system utilization (CPU, memory) in a portable way by using Python, implementing many functionalities offered by tools like ps, top and Windows task manager.
It currently supports Linux, Windows, OSX, Sun Solaris, FreeBSD, OpenBSD and NetBSD, both 32-bit and 64-bit architectures, with Python versions from 2.6 to 3.5 (users of Python 2.4 and 2.5 may use 2.1.3 version).
Some examples:
#!/usr/bin/env python
import psutil
# gives a single float value
psutil.cpu_percent()
# gives an object with many fields
psutil.virtual_memory()
# you can convert that object to a dictionary
dict(psutil.virtual_memory()._asdict())
# you can have the percentage of used RAM
psutil.virtual_memory().percent
79.2
# you can calculate percentage of available memory
psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
20.8
Here's other documentation that provides more concepts and interest concepts:
- https://psutil.readthedocs.io/en/latest/
CPU usage of python script
You'll need the psutil module.
import psutil
print(psutil.cpu_percent())
How to measure execution stats of a Python script CPU usage, RAM usage, disk usage etc?
So using psutil i made this helper metrics class you can see in this gist
How to distribute multiprocess CPU usage over multiple nodes?
To run your job as-is, you could simply request ncpu=32
and then in your python script set num_cores = 2
. Obviously this has you paying for 32 cores and then leaving 30 of them idle, which is wasteful.
The real problem here is that your current algorithm is memory-bound, not CPU-bound. You should be going to great lengths to read only chunks of your files into memory, operating on the chunks, and then writing the result chunks to disk to be organized later.
Fortunately Dask
is built to do exactly this kind of thing. As a first step, you can take out the parallelize_dataframe
function and directly load and map your some_algorithm
with a dask.dataframe
and dask.array
:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import dask.dataframe as dd
import dask.array as da
def main():
# Loading input data
direc = '/home/dir1/dir2/'
file = 'input_data.csv'
a_file = 'array_a.npy'
b_file = 'array_b.npy'
df = dd.read_csv(direc + file, blocksize=25e6)
a_and_b = da.from_np_stack(direc)
df['col3'] = df.apply(some_algorithm, args=(a_and_b,))
# dask is lazy, this is the only line that does any work
# Saving:
df.to_csv(
direc + 'outfile.csv',
index = False,
compute_kwargs={"scheduler": "threads"}, # also "processes", but try threads first
)
if __name__ == '__main__':
main()
That will require some tweaks to some_algorithm
, and to_csv
and from_np_stack
work a bit differently, but you will be able to reasonably run this thing just on your own laptop and it will scale to your cluster hardware. You can level up from here by using the distributed scheduler or even deploy it directly to your cluster with dask-jobqueue.
how to get the cpu usage of past 10 minutes in python
Run the python script in background using cronjob
1)Open terminal and type crontab -e
2)Edit the file and write the following code to run the python script in background
*/1 * * * * python /yourpath/yourpythonfile.py
3) Create yourpythonfile.py and write the following code
import psutil as PSUTIL
with open('/yourpath/yourfile.txt', "a") as myfile:
myfile.write(str(PSUTIL.cpu_percent(interval=1))+"%"'\n')
Related Topics
Find Nearest Value in Numpy Array
How to Read a File in Reverse Order
Convert Timestamps With Offset to Datetime Obj Using Strptime
How to Organize My Tkinter Appllication
Getting a List of All Subdirectories in the Current Directory
How to Crop an Image in Opencv Using Python
Check If a String Contains a Number
Python List by Value Not by Reference
How to Remove List Elements in a For Loop in Python
How to Deal with Linux/Python Dependencies
How to Download Python from Command-Line
How to Kill Zombie Processes Created by Multiprocessing Module
Howto Do Python Command-Line Autocompletion But Not Only at the Beginning of a String
Pyodbc:Can't Open the Driver Even If It Exists