How do I access data from a python thread
please find the below simple example for queue and threads,
import threading
import Queue
import timeit
q = Queue.Queue()
number = 5
t1 = timeit.default_timer()
# Step1: For example, we are running multiple functions normally
result = []
def fun(x):
result.append(x)
return x
for i in range(number):
fun(i)
print result ," # normal result"
print (timeit.default_timer() - t1)
t2 = timeit.default_timer()
#Step2: by using threads and queue
def fun_thrd(x,q):
q.put(x)
return
for i in range(number):
t1 = threading.Thread(target = fun_thrd, args=(i,q))
t1.start()
t1.join()
thrd_result = []
while True:
if not q.empty():
thrd_result.append(q.get())
else:
break
print thrd_result , "# result with threads involved"
print (timeit.default_timer() - t2)
t3 = timeit.default_timer()
#step :3 if you want thread to be run without depending on the previous thread
threads = []
def fun_thrd_independent(x,q):
q.put(x)
return
def thread_indep(number):
for i in range(number):
t = threading.Thread(target = fun_thrd_independent, args=(i,q))
t.start()
threads.append(t)
thread_indep(5)
for j in threads:
j.join()
thread_indep_result = []
while True:
if not q.empty():
thread_indep_result.append(q.get())
else:
break
print thread_indep_result # result when threads are independent on each other
print (timeit.default_timer() - t3)
output:
[0, 1, 2, 3, 4] # normal result
3.50475311279e-05
[0, 1, 2, 3, 4] # result with threads involved
0.000977039337158
[0, 1, 2, 3, 4] result when threads are independent on each other
0.000933170318604
It will hugely differ according to the scale of the data
Hope this helps, Thanks
How to use threading to improve performance in python
This answer will address improving performance without using concurrency.
The way you structured your search you are looking for 13 million unique things in each sentence. You said it takes 3-5 minutes for each sentence and that the word lengths in concepts
range from one to ten.
I think you can improve the search time by making a set of concepts
(either initially when constructed or from your list) then splitting each sentence into strings of one to ten (consecutive) words and testing for membership in the set.
Example of a sentence split into 4 word strings:
'data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning statistics and database systems'
# becomes
[('data', 'mining', 'is', 'the'),
('mining', 'is', 'the', 'process'),
('is', 'the', 'process', 'of'),
('the', 'process', 'of', 'discovering'),
('process', 'of', 'discovering', 'patterns'),
('of', 'discovering', 'patterns', 'in'),
('discovering', 'patterns', 'in', 'large'),
('patterns', 'in', 'large', 'data'),
('in', 'large', 'data', 'sets'),
('large', 'data', 'sets', 'involving'),
('data', 'sets', 'involving', 'methods'),
('sets', 'involving', 'methods', 'at'),
('involving', 'methods', 'at', 'the'),
('methods', 'at', 'the', 'intersection'),
('at', 'the', 'intersection', 'of'),
('the', 'intersection', 'of', 'machine'),
('intersection', 'of', 'machine', 'learning'),
('of', 'machine', 'learning', 'statistics'),
('machine', 'learning', 'statistics', 'and'),
('learning', 'statistics', 'and', 'database'),
('statistics', 'and', 'database', 'systems')]
Process:
concepts = set(concepts)
sentence = sentence.split()
#one word
for meme in sentence:
if meme in concepts:
#keep it
#two words
for meme in zip(sentence,sentence[1:]):
if ' '.join(meme) in concepts:
#keep it
#three words
for meme in zip(sentence,sentence[1:],sentence[2:]):
if ' '.join(meme) in concepts:
#keep it
Adapting an itertools recipe (pairwise) you can automate that process of making n-word strings from a sentence:
from itertools import tee
def nwise(iterable, n=2):
"s -> (s0,s1), (s1,s2), (s2, s3), ... for n=2"
iterables = tee(iterable, n)
# advance each iterable to the appropriate starting point
for i, thing in enumerate(iterables[1:],1):
for _ in range(i):
next(thing, None)
return zip(*iterables)
Testing each sentence looks like this
sentence = sentence.strip().split()
for n in [1,2,3,4,5,6,7,8,9,10]:
for meme in nwise(sentence,n):
if ' '.join(meme) in concepts:
#keep meme
I made a set of 13e6 random strings with 20 characters each to approximate concepts
.
import random, string
data =set(''.join(random.choice(string.printable) for _ in range(20)) for _ in range(13000000))
Testing a four or forty character string for membership in data
consistently takes about 60 nanoseconds. A one hundred word sentence has 955 one to ten word strings so searching that sentence should take ~60 microseconds.
The first sentence from your example 'data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning statistics and database systems'
has 195 possible concepts (one to ten word strings). Timing for the following two functions is about the same: about 140 microseconds for f
and 150 microseconds for g
:
def f(sentence, data=data, nwise=nwise):
'''iterate over memes in sentence and see if they are in data'''
sentence = sentence.strip().split()
found = []
for n in [1,2,3,4,5,6,7,8,9,10]:
for meme in nwise(sentence,n):
meme = ' '.join(meme)
if meme in data:
found.append(meme)
return found
def g(sentence, data=data, nwise=nwise):
'make a set of the memes in sentence then find its intersection with data'''
sentence = sentence.strip().split()
test_strings = set(' '.join(meme) for n in range(1,11) for meme in nwise(sentence,n))
found = test_strings.intersection(data)
return found
So these are just approximations since I'm not using your actual data but it should speed things up quite a bit.
After testing with your example data I found that g
won't work if a concept appears twice in a sentence.
So here it is all together with the concepts listed in the order they are found in each sentence. The new version of f
will take longer but the added time should be relatively small. If possible would you post a comment letting me know how much longer it is than the original? (I'm curious).
from itertools import tee
sentences = ['data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning statistics and database systems',
'data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a data set and transform the information into a comprehensible structure for further use',
'data mining is the analysis step of the knowledge discovery in databases process or kdd']
concepts = ['data mining', 'database systems', 'databases process',
'interdisciplinary subfield', 'information', 'knowledge discovery',
'methods', 'machine learning', 'patterns', 'process']
concepts = set(concepts)
def nwise(iterable, n=2):
"s -> (s0,s1), (s1,s2), (s2, s3), ... for n=2"
iterables = tee(iterable, n)
# advance each iterable to the appropriate starting point
for i, thing in enumerate(iterables[1:],1):
for _ in range(i):
next(thing, None)
return zip(*iterables)
def f(sentence, concepts=concepts, nwise=nwise):
'''iterate over memes in sentence and see if they are in concepts'''
indices = set()
#print(sentence)
words = sentence.strip().split()
for n in [1,2,3,4,5,6,7,8,9,10]:
for meme in nwise(words,n):
meme = ' '.join(meme)
if meme in concepts:
start = sentence.find(meme)
end = len(meme)+start
while (start,end) in indices:
#print(f'{meme} already found at character:{start} - looking for another one...')
start = sentence.find(meme, end)
end = len(meme)+start
indices.add((start, end))
return [sentence[start:end] for (start,end) in sorted(indices)]
###########
results = []
for sentence in sentences:
results.append(f(sentence))
#print(f'{sentence}\n\t{results[-1]})')
In [20]: results
Out[20]:
[['data mining', 'process', 'patterns', 'methods', 'machine learning', 'database systems'],
['data mining', 'interdisciplinary subfield', 'information', 'information'],
['data mining', 'knowledge discovery', 'databases process', 'process']]
Creating Threads in python
You don't need to use a subclass of Thread
to make this work - take a look at the simple example I'm posting below to see how:
from threading import Thread
from time import sleep
def threaded_function(arg):
for i in range(arg):
print("running")
sleep(1)
if __name__ == "__main__":
thread = Thread(target = threaded_function, args = (10, ))
thread.start()
thread.join()
print("thread finished...exiting")
Here I show how to use the threading module to create a thread which invokes a normal function as its target. You can see how I can pass whatever arguments I need to it in the thread constructor.
How to use multi-threading in python with ROS services and serial comunication?
You could build a default, simple node with two callbacks, one of which is your message callback for the subscribed ROS topic, and one of which is a callback to a TimerEvent which can be called repeatedly every 0.3 seconds. You need to start the TimerEvent before rospy.spin()
in the main.
The code could look like this:
#!/usr/bin/env python
import threading
import rospy
from std_msgs.msg import String
mutex = threading.Lock()
def msg_callback(msg):
# reentrang processing
mutex.acquire(blocking=True)
# work serial port here, e.g. send msg to serial port
mutex.release()
# reentrant processing
def timer_callback(event):
# reentrant processing
mutex.acquire(blocking=True)
# work serial port here, e.g. check for incoming data
mutex.release()
# reentrant processing, e.g. publish the data from serial port
def service_callback(req):
# read out temperature
mutex.acquire(blocking=True)
# send temperature over serial port
mutex.release()
if __name__ == '__main__':
# initialize serial port here
rospy.init_node('name')
rospy.Subscriber('/test_in', String, msg_callback)
rospy.Service('on_req', Empty, service_callback)
rospy.Timer(rospy.Duration(0.3), timer_callback)
rospy.spin()
In fact, rospy starts two different threads for the callbacks. However, these threads are not running in parallel on different cores due to the GIL but they get scheduled and switch execution at certain system calls, e. g., when calling IO operations. Hence, you still need to sync your threads with a Lock as you did in your question.
How to use threading to run the same proccess multiple times with specified data
You have to replace the last line with threading.Thread(target=request, args=(line,)).start()
. In your code the request is executed before the Thread object is even created.
How to use Queue with threading properly
You can use a Semaphore for your purposes
A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().
A default value of Semaphore is 1
,
class threading.Semaphore(value=1)
so only one thread would be active at once:
import queue
import threading
import time
fifo_queue = queue.Queue()
semaphore = threading.Semaphore()
def hd():
with semaphore:
print("hi")
time.sleep(1)
print("done")
for i in range(3):
cc = threading.Thread(target=hd)
fifo_queue.put(cc)
cc.start()
hi
done
hi
done
hi
done
As @user2357112supportsMonica mentioned in comments RLock would be more safe option
class threading.RLock
This class implements reentrant lock objects. A reentrant lock must be released by the thread that acquired it. Once a thread has acquired a reentrant lock, the same thread may acquire it again without blocking; the thread must release it once for each time it has acquired it.
import queue
import threading
import time
fifo_queue = queue.Queue()
lock = threading.RLock()
def hd():
with lock:
print("hi")
time.sleep(1)
print("done")
for i in range(3):
cc = threading.Thread(target=hd)
fifo_queue.put(cc)
cc.start()
Related Topics
Get Total Physical Memory in Python
In Python Script, How to Set Pythonpath
Python Multiprocessing: Permission Denied
Django Server Killed Frequently
How to Install Pyodbc on Linux
Simulating Key Press Event Using Python For Linux
Show Matplotlib Plots (And Other Gui) in Ubuntu (Wsl1 & Wsl2)
Run Interactive Bash With Popen and a Dedicated Tty Python
Calling a Python Script from Command Line Without Typing "Python" First
How to Do Sed Like Text Replace With Python
Trouble Installing Scipy in Virtualenv on a Amazon Ec2 Linux Micro Instance
Get the Olson Tz Name For the Local Timezone
How to Open a File With the Standard Application
How to Find and Install the Dependencies For Pygame
How to Install R Packages That Are Not Available in "R-Essentials"
Is Everything an Object in Python Like Ruby