How to Use Threading in Python

How do I access data from a python thread

please find the below simple example for queue and threads,

import threading
import Queue
import timeit

q = Queue.Queue()
number = 5

t1 = timeit.default_timer()
# Step1: For example, we are running multiple functions normally
result = []
def fun(x):
    result.append(x)
    return x

for i in range(number):
    fun(i)
print result ," # normal result"
print (timeit.default_timer() - t1)

t2 = timeit.default_timer()   

#Step2:  by using threads and queue

def fun_thrd(x,q):
    q.put(x)
    return
for i in range(number):
    t1 = threading.Thread(target = fun_thrd, args=(i,q))
    t1.start()
    t1.join()

thrd_result = []

while True:
    if not q.empty():
     thrd_result.append(q.get())
    else:
       break

print thrd_result , "# result with threads involved"
print (timeit.default_timer() - t2)

t3 = timeit.default_timer() 

#step :3 if you want thread to be run without depending on the previous thread

threads = []

def fun_thrd_independent(x,q):
    q.put(x)
    return

def thread_indep(number):
    for i in range(number):
        t = threading.Thread(target = fun_thrd_independent, args=(i,q))
        t.start()
        threads.append(t)

thread_indep(5)

for j in threads:
    j.join()

thread_indep_result = []

while True:
    if not q.empty():
        thread_indep_result.append(q.get())
    else:
       break

print thread_indep_result # result when threads are independent on each other   
print (timeit.default_timer() - t3)

output:

[0, 1, 2, 3, 4]  # normal result
3.50475311279e-05
[0, 1, 2, 3, 4] # result with threads involved
0.000977039337158
[0, 1, 2, 3, 4]  result when threads are independent on each other
0.000933170318604

It will hugely differ according to the scale of the data

Hope this helps, Thanks

How to use threading to improve performance in python

This answer will address improving performance without using concurrency.

The way you structured your search you are looking for 13 million unique things in each sentence. You said it takes 3-5 minutes for each sentence and that the word lengths in concepts range from one to ten.

I think you can improve the search time by making a set of concepts (either initially when constructed or from your list) then splitting each sentence into strings of one to ten (consecutive) words and testing for membership in the set.

Example of a sentence split into 4 word strings:

'data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning statistics and database systems'
# becomes
[('data', 'mining', 'is', 'the'),
 ('mining', 'is', 'the', 'process'),
 ('is', 'the', 'process', 'of'),
 ('the', 'process', 'of', 'discovering'),
 ('process', 'of', 'discovering', 'patterns'),
 ('of', 'discovering', 'patterns', 'in'),
 ('discovering', 'patterns', 'in', 'large'),
 ('patterns', 'in', 'large', 'data'),
 ('in', 'large', 'data', 'sets'),
 ('large', 'data', 'sets', 'involving'),
 ('data', 'sets', 'involving', 'methods'),
 ('sets', 'involving', 'methods', 'at'),
 ('involving', 'methods', 'at', 'the'),
 ('methods', 'at', 'the', 'intersection'),
 ('at', 'the', 'intersection', 'of'),
 ('the', 'intersection', 'of', 'machine'),
 ('intersection', 'of', 'machine', 'learning'),
 ('of', 'machine', 'learning', 'statistics'),
 ('machine', 'learning', 'statistics', 'and'),
 ('learning', 'statistics', 'and', 'database'),
 ('statistics', 'and', 'database', 'systems')]

Process:

concepts = set(concepts)
sentence = sentence.split()
#one word
for meme in sentence:
    if meme in concepts:
        #keep it
#two words
for meme in zip(sentence,sentence[1:]):
    if ' '.join(meme) in concepts:
        #keep it
#three words
for meme in zip(sentence,sentence[1:],sentence[2:]):
    if ' '.join(meme) in concepts:
        #keep it

Adapting an itertools recipe (pairwise) you can automate that process of making n-word strings from a sentence:

from itertools import tee
def nwise(iterable, n=2):
    "s -> (s0,s1), (s1,s2), (s2, s3), ... for n=2"
    iterables = tee(iterable, n)
    # advance each iterable to the appropriate starting point
    for i, thing in enumerate(iterables[1:],1):
        for _ in range(i):
            next(thing, None)
    return zip(*iterables)

Testing each sentence looks like this

sentence = sentence.strip().split()
for n in [1,2,3,4,5,6,7,8,9,10]:
    for meme in nwise(sentence,n):
        if ' '.join(meme) in concepts:
            #keep meme

I made a set of 13e6 random strings with 20 characters each to approximate concepts.

import random, string
data =set(''.join(random.choice(string.printable) for _ in range(20)) for _ in range(13000000))

Testing a four or forty character string for membership in data consistently takes about 60 nanoseconds. A one hundred word sentence has 955 one to ten word strings so searching that sentence should take ~60 microseconds.

The first sentence from your example 'data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning statistics and database systems' has 195 possible concepts (one to ten word strings). Timing for the following two functions is about the same: about 140 microseconds for f and 150 microseconds for g:

def f(sentence, data=data, nwise=nwise):
    '''iterate over memes in sentence and see if they are in data'''
    sentence = sentence.strip().split()
    found = []
    for n in [1,2,3,4,5,6,7,8,9,10]:
        for meme in nwise(sentence,n):
            meme = ' '.join(meme)
            if meme in data:
                found.append(meme)
    return found

def g(sentence, data=data, nwise=nwise):
    'make a set of the memes in sentence then find its intersection with data'''
    sentence = sentence.strip().split()
    test_strings = set(' '.join(meme) for n in range(1,11) for meme in nwise(sentence,n))
    found = test_strings.intersection(data)
    return found

So these are just approximations since I'm not using your actual data but it should speed things up quite a bit.

After testing with your example data I found that g won't work if a concept appears twice in a sentence.

So here it is all together with the concepts listed in the order they are found in each sentence. The new version of f will take longer but the added time should be relatively small. If possible would you post a comment letting me know how much longer it is than the original? (I'm curious).

from itertools import tee

sentences = ['data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning statistics and database systems', 
             'data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a data set and transform the information into a comprehensible structure for further use',
             'data mining is the analysis step of the knowledge discovery in databases process or kdd']

concepts = ['data mining', 'database systems', 'databases process', 
            'interdisciplinary subfield', 'information', 'knowledge discovery',
            'methods', 'machine learning', 'patterns', 'process']

concepts = set(concepts)

def nwise(iterable, n=2):
    "s -> (s0,s1), (s1,s2), (s2, s3), ... for n=2"
    iterables = tee(iterable, n)
    # advance each iterable to the appropriate starting point
    for i, thing in enumerate(iterables[1:],1):
        for _ in range(i):
            next(thing, None)
    return zip(*iterables)

def f(sentence, concepts=concepts, nwise=nwise):
    '''iterate over memes in sentence and see if they are in concepts'''
    indices = set()
    #print(sentence)
    words = sentence.strip().split()
    for n in [1,2,3,4,5,6,7,8,9,10]:
        for meme in nwise(words,n):
            meme = ' '.join(meme)
            if meme in concepts:
                start = sentence.find(meme)
                end = len(meme)+start
                while (start,end) in indices:
                    #print(f'{meme} already found at character:{start} - looking for another one...') 
                    start = sentence.find(meme, end)
                    end = len(meme)+start
                indices.add((start, end))
    return [sentence[start:end] for (start,end) in sorted(indices)]


###########
results = []
for sentence in sentences:
    results.append(f(sentence))
    #print(f'{sentence}\n\t{results[-1]})')


In [20]: results
Out[20]: 
[['data mining', 'process', 'patterns', 'methods', 'machine learning', 'database systems'],
 ['data mining', 'interdisciplinary subfield', 'information', 'information'],
 ['data mining', 'knowledge discovery', 'databases process', 'process']]

Creating Threads in python

You don't need to use a subclass of Thread to make this work - take a look at the simple example I'm posting below to see how:

from threading import Thread
from time import sleep

def threaded_function(arg):
    for i in range(arg):
        print("running")
        sleep(1)


if __name__ == "__main__":
    thread = Thread(target = threaded_function, args = (10, ))
    thread.start()
    thread.join()
    print("thread finished...exiting")

Here I show how to use the threading module to create a thread which invokes a normal function as its target. You can see how I can pass whatever arguments I need to it in the thread constructor.

How to use multi-threading in python with ROS services and serial comunication?

You could build a default, simple node with two callbacks, one of which is your message callback for the subscribed ROS topic, and one of which is a callback to a TimerEvent which can be called repeatedly every 0.3 seconds. You need to start the TimerEvent before rospy.spin() in the main.

The code could look like this:

#!/usr/bin/env python

import threading
import rospy
from std_msgs.msg import String

mutex = threading.Lock()


def msg_callback(msg):
    # reentrang processing
    mutex.acquire(blocking=True)
    # work serial port here, e.g. send msg to serial port
    mutex.release()
    # reentrant processing

def timer_callback(event):
    # reentrant processing
    mutex.acquire(blocking=True)
    # work serial port here, e.g. check for incoming data
    mutex.release()
    # reentrant processing, e.g. publish the data from serial port

def service_callback(req):
    # read out temperature
    mutex.acquire(blocking=True)
    # send temperature over serial port
    mutex.release()

    
if __name__ == '__main__':
    # initialize serial port here
    rospy.init_node('name')
    rospy.Subscriber('/test_in', String, msg_callback)
    rospy.Service('on_req', Empty, service_callback)
    rospy.Timer(rospy.Duration(0.3), timer_callback)
    rospy.spin()

In fact, rospy starts two different threads for the callbacks. However, these threads are not running in parallel on different cores due to the GIL but they get scheduled and switch execution at certain system calls, e. g., when calling IO operations. Hence, you still need to sync your threads with a Lock as you did in your question.

How to use threading to run the same proccess multiple times with specified data

You have to replace the last line with threading.Thread(target=request, args=(line,)).start(). In your code the request is executed before the Thread object is even created.

How to use Queue with threading properly

You can use a Semaphore for your purposes

A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().

A default value of Semaphore is 1,

class threading.Semaphore(value=1)

so only one thread would be active at once:

import queue
import threading
import time

fifo_queue = queue.Queue()

semaphore = threading.Semaphore()


def hd():
    with semaphore:
        print("hi")
        time.sleep(1)
        print("done")


for i in range(3):
    cc = threading.Thread(target=hd)
    fifo_queue.put(cc)
    cc.start()

hi
done
hi
done
hi
done

As @user2357112supportsMonica mentioned in comments RLock would be more safe option

class threading.RLock

This class implements reentrant lock objects. A reentrant lock must be released by the thread that acquired it. Once a thread has acquired a reentrant lock, the same thread may acquire it again without blocking; the thread must release it once for each time it has acquired it.

import queue
import threading
import time

fifo_queue = queue.Queue()

lock = threading.RLock()


def hd():
    with lock:
        print("hi")
        time.sleep(1)
        print("done")


for i in range(3):
    cc = threading.Thread(target=hd)
    fifo_queue.put(cc)
    cc.start()