I want to use Fuzzywuzzy logic on python script. I am implement in this way but i didn't get anything.
This is my python script code:
import pandas as pd
from fuzzywuzzy import process
def azureml_main(dataframe1 = None):
return dataframe1,
def get_matches(query, choice, limit = 6):
result = process.extract(query, choice, limit = limit)
return result,
get_matches("admissibility", dataframe1)
Related
I'm trying to use multiprocessing to do some web scraping and then add it to a unique DataFrame for each Process and then merge all the DataFrames together at the end to avoid Locks.
I tried out the code below and it ran but the DataFrame would only hold the data for a moment but not when the process finish running.
Am I missing something?
import pandas as pd
import multiprocessing
def add_row_to_db(database):
database.loc[len(database.index)] = ['Sample', 'Testing']
print('Added')
print(f'{database}\n')
if __name__ == '__main__':
columns_name = ['name', 'power']
db = pd.DataFrame(columns=columns_name)
process = multiprocessing.Process(target=add_row_to_db, args=(db,))
process.start()
process.join()
print(db)
Output
Added
name power
0 Sample Testing
Empty DataFrame
Columns: [name, power]
Index: []
Process finished with exit code 0
I have a python list of Ids which I am calling in my function. There are around 200 ids. I would like to know what will be the best way to call these ids in chunks like I call 10 or 20 ids at a time and in next call, I call the next 20 ids and so on.. I have used multithreading here to make it faster but it seems to take lot of time. Here is the code I managed:
from concurrent.futures import ThreadPoolExecutor
import numpy as np
import datetime as dt
df = pd.ExcelFile('ids.xlsx').parse('Sheet1')
x=[]
x.append(df['external_ids'].to_list())
def download():
#client is my python sdk
dtest_df = client.datapoints.retrieve_dataframe(external_id = x[0], start=0, end="now",granularity='1m')
dtest_df = dtest_df.rename(columns = {'index':'timestamp'})
client.datapoints.insert_dataframe(dtest_df,external_id_headers = True,dropna = True)
print(dtest_df)
with ThreadPoolExecutor(max_workers=20) as executor:
future = executor.submit(download)
print(future.result())
The python 'yfinance' module downloads the quotes of many Financial Securities in a pandas dataframe and in the meanwhile it displays a progress bar in the console. In this way:
import yfinance as yf
Tickerlist = ["AAPL","GOOG","MSFT"]
quote = yf.download(tickers=Tickerlist,period='max',interval='1d',group_by='ticker')
I would like to capture the console progress bar in real time, and the code should be this:
import sys
import subprocesss
process = subprocess.Popen(["yf.download","tickers=Tickerlist","period='max'","interval='1d'","group_by='ticker'"],stdout=quote)
while True:
out = process.stdout.read(1)
sys.stdout.write(out)
sys.stdout.flush()
I make a big mess with subprocess. I need your help! Thanks.
I have already seen all the links that deal with this topic but without being able to solve my problem.
You need two python files to do what you want.
one is yf_download.py and second is run.py
The file code looks like this and you can run it through run.py
python run.py
yf_download.py
import sys
import yfinance as yf
Tickerlist = ["AAPL","GOOG","MSFT"]
def run(period):
yf.download(tickers=Tickerlist, period=period,interval='1d',group_by='ticker')
if __name__ == '__main__':
period = sys.argv[1]
run(period)
run.py
import sys
import subprocess
process = subprocess.Popen(["python", "yf_download.py", "max"],stdout=subprocess.PIPE)
while True:
out = process.stdout.read(1)
if process.poll() is not None:
break
if out != '':
sys.stdout.buffer.write(out)
sys.stdout.flush()
I am trying to update a variable of a class by calling a function of the class from a different function which is being run on multi-process.
To achieve the desired result, process (p1) needs to update the variable "transaction" and which should get then modified by process (p2)
I tried the below code and I know i should use Multiprocess.value or manager to achieve the desired result and I am not sure of how to do it as my variable to be updated is in another class
Below is the code:
from multiprocessing import Process
from helper import Helper
camsource = ['a','b']
Pros = []
def sub(i):
HC.trail_func(i)
def main():
for i in camsource:
print ("Camera Thread {} Started!".format(i))
p = Process(target=sub, args=(i))
Pros.append(p)
p.start()
# block until all the threads finish (i.e. block until all function_x calls finish)
for t in Pros:
t.join()
if __name__ == "__main__":
HC = Helper()
main()
Here is the helper code:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
class Helper():
def __init__(self):
self.transactions = []
def trail_func(self,preview):
if preview == 'a':
self.transactions.append({"Apple":1})
else:
if self.transactions[0]['Apple'] == 1:
self.transactions[0]['Apple'] = self.transactions[0]['Apple'] + 1
print (self.transactions)
Desired Output:
p1:
transactions = {"Apple":1}
p2:
transactions = {"Apple":2}
I've recently released this module that can help you with your code, as all data frames (data models that can hold any type of data), have locks on them, in order to solve concurrency issues. Anyway, take a look at the README file and the examples.
I've made an example here too, if you'd like to check.
im using multithreading in python3 with Flask as below.
Would like to know if there is any issue in below code, and if this is efficient way of using threads
import _thread
COUNT = 0
class Myfunction(Resource):
#staticmethod
def post():
global GLOBAL_COUNT
logger = logging.getLogger(__name__)
request_json = request.get_json()
logger.info(request_json)
_thread.start_new_thread(Myfunction._handle_req, (COUNT, request_json))
COUNT += 1
return Response("Request Accepted", status=202, mimetype='application/json')
#staticmethod
def _handle_req(thread_id, request_json):
with lock:
empID = request_json.get("empId", "")
myfunction2(thread_id,empID)
api.add_resource(Myfunction, '/Myfunction')
I think the newer module threading would be better suited for python 3. Its more powerful.
import threading
threading.Thread(target=some_callable_function).start()
or if you wish to pass arguments
threading.Thread(target=some_callable_function,
args=(tuple, of, args),
kwargs={'dict': 'of', 'keyword': 'args'},
).start()
Unless you specifically need _thread for backwards compatibility. Not specifically related to how efficient your code is but good to know anyways.
see What happened to thread.start_new_thread in python 3 and https://www.tutorialspoint.com/python3/python_multithreading.htm