What is the best way to run a background process in a Dash app? - python-3.x

I have a Dash application that queries an API, based on a user search query, performs some calculations on the response, then displays the final results to the user on a Dash app. In order to provide a quick response to the user, I am trying to set up a quick result callback and a full result long_callback.
The quick result will grab limited results from the API and display results to the user within 10-15 seconds, while the full results will run in the background, collecting all results (which can take up to 2 minutes), then updates the page with the full results when they are available.
I am curious what the best way to perform this action is, as I have run into forking issues with my current attempt.
My current attempt: Using the diskcache.Cache() as the DiskcacheLongCallbackManager and a database txt file to store availability of results.
I have a database txt file that stores a dictionary, with the keys being the search query and the fields being quick_results: bool, full_results: bool, file_path: str, timestamp: dt (as str).
When a search query is entered and submit is pressed, a callback loads the database file as a variable and then checks the dictionary keys for the presence of this search query.
If it finds the query in the keys of the database, it loads the saved feather file from the provided file_path and returns it to the dash app for generation of the page content.
If it does not find the query in the database keys, it requests limited data from the API, runs calculations, saves the DataFrame as a feather file on disk, then creates an entry in the database with the search query(as the key), the file path of the saved feather file, the current timestamp, and sets the quick_results value to True.
It then loads this feather file from the file_path created and returns it to the dash app for generation of the page content.
A long_callback is triggered at the same time as the above callback, with a 20 second sleep to prevent overlap with the quick search. This callback also loads the database file as a variable and checks if the query is present in the database keys.
If found, it then checks if the full results value is True and if the timestamp is more than 0 days old.
If the full results are unavailable or are more than 0 days old, the long_callback requests full results from the API, performs the same calculations, then updates the already existing search query in the database, making the full_results True and the timestamp the time of completion for the full search.
It then loads the feather file from the file_path and returns it to the dash app for generation of the page content.
If the results are available and less then 1 day old, the long callback simply loads the feather file from the provided file_path and returns it to the dash app for generation of the page content.
The problem I am currently facing is that I am getting a weird forking error on the long callback on only one of the conditions for a full search. I currently have the long_callback setup to perform a full search only if the full results flag is False or the results are more than 0 days old. When the full_results flag is False, the callback runs as expected, updates the database and returns the full results. However, when the results are available but more than 0 days old, the callback hits a forking error and is unable to complete.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec(). Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
I am at a loss as to why the function would run without error on one of the conditions, but then have a forking error on the other condition. The process that runs after both conditions is exactly the same.
By using print statements, I have noticed that this forking error triggers when the function tries to call the requests.get() function on the API.
If this issue is related to how I have setup the background process functionality, I would greatly appreciate some suggestions or assitance on how to do this properly, where I will not face this forking error.
If there is any information I have left out that will be helpful, please let me know and I will try to provide it.
Thank you for any help you can provide.

Related

First Result Only Using Google-Search-API

I am using abenassi/Google-Search-API https://github.com/abenassi/Google-Search-API to make multiple Google queries in a small python script. I typically only need the first result (link) but the program is built to collect whole pages of results. So far I have been limiting the result as such:
results = google.search(query)
for result in iter(results[0:1]):
loc = result.link
The problem is that the script is slow as a result (I think) of having to wade through the whole page before I get my one link. Does anyone see something obvious I'm missing, or alternately, a simple way to modify the standard_search module https://github.com/abenassi/Google-Search-API/blob/master/google/modules/standard_search.py to limit results to first link only? Thanks!

Regular Expressions and SQL Server Error Logs - All false results

Ok, I have done my searching and I have tried many things. I think it is time to put my question here:
I have been working on taking in other user's SQL Server error logs, parsing out the rows into columns, then bulk inserting the data 1000 at a time. I troubleshoot SQL Server for other people so sp_readerrorlog will only show me my local instance. Finding root cause involves 4 sets of logs (SQL Server, Application Event, System Event, and get-clusterlog outputs and matching up timestamps. A fast load into SQL Server along with the ability to pull the exact timeframe needed will shorten my time spent staring at log files.
I am currently bottlenecked in testing the rows with a regular expression, which does work if I feed it data myself:
def sqlrowmatch(row):
pattern = re.compile(r'\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d.\d\d')
if pattern.search(row):
return True
else:
return False
given any string that matches above (1111-11-11 11:11:11.11) will return as true. The idea is if in a SQL Server Error Log, if this is matched, then it is a separate entry. this will allow memory graphs, deadlock graphs, and dumps to all be grouped in one entry as opposed to being split over several lines.
However, if I point it at one of the SQL Error Logs, there seems to be extra characters. This is giving re.match and re.show a hard time finding a match. If I load any line in this function,sqlrowmatch(), it reports back false for all rows.
ÿþ <-- this appears to be the first 2 characters at the first line. re.search just doesn't even find it anywhere in the in the different elements.
False is what is returned if I put the function in with the 'with open' as statement:
with open(file, 'r') as sqllog:
for line in sqllog:
print(sqlrowmatch(line))
the first line should always be true if sqlrowmatch() is used.
2018-10-13 22:40:09.41 Server Microsoft SQL Server 2016 (SP2-CU2-GDR) (KB4458621) - 13.0.5201.2 (X64)
So I am lost and my current project is at a halt. Perhaps some seasoned insight from this group can get me going again.
TIA
Interesting enough, I found my answer here: Opening huge text file, unicode issue
open should be done with encoding='utf-16'
It now matches appropriately

Create automated report from web data

I have a set of multiple API's I need to source data from and need four different data categories. This data is then used for reporting purposes in Excel.
I initially created web queries in Excel, but my Laptop just crashes because there is too many querie which have to be updated. Do you guys know a smart workaround?
This is an example of the API I will source data from (40 different ones in total)
https://api.similarweb.com/SimilarWebAddon/id.priceprice.com/all
The data points I need are:
EstimatedMonthlyVisits, TopOrganicKeywords, OrganicSearchShare, TrafficSources
Any ideas how I can create an automated report which queries the above data on request?
Thanks so much.
If Excel is crashing due to the demand, and that doesn't surprise me, you should consider using Python or R for this task.
install.packages("XML")
install.packages("plyr")
install.packages("ggplot2")
install.packages("gridExtra")
require("XML")
require("plyr")
require("ggplot2")
require("gridExtra")
Next we need to set our working directory and parse the XML file as a matter of practice, so we're sure that R can access the data within the file. This is basically reading the file into R. Then, just to confirm that R knows our file is in XML, we check the class. Indeed, R is aware that it's XML.
setwd("C:/Users/Tobi/Documents/R/InformIT") #you will need to change the filepath on your machine
xmlfile=xmlParse("pubmed_sample.xml")
class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"
Now we can begin to explore our XML. Perhaps we want to confirm that our HTTP query on Entrez pulled the correct results, just as when we query PubMed's website. We start by looking at the contents of the first node or root, PubmedArticleSet. We can also find out how many child nodes the root has and their names. This process corresponds to checking how many entries are in the XML file. The root's child nodes are all named PubmedArticle.
xmltop = xmlRoot(xmlfile) #gives content of root
class(xmltop)#"XMLInternalElementNode" "XMLInternalNode" "XMLAbstractNode"
xmlName(xmltop) #give name of node, PubmedArticleSet
xmlSize(xmltop) #how many children in node, 19
xmlName(xmltop[[1]]) #name of root's children
To see the first two entries, we can do the following.
# have a look at the content of the first child entry
xmltop[[1]]
# have a look at the content of the 2nd child entry
xmltop[[2]]
Our exploration continues by looking at subnodes of the root. As with the root node, we can list the name and size of the subnodes as well as their attributes. In this case, the subnodes are MedlineCitation and PubmedData.
#Root Node's children
xmlSize(xmltop[[1]]) #number of nodes in each child
xmlSApply(xmltop[[1]], xmlName) #name(s)
xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s)
xmlSApply(xmltop[[1]], xmlSize) #size
We can also separate each of the 19 entries by these subnodes. Here we do so for the first and second entries:
#take a look at the MedlineCitation subnode of 1st child
xmltop[[1]][[1]]
#take a look at the PubmedData subnode of 1st child
xmltop[[1]][[2]]
#subnodes of 2nd child
xmltop[[2]][[1]]
xmltop[[2]][[2]]
The separation of entries is really just us, indexing into the tree structure of the XML. We can continue to do this until we exhaust a path—or, in XML terminology, reach the end of the branch. We can do this via the numbers of the child nodes or their actual names:
#we can keep going till we reach the end of a branch
xmltop[[1]][[1]][[5]][[2]] #title of first article
xmltop[['PubmedArticle']][['MedlineCitation']][['Article']][['ArticleTitle']] #same command, but more readable
Finally, we can transform the XML into a more familiar structure—a dataframe. Our command completes with errors due to non-uniform formatting of data and nodes. So we must check that all the data from the XML is properly inputted into our dataframe. Indeed, there are duplicate rows, due to the creation of separate rows for tag attributes. For instance, the ELocationID node has two attributes, ValidYN and EIDType. Take the time to note how the duplicates arise from this separation.
#Turning XML into a dataframe
Madhu2012=ldply(xmlToList("pubmed_sample.xml"), data.frame) #completes with errors: "row names were found from a short variable and have been discarded"
View(Madhu2012) #for easy checking that the data is properly formatted
Madhu2012.Clean=Madhu2012[Madhu2012[25]=='Y',] #gets rid of duplicated rows
Here is a link that should help you get started.
http://www.informit.com/articles/article.aspx?p=2215520
If you have never used R before, it will take a little getting used to, but it's worth it. I've been using it for a few years now and when compared to Excel, I have seen R perform anywhere from a couple hundred percent faster to many thousands of percent faster than Excel. Good luck.

tailLines and SinceTime in logging api,both not worked simultaneously

I am using container engine, and my pods are hosted there.
I am trying to fetch logs, using log api :
http://localhost:8000/api/v1/namespaces/app-test/pods/designer-0/log?tailLines=100&sinceTime=2017-09-17T10:47:58Z
if i used both the query params separately, it works and show the proper result, but if i am using it simultaneously only the top 100 logs are returning, the sinceTime param is get ignored.
my scenario is, i need a log from a specific time, in a chunk like, 100 lines, 100 lines.. like this.
I am not sure, whether it is a bug, or it is not implemented.
I found this from the api reference manual
https://kubernetes.io/docs/api-reference/v1.6/
tailLines - If set, the number of lines from the end of the logs to
show. If not specified, logs are shown from the creation of the
container or sinceSeconds or sinceTime
So, that means if you specify tailLines, it start from the end. I dont see any option explicitly mentioned other than limitBytes. But you will have to play around with it as it does not guarantee number of lines.
tailLines=X tells the server to start that many lines from the end
sinceTime tells the server to start from the specified time
the options are mutually exclusive
Thanks All,
I have later on recognized that, it is not ignoring the sinceTime, as the TailLines intended functionality is return the lines from the last.
So, if i mentioned the sinceTime= 10 PM yesterday, it will return the records from that time..And if also tailLines, is mentioned, so it will return the recent logs from that chunk.
So, it was working as expected. I need to play with LimitBytes for getting the logs in chunk, from that time, Instead of full logs.

2 Sequential Transactions, setting Detail Number (Revit API / Python)

Currently, I made a tool to rename view numbers (“Detail Number”) on a sheet based on their location on the sheet. Where this is breaking is the transactions. Im trying to do two transactions sequentially in Revit Python Shell. I also did this originally in dynamo, and that had a similar fail , so I know its something to do with transactions.
Transaction #1: Add a suffix (“-x”) to each detail number to ensure the new numbers won’t conflict (1 will be 1-x, 4 will be 4-x, etc)
Transaction #2: Change detail numbers with calculated new number based on viewport location (1-x will be 3, 4-x will be 2, etc)
Better visual explanation here: https://www.docdroid.net/EP1K9Di/161115-viewport-diagram-.pdf.html
Py File here: http://pastebin.com/7PyWA0gV
Attached is the python file, but essentially what im trying to do is:
# <---- Make unique numbers
t = Transaction(doc, 'Rename Detail Numbers')
t.Start()
for i, viewport in enumerate(viewports):
setParam(viewport, "Detail Number",getParam(viewport,"Detail Number")+"x")
t.Commit()
# <---- Do the thang
t2 = Transaction(doc, 'Rename Detail Numbers')
t2.Start()
for i, viewport in enumerate(viewports):
setParam(viewport, "Detail Number",detailViewNumberData[i])
t2.Commit()
Attached is py file
As I explained in my answer to your comment in the Revit API discussion forum, the behaviour you describe may well be caused by a need to regenerate between the transactions. The first modification does something, and the model needs to be regenerated before the modifications take full effect and are reflected in the parameter values that you query in the second transaction. You are accessing stale data. The Building Coder provides all the nitty gritty details and numerous examples on the need to regenerate.
Summary of this entire thread including both problems addressed:
http://thebuildingcoder.typepad.com/blog/2016/12/need-for-regen-and-parameter-display-name-confusion.html
So this issue actually had nothing to do with transactions or doc regeneration. I discovered (with some help :) ), that the problem lied in how I was setting/getting the parameter. "Detail Number", like a lot of parameters, has duplicate versions that share the same descriptive param Name in a viewport element.
Apparently the reason for this might be legacy issues, though im not sure. Thus, when I was trying to get/set detail number, it was somehow grabbing the incorrect read-only parameter occasionally, one that is called "VIEWER_DETAIL_NUMBER" as its builtIn Enumeration. The correct one is called "VIEWPORT_DETAIL_NUMBER". This was happening because I was trying to get the param just by passing the descriptive param name "Detail Number".Revising how i get/set parameters via builtIn enum resolved this issue. See images below.
Please see pdf for visual explanation: https://www.docdroid.net/WbAHBGj/161206-detail-number.pdf.html

Resources