I am trying to read data online from a website every 1 seconds using following code, then plot the result in real time. I mean I like to add new last_price in every second to the previous plot and update it.
import time
import requests
for i in range(2600):
time.sleep(1)
with requests.Session() as s:
data = {'ContractCode' : 'SAFTR98'}
r = s.post('http://cdn.ime.co.ir/Services/Fut_Live_Loc_Service.asmx/GetContractInfo', json = data ).json()
for key, value in r.items():
print(r[key]['ContractCode'])
last_prices = (r[key]['LastTradedPrice'])
I used animation.FuncAnimation but didn't work because it plots all the results after the 2600 iterations is done or I stop the program! So I thought maybe I can do the work with multi-threading. But I don't know how exactly should I use it? I searched about but couldn't understand the examples and how should I map it to my problem?
This is not a duplicated question. In the mentioned link I tried to solve the problem using Animations but in this question I am trying to find a new way using multi-threading.
Related
I am working with python especially pandas module. I am slicing in this way dfps7 = dfps5.iloc[:,[1,4,5,7,8,9,10]] and it's working. But I want to know the smart way of representing the continuous part as like 4,5,7,8,9,10 as 4:10. When I tried like dfps7 = dfps5.iloc[:,[1,4:10]] and this is not working. Looking forward for the smartest solution.
I am attempting to migrate some old python code using the scikit-Learn library.
When doing so I encountered the GaussianProcess class which is now fully reimplemented as GaussianProcessRegressor.
I was able to get a running script by replacing
self.f = GaussianProcess(corr='linear',theta0=1e-2,thetaL=1e-4,thetaU=1e-1)
with
self.f = GaussianProcessRegressor()
except now I have completely different results when calling predict()...
Any idea how to translate the autocorrelation method (corr) and different theta values with the new API?
I found this topic talking about pretty much the same problem, but aparently the author was fine about not having the old parameters taken into account, and this topic which states the problem precisely as well but does not provide a clear answer.
I am trying to visualize a graph using ipycytoscape and I got so far:
import networkx as nx
import ipycytoscape
G2 = nx.Graph()
G2.add_nodes_from([*'ABCDE'])
G2.add_edges_from([('A','B'),('B','C'),('D','E'),('F','A'),('C','A')])
mynodes= G2.nodes
myedges= G2.edges
nodes = [{"data":{"id":n}} for n in mynodes]
edges = [{"data":{"source":n0,"target":n1}} for n0,n1 in myedges]
JSON_graph_data = {"nodes":nodes,"edges":edges}
cytoscapeobj = ipycytoscape.CytoscapeWidget()
cytoscapeobj.graph.add_graph_from_json(JSON_graph_data)
cytoscapeobj
It works so far. G2 is a very small Graph for training purposes
But what could be the reason that when I passed a bigger one (40 nodes, several hundreds of edges) I dont get an error but I dont see anything coming out.
This seems like a bug. Sorry I missed your question, I'll be more activate on stackoverflow from now on.
If you're still interested please open an issue on: https://github.com/QuantStack/ipycytoscape/
Or join the chat: https://gitter.im/QuantStack/Lobby
I'll be happy to help!
Maybe the error won't even be reproducible anymore since we're moving kind of fast with the API.
I am using spatstat package in R to read my road network shapefile which also has some additional attributes.
When i am reading my shapefiles and converting them to as.psp(before I make them an object of linnet), I am getting n columns of data frame discarded. I do not understand why? The columns being discarded are my covariates for a linear network, so I am not able to bring them into my analysis.
Could someone give me an idea why this happens and how to correct it?
Why it happens:
I would guess that we (spatstat authors) need to spend a bit of time discussing with the maptools guys how to handle the additional info in the SpatialLinesDataFrame object, and we haven't done that yet.
How to correct it:
You have to write some code on your own at the moment. You can extract the data from SpatialLinesDataFrame object by accessing the #data slot. Please provide specific data and how you need to use the additional data (what format do you need it in) if you need more help. You can find a few helpful commands here: https://cran.r-project.org/web/packages/spatstat/vignettes/shapefiles.pdf
I am currently writing this code to grab restaurants' official website links off from their Yelp pages. The code works mostly, but it returns the first link twice instead of going through the list and returning each item once. I tried to work it out but I'm just stuck on what is causing this to happen. Can you spot what I am doing wrong?
I also have another question about grabbing links from Yelp. I know Yelp may not like it, but I really cannot copy and paste links from 20,000 pages by hand so I have to use this.
Would they block my IP? Will inserting 2-second delays between requests keep them from blocking me? Are there any other ways besides inserting delays?
import urllib
import urllib.request
from bs4 import BeautifulSoup
url=[
"https://www.yelp.com/biz/buffalo-wild-wings-ann-arbor-3",
"https://www.yelp.com/biz/good-burger-east-dearborn-dearborn?osq=mac+donalds"
]
def make_soup(url):
for i in url:
thepage=urllib.request.urlopen(i)
soupdata=BeautifulSoup(thepage, "html.parser")
return soupdata
compoundfinal=''
soup=make_soup(url)
for i in url:
for thing1 in soup.findAll('div',{'class':'mapbox-text'}):
for thing2 in thing1.findAll('a',{'rel':'nofollow'}):
final='"http://www.'+thing2.text+'",\n'
compoundfinal=compoundfinal+final
print(compoundfinal)
An answer for your secondary question:
Yes, putting a delay between scrapes would be a very good idea. I would say a static 2-second delay may not be enough - consider a random delay between 2 and 5, perhaps.That will make the scrapes seem less deterministic, though you might still get caught based on scrapes per hour. It would be worth writing your script so you can restart it, in case there are problems mid-scrape - you don't want to have to start again from the beginning.
Please also download Yell's Robots Exclusion File and check your scraping list against their no-scrape list. I notice they request a 10-second delay for Bing, so consider increasing the delay I suggested above.
You might also want to consider the legal aspects of this. Most sites want to be scraped, so they can appear in search engines. However some data aggregators may not have the same enthusiasm: they probably want to be found by search engines, but they don't want to be replaced by competitors. Remember that it costs a lot of money to collect the data in the first place, and they may object to third parties getting a free ride. Thus, if you plan to do this regularly in order to update your own website, I think you might run into either technical or legal obstacles.
You may be tempted to use proxies to hide your scraping traffic, but this carries with it an implicit message that you believe you are doing something wrong. Your scrape target will probably make more efforts to block you in this case, and may be more likely to take legal action against you if they find which website you are republishing the data on.
You are trying to split your processing between two different loops, but not properly saving the data and then re-iterating over it between the two. You also look like you have the wrong indentation on the return statement in the function definition, so the function returns after the first iteration regardless of the number of items in the list. The below seems to work by placing all the processing into one function. It was the best way to get working code from your example, however it isn't the best way to tackle the problem. You would be better off defining your function to process one page, then looping over url to call the function.
import urllib
import urllib.request
from bs4 import BeautifulSoup
url=[
"https://www.yelp.com/biz/buffalo-wild-wings-ann-arbor-3",
"https://www.yelp.com/biz/good-burger-east-dearborn-dearborn?osq=mac+donalds"
]
def make_soup(url):
compoundfinal = ""
for i in url:
thepage=urllib.request.urlopen(i)
soupdata=BeautifulSoup(thepage, "html.parser")
for thing1 in soupdata.findAll('div',{'class':'mapbox-text'}):
for thing2 in thing1.findAll('a',{'rel':'nofollow'}):
final='"http://www.'+thing2.text+'",\n'
compoundfinal=compoundfinal+final
return compoundfinal
final = make_soup(url)
print( final )
output
"http://www.buffalowildwings.com",
"http://www.goodburgerrestaurant.com"