I know that my authentication is working and I can make tweets just fine. I have no reason to believe that the tweepy library is causing this problem, though I suppose I have no reason to rule it out. My code looks something like this, attempting to tweet an emoji flag. No other emoji is working either.
auth = tweepy.OAuthHandler(keys.twitter['consumer_key'], keys.twitter['consumer_secret'])
auth.set_access_token(keys.twitter['access_token'], keys.twitter['access_token_secret'])
api = tweepy.API(auth)
print('Connected to the Twitter API...')
api.update_status(r'testing \U0001F1EB\U0001F1F7')
I get an error code 400 with seemingly no additional info about what the reason is. Trying to determine if the problem is the encoding in the string is somehow wrong, or if it is simply some sort of problem with sending it to Twitter's API.
You have unicode in your string. Because you are using Python 2 use:
api.update_status(u"testing \U0001F1EB\U0001F1F7")
Notice that I changed r"x" to u"x".
If you are using Python 3, you can just use:
api.update_status("testing \U0001F1EB\U0001F1F7")
Update: This is the tweet that I sent using this code:
https://twitter.com/twistedhardware/status/686527722634383360
Related
I am trying to build a program which runs a function that input a url of a post, output the links of images and videos the post contain. It works really good for images. However, when it comes to get the links of videos, it return me a wrong url. I have no idea how to handle this situation.
https://scontent-lax3-2.cdninstagram.com/v/t50.2886-16/86731551_2762014420555254_3542675879337307555_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jYXJvdXNlbF9pdGVtIiwicWVfZ3JvdXBzIjoiW1wiaWdfd2ViX2RlbGl2ZXJ5X3Z0c19vdGZcIl0ifQ&_nc_ht=scontent-lax3-2.cdninstagram.com&_nc_cat=106&_nc_ohc=WDuXskvIuLEAX9rj7MU&vs=17877888256532912_3147883953&_nc_vs=HBksFQAYJEdCOXJLd1gyMVdhWUNkQUpBS090UGo3eEhDb3hia1lMQUFBRhUAAsgBABUAGCRHTFBXTUFVTXNPaG5XcW9EQU5KUEE5bEZVdVZxYmtZTEFBQUYVAgLIAQAoABgAGwGIB3VzZV9vaWwBMBUAABgAFuD4nJGH9sE%2FFQIoAkMzLBdAEszMzMzMzRgSZGFzaF9iYXNlbGluZV8xX3YxEQB17gcA&_nc_rid=97e769e058&oe=5EDF10A5&oh=3713c35f89fca1aa9554a281aa3421ed
https://scontent-gmp1-1.cdninstagram.com/v/t50.2886-16/0_0_0_\x00.mp4?_nc_ht=scontent-gmp1-1.cdninstagram.com&_nc_cat=100&_nc_ohc=Wnu_-GvKHJoAX9F_ui1&oe=5EDE8214&oh=7920ac3339d5bf313e898c3cbec85fa2
Here are two urls. The first one is copied from the sources of a web page, while the second one is copied from the data scraped by pyquery. They come from a same Instagram post, same path, but they are totally different. The first one works well, but the second one doesn't. How can I solve this? How can I get a right video url?
I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
Here is my code related to the question
def getUrls(url):
URL = str(url)
html = get_html(URL)
doc = pq(html)
urls = []
items = doc('script[type="text/javascript"]').items()
for item in items:
if item.text().strip().startswith('window._sharedData'):
js_data = json.loads(item.text()[21:-1], encoding='utf-8')
shortcode_media = js_data["entry_data"]["PostPage"][0]["graphql"]["shortcode_media"]
edges = shortcode_media['edge_sidecar_to_children']['edges']
for edge in edges:
is_video = edge['node']['is_video']
if is_video:
video_url = edge['node']['video_url']
video_url.replace(r'\u0026', "&")
urls.append(video_url)
else:
display_url = edge['node']['display_resources'][-1]['src']
display_url.replace(r'\u0026', "&")
urls.append(display_url)
return urls
Thanks in advance.
There's nothing wrong with your code. This is a known intermittent issue with Instagram, and other people have encountered it too: https://github.com/arc298/instagram-scraper/issues/545
There doesn't appear to be a known fix yet.
Also, while unrelated to your question, it's worth mentioning that you don't need to inspect the display_resources object to get the URL of the image:
display_url = edge['node']['display_resources'][-1]['src']
There is already a display_url property available (I'm guessing you saw this, based on the variable name?). So you can simply do:
display_url = edge['node']['display_url']
I've seen this sometimes when using this Python module instead of HTML-scraping. At least with that module, edge["node"]["videos"]["standard_resolution"]["url"] usually (but not always) gives a non-corrupted value.
I recently learned that Companies House has API that allows access to companies filling history and I want to get data from the API and load it in pandas dataframe.
I have set up API account but I am having difficulties with the python wrapper companies-house 0.1.2 https://pypi.org/project/companies-house/
from companies_house.api import CompaniesHouseAPI
ch = CompaniesHouseAPI('my_api_key')
This works, but when I try to get the data with get_company or get_company_filing_history I seem to pass incorrect parameters. I tried passing CompaniesHouseAPI.get_company('02627406') but get KeyError: 'company_number'. Quite puzzled as there is no example provided in the documentation. Please help me figure out what should I pass as a parameter/parameters in both functions.
# what errors
CompaniesHouseAPI.get_company('02627406')
I am not a python expert but want to learn by doing interesting projects. Please help. If you know how to get financial history from Companies House API using another python wrapper your solution is also welcome.
I recently wrote a blog post describing how to make your own wrapper and then use that to create an application that loads the data into a pandas dataframe as you described. You can find it here.
By creating your own wrapper class, you avoid the limitations of whichever library you have chosen. You may also learn a lot about calling an API from python and working with the response.
Here is a code example that does not need a Companies House-specific library.
import requests
import json
url = "https://api.companieshouse.gov.uk/search/companies?q={}"
query = "tesco"
api_key = "vLmk-4YxYS-QH8nMi8767zJSlcPlo3MKn41-d" #Fake key - insert your key here
response = requests.get(url.format(query),auth=(api_key,''))
json_search_result = response.text
search_result = json.JSONDecoder().decode(json_search_result)
for company in search_result['items']:
print(company['title'])
Running this should give you the top 20 matches for the keyword "tesco" from the Companies House search function. Check out the blog post to see how you could adapt this to perform any function from the API.
I met a very weird problem when I am trying to parse data from a JavaScript written website. Maybe because I am not an expert of web development.
Here is what happened:
I am trying to get all the comments data from The Globe and Mail. If you check its source code, there is no way to use Python and parse the comments data from the source code, everything is written in JavaScript.
However, there is a magic tool called "Gigya" API, it could return all the comments from a JS written website. Gigya getComments method
When I was using these lines of code in Python Scrapy Spider, it could return all the comments.
data = {"categoryID": self.categoryID,
"streamID": streamId,
"APIKey": self.apikey,
"callback": "foo",
"threadLimit": 1000 # assume all the articles have no more then 1000 comments
}
r = urlopen("http://comments.us1.gigya.com/comments.getComments", data=urlencode(data).encode("utf-8"))
comments_lst = loads(r.read().decode("utf-8"))["comments"]
However, The Globe and Mail is updating their website, all the comments posted before Nov. 28 have been hiden from the web for now. That's why on the sample url I am shoing here, you could only see 2 comments, because they were posted after Nov. 28. And these 2 new comments have been added the new feature - the "React" button.
The weird thing is, righ now when I am running my code, I can get all those hidden hundreds of comments published before Nov. 28, but cannot get the new commnets we can see on the website now.
I have tried all the Gigya comment related methods, none of them worked, other Gigya methods, do not look like something helpful...
Is there any way to solve this problem?
Or at least, do you know why, I can get all the hiden comments but cannot get visible new commnets that have the new features?
Finally, I solved the problem with Python selenium library, it's free and it's super cool.
So, it seems although in the source code of JS written website, we could not see the content, it's in fact has HTML page where we can parse the content.
First of all, I installed Firebug on Firefox, with this Addon, I'm able to see the HTML page of the url and it's very easy to help you locate the content, just search key words in Firebug
Then I wrote the code like this:
from selenium import webdriver
import time
def main():
comment_urls = [
"http://www.theglobeandmail.com/opinion/a-fascists-win-americas-moral-loss/article32753320/comments/"
]
for comment_url in comment_urls:
driver = webdriver.Firefox()
driver.get(comment_url)
time.sleep(5)
htmlSource = driver.page_source
clk = driver.find_element_by_css_selector('div.c3qHyJD')
clk.click()
reaction_counts = driver.find_elements_by_class_name('c2oytXt')
for rc in reaction_counts:
print(rc.text)
if __name__ == "__main__":
main()
The data I am parsing here are those content cannot be found in HTML page until you click the reaction image on the website. What makes selenium super cool is that click() method. After you found the element you can click, just use this method, then those generated elements will appear in the HTML and become parsable. Super cool!
This script is giving ma a 500 Error, any ideas?
I am taking the script from a page from python samples and also using the path given to me by my hosting company (and I know it works because I have another script that does work.)
The file has 755 permissions as well as it's directory:
#!/home3/master/bin/python
import sys
sys.path.insert(1,'/home3/master/lib/python2.6/site-packages')
from twython import Twython
twitter = Twython()
trends = twitter.getCurrentTrends()
print trends
There are two problems with this code. The first is you have not included any OAuth data, so the Twitter API will reject whatever you send. The second is there is no getCurrentTrends() attribute in Twython. Did you mean get_available_trends or get_closest_trends?
I have a python script that I cobbled together that checks my gmail via the rss feed and outputs the text to the screen. It worked on python 2.6.6 but I've been unsuccessful to get it working under python 3.2.3.
I've used 2to3 to convert it. The code is here:https://www.dropbox.com/s/b1277mi7vc7hv3f/gmailcheckparseconky.py3
The problem occurs in the ObtainEmailFeed function.
I get the dreaded "TypeError("expected bytes, not %s" % s.__class__.__name__)" error when I use
b64auth = base64.encodestring("%s:%s" % (user, password))
however when I use
string = bytes("%s:%s" % (user,password), 'utf-8')
string2=str(string,'utf-8')
b64auth = base64.encodestring( string)
I get
auth = "Basic " + b64auth
TypeError: Can't convert 'bytes' object to str implicitly
which seems to bring the whole problem back full circle.
I've tried hard coding in my password as text and that doesn't work. I get an
urllib.error.HTTPError: HTTP Error 401: Unauthorized when I set `auth = "Basic user:password"
As I said I cobbled the code together. I don't fully understand what gmail needs for me to authorize. Continuing along with that vein, I'd prefer help in fixing this code so I can learn and get a better understanding of python instead of pointing me to another script on the web that does the same/similar thing as to what I'm doing.
Thanks in advance,
Nick
Since python3 there is a distinction between bytes and strings. The error comes from the fact that base64.encodestring is now a deprecated alias for encodebytes (see also: http://docs.python.org/py3k/library/base64.html#module-base64), and so it wants you to give bytes, and it will give you bytes back.
You might want to do:
string = bytes("%s:%s" % (user,password), 'utf-8')
b64bytes = base64.encodebytes(string)
b64auth = b64bytes.decode('ascii')
Then b64auth will be a string and you will be able to use it as string in rest of the code.