Twython basic, helpm please , can it get easier than this? - get

This script is giving ma a 500 Error, any ideas?
I am taking the script from a page from python samples and also using the path given to me by my hosting company (and I know it works because I have another script that does work.)
The file has 755 permissions as well as it's directory:
#!/home3/master/bin/python
import sys
sys.path.insert(1,'/home3/master/lib/python2.6/site-packages')
from twython import Twython
twitter = Twython()
trends = twitter.getCurrentTrends()
print trends

There are two problems with this code. The first is you have not included any OAuth data, so the Twitter API will reject whatever you send. The second is there is no getCurrentTrends() attribute in Twython. Did you mean get_available_trends or get_closest_trends?

Related

Integrating python-decouple with PRAW?

I've been trying to see if I can use python-decouple to place my bot credentials on a separate .env file.
Auth method is basically right off the praw doc:
reddit = praw.Reddit(
client_id=config('CLIENT_ID'),
client_secret=config('CLIENT_SECRET'),
password=config('PASSWORD'),
user_agent=config('USER_AGENT'),
username=config('USERNAME')
)
However, whenever I try it, it seems to return an 403 auth error. I work my way back, replacing the decouple configs with strings of the actual details, but it doesn't seem to follow through, and the errors that occur seem random depending on what and when things I take out.
Is this a problem with how decouple functions?
Thanks.
I've been trying to see if I can use python-decouple to place my bot credentials on a separate .env file.
Why not use a praw.ini file? This is documented here in PRAW documentation. It's a format for storing Reddit credentials in a separate file from your code. For example, a praw.ini file may look like:
[bot1]
client_id=Y4PJOclpDQy3xZ
client_secret=UkGLTe6oqsMk5nHCJTHLrwgvHpr
password=pni9ubeht4wd50gk
username=fakebot1
[bot2]
client_id=6abrJJdcIqbclb
client_secret=Kcn6Bj8CClyu4FjVO77MYlTynfj
password=mi1ky2qzpiq8s59j
username=fakebot2
You then use specific credentials in your code like so:
import praw
reddit = praw.Reddit('bot2', user_agent='myBot v0.1')
print('Logged in as', reddit.user.me())
I think this is the best solution for working with PRAW credentials.
However, if you really want to do it with python-decouple, here's a working example:
Contents of file .env:
username=k8IA
password=REDACTED
client_id=REDACTED
client_secret=REDACTED
Contents of file connect.py:
import praw
from decouple import config
reddit = praw.Reddit(username=config('username'),
password=config('password'),
client_id=config('client_id'),
client_secret=config('client_secret'),
user_agent='myBot v0.1')
print('Logged in as', reddit.user.me())
Output when running python3 connect.py:
Logged in as k8IA

I want to extract the latitude, longitude and accurency from https://mycurrentlocation.net/ using Python3 BeautifulSoup and requests modules

My basic idea is to make a simple script in order to get geographic coordinates using this site: https://mycurrentlocation.net/.
When I run my script the attribute column is empty and the program doesn't return the list correctly : check the list
Please help me! :)
import requests,time
from bs4 import BeautifulSoup as bs
site="https://mycurrentlocation.net/"
def track_the_location():
page=requests.get(site)
page=bs(page.text,"html.parser")
latitude=page.find_all("td",{"id":"latitude"})
longitude=page.find_all("td",{"id":"longitude"})
accuracy=page.find_all("td",{"id":"accuracy"})
List=[latitude.text,longitude.text,accuracy.text]
return List
print(track_the_location())
I think the problem is the fact, that you are running it from a local script. This behaves different from a browser as it doesn't provide any location information to the website. The needed information are provided through runtime, so you actually need to simulate a browser session to get the data as long as they don't offer an API, where you can manually specify your information.
A possible solution for that would be selenium as it helps you simulating a browser session. Here's the selenium documentation for further readings.
Hope I could help you. Have a nice day.

audio file isn't being parsed with Google Speech

This question is a followup to a previous question.
The snippet of code below almost works...it runs without error yet gives back a None value for results_list. This means it is accessing the file (I think) but just can't extract anything from it.
I have a file, sample.wav, living publicly here: https://storage.googleapis.com/speech_proj_files/sample.wav
I am trying to access it by specifying source_uri='gs://speech_proj_files/sample.wav'.
I don't understand why this isn't working. I don't think it's a permissions problem. My session is instantiated fine. The code chugs for a second, yet always comes up with no result. How can I debug this?? Any advice is much appreciated.
from google.cloud import speech
speech_client = speech.Client()
audio_sample = speech_client.sample(
content=None,
source_uri='gs://speech_proj_files/sample.wav',
encoding='LINEAR16',
sample_rate_hertz= 44100)
results_list = audio_sample.async_recognize(language_code='en-US')
Ah, that's my fault from the last question. That's the async_recognize command, not the sync_recognize command.
That library has three recognize commands. sync_recognize reads the whole file and returns the results. That's probably the one you want. Remove the letter "a" and try again.
Here's an example Python program that does this: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/cloud-client/transcribe.py
FYI, here's a summary of the other types:
async_recognize starts a long-running, server-side operation to translate the whole file. You can make further calls to the server to see whether it's finished with the operation.poll() method and, when complete, can get the results via operation.results.
The third type is streaming_recognize, which sends you results continually as they are processed. This can be useful for long files where you want some results immediately, or if you're continuously uploading live audio.
I finally got something to work:
import time
from google.cloud import speech
speech_client = speech.Client()
sample = speech_client.sample(
content = None
, 'gs://speech_proj_files/sample.wav'
, encoding='LINEAR16'
, sample_rate= 44100
, 'languageCode': 'en-US'
)
retry_count = 100
operation = sample.async_recognize(language_code='en-US')
while retry_count > 0 and not operation.complete:
retry_count -= 1
time.sleep(10)
operation.poll() # API call
print(operation.complete)
print(operation.results[0].transcript)
print(operation.results[0].confidence)
for op in operation.results:
print op.transcript
Then something like
for op in operation.results:
print op.transcript

python 3.3 basic error

I have python 3.3 installed.
i use the example they use on their site:
import urllib.request
response = urllib.request.urlopen('http://python.org/')
html = response.read()
the only thing that happens when I run it is I get this :
======RESTART=========
I know I am a rookie but I figured the example from python's own website should be able to work.
It doesn't. What am I doing wrong?Eventually I want to run this script from the website below. But I think urllib is not going to work as it is on that site. Can someone tell me if the code will work with python3.3???
http://flowingdata.com/2007/07/09/grabbing-weather-underground-data-with-beautifulsoup/
I think I see what's probably going on. You're likely using IDLE, and when it starts a new run of a program, it prints the
======RESTART=========
line to tell you that a fresh program is starting. That means that all the variables currently defined are reset and/or deleted, as appropriate.
Since your program didn't print any output, you didn't see anything.
The two lines I suggested adding were just tests to figure out what was going on, they're not needed in general. [Unless the window itself is automatically closing, which it shouldn't.] But as a rule, if you want to see output, you'll have to print what you're interested in.
Your example works for me. However, I suggest using requests instead of urllib2.
To simplify the example you linked to, it would look like:
from bs4 import BeautifulSoup
import requests
resp = requests.get("http://www.wunderground.com/history/airport/KBUF/2007/12/16/DailyHistory.html")
soup = BeautifulSoup(resp.text)

how to verify links in a PDF file

I have a PDF file which I want to verify whether the links in that are proper. Proper in the sense - all URLs specified are linked to web pages and nothing is broken. I am looking for a simple utility or a script which can do it easily ?!
Example:
$ testlinks my.pdf
There are 2348 links in this pdf.
2322 links are proper.
Remaining broken links and page numbers in which it appears are logged in brokenlinks.txt
I have no idea of whether something like that exists, so googled & searched in stackoverflow also. But did not find anything useful yet. So would like to anyone has any idea about it !
Updated: to make the question clear.
You can use pdf-link-checker
pdf-link-checker is a simple tool that parses a PDF document and checks for broken hyperlinks. It does this by sending simple HTTP requests to each link found in a given document.
To install it with pip:
pip install pdf-link-checker
Unfortunately, one dependency (pdfminer) is broken. To fix it:
pip uninstall pdfminer
pip install pdfminer==20110515
I suggest first using the linux command line utility 'pdftotext' - you can find the man page:
pdftotext man page
The utility is part of the Xpdf collection of PDF processing tools, available on most linux distributions. See http://foolabs.com/xpdf/download.html.
Once installed, you could process the PDF file through pdftotext:
pdftotext file.pdf file.txt
Once processed, a simple perl script that searched the resulting text file for http URLs, and retrieved them using LWP::Simple. LWP::Simple->get('http://...') will allow you to validate the URLs with a code snippet such as:
use LWP::Simple;
$content = get("http://www.sn.no/");
die "Couldn't get it!" unless defined $content;
That would accomplish what you want to do, I think. There are plenty of resources on how to write regular expressions to match http URLs, but a very simple one would look like this:
m/http[^\s]+/i
"http followed by one or more not-space characters" - assuming the URLs are property URL encoded.
There are two lines of enquiry with your question.
Are you looking for regex verification that the link contains key information such as http:// and valid TLD codes? If so I'm sure a regex expert will drop by, or have a look at regexlib.com which contains lots of existing regex for dealing with URLs.
Or are you wanting to verify that a website exists then I would recommend Python + Requests as you could script out checks to see if websites exist and don't return error codes.
It's a task which I'm currently undertaking for pretty much the same purpose at work. We have about 54k links to get processed automatically.
Collect links by:
enumerating links using API, or dumping as text and linkifying the result, or saving as html PDFMiner.
Make requests to check them:
there are plethora of options depending on your needs.
https://stackoverflow.com/a/42178474/1587329's advice was inspiration to write this simple tool (see gist):
'''loads pdf file in sys.argv[1], extracts URLs, tries to load each URL'''
import urllib
import sys
import PyPDF2
# credits to stackoverflow.com/questions/27744210
def extract_urls(filename):
'''extracts all urls from filename'''
PDFFile = open(filename,'rb')
PDF = PyPDF2.PdfFileReader(PDFFile)
pages = PDF.getNumPages()
key = '/Annots'
uri = '/URI'
ank = '/A'
for page in range(pages):
pageSliced = PDF.getPage(page)
pageObject = pageSliced.getObject()
if pageObject.has_key(key):
ann = pageObject[key]
for a in ann:
u = a.getObject()
if u[ank].has_key(uri):
yield u[ank][uri]
def check_http_url(url):
urllib.urlopen(url)
if __name__ == "__main__":
for url in extract_urls(sys.argv[1]):
check_http_url(url)
Save to filename.py, run as python filename.py pdfname.pdf.

Resources