How can retrieve values from Javascript variable - python-3.x

Don't know if its possible but i have written a script to download some mp3 files using python selenium which is working fine but i need to also grab some data that is stored in a javascript variable
I am trying to get "dfn=20190611-154434-123425015190- 123417133890" this data will always be changing for each download see below
onclick="var response = AjaxServerSide.LogURLAccess('C','http://crc-
c.myphones.net:1234/cgi-bin/get-record.cgi?
action=audio&file=record/2019/06/11/3DEC60E9-8B8E11E9-9580B8FB-
F6BE3F00#153.81.229.276.mp3&dfn=20190611-154434-123425015190-
123417133890
I have tried this
download_button = driver.find_element_by_xpath(f'//*[#id="dnn_ctr545_Aggregator_ctr542_CallRecordingHistory_dgCallsRecorded"]/tbody/tr[{number}]/td[7]/input[3]')
download_button.click()
data = driver.execute_script('response = AjaxServerSide.LogURLAccess')
print(data)
I am getting None when i print the variable, any suggestion is it even possible?

I seem to have found a way around it by using
val = download_button.get_attribute("onclick") to get all the attributes what i will need to do now is slice the info that i need

Related

How to get a right video url of an Instagram post using python

I am trying to build a program which runs a function that input a url of a post, output the links of images and videos the post contain. It works really good for images. However, when it comes to get the links of videos, it return me a wrong url. I have no idea how to handle this situation.
https://scontent-lax3-2.cdninstagram.com/v/t50.2886-16/86731551_2762014420555254_3542675879337307555_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jYXJvdXNlbF9pdGVtIiwicWVfZ3JvdXBzIjoiW1wiaWdfd2ViX2RlbGl2ZXJ5X3Z0c19vdGZcIl0ifQ&_nc_ht=scontent-lax3-2.cdninstagram.com&_nc_cat=106&_nc_ohc=WDuXskvIuLEAX9rj7MU&vs=17877888256532912_3147883953&_nc_vs=HBksFQAYJEdCOXJLd1gyMVdhWUNkQUpBS090UGo3eEhDb3hia1lMQUFBRhUAAsgBABUAGCRHTFBXTUFVTXNPaG5XcW9EQU5KUEE5bEZVdVZxYmtZTEFBQUYVAgLIAQAoABgAGwGIB3VzZV9vaWwBMBUAABgAFuD4nJGH9sE%2FFQIoAkMzLBdAEszMzMzMzRgSZGFzaF9iYXNlbGluZV8xX3YxEQB17gcA&_nc_rid=97e769e058&oe=5EDF10A5&oh=3713c35f89fca1aa9554a281aa3421ed
https://scontent-gmp1-1.cdninstagram.com/v/t50.2886-16/0_0_0_\x00.mp4?_nc_ht=scontent-gmp1-1.cdninstagram.com&_nc_cat=100&_nc_ohc=Wnu_-GvKHJoAX9F_ui1&oe=5EDE8214&oh=7920ac3339d5bf313e898c3cbec85fa2
Here are two urls. The first one is copied from the sources of a web page, while the second one is copied from the data scraped by pyquery. They come from a same Instagram post, same path, but they are totally different. The first one works well, but the second one doesn't. How can I solve this? How can I get a right video url?
I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
Here is my code related to the question
def getUrls(url):
URL = str(url)
html = get_html(URL)
doc = pq(html)
urls = []
items = doc('script[type="text/javascript"]').items()
for item in items:
if item.text().strip().startswith('window._sharedData'):
js_data = json.loads(item.text()[21:-1], encoding='utf-8')
shortcode_media = js_data["entry_data"]["PostPage"][0]["graphql"]["shortcode_media"]
edges = shortcode_media['edge_sidecar_to_children']['edges']
for edge in edges:
is_video = edge['node']['is_video']
if is_video:
video_url = edge['node']['video_url']
video_url.replace(r'\u0026', "&")
urls.append(video_url)
else:
display_url = edge['node']['display_resources'][-1]['src']
display_url.replace(r'\u0026', "&")
urls.append(display_url)
return urls
Thanks in advance.
There's nothing wrong with your code. This is a known intermittent issue with Instagram, and other people have encountered it too: https://github.com/arc298/instagram-scraper/issues/545
There doesn't appear to be a known fix yet.
Also, while unrelated to your question, it's worth mentioning that you don't need to inspect the display_resources object to get the URL of the image:
display_url = edge['node']['display_resources'][-1]['src']
There is already a display_url property available (I'm guessing you saw this, based on the variable name?). So you can simply do:
display_url = edge['node']['display_url']
I've seen this sometimes when using this Python module instead of HTML-scraping. At least with that module, edge["node"]["videos"]["standard_resolution"]["url"] usually (but not always) gives a non-corrupted value.

Flask: How to read "file" from request.files multiple times?

For flask web app, I know I can't read a "file" multiple times from request.files because it's a stream. So when I read it once, I'll empty it. But I need to use the "file" multiple times without saving it locally and I'm having trouble doing it.
For example, from this
image = request.files["image"]
I'd like to have something like
image2 = image.copy
and perform different operations on image and image2.
Can someone please help me with this?
image = request.files["image"]
# Seek the pointer to the beginning of the file to read again
request.files["image"].seek(0)
After reading a file just run "f.stream.seek(0)" this points to the beginning of the file stream and then you are able to read the file from beginning again, you can simply put the following snippet in a loop and see it in action.
f.stream.seek(0)
stream = io.StringIO(f.stream.read().decode("UTF8"), newline=None)
reader = csv.reader(stream)
for row in reader:
print(row)

Transform a string to some code in python 3

I store some data in a excel that I extract in a JSON format. I also call some data with GET requests from some API I created. With all these data, I do some test (does the data in the excel = the data returned by the API?)
In my case, I may need to store in the excel the way to select the data from the API json returned by the GET.
for example, the API returns :
{"countries":
[{"code":"AF","name":"Afghanistan"},
{"code":"AX","name":"Ă…land Islands"} ...
And in my excel, I store :
excelData['countries'][0]['name']
I can retrieve the excelData['countries'][0]['name'] in my code just fine, as a string.
Is there a way to convert excelData['countries'][0]['name'] from a string to some code that actually points and get the data I need from the API json?
here's how I want to use it :
self.assertEqual(str(valueExcel), path)
#path is the string from the excel that tells where to fetch the data from the
# JSON api
I thought strings would be interpreted but no :
AssertionError: 'AF' != "excelData['countries'][0]['code']"
- AF
+ excelData['countries'][0]['code']
You are looking for the eval method. Try with this:
self.assertEqual(str(valueExcel), eval(path))
Important: Keep in mind that eval can be dangerous, since malicious code could be executed. More warnings here: What does Python's eval() do?

Call multiple strings inside variable

Python scrub here.
Im still learning python so sorry.
Im trying to create a Dict(i think) that then behaves as a variable called fileshare and then want to call each entry inside the variable called fileshareARN. So basically inside the AWS ARN I want each share to be called. for example I want share-A, share-B, etc to be called each time. Im guessing I need to setup a function or a IF statement but im not sure.
import boto3
client = boto3.client('storagegateway')
fileshare = [share-A, share-B, share-C, share-D]
response = client.refresh_cache(
FileShareARN='arn:aws:storagegateway:us-west-1:AWS-ID:share/{Fileshare-variable, share-ID should go here}.format',
FolderList=['/'],
Recursive=True
)
You're very close! A few notes to preface the answer to assist you on your Python journey:
Python does not allow hyphenated variable names, as hyphens are a reserved operator for subtraction. You only had them listed as placeholders, but figured it would be helpful to know.
lists, arrays, and dictionaries are all different data structures in Python. You can read about them more here: https://docs.python.org/3/tutorial/datastructures.html , but for your particular use case, if you're simply trying to store a collection of variables and iterate over them, a list or array work fine (although a dictionary is usable as well).
In Python, lists and arrays are iterables, which means that they have built-in functions that can naturally be iterated over to sequentially access their constituent values.
Let's go over an example using the following array:
fruits = ['apples','bananas','oranges'],
In other languages, you're probably used to having to define your own loop with the following syntax:
for (int i = 0; i < sizeOf(fruits); i++)
{
print(fruits[i]);
}
Python enables this same functionality much more easily.
for item in fruits:
print(item)
Here, the scope of the term item within the loop is equal to the value that exists at that current index in the array (fruits).
Now, to perform this same functionality for your example, we can use this same technique to loop over your list of ARNs:
import boto3
client = boto3.client('storagegateway')
fileshare = [shareA, shareB, shareC, shareD]
for path in fileshare:
response = client.refresh_cache(
FileShareARN='arn:aws:storagegateway:us-west-1:AWS-ID:share/'+path,
FolderList=['/'],
Recursive=True
)
After changing the placeholder variables you had in fileshare, I wrapped the existing response variable execution with a for loop, and made a slight change to the string appending at the end of your FileShareARN variable.
Hope this helps, and welcome to Python!
Did some more research, found f.string formatting which seems to make python life easy. Also since I am deploying this in AWS Lambda I added a handler.
#!/usr/bin/env python3
import boto3
def default_handler( event, context ):
print(boto3.client('sts').get_caller_identity())
client = boto3.client('storagegateway')
fileshare = ['share-A', 'share-B', 'share-C', 'share-D']
for path in fileshare:
response = client.refresh_cache(
FileShareARN = f"arn:aws:storagegateway:us-west-1:ARN-ID:share/{path}",
FolderList=['/'],
Recursive=True
)
print(response)
default_handler( None, None )

How to save the output of text from selenium chrome (Python)

I'm using Selenium for extracting comments of Youtube.
Everything went well. But when I print comment.text, the output is the last sentence.
I don't know who to save it for further analyze (cleaning and tokenization)
path = "/mnt/c/Users/xxx/chromedriver.exe"
This is the path that I saved and downloaded my chrome
chrome = webdriver.Chrome(path)
url = "https://www.youtube.com/watch?v=WPni755-Krg"
chrome.get(url)
chrome.maximize_window()
scrolldown
sleep = 5
chrome.execute_script('window.scrollTo(0, 500);'
time.sleep(sleep)
chrome.execute_script('window.scrollTo(0, 1080);')
time.sleep(sleep)
text_comment = chrome.find_element_by_xpath('//*[#id="contents"]')
comments = text_comment.find_elements_by_xpath('//*[#id="content-text"]')
comment_ids = []
Try this approach for getting the text of all comments. (the forloop part edited- there was no indention in the previous code.)
for comment in comments:
comment_ids.append(comment.get_attribute('id'))
print(comment.text)
when I print, i can see all the texts here. but how can i open it for further study. Should i always use for loop? I want to tokenize the texts but the output is only last sentence. Is there a way to save this .text file with the whole texts inside it and open it again? I googled it a lot but it wasn't successful.
So it sounds like you're just trying to store these comments to reference later. Your current solution is to append them to a string and use a token to create substrings? I'm not familiar with pythons data structures, but this sounds like a great job for an array or a list depending on how you plan to reference this data.

Resources