Display vega spec in Jupyter Lab

Display vega spec in Jupyter Lab - jupyter-lab

How can I display a vega spec (such as this force directed layout) in Jupyter Lab/JupyterLab?

If you don't mind installing altair, you can do this:
from altair.vega import vega
import json
with open("bar-chart.vg.json") as f:
s = json.load(f)
vega(s)
Alternatively, you can use Vega Embed through the javascript extension:
Add the scripts:
%%javascript
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = '//cdn.jsdelivr.net/npm/vega#5';
document.head.appendChild(script);
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = '//cdn.jsdelivr.net/npm/vega-embed#6';
document.head.appendChild(script);
then
from IPython.display import Javascript
script = '''
var spec = "https://raw.githubusercontent.com/vega/vega/master/docs/examples/bar-chart.vg.json";
vegaEmbed(element, spec).then(function(result) {
}).catch(console.error);
'''
Javascript(script)
Note: The example force directed spec at https://raw.githubusercontent.com/vega/vega/master/docs/examples/force-directed-layout.vg.json doesn't display because it references data at a relative url (data/miserables.json). The bar chart works because the data is encoded directly into the spec.

Related

Web Scraping of Asynchronous Search Page

I am trying to learn web-scraping on asynchronous javascript-heavy sites. I chose a real estate website to do that. So, I have done the search by hand and came up with the URL as the first step. Here is the url:
CW_url = https://www.cushmanwakefield.com/en/united-states/properties/invest/invest-property-search#q=Los%20angeles&sort=%40propertylastupdateddate%20descending&f:PropertyType=[Office,Warehouse%2FDistribution]&f:Country=[United%20States]&f:StateProvince=[CA]
I then tried to write code to read the page using beautiful soup:
while iterations < 10:
time.sleep(5)
html = driver.execute_script("return document.documentElement.outerHTML")
sel_soup = bs(html, 'html.parser')
forsales = sel_soup.findAll("for sale")
iterations += 1
print (f'iteration {iterations} - forsales: {forsales}')
I also tried using requests-html:
from requests_html import HTMLSession, HTML
from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()
r = await asession.get(CW_url)
r.html.arender(wait = 5, sleep = 5)
r.text.find('for sale')
But, this gives me -1, which means the text could not be found! The r.text does give me a wall of HTML text, and inside that there seems to be some javascript not run yet!
<script type="text/javascript">
var endpointConfiguration = {
itemUri: "sitecore://web/{34F7EE0A-4405-44D6-BF43-13BC99AE8AEE}?lang=en&ver=4",
siteName: "CushmanWakefield",
restEndpointUri: "/coveo/rest"
};
if (typeof (CoveoForSitecore) !== "undefined") {
CoveoForSitecore.SearchEndpoint.configureSitecoreEndpoint(endpointConfiguration);
CoveoForSitecore.version = "5.0.788.5";
var context = document.getElementById("coveo3a949f41");
if (!!context) {
CoveoForSitecore.Context.configureContext(context);
}
}
</script>
I thought the fact that the url contains all the search criteria means that the site makes the fetch request, returns the data, and generate the HTML. Apparently not! So, what am I doing wrong and how to deal with this or similar sites? Ideally, one would replace the search criteria in the CW_url and let the code retrieve and store the data

Using Python to find the bounds of a Google Map

I am using this code in a Jupyter notebook to open a Google Map.
import gmaps
with open('apikey.txt') as f:
apikey = f.readline()
f.close
gmaps.configure(api_key = apikey)
coordinates = (35.5, 140)
map = gmaps.figure(center=coordinates, zoom_level=10, layout={'width': '800px', 'height': '600px'} )
map
I want to find the limits of the map with Python 3.6.
It seems this can be done in JavaScript with the getBounds method which give latitude and longitude for the SW and NE corners of the displayed map.
Also, JavaScript seems to allow changes to be tracked with the bounds_changed event.
This is exactly what I want to do but I can't see how in Python.
I have looked through both the gmaps 0.9.0 and googlemaps 4.4.0 plugins with no success.
Anyone done this?

You must use Flask to this solution to work.
pip install flask
Using python-flask create a folder templates in your root project folder. This is a specific behavior from flask, it always lookup html files from templates folder.
And create a app.py to start our flask application.
Your project configuration must contain at least that configuration.
.
├── app.py
├── _data
└── poi.csv
├── _templates
└── index.html
Just get this lat lon from this question and stuffed with some data to be more clear about how to fill the data.
data/poi.csv
dataa,SclId,Latitude,Longitude,datab
dataa1,HAT-0,44.968046,-94.420307,datab1
dataa2,HAT-1,44.33328,-89.132008,datab2
dataa3,HAT-2,33.755787,-116.359998,datab3
app.py
# python version 3.8.2
import os
from flask import Flask, render_template
from dotenv import load_dotenv
load_dotenv()
class Location:
def __init__(self,latitude,longitude,produto):
self.lat = latitude
self.lon = longitude
self.nome = produto
def read_data_from_file(caminho):
lista = list()
with open(caminho) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
next(csv_reader)#skip headers
for row in csv_reader:
lista.append(Location(row[2], row[3], row[1]))
return lista
app = Flask(__name__)
app.config['API_KEY'] = os.getenv("APIKEY")#load google maps api key
def read_dataset(dataset):
items = list()
for i in dataset:
items.append([i.lat, i.lon, i.nome])
return items
poidata = read_dataset('data/poi.csv')
#app.route('/')
def index():
context = {
"key": app.config['API_KEY'],
"poidata": poidata
}
return render_template('./index.html', poidata=poidata, context=context)
if __name__ == '__main__':
app.run(debug=False)
templates/index.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Flask with gmaps</title>
<style>
#map-canvas {
height: 500px;
width: 100%;
}
</style>
</head>
<body>
<div id="map-canvas">
</div>
<script>
const poiarray = JSON.parse('{{ poidata | tojson | safe}}');
fillMapItems = (elements, map) => {
const bounds = new google.maps.LatLngBounds();
elements.forEach((item, index) => {
const marker = new google.maps.Marker({ position: new google.maps.LatLng(item[0], item[1]), title: item[2], map: map });
bounds.extend(marker.position);
})
map.fitBounds(bounds);
}
showMap = () => {
const map = new google.maps.Map(document.getElementById('map-canvas'));
fillMapItems(poiarray, map)
}
</script>
<script async defer src="https://maps.googleapis.com/maps/api/js?key={{ context.key }}&callback=showMap">
</script>
</body>
</html>
In this code I use python-dotenv, but its use is completely optional, you can use your method to load the env variable with api key loading from file system.
In this line call the LatLngBounds type from googlemaps, we are putting each item from iteration in this "list", after finish iteration just set the map to a distance to fit all points.
const bounds = new google.maps.LatLngBounds();
If you need some clarifications, please let me know.

Downloading *.mp4 files form url with Python

Im triying to generate an array of [url_audioenci,url_caratula,titulo_cancion,nombre_artista] to download a list of music from http://los40.com.ar/lista40/. I know how to download media with Requests library, but i cant extract and the links from the page
from bs4 import BeautifulSoup
import requests
# import re
url = 'http://los40.com.ar/m/lista40/'
videos = []
response = requests.get(url)
bs = BeautifulSoup(response.text)
for i in range (1,41):
videos[i]= bs.find_all('datos_camcion_'+i))
# responses= bs.find_all('script', language="javascript", type="text/javascript")
print(videos)
<h3>LISTA DEL 08/06/2019</h3>
<script language="javascript" type="text/javascript">
var datos_cancion_1 = Array();
datos_cancion_1['url_audioenci'] = 'https://recursosweb.prisaradio.com/audios/dest/570005645440.mp4';
datos_cancion_1['url_muzu'] = '';
datos_cancion_1['url_youtube'] = 'https://www.youtube.com/watch?v=XsX3ATc3FbA';
datos_cancion_1['url_itunes'] = '';
datos_cancion_1['posicion'] = '1';
datos_cancion_1['url_caratula'] = 'https://recursosweb.prisaradio.com/fotos/dest/570005645461.jpg';
datos_cancion_1['titulo_cancion'] = 'Boy with luv';
datos_cancion_1['nombre_artista'] = 'BTS;Halsey';
datos_cancion_1['idYes'] = 'BTS';
datos_cancion_1['VidAu'] = 0;
</script>
I expect
videos=[['https://recursosweb.prisaradio.com/audios/dest/570005645440.mp4','https://recursosweb.prisaradio.com/fotos/dest/570005645461.jpg','Boy with luv','BTS;Halsey'].....]

My attempt at filtering the data:
from bs4 import BeautifulSoup
import requests
url = 'http://los40.com.ar/m/lista40/'
videos = []
response = requests.get(url)
bs = BeautifulSoup(response.text, features="html5lib")
scripts = bs.find_all('script', language='javascript', type='text/javascript')
end = len( bs.find_all('script', language='javascript', type='text/javascript') )
start = end - 40
data = []
for i in range( start, end ):
data.append( str(scripts[ i ]) )
print( data[0] )
Output:
<script language="javascript" type="text/javascript">
var datos_cancion_1 = Array();
datos_cancion_1['url_audioenci'] = 'https://recursosweb.prisaradio.com/audios/dest/570005645440.mp4';
datos_cancion_1['url_muzu'] = '';
datos_cancion_1['url_youtube'] = 'https://www.youtube.com/watch?v=XsX3ATc3FbA';
datos_cancion_1['url_itunes'] = '';
datos_cancion_1['posicion'] = '1';
datos_cancion_1['url_caratula'] = 'https://recursosweb.prisaradio.com/fotos/dest/570005645461.jpg';
datos_cancion_1['titulo_cancion'] = 'Boy with luv';
datos_cancion_1['nombre_artista'] = 'BTS;Halsey';
datos_cancion_1['idYes'] = 'BTS';
datos_cancion_1['VidAu'] = 0;
</script>
Data[0:39] contains the top 40 and all the relevant data as strings, but I'm not sure how to extract the information from the strings.
There are some suggestions in this thread via import json or import re that I tried fiddling with, but I couldn't get them to work.

How to download the PDF by using Selenium Module (FireFox) in Python 3

I want to download a PDF which is from an Online-Magazin. In order to open it, must log in first. Then open the PDF and download it.
The following is my code. It can login to the page and the PDF can also be open. But the PDF can not be downloaded since I am not sure how to simulate the click on Save. I use FireFox.
import os, time
from selenium import webdriver
from bs4 import BeautifulSoup
# Use firefox dowmloader to get file
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", 'D:/eBooks/Stocks_andCommodities/2008/Jul/')
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
fp.set_preference("pdfjs.disabled", "true")
# disable Adobe Acrobat PDF preview plugin
fp.set_preference("plugin.scan.plid.all", "false")
fp.set_preference("plugin.scan.Acrobat", "99.0")
browser = webdriver.Firefox(firefox_profile=fp)
# Get the login web page
web_url = 'http://technical.traders.com/sub/sublogin2.asp'
browser.get(web_url)
# SImulate the authentication
user_name = browser.find_element_by_css_selector('#SubID > input[type="text"]')
user_name.send_keys("thomas2003#test.net")
password = browser.find_element_by_css_selector('#SubName > input[type="text"]')
password.send_keys("LastName")
time.sleep(2)
submit = browser.find_element_by_css_selector('#SubButton > input[type="submit"]')
submit.click()
time.sleep(2)
# Open the PDF for downloading
url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf'
browser.get(url)
time.sleep(10)
# How to simulate the Clicking to Save/Download the PDF here?

You should not open the file in browser. Once you have the file url. Get a request session with all the cookies
def get_request_session(driver):
import requests
session = requests.Session()
for cookie in driver.get_cookies():
session.cookies.set(cookie['name'], cookie['value'])
return session
Once you have the session you can download the file using the same
url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf'
session = get_request_session(driver)
r = session.get(url, stream=True)
chunk_size = 2000
with open('/tmp/mypdf.pdf', 'wb') as file:
for chunk in r.iter_content(chunk_size):
file.write(chunk)

Apart from Tarun's solution, you can also download the file through js and store it as a blob. Then you can extract the data into python via selinium's execute script as shown in this answer.
In you case,
url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf'
browser.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(url),
})
Now your data exists as a blob on the window object, so you can easily extract into python:
time.sleep(3)
downloaded_file = driver.execute_script("return (window.file_contents !== null ? window.file_contents.split(',')[1] : null);")
with open('/Users/Chetan/Desktop/dummy.pdf', 'wb') as f:
f.write(base64.b64decode(downloaded_file))

Try
import urllib
file_path = "<FILE PATH TO SAVE>"
urllib.urlretrieve(<pdf_link>,file_path)

Not able to take full page screenhot in selinium web driver- firefox

Hey what is the command to take full screen screenshot in selinium webdriver nodejs.
here is my code:
var webdriver = require('selenium-webdriver');
By = require('selenium-webdriver').By;
until = require('selenium-webdriver').untill;
fs = require('fs');
var chromedriver = require('chromedriver');
firefox = require('selenium-webdriver/firefox');
var Capabilities = require('selenium-webdriver/lib/capabilities').Capabilities;
var capabilities = Capabilities.firefox();
capabilities.set('marionette', true);
//driver = new FirefoxDriver();
var driver = new webdriver.Builder().withCapabilities(capabilities).build();
driver.manage().window().maximize();
driver.manage().deleteAllCookies();
driver.get('http://iolearn.com');
driver.takeScreenshot().then(function(data){
fs.writeFileSync('img.png',data,'base64');
});
driver.quit();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Display vega spec in Jupyter Lab - jupyter-lab

How can I display a vega spec (such as this force directed layout) in Jupyter Lab/JupyterLab?

Related

Web Scraping of Asynchronous Search Page

Using Python to find the bounds of a Google Map

Downloading *.mp4 files form url with Python

How to download the PDF by using Selenium Module (FireFox) in Python 3

Not able to take full page screenhot in selinium web driver- firefox

Categories

Resources