how to control formatting for google doc html upload - google-docs

I am uploading an html doc which has embedded <style> for p element. This seems to control p elements at the top level, but not if they're in a list.
Here's an excerpt of the html being used to upload the file, with deleted content shown with ":"
<!DOCTYPE html>
<head>
<style>
p, li {
margin-bottom: 0.75em;
}
</style>
</head>
<body>
<h2>Status Report Summary: Board Meeting Demo - 2020-10-12</h2>
:
<h3>Market Street Mile - Lou King</h3>
<p>Here's my status for Market Street Mile</p>
<p><b>For discussion:</b></p>
<ul class="statusreport">
<li><p>Market Street Mile registrations are low</p><p>the market street mile has been suffering</p><ul><li>this is one reason</li><li>this is another reason</li></ul></li>
<li><p>sponsorships are up</p><p>sponsorship details</p></li>
</ul>
:
</body>
This is formatted in the file as follows
As you can see the p element formatting works for "Here's my status for Market Street Mile", but not after "Market Street Mile registrations are low. (Also note li element formatting not working).
Is there some documentation somewhere on what can and cannot be formatted with css for google doc html upload, or description of how to get the desired formatting?
I'm using google doc api, but I expect the same would be true on native upload of html with conversion.
UPDATE 10/5/20:
For reference, here is the code which creates the file. Not sure this is relevant, though.
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload
class GoogleAuthService():
"""
methods to interact with google api using service based credentials (e.g., for server authentication)
* see https://developers.google.com/identity/protocols/oauth2/service-account
:param service_account_file: service_account_file.json path
:param scopes: list of google scopes. see https://developers.google.com/identity/protocols/googlescopes
:param logdebug: (optional) debug logger function
:param logerror: (optional) debug logger function
"""
def __init__(self, service_account_file, scopes, loginfo=None, logdebug=None, logerror=None):
self.service_account_file = service_account_file
self.scopes = scopes
self.loginfo = loginfo
self.logdebug = logdebug
self.logerror = logerror
self.credentials = service_account.Credentials.from_service_account_file(self.service_account_file, scopes=self.scopes)
self.drive = build('drive', 'v3', credentials=self.credentials)
def create_file(self, folderid, filename, contents, doctype='html'):
"""
create file in drive folder
..note::
folderid must be shared read/write with services account email address
:param folderid: drive id for folder file needs to reside in
:param filename: name for file on drive
:param contents: path for file contents (docx formatted)
:param doctype: 'html' or 'docx', default 'html'
:return: google drive id for created file
"""
## upload (adapted from https://developers.google.com/drive/api/v3/manage-uploads)
file_metadata = {
'name': filename,
# see https://developers.google.com/drive/api/v3/mime-types
'mimeType': 'application/vnd.google-apps.document',
# see https://developers.google.com/drive/api/v3/folder
'parents': [folderid],
}
# mimetype depends on doctype
mimetype = _getmimetype(doctype)
# create file
media = MediaFileUpload(
contents,
mimetype=mimetype,
resumable=True
)
file = self.drive.files().create(
body=file_metadata,
media_body=media,
fields='id'
).execute()
fileid = file.get('id')
return fileid
def _getmimetype(doctype):
"""
determine mimetype for specified doctype
:param doctype: 'html' or 'docx'
:return: mimetype
"""
if doctype not in ['docx', 'html']:
raise parameterError('_getmimetype(): doctype must be "docx" or "html", found {}'.format(doctype))
if doctype == 'docx':
mimetype = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
else: # html
mimetype = 'text/html'
return mimetype

Related

Can i render in realtime an variable in Flask using python? [duplicate]

I have a view that generates data and streams it in real time. I can't figure out how to send this data to a variable that I can use in my HTML template. My current solution just outputs the data to a blank page as it arrives, which works, but I want to include it in a larger page with formatting. How do I update, format, and display the data as it is streamed to the page?
import flask
import time, math
app = flask.Flask(__name__)
#app.route('/')
def index():
def inner():
# simulate a long process to watch
for i in range(500):
j = math.sqrt(i)
time.sleep(1)
# this value should be inserted into an HTML template
yield str(i) + '<br/>\n'
return flask.Response(inner(), mimetype='text/html')
app.run(debug=True)
You can stream data in a response, but you can't dynamically update a template the way you describe. The template is rendered once on the server side, then sent to the client.
One solution is to use JavaScript to read the streamed response and output the data on the client side. Use XMLHttpRequest to make a request to the endpoint that will stream the data. Then periodically read from the stream until it's done.
This introduces complexity, but allows updating the page directly and gives complete control over what the output looks like. The following example demonstrates that by displaying both the current value and the log of all values.
This example assumes a very simple message format: a single line of data, followed by a newline. This can be as complex as needed, as long as there's a way to identify each message. For example, each loop could return a JSON object which the client decodes.
from math import sqrt
from time import sleep
from flask import Flask, render_template
app = Flask(__name__)
#app.route("/")
def index():
return render_template("index.html")
#app.route("/stream")
def stream():
def generate():
for i in range(500):
yield "{}\n".format(sqrt(i))
sleep(1)
return app.response_class(generate(), mimetype="text/plain")
<p>This is the latest output: <span id="latest"></span></p>
<p>This is all the output:</p>
<ul id="output"></ul>
<script>
var latest = document.getElementById('latest');
var output = document.getElementById('output');
var xhr = new XMLHttpRequest();
xhr.open('GET', '{{ url_for('stream') }}');
xhr.send();
var position = 0;
function handleNewData() {
// the response text include the entire response so far
// split the messages, then take the messages that haven't been handled yet
// position tracks how many messages have been handled
// messages end with a newline, so split will always show one extra empty message at the end
var messages = xhr.responseText.split('\n');
messages.slice(position, -1).forEach(function(value) {
latest.textContent = value; // update the latest value in place
// build and append a new item to a list to log all output
var item = document.createElement('li');
item.textContent = value;
output.appendChild(item);
});
position = messages.length - 1;
}
var timer;
timer = setInterval(function() {
// check the response for new data
handleNewData();
// stop checking once the response has ended
if (xhr.readyState == XMLHttpRequest.DONE) {
clearInterval(timer);
latest.textContent = 'Done';
}
}, 1000);
</script>
An <iframe> can be used to display streamed HTML output, but it has some downsides. The frame is a separate document, which increases resource usage. Since it's only displaying the streamed data, it might not be easy to style it like the rest of the page. It can only append data, so long output will render below the visible scroll area. It can't modify other parts of the page in response to each event.
index.html renders the page with a frame pointed at the stream endpoint. The frame has fairly small default dimensions, so you may want to to style it further. Use render_template_string, which knows to escape variables, to render the HTML for each item (or use render_template with a more complex template file). An initial line can be yielded to load CSS in the frame first.
from flask import render_template_string, stream_with_context
#app.route("/stream")
def stream():
#stream_with_context
def generate():
yield render_template_string('<link rel=stylesheet href="{{ url_for("static", filename="stream.css") }}">')
for i in range(500):
yield render_template_string("<p>{{ i }}: {{ s }}</p>\n", i=i, s=sqrt(i))
sleep(1)
return app.response_class(generate())
<p>This is all the output:</p>
<iframe src="{{ url_for("stream") }}"></iframe>
5 years late, but this actually can be done the way you were initially trying to do it, javascript is totally unnecessary (Edit: the author of the accepted answer added the iframe section after I wrote this). You just have to include embed the output as an <iframe>:
from flask import Flask, render_template, Response
import time, math
app = Flask(__name__)
#app.route('/content')
def content():
"""
Render the content a url different from index
"""
def inner():
# simulate a long process to watch
for i in range(500):
j = math.sqrt(i)
time.sleep(1)
# this value should be inserted into an HTML template
yield str(i) + '<br/>\n'
return Response(inner(), mimetype='text/html')
#app.route('/')
def index():
"""
Render a template at the index. The content will be embedded in this template
"""
return render_template('index.html.jinja')
app.run(debug=True)
Then the 'index.html.jinja' file will include an <iframe> with the content url as the src, which would something like:
<!doctype html>
<head>
<title>Title</title>
</head>
<body>
<div>
<iframe frameborder="0"
onresize="noresize"
style='background: transparent; width: 100%; height:100%;'
src="{{ url_for('content')}}">
</iframe>
</div>
</body>
When rendering user-provided data render_template_string() should be used to render the content to avoid injection attacks. However, I left this out of the example because it adds additional complexity, is outside the scope of the question, isn't relevant to the OP since he isn't streaming user-provided data, and won't be relevant for the vast majority of people seeing this post since streaming user-provided data is a far edge case that few if any people will ever have to do.
Originally I had a similar problem to the one posted here where a model is being trained and the update should be stationary and formatted in Html. The following answer is for future reference or people trying to solve the same problem and need inspiration.
A good solution to achieve this is to use an EventSource in Javascript, as described here. This listener can be started using a context variable, such as from a form or other source. The listener is stopped by sending a stop command. A sleep command is used for visualization without doing any real work in this example. Lastly, Html formatting can be achieved using Javascript DOM-Manipulation.
Flask Application
import flask
import time
app = flask.Flask(__name__)
#app.route('/learn')
def learn():
def update():
yield 'data: Prepare for learning\n\n'
# Preapre model
time.sleep(1.0)
for i in range(1, 101):
# Perform update
time.sleep(0.1)
yield f'data: {i}%\n\n'
yield 'data: close\n\n'
return flask.Response(update(), mimetype='text/event-stream')
#app.route('/', methods=['GET', 'POST'])
def index():
train_model = False
if flask.request.method == 'POST':
if 'train_model' in list(flask.request.form):
train_model = True
return flask.render_template('index.html', train_model=train_model)
app.run(threaded=True)
HTML Template
<form action="/" method="post">
<input name="train_model" type="submit" value="Train Model" />
</form>
<p id="learn_output"></p>
{% if train_model %}
<script>
var target_output = document.getElementById("learn_output");
var learn_update = new EventSource("/learn");
learn_update.onmessage = function (e) {
if (e.data == "close") {
learn_update.close();
} else {
target_output.innerHTML = "Status: " + e.data;
}
};
</script>
{% endif %}

How do I add an attachment to a daily scheduled email in AWS, SES, Lambda, Cloudwatch

1). I have found a piece of code (below) in Node.js which allows me to send an email alert at 9am which works fine. I need to further develop this so that it will attach the latest file from an S3 bucket.
2). I have found other code in Python which will send an email with the file attached as soon as it is uploaded to the S3 bucket.
I can't get the extra functionality from 2 to work in 1.
Below is the code for the function which successfully schedules an email but I need to be able to attach the latest file in an S3 bucket. I also don't know the name of the file as it will be uploaded by someone else.
All I need to do is to send the latest file in the S3 bucket as an attachment at 9am daily.
I have the Cloudwatch schedule feature already setup.
const AWS = require('aws-sdk');
AWS.config.update({
region: 'eu-west-2'
})
const ses = new AWS.SES();
const s3 = new AWS.S3();
exports.handler = async (event) => {
const params = {
Destination: {
ToAddresses: ['abc#abc.com']
},
Message: {
Subject: {Data: 'Daily Email'},
Body: {
Text: {Data: 'Hello: \n\n Good Morning: Here is your 9am Alert!!! \n\n'}
}
},
Source: 'xyz#xyz.com'
};
await ses.sendEmail(params).promise().then(response => {
console.log('Successfully sent email!!!');
}, error => {
console.error('An error occured while attempting to send email: ', error);
})
};
I managed to get a Python version working, which sends the attachment, but for some reason it doesn't include the file name to the attachment? It gives the attachment name as a string of random characters.
Does anyone know how to include the file attachment name to the attached file & which part of my code would be causing the issue? I know the attachment name as "TEST.pdf" can I just add the name to the successfully attached file?
import boto3
import os.path
import email
from botocore.exceptions import ClientError
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.application import MIMEApplication
def lambda_handler(event, context):
ses = boto3.client("ses")
s3 = boto3.client("s3")
AWS_REGION = "aws_region"
bucket_name = 'testing'
object_name = 'TEST.pdf'
fileObj = s3.get_object( Key=object_name, Bucket='testing')
key = str(fileObj)
sender = "abc#abc.com"
to = "xyz#xyz.com"
subject = 'Email from abc'
body = """Hello, xyz Please see the attached file related to recent submission.
<br>
This email is from blahblahblah:
"""
file_name = os.path.basename(key)
tmp_file_name = '/tmp/' +file_name
s3.download_file(bucket_name, object_name, tmp_file_name)
ATTACHMENT= tmp_file_name
# att= tmp_file_name
# The email body for recipients with non-HTML email clients.
BODY_TEXT = "Hello, xyz Please see the attached file related to recent submission."
# The HTML body of the email.
BODY_HTML = """\
<html>
<head></head>
<body>
<h1>Hello!!!</h1>
<p>Please see the attached file related to TEST.pdf latest upload to S3 Bucket.</p>
</body>
</html>
"""
# The character encoding for the email.
CHARSET = "utf-8"
# Create a new SES resource and specify a region.
client = boto3.client('ses',region_name=AWS_REGION)
# Create a multipart/mixed parent container.
msg = MIMEMultipart('mixed')
# Add subject, from and to lines.
msg['Subject'] = subject
msg['From'] = sender
msg['To'] = to
# Create a multipart/alternative child container.
msg_body = MIMEMultipart('alternative')
# Encode the text and HTML content and set the character encoding. This step is
# necessary if you're sending a message with characters outside the ASCII range.
textpart = MIMEText(BODY_TEXT.encode(CHARSET), 'plain', CHARSET)
htmlpart = MIMEText(BODY_HTML.encode(CHARSET), 'html', CHARSET)
# Add the text and HTML parts to the child container.
msg_body.attach(textpart)
msg_body.attach(htmlpart)
# Define the attachment part and encode it using MIMEApplication.
att = MIMEApplication(open(ATTACHMENT, 'rb').read())
# Add a header to tell the email client to treat this part as an attachment,
# and to give the attachment a name.
att.add_header('Content-Disposition','attachment',filename=os.path.basename(ATTACHMENT))
# Attach the multipart/alternative child container to the multipart/mixed
# parent container.
msg.attach(msg_body)
# Add the attachment to the parent container.
msg.attach(att)
print(msg)
try:
#Provide the contents of the email.
response = client.send_raw_email(
Source=sender,
Destinations=[
to
],
RawMessage={
'Data':msg.as_string(),
},
# ConfigurationSetName=CONFIGURATION_SET
)
# Display an error if something goes wrong.
except ClientError as e:
print(e.response['Error']['Message'])
else:
print("Email sent! Message ID:"),
print(response['MessageId'])

How to use Pyramid to host static files at an absolute path on my computer?

I created a simple Pyramid app from the quick tutorial page here that has the following files relevant to the question:
tutorial/__init__.py:
from pyramid.config import Configurator
def main(global_config, **settings):
config = Configurator(settings=settings)
config.include('pyramid_chameleon')
config.add_route('home', '/')
config.add_route('hello', '/howdy')
config.add_static_view(name='static', path='tutorial:static')
config.scan('.views')
return config.make_wsgi_app()
tutorial/views/views.py:
from pyramid.view import (
view_config,
view_defaults
)
#view_defaults(renderer='../templates/home.pt')
class TutorialViews:
def __init__(self, request):
self.request = request
#view_config(route_name='home')
def home(self):
return {'name': 'Home View'}
#view_config(route_name='hello')
def hello(self):
return {'name': 'Hello View'}
tutorial/templates/image.pt:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Quick Tutorial: ${name}</title>
<link rel="stylesheet"
href="${request.static_url('tutorial:static/app.css') }"/>
</head>
<body>
<h1>Hi ${name}</h1>
<img src="${filepath}">
</body>
</html>
I want to add a URL route to my app such that when a user goes to www.foobar.com/{filename} (for example: www.foobar.com/test.jpeg), the app would show the image from the absolute path /Users/s/Downloads/${filename}on the server, as shown in the file tutorial/templates/image.pt above.
This is what I tried:
In the file tutorial/__init__.py, added config.add_route('image', '/foo/{filename}').
In the file development.ini, added the setting image_dir = /Users/s/Downloads.
In the file tutorial/views/views.py, ddded the following view image:
#view_config(route_name='image', renderer='../templates/image.pt')
def image(request):
import os
filename = request.matchdict.get('filename')
filepath = os.path.join(request.registry.settings['image_dir'], filename)
return {'name': 'Hello View', 'filepath': filepath}
However this does not work. How can I get absolute paths for assets working with Pyramid?
filepath needs to be a url if you're expecting to put it as the src attribute in an img tag in your HTML. The browser needs to request the asset. You should be doing something like request.static_url(...) or request.route_url(...) in that image tag. This is the URL generation part of the problem.
The second part is actually serving file contents when the client/browser requests that URL. You can use a static view to support mapping the folder on disk to a URL structure in your application in the same way as add_static_view is doing. There is a chapter on this in the Pyramid docs at https://docs.pylonsproject.org/projects/pyramid/en/latest/narr/assets.html. There is also a section specifically on using absolute paths on disk. You can register multiple static views in your app if you feel it's necessary.

Using Python to find the bounds of a Google Map

I am using this code in a Jupyter notebook to open a Google Map.
import gmaps
with open('apikey.txt') as f:
apikey = f.readline()
f.close
gmaps.configure(api_key = apikey)
coordinates = (35.5, 140)
map = gmaps.figure(center=coordinates, zoom_level=10, layout={'width': '800px', 'height': '600px'} )
map
I want to find the limits of the map with Python 3.6.
It seems this can be done in JavaScript with the getBounds method which give latitude and longitude for the SW and NE corners of the displayed map.
Also, JavaScript seems to allow changes to be tracked with the bounds_changed event.
This is exactly what I want to do but I can't see how in Python.
I have looked through both the gmaps 0.9.0 and googlemaps 4.4.0 plugins with no success.
Anyone done this?
You must use Flask to this solution to work.
pip install flask
Using python-flask create a folder templates in your root project folder. This is a specific behavior from flask, it always lookup html files from templates folder.
And create a app.py to start our flask application.
Your project configuration must contain at least that configuration.
.
├── app.py
├── _data
└── poi.csv
├── _templates
└── index.html
Just get this lat lon from this question and stuffed with some data to be more clear about how to fill the data.
data/poi.csv
dataa,SclId,Latitude,Longitude,datab
dataa1,HAT-0,44.968046,-94.420307,datab1
dataa2,HAT-1,44.33328,-89.132008,datab2
dataa3,HAT-2,33.755787,-116.359998,datab3
app.py
# python version 3.8.2
import os
from flask import Flask, render_template
from dotenv import load_dotenv
load_dotenv()
class Location:
def __init__(self,latitude,longitude,produto):
self.lat = latitude
self.lon = longitude
self.nome = produto
def read_data_from_file(caminho):
lista = list()
with open(caminho) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
next(csv_reader)#skip headers
for row in csv_reader:
lista.append(Location(row[2], row[3], row[1]))
return lista
app = Flask(__name__)
app.config['API_KEY'] = os.getenv("APIKEY")#load google maps api key
def read_dataset(dataset):
items = list()
for i in dataset:
items.append([i.lat, i.lon, i.nome])
return items
poidata = read_dataset('data/poi.csv')
#app.route('/')
def index():
context = {
"key": app.config['API_KEY'],
"poidata": poidata
}
return render_template('./index.html', poidata=poidata, context=context)
if __name__ == '__main__':
app.run(debug=False)
templates/index.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Flask with gmaps</title>
<style>
#map-canvas {
height: 500px;
width: 100%;
}
</style>
</head>
<body>
<div id="map-canvas">
</div>
<script>
const poiarray = JSON.parse('{{ poidata | tojson | safe}}');
fillMapItems = (elements, map) => {
const bounds = new google.maps.LatLngBounds();
elements.forEach((item, index) => {
const marker = new google.maps.Marker({ position: new google.maps.LatLng(item[0], item[1]), title: item[2], map: map });
bounds.extend(marker.position);
})
map.fitBounds(bounds);
}
showMap = () => {
const map = new google.maps.Map(document.getElementById('map-canvas'));
fillMapItems(poiarray, map)
}
</script>
<script async defer src="https://maps.googleapis.com/maps/api/js?key={{ context.key }}&callback=showMap">
</script>
</body>
</html>
In this code I use python-dotenv, but its use is completely optional, you can use your method to load the env variable with api key loading from file system.
In this line call the LatLngBounds type from googlemaps, we are putting each item from iteration in this "list", after finish iteration just set the map to a distance to fit all points.
const bounds = new google.maps.LatLngBounds();
If you need some clarifications, please let me know.

How to crawl dynamically generated data on google's webstore search results

I want to crawl a web page which shows the results of a search in google's webstore and the link is static for that particular keyword.
I want to find the ranking of an extension periodically.
Here is the URL
Problem is that I can't render the dynamic data generated by Javascript code in response from server.
I tried using Scrapy and Scrapy-Splash to render the desired page but I was still getting the same response. I used Docker to run an instance of scrapinghub/splash container on port 8050. I even visited the webpage http://localhost:8050 and entered my URL manually but it couldn't render the data although the message showed success.
Here's the code I wrote for the crawler. It actually does nothing and its only job is to fetch the HTML contents of the desired page.
import scrapy
from scrapy_splash import SplashRequest
class WebstoreSpider(scrapy.Spider):
name = 'webstore'
def start_requests(self):
yield SplashRequest(
url='https://chrome.google.com/webstore/search/netflix%20vpn?utm_source=chrome-ntp-icon&_category=extensions',
callback=self.parse,
args={
"wait": 3,
},
)
def parse(self, response):
print(response.text)
and the contents of the settings.py of my Scrapy project:
BOT_NAME = 'webstore_cralwer'
SPIDER_MODULES = ['webstore_cralwer.spiders']
NEWSPIDER_MODULE = 'webstore_cralwer.spiders'
ROBOTSTXT_OBEY = False
SPLASH_URL = 'http://localhost:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
And for the result I always get nothing.
Any help is appreciated.
Works for me with a small custom lua script:
lua_source = """
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(5.0))
return {
html = splash:html(),
}
end
"""
You can then change your start_requests as follows:
def start_requests(self):
yield SplashRequest(
url='https://chrome.google.com/webstore/search/netflix%20vpn?utm_source=chrome-ntp-icon&_category=extensions',
callback=self.parse,
args={'lua_source': self.lua_source},
)

Resources