AWS Cloudfront + lambda#edge modify html content (making all links absolute -> relative)

AWS Cloudfront + lambda#edge modify html content (making all links absolute -> relative) - node.js

I (maybe falsely) assumed lambda#edge can modify origin.responce content,
so wrote a lambda function like this:
/* this does not work. response.Body is not defined */
'use strict';
exports.handler = (event, context, callback) => {
var response = event.Records[0].cf.response;
var data = response.Body.replace(/OLDTEXT/g, 'NEWTEXT');
response.Body = data;
callback(null, response);
};
Which fails because you can not reference origin responce body with this syntax.
Can I modify this script to make it work as I intended, or maybe should I consider using another service on AWS?
My background :
We are trying to set up an AWS Cloudfront distribution, that consolidates access to several websites, like this:
ttp://foo.com/ -> https:/newsite.com/foo/
ttp://bar.com/ -> https:/newsite.com/bar/
ttp://boo.com/ -> https:/newsite.com/boo/
the sites are currently managed by external parties. We want to disable direct public access to foo/bar/boo, and have just newsite.com as the only site visible on the internet.
Mapping the origins into a single c-f distribution is relatively simple.
however doing so will break html contents that specify files with an absolute url,
if their current domain names are removed from the web.
ttp://foo.com/images/1.jpg
-> (disable foo.com dns)
-> image not found
to benefit from cloudfront caching and other merits,
I want to modify/rewrite all absolute file references in html files to a relative url -
so
<img src="ttp://foo.com/images/1.jpg">
becomes
<img src="/foo/images/1.jpg">
//(accessed as https:/newsite.com/foo/images/1.jpg from a user)
//(maybe I should make it an absolte url for SEO purpose)
(http is changed to ttp, due to restriction of using the banned domain name foo.com)
(edit)
I found this AWS blog, which may be a great hint but feel a little too convoluted to my expectation. (set up a linux container so I can just use sed to process html files, maybe using S3 as a temp storage)
Hope I can find a simpler way:
https://aws.amazon.com/blogs/networking-and-content-delivery/resizing-images-with-amazon-cloudfront-lambdaedge-aws-cdn-blog/

From what I have just learnt myself you unfortunately cannot modify the response body within a Lambda#edge. You can only wipe out or totally replace the body content. I was hoping to be able to clean all responses from a legacy site, but using a Cloudfront Lambda#Edge will not allow this to be done.
As the AWS documentation states here :
When you’re working with the HTTP response, Lambda#Edge does not expose the body that is returned by the origin server to the origin-response trigger. You can generate a static content body by setting it to the desired value, or remove the body inside the function by setting the value to be empty. If you don’t update the body field in your function, the original body returned by the origin server is returned back to viewer.

I ran into the same issue, and have been able to pull some info out of the request headers to piece together a URL from which I can fetch the original body.
Beware: I haven't yet been able to confirm that this is a "safe" method, like maybe it's relying on undocumented behaviour etc, but for now it DOES fetch the original body properly, for me. Of course it also takes another request / round trip, possibly inferring some extra transfer costs, execution time, etc.
const fetchOriginalBody = (request) => {
const host = request['headers']['host'][0]['value']; // xxxx.yyy.com
const uri = request['uri'];
const fetchOriginalBodyUrl = 'https://' + host + uri;
return httpsRequest(fetchOriginalBodyUrl);
}
// Helper that turns https.request into a promise
function httpsRequest(options) {
return new Promise((resolve, reject) => {
const req = https.request(options, (res) => {
if (res.statusCode < 200 || res.statusCode >= 300) {
return reject(new Error('statusCode=' + res.statusCode));
}
var body = [];
res.on('data', function(chunk) {
body.push(chunk);
});
res.on('end', function() {
try {
body = Buffer.concat(body).toString();
// body = JSON.parse(Buffer.concat(body).toString());
} catch(e) {
reject(e);
}
resolve(body);
});
});
req.on('error', (e) => {
reject(e.message);
});
req.end();
});
}
exports.handler = async (event, context, callback) => {
const records = event.Records;
if (records && records.length > 0) {
const request = records[0].cf.request;
const body = await fetchOriginalBody(request);
}
...

Related

How to use the full request URL in AWS Lambda to execute logic only on certain pages

I have a website running on www.mywebsite.com. The files are hosted in an S3 bucket in combination with cloudFront. Recently, I have added a new part to the site, which is supposed to be only for private access, so I wanted to put some form of protection on there. The rest of the site, however, should remain public. My goal is for the site to be accessible for everyone, but as soon as someone gets to the new part, they should not see any source files, and be prompted for a username/password combination.
The URL of the new part would be for example www.mywebsite.com/private/index.html ,...
I found that an AWS Lambda function (with node.js) is good for this, and it kind of works. I have managed to authenticate everything in the entire website, but I can't figure out how to get it to work on only the pages that contain for example '/private/*' in the full URL name. The lambda function I wrote looks like this:
'use strict';
exports.handler = (event, context, callback) => {
// Get request and request headers
const request = event.Records[0].cf.request;
const headers = request.headers;
if (!request.uri.toLowerCase().indexOf("/private/") > -1) {
// Continue request processing if authentication passed
callback(null, request);
return;
}
// Configure authentication
const authUser = 'USER';
const authPass = 'PASS';
// Construct the Basic Auth string
const authString = 'Basic ' + new Buffer(authUser + ':' + authPass).toString('base64');
// Require Basic authentication
if (typeof headers.authorization == 'undefined' || headers.authorization[0].value != authString) {
const body = 'Unauthorized';
const response = {
status: '401',
statusDescription: 'Unauthorized',
body: body,
headers: {
'www-authenticate': [{key: 'WWW-Authenticate', value:'Basic'}]
},
};
callback(null, response);
}
// Continue request processing if authentication passed
callback(null, request);
};
The part that doesn't work is the following part:
if (!request.uri.toLowerCase().indexOf("/private/") > -1) {
// Continue request processing if authentication passed
callback(null, request);
return;
}
My guess is that the request.uri does not contain what I expected it to contain, but I can't seem to figure out what does contain what I need.

My guess is that the request.uri does not contain what I expected it to contain, but I can't seem to figure out what does contain what I need.
If you're using a Lambda#Edge function (appears you are). Then you can view the Request Event structure here: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-event-structure.html#lambda-event-structure-request
You can see the actual value of the request URI field by using console.log and checking the respective logs in Cloudwatch.
The problem might be this line:
if (!request.uri.toLowerCase().indexOf("/private/") > -1) {
If you're strictly looking to check if a JavaScript string contains another string in it, you probably want to do this instead:
if (!request.uri.toLowerCase().indexOf("/private/") !== -1) {
Or better yet, using more modern JS:
if (!request.uri.toLowerCase().includes("/private/")) {

Making remote request to Google Places API via Express fetches duplicate results everytime

I have been trying to fetch search results via text query using Google places API.
My URL string is
https://maps.googleapis.com/maps/api/place/textsearch/json?query=${textQuery}&&location=${lat},${lng}&radius=10000&key=${key}
GET request from browser works perfectly.
https://maps.googleapis.com/maps/api/place/textsearch/json?query=saravana stores&&location=13.038063,80.159607&radius=10000&key=${key}
The above search fetches results relevant to query.
https://maps.googleapis.com/maps/api/place/textsearch/json?query=dlf&&location=13.038063,80.159607&radius=10000&key=${key}
This search also fetches results related to dlf.
But, when I tried to do the same via express server, it gives me the same search results for different queries.
app.get('/findPlaces', (req, res) => {
SEARCH_PLACES = SEARCH_PLACES.replace("lat", req.query.lat);
SEARCH_PLACES = SEARCH_PLACES.replace("lng", req.query.lng);
SEARCH_PLACES = SEARCH_PLACES.replace("searchQuery", req.query.search);
https.get(SEARCH_PLACES, (response) => {
let body = '';
response.on('data', (chunk) => {
body += chunk;
});
response.on('end', () => {
let places = JSON.parse(body);
const locations = places.results;
console.log(locations);
res.json(locations);
});
}).on('error', () => {
console.log('error occured');
})
});
From client side, if I make my first request to /findPlaces?lat=13.038063&lng=80.159607&search=saravana stores, I get proper results. When I tried for a different search like [search=dlf], it gives me the same results that I got back from [search=saravana stores]. I have even tried to search for different lat, lng with different query search.
However proper results are fetched if I restart my node server. Practically, I cannot restart the server for every new request.
Am I missing something? Please help.
Thanks.

The problem is that you are replacing the global variable SEARCH_PLACES with the first query. After that, you cannot replace the placeholders again, since they have already been replaced in that string.
Example, when the app starts SEARCH_PLACES has this value:
https://maps.googleapis.com/maps/api/place/textsearch/json?query=searchQuery&location=lat,lng&radius=10000
After the first request, the global variable will have changed to:
https://maps.googleapis.com/maps/api/place/textsearch/json?query=foo&location=13,37&radius=10000
When the second request comes in, there is no longer any placeholder to replace in the string, and thus the last request gets returned again.
You want to construct the URL without modifying the global one for every request:
const SEARCH_PLACES = 'https://maps.googleapis.com/maps/api/place/textsearch/json'
app.get('/findPlaces', (req, res) => {
const { lat, lng, search } = req.query
let url = `${SEARCH_PLACES}?query=${search}&location=${lat},${lng}`
https.get(url, (res) => {
// ...
})
})

Node.js - Why does my HTTP GET Request return a 404 when I know the data is there # the URL I am using

I'm still new enough with Node that HTTP requests trip me up. I have checked all the answers to similar questions but none seem to address my issue.
I have been dealt a hand in the Wild of having to go after JSON files in an API. I then parse those JSON files to separate them out into rows that populate a SQL database. The API has one JSON file with an ID of 'keys.json' that looks like this:
{
"keys":["5sM5YLnnNMN_1540338527220.json","5sM5YLnnNMN_1540389571029.json","6tN6ZMooONO_1540389269289.json"]
}
Each array element in the keys property holds the value of one of the JSON data files in the API.
I am having problems getting either type of file returned to me, but I figure if I can learn what is wrong with the way I am trying to get 'keys.json', I can leverage that knowledge to get the individual JSON data files represented in the keys array.
I am using the npm modules 'request' and 'request-promise-native' as follows:
const request = require('request');
const rp = require('request-promise-native');
My URL is constructed with the following elements, as follows (I have used the ... to keep my client anonymous, but other than that it is a direct copy:
let baseURL = 'http://localhost:3000/Users/doug5solas/sandbox/.../server/.quizzes/'; // this is the development value only
let keysID = 'keys.json';
Clearly the localhost aspect will have to go away when we deploy but I am just testing now.
Here is my HTTP call:
let options = {
method: 'GET',
uri: baseURL + keysID,
headers: {
'User-Agent': 'Request-Promise'
},
json: true // Automatically parses the JSON string in the response
};
rp(options)
.then(function (res) {
jsonKeysList = res.keys;
console.log('Fetched', jsonKeysList);
})
.catch(function (err) {
// API call failed
let errMessage = err.options.uri + ' ' + err.statusCode + ' Not Found';
console.log(errMessage);
return errMessage;
});
Here is my console output:
http://localhost:3000/Users/doug5solas/sandbox/.../server/.quizzes/keys.json 404 Not Found
It is clear to me that the .catch() clause is being taken and not the .then() clause. But I do not know why that is because the data is there at that spot. I know it is because I placed it there manually.

Thanks to #Kevin B for the tip regarding serving of static files. I revamped the logic using express.static and served the file using that capability and everything worked as expected.

how to restrict making http calls from aws lambda

I am creating application which takes nodejs code from the user, and I am creating lambda function on the fly using that code.
eg: The code can be
var http = require('http');
exports.handler = function(event, context) {
console.log('start request to ' + event.url)
http.get('http://##someapi', function(res) {
console.log("Any Response : " + res.statusCode);
}).on('error', function(e) {
console.log("Error from API : " + e.message);
});
console.log('end request to ' + event.url)
context.done(null);
}
But some how I want to restrict http/https calls to be made from that code , as I don't have control on what code will passed by the user.
So is there any way to restrict that, like some sort of ROLE or POLICY or any configuration to achieve that?
I am able to restrict DynamoDB access by specifying Policy in Role. So I have control over db access but not http calls.

Simply prepend the user's code with the following:
(function(){
function onlyAWS (module) {
var isAWS = /amazonaws.com$/i
var orig = module.request
module.request = function restrictedRequest (opts, done) {
if (typeof opts === 'string') opts = require('url').parse(opts)
if (isAWS.test(opts.host || opts.hostname)) {
return orig.call(module, opts, done)
} else {
throw new Error('No HTTP requests allowed')
}
}
}
onlyAWS(require('http'))
onlyAWS(require('https'))
})()

One alternative would be, putting these lambdas in a VPC with restricted Outbound access.

It sounds like funny solution but I found simple solution to my problem. I am adding below code along with code entered by User.
var require = function(){
return "You are not allowed to do this operation";
}
Now if used user tries to include any 3rd party library like required('http') , then it will not allow to instantiate http lib in the node code.
using this solution I am able to block loading all 3rd party library which i don't want User to use in AWS lambda function.
I am still searching for proper solution instead of using that hack in code.

check on server side if youtube video exist

How to check if youtube video exists on node.js app server side:
var youtubeId = "adase268_";
// pseudo code
youtubeVideoExist = function (youtubeId){
return true; // if youtube video exists
}

You don't need to use the youtube API per-se, you can look for the thumbnail image:
Valid video = 200 - OK:
http://img.youtube.com/vi/gC4j-V585Ug/0.jpg
Invalid video = 404 - Not found:
http://img.youtube.com/vi/gC4j-V58xxx/0.jpg
I thought I could make this work from the browser since you can load images from a third-party site without security problems. But testing it, it's failing to report the 404 as an error, probably because the content body is still a valid image. Since you're using node, you should be able to look at the HTTP response code directly.

I can't think of an approach that doesn't involve making a separate HTTP request to the video link to see if it exists or not unless you know beforehand of a set of video IDs that are inactive,dead, or wrong.
Here's an example of something that might work for you. I can't readily tell if you're using this as a standalone script or as part of a web server. The example below assumes the latter, assuming you call a web server on /video?123videoId and have it respond or do something depending on whether or not the video with that ID exists. It uses Node's request library, which you can install with npm install request:
var request = require('request');
// Your route here. Example on what route may look like if called on /video?id=123videoId
app.get('/video', function(req, response, callback){
var videoId = 'adase268_'; // Could change to something like request.params['id']
request.get('https://www.youtube.com/watch?v='+videoId, function(error, response, body){
if(response.statusCode === 404){
// Video doesn't exist. Do what you need to do here.
}
else{
// Video exists.
// Can handle other HTTP response codes here if you like.
}
});
});
// You could refactor the above to take out the 'request.get()', wrap it in a function
// that takes a callback and re-use in multiple routes, depending on your problem.

#rodrigomartell is on the right track, in that your check function will need to make an HTTP call; however, just checking the youtube.com URL won't work in most cases. You'll get back a 404 if the videoID is a malformed ID (i.e. less than 11 characters or using characters not valid in their scheme), but if it's a properly formed videoID that just happens to not correspond to a video, you'll still get back a 200. It would be better to use an API request, like this (note that it might be easier to use the request-json library instead of just the request library):
request = require('request-json');
var client = request.newClient('https://www.googleapis.com/youtube/v3/');
youtubeVideoExist = function (youtubeId){
var apikey ='YOUR_API_KEY'; // register for a javascript API key at the Google Developer's Console ... https://console.developers.google.com/
client.get('videos/?part=id&id='+youtubeId+'&key='+apikey, function(err, res, body) {
if (body.items.length) {
return true; // if youtube video exists
}
else {
return false;
}
});
};

Using youtube-feeds module. Works fast (~200ms) and no need API_KEY
youtube = require("youtube-feeds");
existsFunc = function(youtubeId, callback) {
youtube.video(youtubeId, function(err, result) {
var exists;
exists = result.id === youtubeId;
console.log("youtubeId");
console.log(youtubeId);
console.log("exists");
console.log(exists);
callback (exists);
});
};
var notExistentYoutubeId = "y0srjasdkfjcKC4eY"
existsFunc (notExistentYoutubeId, console.log)
var existentYoutubeId = "y0srjcKC4eY"
existsFunc (existentYoutubeId, console.log)
output:
❯ node /pathToFileWithCodeAbove/FileWithCodeAbove.js
youtubeId
y0srjcKC4eY
exists
true
true
youtubeId
y0srjasdkfjcKC4eY
exists
false
false

All you need is to look for the thumbnail image. In NodeJS it would be something like
var http = require('http');
function isValidYoutubeID(youtubeID) {
var options = {
method: 'HEAD',
host: 'img.youtube.com',
path: '/vi/' + youtubeID + '/0.jpg'
};
var req = http.request(options, function(res) {
if (res.statusCode == 200){
console.log("Valid Youtube ID");
} else {
console.log("Invalid Youtube ID");
}
});
req.end();
}
API_KEY is not needed. It is quite fast because there is only header check for statusCode 200/404 and image is not loaded.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

AWS Cloudfront + lambda#edge modify html content (making all links absolute -> relative) - node.js

Related

How to use the full request URL in AWS Lambda to execute logic only on certain pages

Making remote request to Google Places API via Express fetches duplicate results everytime

Node.js - Why does my HTTP GET Request return a 404 when I know the data is there # the URL I am using

how to restrict making http calls from aws lambda

check on server side if youtube video exist

Categories

Resources