How to check if two URL's lead to the same path? - node.js

I'm building a URL Shortener, and I've decided to recyle the short ID's if possible to save space in my database. How can i check if 2 URL's lead to the same path?
For example, let's say a user generates a short URL for https://google.com/.
My app generates the following short id: jkU3
So if this user visits https://tiny.url/jkU3 my express server will redirect the visitor to https://google.com/.
This works like a charm, but know let's imagine another person visits https://tiny.url/ and generates a short URL for https://google.com. And another one comes and generates a short URL for https://www.google.com/, and another one comes and generates one for https://www.google.com. You get the point..
So far my app would have wasted 4 short ID's.
How can i prevent this from happening? Is there a regex for this?
This is the current code I have for generating short URL's:
app.post("/", (req: Request, res: Response) => {
const shortUrl: string = nanoid(4);
const destination: string = req.body.destination;
UrlSchema.create({
_id: mongoose.Types.ObjectId(),
origin: shortUrl,
destination: destination,
}).then(() => {
// Unique Id
res.json(shortUrl);
});
});

Before creating a new entry you can check work destination with
const existing = await UrlSchema.findOne({destination:req.body.destination});
if(!existing){
// create new
} else{
// return same
}
This way you will be creating destination if it does not exist already. You can remove tariling slash(/) if it exists to match URLs better,

You've listed four slightly different URLs:
https://www.google.com
https://google.com
https://www.google.com/
https://google.com/
None of these are technically the same https request, though it sounds like you want to assume that the / at the end is optional and thus does not make it a different target URL.
The last two are not guaranteed to be the same host as the first two. For google.com and www.google.com, they are the same host, but this is not guaranteed to be the case for all possible hosts.
If you want to assume that these four are all to the same host no matter what the domain is, then you just have to normalize the URL before you put it in your database and then before assigning a new shortened ID, you search the database for the normalized version of the URL.
In this case, you would remove the www. and remove any trailing slash to create the normalized version of the URL.
function normalizeUrl(url) {
// remove "www." if at first part of hostname
// remove trailing slash
return url.replace(/\/\/www\./, "//").replace(/\/$/, "");
}
Once you've normalized the URL, you search for the normalized URL in your database. If you find it, you use the existing shortener for it. If you don't find it, you add the normalized version to your database with a newly generated shortId.
Here's a demo:
function normalizeUrl(url) {
// remove "www." if at first part of hostname
// remove trailing slash
return url.replace(/\/\/www\./i, "//").replace(/\/$/, "");
}
const testUrls = [
"https://www.google.com",
"https://www.google.com/",
"https://google.com",
"https://google.com/",
];
for (const url of testUrls) {
console.log(normalizeUrl(url));
}
FYI, since hostnames in DNS are not case sensitive, you may also want to force the hostname to lower case to normalize it. Path names or query parameters could be case sensitive (sometimes they are and sometime they are not).
To include the host case sensitivity normalization, you could use this:
function normalizeUrl(url) {
// remove "www." if at first part of hostname
// remove trailing slash
// lowercase host name
return newUrl = url.replace(/\/\/www\./i, "//").replace(/\/$/, "").replace(/\/\/([^/]+)/, function(match, p1) {
// console.log(match, p1);
return "//" + p1.toLowerCase();
});
}
const testUrls = [
"https://www.google.com",
"https://www.google.com/",
"https://google.com",
"https://google.com/",
"https://WWW.google.com",
"https://www.Google.com/",
"https://GOOGLE.com",
"https://google.COM/",
"https://www.Google.com/xxx", // this should be unique
"https://google.COM/XXX", // this should be unique
];
for (const url of testUrls) {
console.log(normalizeUrl(url));
}

Related

Serve different origin as DNS CNAME value, depending on the IP Geolocation

I have Next.js Application that is deployed as Lambda Edge and Cloudfront that has S3 static files as origin. AWS Cloudfront URL is the value of example-domain.com CNAME.
Now I have a lot of changes to the Application and I want to deploy those changes.
I don't want these changes to be override current deploy, instead I would like to create new environment and that environment should be available in few countries.
So in this case I would have 2 different Cloudfront URLS:
oldfeatures.cloudfront.net
newfeatures.cloudfront.net
Different origin url should be served depending on the geoloaction.
I am not sure that this is the right approach, but I am open for suggestions.
I am managing the domain settings in Cloudflare, all the rest is AWS environment.
How can I achieve this, without making changes to the Applications code.
The easiest (and cost-effective) option would be to create a Cloudflare (CF) Worker and attach it to your subdomains. Probably better to attach it to both subdomains to ensure they can't access the newfeatures.cloudfront.net URL manually if they are 'not allowed' to. The second option would be to use CF Traffic Steering that comes with the CF Load Balancers feature. This option comes at a small cost and might not really fit your use case - it redirects based on regions rather than countries - I believe you'd need to fork more money and get an enterprise account for country based steering.
That said, the following option using a CF Worker JavaScript, will require you to have CF Proxy (the orange cloud) turned on.
Here are 3 options to satisfy your requirements - Recommending option 1 based on my understanding of your question.
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
return setRedirect(request) //Change to appropriate function
}
// Option 1
// Return same newfeatures URL for specific countries based on a country list
// Useful if single URL applies for many countries - I guess that's your case
async function setRedirect(request) {
const country = request.cf.country
var cSet = new Set(["US",
"ES",
"FR",
"CA",
"CZ"]);
if (cSet.has(country)) {
return Response.redirect('https://newfeatures.cloudfront.net', 302);
}
// Else return oldfeatures URL for all other countries
else {
return await fetch(request)
}
}
// Option 2
const countryMap = {
US: "https://newfeatures1.cloudfront.net", //URL1
CA: "https://newfeatures2.cloudfront.net", //URL2
FR: "https://newfeatures3.cloudfront.net" //URL3
}
// Return different newfeatures URL for specific countries based on a map
// Useful if more than two URLs
async function mapRedirect(request) {
const country = request.cf.country
if (country != null && country in countryMap) {
const url = countryMap[country]
return Response.redirect(url, 302)
}
// Else return old features for all other countries
else {
return await fetch(request)
}
}
// Option 3
// Return single newfeatures URL for a single country.
// Much simpler one if you only have one specific country to redirect.
async function singleRedirect(request) {
const country = request.cf.country
if (country == 'US'){
return Response.redirect('https://newfeatures.cloudfront.net', 302)
}
// Else return oldfeatures URL for all other countries
else {
return await fetch(request)
}
}

Case-insensitive router.base with nuxt.js and IIS

If I generate a site with nuxt, specifying router.base in nuxt.config.js then host it in an IIS server, all works as expected if the requested URL path starts with the specified router.base exactly.
Say, for example router.base = "/Foo/" and the page is named "Bar" then http://example.com/Foo/Bar loads just fine. So does http://example.com/Foo/bar, so it does not appear that page names are case sensitive.
However, if I use a different case for the router.base portion of the URL, then the page loads, but it appears that page lifecycle methods (such as data(), created(), head(), etc.) do not run, although the layout mounted() method does load. This would happen if I were to use a URL like http://example.com/foo/Bar in the previous example.
So my question is... is there a way to run page lifecycle methods when the case of the URL path differs from the router.base value?
This doesn't really fix the issue, but it seems one can redirect the page early in the loading process to a page with a path that starts with the router.base exactly. So in /layouts/default.vue, I now have:
export default {
mounted () {
let escapeRegExp = function(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
}
if (this.$route.fullPath.indexOf(this.$router.options.base) == -1 &&
this.$route.fullPath.toLocaleLowerCase().indexOf(this.$router.options.base.toLowerCase()) == 0) {
let base = this.$router.options.base
let regex = new RegExp("(//[^/]+)"+escapeRegExp(base),"i")
let newLoc = document.location.href.replace(regex,"$1"+base)
if (newLoc != document.location.href) {
document.location.replace(newLoc)
}
}
}
}

MVC5 Rewriting Routing Attribute - Default page

I am attempting to switch from RouteConfig to Routing Attributes.
I am following along the Pro ASP.NET MVC 5 book from Adam Freeman and I'm trying to convert the following code that handles the paging of clients.
routes.MapRoute(
name: null,
url: "{controller}/Page{page}",
defaults: new { action = "Index", status = (string)null },
constraints: new { page = #"\d+" }
);
This works great! As I go to different URLs, the links look very nice
http://localhost:65534/Client - Default page
http://localhost:65534/Client/Page2 - Second page
Now I've decided to try out Url Attributes and having a bit of problems when it comes to how 'pretty' the links are. All of the links are working fine, but it's the 'routing rewriting' that I am trying to fix.
Here are the important parts of my controller.
[RoutePrefix("Client")]
[Route("{action=index}/{id:int?}")]
public class ClientController : Controller {
[Route("Page{page:int?}")]
public ActionResult Index(string sortOrder, string search = null, int page = 1) {
With the attribute above the Index, going to /Client or to /Client/Page gives me a 404.
Adding a blank route to catch the default page
[Route("Page{page:int?}")]
[Route]
Works for /Client and /Client/Page3, but now the rewriting of the URL is messed up. Clicking on page 3 of the pager gives me a URL of
http://localhost:65534/Client?page=3
which is not what I want. Changing the routing to
[Route("Page{page:int?}")]
[Route("{page=1:int?}")]
Works almost 100%, but the default link for /Client is now
http://localhost:65534/Client/Page
So, I am now asking for help. How can I correctly convert the original MapRoute to the attributes?
Just use:
[Route("", Order = 1)]
[Route("Page{page:int}", Order = 2)]
UPDATE
Plainly and simply, the routing framework is dumb. It doesn't make decisions about which route is the most appropriate, it merely finds a matching route and returns. If you do something like:
Url.Action("Index", "Client", new { page = 1 })
You're expecting the generated URL to be /Client/Page1, but since you have a route where page is essentially optional, it always will choose that route and append anything it can't stuff into the URL as a querystring, i.e. /Client?page=1. The only way to get around this is to actually name the route you want and use that named route to generate the URL. For example:
[Route("", Order = 1)]
[Route("Page{page:int}", Name = "ClientWithPage", Order = 2)]
And then:
Url.RouteUrl("ClientWithPage", new { page = 1 })
Then, you'll get the route you expect because you're directly referencing it.
UPDATE #2
I'm not sure what you mean by "go into PagedList.MVC and add a name property to it". It doesn't require any core changes to the code because PagedList already has support for custom page links. Just change your pager code to something like:
#Html.PagedListPager((IPagedList)ViewBag.OnePageOfItems, page => Url.RouteUrl("ClientWithPage", new { page = page }))
And you'll get the URL style you want. Attribute routing can be a bit more finicky than traditional routing, but I'd hardly call it useless. It's far more flexible than traditional routing, but that flexibility has some costs.

Check URL exists in ApplicationContentUriRules

I have an app that is loading URLs into a WebView (x-ms-webview). When the user makes the request, I would like to compare the URL they are trying to load with the "whitelisted" URLs in the ApplicationContentUriRules and warn them if it is not there. Any ideas how I might accomplish this?
There isn't a direct API for pulling information out of the manifest, but there are options.
First, you can just maintain an array of those same URIs in your code, because to change them you'd have to change the manifest and update your package anyway, so you would update the array to match. This would make it easy to check, but increase code maintenance.
Such an array would also let you create a UI in which the user can enter only URIs that will work, e.g. you can offer possibilities from a drop-down list instead of letting the user enter anything.
Second, you can read the manifest XML into a document directly and parse through it to get to the rules. Here's some code that will do that:
var uri = new Windows.Foundation.Uri("ms-appx:///appxmanifest.xml");
var doc;
Windows.Storage.StorageFile.getFileFromApplicationUriAsync(uri).then(function (file) {
return Windows.Storage.FileIO.readTextAsync(file);
}).done(function (text) {
doc = new Windows.Data.Xml.Dom.XmlDocument();
doc.loadXml(text);
var acur = doc.getElementsByTagName("ApplicationContentUriRules");
if (acur !== null) {
var rules = acur[0].getElementsByTagName("Rule");
for (var i = 0; i < rules.length; i++) {
console.log(rules[i].getAttribute("Match") + " - " + rules[i].getAttribute("Type"));
}
}
});
You could probably just get "Rule" tags directly from the root doc, because I don't think anything else in the manifest uses that kind of node, but to future-proof it's better to get the ApplicationContentUriRules first. As you can see, the "Match" attribute is what holds the URI for that rule, but you also need to make sure that "Type" is "include" and not "exclude".

Why doesn't this URL scheme match my URL?

I was filtering URLs for pageAction with hostSuffix according to this SO answer: How to show Chrome Extension on certain domains?
var onWebNav = function(details){ if (details.frameId === 0) chrome.pageAction.show(details.tabId); };
var filter = { url: [{ hostSuffix: "reddit.com" }] };
chrome.webNavigation.onCommitted.addListener(onWebNav, filter);
chrome.webNavigation.onHistoryStateUpdated.addListener(onWebNav, filter);
But if I want a pattern, like http://*.reddit.com/r/*/comments/*, I should use schemes instead of hostSuffix, right? According to https://developer.chrome.com/extensions/match_patterns this should match everything:
var filter = { url: [{ schemes: ["http://*/*"] }] };
but it doesn't match even this URL http://www.reddit.com/r/IAmA/comments/z1c9z/.
Do I use schemes filter in wrong way?
I don't think that is a valid scheme. A scheme is what identifies the protocol used, it shouldn't have any wildcards.
The match_patterns link describes a fully build up pattern, consisting of 3 parts, including a scheme. This is a different system from the one you're trying to use, which is based on matching different parts of an URL separately. The patterns link does give you info on valid schemes, in the basic syntax table:
<scheme> := '*' | 'http' | 'https' | 'file' | 'ftp'
Try var filter = { url: [{ schemes: ["http"] }] }; instead. This should match any URL that goes over plain old HTTP.
To achieve your desired matching, the best thing would be to use either the originAndPathMatches or urlMatches filters, since you need the regular expression for matching the host and the specific syntax of the path.

Resources