Instagram Link Sticker Redirect does not decode URL parameters and leads to 404

Instagram Link Sticker Redirect does not decode URL parameters and leads to 404 - instagram

I created a Link Sticker for an Instagram story and used UTM parameters to track the sources. I customized the Link Sticker text to contain only a human readable word.
I got some meassages from users that the link is not working and redirects to a 404 page on my website.
After analyzing the access logs I noticed that some visitors came to the website with encoded url parameters.
50 out of 58 visitors came correctly to:
/de/example.html?utm_source=instagram&utm_medium=Social&utm_campaign=story&utm_content=example
8 out of 58 (more than 10%!) visitors came incorrectly to:
/de/example.html%3Futm_source%3Dinstagram%26utm_medium%3DSocial%26utm_campaign%3Dstory%26utm_content%3Dexample
The latter leads to the 404 Page.
What differs between those groups is the referrer of the request.
For the correct first group, it always is: https://l.instagram.com/
For the incorrect second group it always is: https://www.instagram.com/
I don't see any obvious differences in the user agent, in both groups IPhones and Android phones of different types and versions are included.
Does anyone else noticed that issue or is this issue known somewhere?

Related

IIS site returns a 403 response on every page for certain browsers (by user agent string)

I have a Wordpress site running on IIS site which returns a hard 403 response for certain user agent strings (IE 6 - 10 and Firefox 33 Mac and Win). After some poking around by manually changing the user agent string I determined that the strings Trident/4 , Trident/5 , and Firefox/3 all cause this site to exhibit this behavior. There might be other combinations, but clearly this is something going on in either the code or the IIS level.
I've scanned the code at a high level and found some mentions of both Firefox and Trident, all related to user agent sniffing, but they appear to be core Wordpress files and not app specific code. I've been searching all afternoon and the only things I find mention of are telling the user to adjust the "directory browsing" settings in web.config. However I can replicate the behavior by directly accessing a static asset such as a CSS file. That tells me not only is it not related to directory browsing, but it's probably also not anything to do with application code.
Can anyone offer insight into what might be happening here? To head off questions:
We just noticed this behavior a few weeks ago, unsure of how long it's been going on.
I'll be checking access/error logs as soon as I can get them.
EDIT
Turns out that the previous developers had added some very specific URL rewriting rules for the site. They were explicitly returning a 403 for any user agent with the patterns I listed above, along with a few other generic patterns and some specific botnames. I knew it had to be something with the web server...we just had to poke around in IIS long enough to find them.

Url shortener which displays shortlink in address bar?

I'm wondering if there is a URL shortener which, which clicked on, keeps displaying the actual shortened link in the address bar, as opposed to showing the original URL.
Thanks.

You could use a full-height iframe, but many sites use X-Frame-Options to forbid this (including major ones like Facebook and Google). Users would see an error message for these sites.
Ow.ly and others used to use this technique, but most have ditched it by now for this reason. In general, it's considered user-hostile and a Bad Idea™ now.

Block or redirect website page URLs using .htaccess

I am having some issues with spam links visiting my site returning a 404 error.
My site was hacked with a secret spam links folder on public_html that redirected users to pornographic sites, those links were plastered across the internet. I have since remedied the malware issue, but have several hundred visitors hitting a 404 page because these links no longer exist, messing up all my analytics accounts, using bandwidth, etc.
I have searched for a way to block (so that they never hit my website) anyone that tries to access these URL paths, but cannot possibly redirect every single link (there were over 2000) using a wildcard, or something. My search led me here: Block Spam Referrer Traffic and it is not quite the solution I need.
The searches go to pages like this: www.mywebsite.com/spampage/morespam/ (which have been deleted and are now 404 errors)
There are several iterations of the /spampage/ and /morespam/ urls.
The referrer is generally a google search, so I can't block the referrer using .htaccess. I'd like to somehow block www.mywebsite.com/spampage/*/ and all iterations.
Apologies, I am by no means a programmer. I do appreciate any help that can be offered.
Update#1:
Seems that perhaps the best way is to block these links/directories using the robots.txt file, I have done so and will report back if I have success!
Update#2:
Reporting back. I am new to this all, so I was going about the solution wrong in my original question. Essentially, I found that I needed all of the links de-indexed, as they were generating all the traffic by being indexed by google. I was able to request de-indexing of the directories in question manually through the google webmaster tools account. One requirement for de-indexing was to have the robots.txt on the site block the directories in question from being crawled. Once I did that I submitted the request to remove the directory from the google index. Those pages were taken off in about 3 hours by google (thanks google!), so it was pretty quick once I found out the proper way to go about it. No .htaccess editing needed. Once the pages were no longer index, traffic went back down to normal levels and my keywords, etc, will be back to normal.

Hacker (Multiple IP's) attacking one page (lib.php) with a variable attached, what to do?

I have in my main website root the file...
lib.php
So hackers keeps hitting my website with different IP addresses, different OS, different everything. The page is redirected to our 404 error page, and this 404 error page tracks visitors using standard visitor tracking analytics do allow us to see problems as they may arise.
Below is an example of the landing pages as shown in analytics by the hackers, except that I get about 200 hits per hour. Each link is a bit different as they are using a variable to set as a page url to goto.
mysite.com/lib.php?id=zh%2F78jQrm3qLoE53KZd2vBHtPFaYHTOvBijvL2NNWYE%3D
mysite.com/lib.php?id=WY%2FfNHaB2OBcAH0TcsAEPrmFy1uGMHgxmiWVqT2M6Wk%VD
mysite.com/lib.php?id=WY%2FfNHaB2OBcAH0TcsAEPrmFy1uGMHgxmiWVqJHGEWk%T%
mysite.com/lib.php?id=JY%2FfNHaB2OBcAH0TcsAEPrmFy1uGMHgxmiWVqT2MFGk%BD
I do not think I even need the file http://www.mysite.com/lib.php
Should I need it? When I visit mysite.com/lib.php it is redirected to my custom 404 page.
How can I stop this best? I am thinking by using .htaccess, but not sure the best setup?

This is most probably part of the Asprox botnet.
http://rebsnippets.blogspot.cz/asprox
Key thing is to change your password and stop using FTP protocol to access your privileged accounts.

How to best normalize URLs

I'm creating a site that allows users to add Keyword --> URL links. I want multiple users to be able to link to the same url (exactly the same, same object instance).
So if user 1 types in "http://www.facebook.com/index.php" and user 2 types in "http://facebook.com" and user 3 types in "www.facebook.com" how do I best "convert" them to what these all resolve to: "http://www.facebook.com/"
The back end is in Python...
How does a search engine keep track of URLs? Do they keep a URL then take what ever it resolves to or do they toss URLs that are different from what they resolve to and just care about the resolved version?
Thanks!!!

So if user 1 types in "http://www.facebook.com/index.php" and user 2 types in "http://facebook.com" and user 3 types in "www.facebook.com" how do I best "convert" them to what these all resolve to: "http://www.facebook.com/"
You'd resolve user 3 by fixing up invalid URLs. www.facebook.com isn't a URL, but you can guess that http:// should go on the start. An empty path part is the same as the / path, so you can be sure that needs to go on the end too. A good URL parser should be able to do this bit.
You could resolve user 2 by making a HTTP HEAD request to the URL. If it comes back with a status code of 301, you've got a permanent redirect to the real URL in the Location response header. Facebook does this to send facebook.com traffic to www.facebook.com, and it's definitely something that sites should be doing (even though in the real world many aren't). You might allow consider allowing other redirect status codes in the 3xx family to do the same; it's not really the right thing to do, but some sites use 302 instead of 301 for the redirect because they're a bit thick.
If you have the time and network resources (plus more code to prevent the feature being abused to DoS you or others), you could also consider GETting the target web page and parsing it (assuming it turns out ot be HTML). If there is a <link rel="canonical" href="..." /> element in the page, you should also treat that URL as being the proper one. (View Source: Stack Overflow does this.)
However, unfortunately, user 1's case cannot be resolved. Facebook is serving a page at / and a page at /index.php, and though we can look at them and say they're the same, there is no technical method to describe that relationship. In an ideal world Facebook would include either a 301 redirect response or a <link rel="canonical" /> to tell people that / was the proper format URL to access a particular resource rather than /index.php (or vice versa). But they don't, and in fact most database-driven web sites don't do this yet either.
To get around this, some search engines(*) compare the content at different [sub]domains, and to a limited extent also different paths on the same host, and guess that they're the same if the content is sufficiently similar. Of course this is a lot of work, requires a lot of storage and processing, and is ultimately not terribly reliable.
I wouldn't really bother with much of this, beyond fixing up URLs like in the user 3 case. From your description it doesn't seem that essential that pages that “are the same” have to share actual identity, unless there's a particular use-case you haven't mentioned.
(*: well, Google anyway; more traditional ones traditionally didn't and would happily serve up multiple links for the same page, but I'd assume the other majors are doing something similar now.)

There's no way to know, other than "magic" knowledge about the particular website, that "/index.php" is the same as fetching "/".
So, your problem, as stated, is impossible.

i'd save 3 link as separated, since you can never reliably tell they resolve to same page. it all depends on how the server (out of our control) resolve the url.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string