Htaccess and url for multilingual site [closed]

Htaccess and url for multilingual site [closed] - .htaccess

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm ready to use the subdirectory format for my multilingual website.
My first question is:
For SEO, have I to translate the page name in the url or it is useless?
Example:
- Same filename
site.com/fr/login
site.com/en/login
OR
- Different filename
site.com/fr/connexion
site.com/en/login
Then, when user is on site.com: Should I redirect him to site.com/en and site.com/fr depending user's IP? Or have I to set a default local, and have my url like site.com/page and site.com/fr/page
Finally, what is the best way to get the local from user's current URL?
Parsing url to get /fr or /en, or add a GET parameter in url with lang=fr (hidden with htaccess)
Thanks :)

As a precondition, I assume that you are not using frameworks / libraries. Furthermore, I never have solved similar problems only using .htaccess (as the title of your question requests) and thus don't know if it is possible to do so. Nevertheless, the following guidelines may help you.
Your first question
In general, a web page's file name and path have influence on its ranking. Furthermore, having page names and paths in native languages might help your users memorize the most important of your URLs even without bookmarking them.
Nevertheless, I never would translate the page names or directories for pages which are part of a web application (as opposed to a informational or promotional pages).
The login page you mentioned is a good example. I am nearly sure that you do not want your site to be found because of its contents on its login page. Actually, there are many websites which exclude login pages and other application pages from being indexed at all.
Instead, in SEO terms, put your effort into your promotional and informational pages. Provide valuable content, explain what is special about you or your site, and do everything you could that those pages get properly indexed. IMHO, static HTML pages are the best choice for doing so.
Furthermore, if you translate the names of pages which belong to your actual application, you will run into massive trouble. For example, after successful login, your application probably will transfer the user to his personal dashboard, which probably will be based on another HTML template / page. If you have translated that page name into different languages, then your application will have to take care to take the user to the right version. Basically, that means that you need as many versions of your application as languages you want to support. Of course, there are tricks to make life easier, but this will be a constant pain and definitely in no way worth the effort.
To summarize: Create static pages which show your USP (unique seller position) and provide valuable content to users (for example sophisticated tutorials and so on). Translate those pages, including names and paths, and SEO them in every way you could. But regarding the actual application, optimizing its pages is kind of pointless and even counterproductive.
Your second question
I would never use IP based redirecting for several reasons.
First, there are many customers in countries which are not their home country. For example, do you really want to redirect all native English speakers to your Hungarian pages because they are currently in Hungary for a business trip?
Second, more and more users today are using VPNs for different reasons, thereby often hiding the country where they currently are.
Third, which IP address belongs to which provider or country is highly volatile; you would have to constantly update your databases to keep up.
There are more reasons, but I think you already have got the idea.
Fortunately, there is a solution to your problem (but see "Final remark" below): Every browser, when fetching a page from a server, tells the server the preferred and accepted languages. For example, Apache can directly use that information in RewriteRule statements and redirect the user to the correct page.
If you can't alter your Server's configuration, then you can evaluate the respective header in your CGI program.
When doing your research, look for the Accept-Language HTTP 1.1 header. A good starting point probably is here.
Your third question
You eventually are mixing up two different things in your third question. A locale is not the same as a language. On one hand, you are asking "...to get the local from...", and on the other hand, you say "...lang=fr...", thus making the impression you want to get the language.
If you want to get the language: See my answer to your second question, or parse the language from the current path (as you already have suggested).
If you want to get the locale, things are more complicated. The only reasonable automatic method is to derive the locale from the language, but this will often fail. For example, I generally prefer the English language when doing research, but on the other hand, I am located in Germany and thus would like dates and times in the format I am used to, so deriving my desired locale from my preferred language will fail.
Unfortunately, there is no HTTP header which could tell the server which locale the user prefers. As a starting point, this article may help you.
See the final remark (next section) on how to solve this problem.
Final remark
As the article linked above already states: The only reliable way to satisfy the user is to let him choose his language, his locale and his time zone within your application. You could store the user's choices either in cookies or in your back-end database; each has its own advantages and disadvantages.
I usually use a combination of all methods (HTTP headers, cookies, database) in my projects.

Think about humans at the first. Is the URL translation important for users in France? Some people may think what it’s fine to get translated words in the URL. Users from other locales may think otherwise. Search engines take into account user behavioral factors. SEO factors will higher if you solution is more convinient for users.
It whould be nice if users get an expected language version. A site could help them if it suggests a language version by IP, HTTP headers, cookies and so on. Some people may prefer another language, some people may be on a trip. So it's still important let them to choice a language version manually.
Please read manuals and analyze competitors sites in case of doubt.

i usally show in mostly website they give url like site.com/en and site.com/fr as you mention but it upon you how you want to show website to user. i prefer make default site.com/en and give user option to select his language.
if you still confuse then refer below link it will usefull.
See Refferal Link Here

Should you translate paths?
If possible, by all means - as this will help users of that language to feel like "first class citizens". Login routes probably won't have much impact on SEO, but translating URLs on content pages may well help them to be more discoverable.
You can read Google's recommendations on multi-regional and multilingual sites, which state that it's "fine to translate words in the URL".
Should you redirect based on IP?
This can help first time users, but there are a few things to bear in mind:
How accurate will this be? E.g. if I speak English but I visit France and then view your site - I will get French. If your target market is mobile-toting globe-trotters, then it may not be the best choice. Is looking at Accept-Language, for example, any better?
Will geolocating the IP address on every request introduce any performance problems for your servers? Or make too many calls to an external geocoding service? Make sure to carry out capacity planning, and reduce calls where you already know the locale (e.g. from a cookie, or user explicitly stating their preference.
Even if you guess a preferred locale, always allow an explicit user preference to override that. There's nothing more frustrating than moving between pages, and having the site decide that it knows better what language you understand :-)
As long as you make it easy to switch between sites, you shouldn't need a specific landing page. It doesn't hurt to pop up a banner if you're unsure whether you should redirect a user (for example, amazon.com will show a banner to a UK user, giving them the option of switching sites - but not deciding for them)
How should you structure your URLs?
Having the language somewhere in the URL (either as a subdomain or a folder) is probably best for SEO. Don't forget to update your sitemap as well, to indicate to crawlers that there are alternate content pages in different languages.

Related

What can you do with server-side includes & htaccess

this issue just did not make sense to me, perhaps someone smart can help?
why is it that many hosts do not allow you to set htaccess or server-side includes?
specifically alot of the free hosting plans do not allow this, what are some situations that they are thinking of that makes them disable these features?

Income - You're not paying anything, the only money they get are from advertisements.
Resources - they take up additional resources. Offering only the essentials reduces the amount of risk and maintenance.
Agreement violations / unethical practices - It may reduce the amount of people signing up for free accounts to redirect websites, rename, hog resources, etc.
That's just what comes to mind.

I am new to this as well, but one thing you can do with SSI is to make your web page design a lot easier.
You can seperate headers, footers, and nav divisions (or any HTML elements that are on every page of your site) into their own HTML files. This means that when you add a page, or change one of those elements you only have to change them once, and then ensure they are included with an SSI include command into the correct spot on every page, nearly eliminating the chance of mistakes when putting the same element on every page.
SSI basically writes the referenced html file directly into the page wherever the include statement sits.
I know a lot less about .htaccess because I haven't played with it yet, but you can change which pages the server looks for SSI code in using different configuration commands. It also looks like you can enable SSI.
I know there is a lot more to all this than I wrote here, but hopefully this gives you an idea.
Here is a website that probably better explains this: http://www.perlscriptsjavascripts.com/tutorials/howto/ssi.html

I want to use security through obscurity for the admin interface of a simple website. Can it be a problem?

For the sake of simplicity I want to use admin links like this for a site:
http://sitename.com/somegibberish.php?othergibberish=...
So the actual URL and the parameter would be some completely random string which only I would know.
I know security through obscurity is generally a bad idea, but is it a realistic threat someone can find out the URL? Don't take the employees of the hosting company and eavesdroppers on the line into account, because it is a toy site, not something important and the hosting company doesn't give me secure FTP anyway, so I'm only concerned about normal visitors.
Is there a way of someone finding this URL? It wouldn't be anywhere on the web, so Google won't now it about either. I hope, at least. :)
Any other hole in my scheme which I don't see?

Well, if you could guarantee only you would ever know it, it would work. Unfortunately, even ignoring malicious men in the middle, there are many ways it can leak out...
It will appear in the access logs of your provider, which might end up on Google (and are certainly read by the hosting admins)
It's in your browsing history. Plugins, extensions etc have access to this, and often use upload it elsewhere (i.e. StumbleUpon).
Any proxy servers along the line see it clearly
It could turn up as a Referer to another site

some completely random string
which only I would know.
Sounds like a password to me. :-)
If you're going to have to remember a secret string I would suggest doing usernames and passwords "properly" as HTTP servers will have been written to not leak password information; the same is not true of URLs.
This may only be a toy site but why not practice setting up security properly as it won't matter if you get it wrong. So hopefully, if you do have a site which you need to secure in future you'll have already made all your mistakes.

I know security through obscurity is
generally a very bad idea,
Fixed it for you.
The danger here is that you might get in the habit of "oh, it worked for Toy such-and-such site, so I won't bother implementing real security on this other site."
You would do a disservice to yourself (and any clients/users of your system) if you ignore Kerckhoff's Principle.
That being said, rolling your own security system is a bad idea. Smarter people have already created security libraries in the other major languages, and even smarter people have reviewed and tweaked those libraries. Use them.

It could appear on the web via a "Referer leak". Say your page links to my page at http://entrian.com/, and I publish my web server referer logs on the web. There'll be an entry saying that http://entrian.com/ was accessed from http://sitename.com/somegibberish.php?othergibberish=...

As long as the "login-URL" never posted anywhere, there shouldn't be any way for search engines to find it. And if it's just a small, personal toy-site with no personal or really important content, I see this as a fast and decent-working solution regarding security compared to implementing some form of proper login/authorization system.
If the site is getting a big number of users and lots of content, or simply becomes more than a "toy site", I'd advice you to do it the proper way

I don't know what your toy admin page would display, but keep in mind that when loading external images or linking to somewhere else, your referrer is going to publicize your URL.

If you change http into https, then at least the url will not be visible to anyone sniffing on the network.

(the caveat here is that you also need to consider that very obscure login system can leave interesting traces to be found in the network traces (MITM), somewhere on the site/target for enabling priv.elevation, or on the system you use to log in if that one is no longer secure and some prefer admin login looking no different from a standard user login to avoid that)
You could require that some action be taken # of times and with some number of seconds of delays between the times. After this action,delay,action,delay,action pattern was noticed, the admin interface would become available for login. And the urls used in the interface could be randomized each time with a single use url generated after that pattern. Further, you could only expose this interface through some tunnel and only for a minute on a port encoded by the delays.
If you could do all that in a manner that didn't stand out in the logs, that'd be "clever" but you could also open up new holes by writing all that code and it goes against "keep it simple stupid".

How would you attack a domain to look for "unknown" resources? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Given a domain, is it possible for an attacker to discover one or many of the pages/resources that exist under that domain? And what could an attacker do/use to discover resources in a domain?
I have never seen the issue addressed in any security material (because it's a solved problem?) so I'm interested in ideas, theories, best-guesses, in addition to practices; anything an attacker could use in a "black box" manor to discover resources.
Some of the things that I've come up with are:
Google -- if google can find it, an attacker can.
A brute force dictionary attack -- Iterate common words and word combinations (Login, Error, Index, Default, etc.) As well, the dictionary could be narrowed if the resource extension was known (xml, asp, html, php.) which is fairly discoverable.
Monitor traffic via a Sniffer -- Watch for a listing of pages that users go to. This assumes some type of network access, in which case URL discovery is likely small peanuts given the fact the attacker has network access.
Edit: Obviously directory listings permissions are turned off.

The list on this is pretty long; there are a lot of techniques that can be used to do this; note that some of these are highly illegal:
See what Google, archive.org, and other web crawlers have indexed for the site.
Crawl through public documents on the site (including PDF, JavaScript, and Word documents) looking for private links.
Scan the site from different IP addresses to see if any location-based filtering is being done.
Compromise a computer on the site owner's network and scan from there.
Attack an exploit in the site's web server software and look at the data directly.
Go dumpster diving for auth credentials and log into the website using a password on a post-it (this happens way more often than you might think).
Look at common files (like robots.txt) to see if they 'protect' sensitive information.
Try common URLs (/secret, /corp, etc.) to see if they give a 302 (unauthorized) or 404 (page not found).
Get a low-level job at the company in question and attack from the inside; or, use that as an opportunity to steal credentials from legitimate users via keyboard sniffers, etc.
Steal a salesperson's or executive's laptop -- many don't use filesystem encryption.
Set up a coffee/hot dog stand offering a free WiFi hotspot near the company, proxy the traffic, and use that to get credentials.
Look at the company's public wiki for passwords.
And so on... you're much better off attacking the human side of the security problem than trying to come in over the network, unless you find some obvious exploits right off the bat. Office workers are much less likely to report a vulnerability, and are often incredibly sloppy in their security habits -- passwords get put into wikis and written down on post-it notes stuck to the monitor, road warriors don't encrypt their laptop hard drives, and so on.

Most typical attack vector would be trying to find well known application, like for example /webstats/ or /phpMyAdmin/, look for some typical files that unexperienced user might left in production env (eg. phpinfo.php). And most dangerous: text editor backup files. Many text editors leave copy of original file with '~' appended or perpended. So imagine you have whatever.php~ or whatever.apsx~. As these are not executed, attacker might get access to source code.

Brute Forcing (Use something like OWASP Dirbuster , ships with a great dictionary - also it will parse responses therefore can map the application quite quickly and then find resources even in quite deeply structured apps)
Yahoo, Google and other search engines as you stated
Robots.txt
sitemap.xml (quite common nowadays, and got lots of stuff in it)
Web Stats applications (if any installed in the server and public accessible such as /webstats/ )
Brute forcing for files and directories generally referred as "Forced Browsing", might help you google searches.

The path to resource files like CSS, JavaScript, images, video, audio, etc can also reveal directories if they are used in public pages. CSS and JavaScript could contain telling URLs in their code as well.
If you use a CMS, some CMS's put a meta tag into the head of each page that indicates the page was generated by the CMS. If your CMS is insecure, it could be an attack vector.

It is usually a good idea to set your defenses up in a way that assumes an attacker can list all the files served unless protected by HTTP AUTH (aspx auth isn't strong enough for this purpose).
EDIT: more generally, you are supposed to assume the attacker can identify all publicly accessible persistent resources. If the resource doesn't have an auth check, assume an attacker can read it.

The "robots.txt" file can give you (if it exists, of course) some information about what files\directories are there (Exmaple).

Can you get the whole machine? Use common / well known scanner & exploids.
Try social engineering. You'll wonder about how efficient it is.
Bruteforce sessions (JSessionid etc.) maybe with a fuzzer.
Try common used path signatures (/admin/ /adm/ .... in the domain)
Have a look for data inserts for further processing with XSS / SQL Injection / vulnerability testing
Exploid weak known applications within the domain
Use fishing hacks (XSS/XRF/HTML-META >> IFrame) to forward the user to your fake page (and the domain name stays).
Blackbox reengineering - What programming language is used? Are there bugs in the VM/Interpreter version? Try service fingerprinting. How whould you write a page like the page you want wo attack. What are the security issues the developer of the page may have missed?
a) Try to think like a dumb developer ;)
b) Hope that the developer of the domain is dumb.

Are you talking about ethical hacking?
You can download the site with SurfOffline tools, and have a pretty idea of the folders, architecture, etc.
Best Regards!

When attaching a new box onto "teh interwebs", I always run (ze)nmap. (I know the site looks sinister - that's a sign of quality in this context I guess...)
It's pretty much push-button and gives you a detailed explanation of how vulnerable the target (read:"your server") is.

If you use mod_rewrite on your server you could something like that:
All request that does not fit the patterns can be redirected to special page. There the IP or whatever will be tracked. You you have a certain number of "attacks" you can ban this user / ip. The most efficient way you be automatically add a special rewrite condition on you mod_rewrite.

A really good first step is to try a domain transfer against their DNS servers. Many are misconfigured, and will give you the complete list of hosts.
The fierce domain scanner does just that:
http://ha.ckers.org/fierce/
It also guesses common host names from a dictionary, as well as, upon finding a live host, checking numerically close IP addresses.

To protect a site against attacks, call the upper management for a security meeting and tell them to never use the work password anywhere else. Most suits will carelessly use the same password everywhere: Work, home, pr0n sites, gambling, public forums, wikipedia. They are simply unaware of the fact that not all sites care not to look at the users passwords (especially when the sites offer "free" stuff).

Is it possible for a 3rd party to reliably discern your CMS?

I don't know much about poking at servers, etc, but in light of the (relatively) recent Wordpress security issues, I'm wondering if it's possible to obscure which CMS you might be using to the outside world.
Obviously you can rename the default login page, error messages, the favicon (I see the joomla one everywhere) and use a non-default template, but the sorts of things I'm wondering about are watching redirects somehow and things like that. Do most CMS leave traces?
This is not to replace other forms of security, but more of a curious question.
Thanks for any insight!

Yes, many CMS leave traces like the forming of identifiers and hierarchy of elements that are a plain giveaway.
This is however not the point. What is the point, is that there are only few very popular CMS. It is not necessary to determine which one you use. It will suffice to methodically try attack techniques for the 5 to 10 biggest CMS in use on your site to get a pretty good probability of success.

In the general case, security by obscurity doesn't work. If you rely on the fact that someone doesn't know something, this means you're vulnerable to certain attacks since you blind yourself to them.
Therefore, it is dangerous to follow this path. Chose a successful CMS and then install all the available security patches right away. By using a famous CMS, you make sure that you'll get security fixes quickly. Your biggest enemy is time; attackers can find thousands of vulnerable sites with Google and attack them simultaneously using bot nets. This is a completely automated process today. Trying to hide what software you're using won't stop the bots from hacking your site since they don't check which vulnerability they might expect; they just try the top 10 of the currently most successful exploits.
[EDIT] Bot nets with 10'000 bots are not uncommon today. As long as installing security patches is so hard, people won't protect their computers and that means criminals will have lots of resources to attack. On top of that, there are sites which sell exploits as ready-to-use plugins for bots (or bots or rent whole bot nets).
So as long as the vulnerability is still there, camouflaging your site won't help.

A lot of CMS's have id, classnames and structure patterns that can identify them (Wordpress for example). URLs have specific patterns too. You just need someone experienced with the plataform or with just some browsing to identify which CMS it's using.
IMHO, you can try to change all this structure in your CMS, but if you are into all this effort, I think you should just create your own CMS.
It's more important to keep everything up to date in your plataform and follow some security measures than try to change everything that could reveal the CMS you're using.

Since this question is tagged "wordpress:" you can hide your wordpress version by putting this in your theme's functions.php file:
add_action('init', 'removeWPVersionInfo');
function removeWPVersionInfo() {
remove_action('wp_head', 'wp_generator');
}
But, you're still going to have the usual paths, i.e., wp-content/themes/ etc... and wp-content/plugins/ etc... in page source, unless you figure out a way to rewrite those with .htaccess.

How to collect customer feedback? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What's the best way to close the loop and have a desktop app "call home" with customer feedback? Right now our code will login to our SMTP server and send me some email.

The site GetSatisfaction has been an increasingly popular way to get customer feedback.
http://getsatisfaction.com/
GetSatisfaction is a community based site that builds a community around your application. Users can post questions, comments, and feedback about and application and get answers to their questions either from other members or from members of the development team themselves.
They also have an API so you can incorporate GetSatifaction into your app, and/or your site.
I've been playing with it for a couple of weeks and it is pretty cool. Kind of like stackoverflow, but for customer feedback.

Feedback from users and programmers simply is one of the most important points of development in my opinion. The whole web2.0 - beta - concept more or less is build around this concept and therefore there should be absolutely no pain involved whatsoever for the user. What does it have to do with your question? I think quite a bit. If you provide a feedback option, make it visible in your application, but don't annoy the user (like MS sometimes does with there feedback thingy on there website above all elements!!). Place it somewhere directly! visible, but discreet. What about a separate menu entry? Some leftover space in the statusbar? Put it there so it is accessible all the time. Why? People really liking your product or who are REALLY annoyed about something will probably find your feedback option in any case, but you will miss the small things. Imagine a user unsure about the value of his input "should I really write him?". This one will probably will not make the afford in searching and in the end these small things make a really outstanding product, don't they? OK, the user found your feedback form, but how should it look and what's next? Keep it simple and don't ask him dozens questions and provoke him with check- and radioboxes. Give him two input fields, one for a title and one for a long description. Not more and not less. Maybe a small text shortly giving him some info what might be useful (OS, program version etc., maybe his email), but leave all this up to him. How to get the message to you and how to show the user that his input counts? In most cases this is simple. Like levand suggested use http and post the comment on a private area on your site and provide a link to his input. After revisiting his input, make it public and accessible for all (if possible). There he can see your response and that you really care etc.. Why not use the mail approach? What about a firewall preventing him to access your site? Duo to spam in quite some modern routers these ports are by default closed and you certainly will not get any response from workers in bigger companies, however port 80 or 443 is often open... (maybe you should check, if the current browser have a proxy installed and use this one..). Although I haven't used GetSatisfaction yet, I somewhat disagree with Nick Hadded, because you don't want third parties to have access to possible private and confidential data. Additionally you want "one face to the customer" and don't want to open up your customers base to someone else. There is SOO much more to tell, but I don't want to get banned for tattling .. haha! THX for caring about the user! :)

You might be interested in UseResponse, open-source (yet not free) hosted customer feedback / idea gathering solution that will be released in December, 2001.
It should run on majority of PHP hosting environments (including shared ones) and according to it's authors it's absorbed only the best features of it's competitors (mentioned in other answers) while will have little-to-none flaws of these.

You could also have the application send a POST http request directly to a URL on your server.

What my friend we are forgetting here is that, does having a mere form on your website enough to convince the users how much effort a Company puts in to act on that precious feedback.
A users' note to a company is a true image about the product or service that they offer. In Web 2.0 culture, people feel proud of being part of continuous development strategy always preached by almost all companies nowadays.
A community engagement platform is the need of the hour & an entry point on ur website that gains enuf traction from visitors to start talking what they feel will leave no stone unturned in getting those precious feedback. Thats where products like GetSatisfaction, UserRules or Zendesk comes in.
A company's active community that involves unimagined ideas, unresolved issues and ofcourse testimonials conveys the better development strategy of the product or service they offer.

Personally, I would also POST the information. However, I would send it to a PHP script that would then insert it into a mySQL database. This way, your data can be pre-sorted and pre-categorized for analysis later. It also gives you the potential to track multiple entries by single users.

There's quite a few options. This site makes the following suggestions
http://www.suggestionbox.com/
http://www.kampyle.com/
http://getsatisfaction.com/
http://www.feedbackify.com/
http://uservoice.com/
http://userecho.com/
http://www.opinionlab.com/content/
http://ideascale.com/
http://sparkbin.net/
http://www.gri.pe/
http://www.dialogcentral.com/
http://websitechat.net/en/
http://www.anymeeting.com/
http://www.facebook.com/

I would recommend just using pre built systems. Saves you the hassle.
Get an Insight is good: http://getaninsight.com/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string