How to fetch data from a website that requires permission using curl?

How to fetch data from a website that requires permission using curl? - linux

I am familiar with the curl command in Linux. However, I want to know if there is a way to access a URL which when accessed asks for a user interaction/permission to proceed forward for example, asking for a license agreement whether user agrees or not.
Is there a way I can either skip this permission check or pass the "I agree" kind of argument in the curl command which takes me to the actual website data?
Edit 1: Some more information on how the user interaction/permission appears on the site:
When using a browser to visit the URL, the webpage asks the user to confirm if he agrees with the terms and license conditions and provides two options "I Agree" and "I Disagree". If the user clicks on "I Agree" he proceeds to the actual webpage.
I want to know if the same can be done via command line in a shell script using curl or equivalent?
Edit 2:
When using a browser (I used firefox) to visit the URL, the URL asks for the user permission only for the first time. Next time when I visit the URL it simply skips this and proceeds to the main site. I reckon the cookie for this session is saved by the browser and used next time onwards. Having this understanding, I tried to generate cookie files and use it through curl in the following way:
To generate cookie:
curl --cookie-jar cookies.txt http://url
To use the cookie:
curl --cookie cookies.txt http://url
But I could not succeed. I traced out the location of cookies saved by firefox and tried to use it in the same way but failed again. I think I am close but I am unable to take any step forward.

Using the information given by Géza Török and wick above and my own understanding I was able to achieve this. I used a Firefox browser to visit the URL and then located the cookies stored on the disk. After reading and understanding the content and format of the cookies I created my own cookie text file with appropriate response and passed it to the curl command in the following way to proceed to the main website:
curl --cookie cookies.txt http://url
Thanks for help!

As comments are rightly suggest there are many ways to implement this. To start with, I would install firebug (if using firefox) or press Ctrl+Shift+I in chrome to check what is going on after you hit the URL in question. If there are different objects associated with initial checks and the main function you will see it there, and decide whether curl will help.
Curl is just a tool to pull http objects. Probably the only case when it could replace browser's interaction is basic authentication, when you can supply keystring in request header. For the case you describe, http debugger would be the place to start.

Related

I set my telegram bot webhook, but do not send the telegram. Why?

I set webhook my telegram bot with setwebhook method but when I send a message in bot don't take this in my URL.
https://api.telegram.org/bot<token>/setwebhook?url=https://www.example.com/bot/temp.php
In my host I use PHP language for take and analysis and answer that. and I user this command for get updates from bot.
$update = file_get_contents(“php://input”);
But after run this line $update is empty.
I haven't problem with take updates without setwebhook bot when I use webhook don't take data.
result run getWebhookinfo is:
{"ok":true,"result":{"url":"https://example.com/bot/temp.php","has_custom_certificate":false,"pending_update_count":0,"max_connections":40}}
Do I have to be true "has_custom_cere"?
Thanks for help me

Mohammad, I think a PHP error occurred when telegram sends the request to you. Following code should works fine for getting the Telegram request.
$json = file_get_contents('php://input');
$request = json_decode($json);
I think your script stop before this code. I suggest you to enable PHP debug mode and check the error log.

You can find out problem via following method:
Check getWebhookInfo method, make sure your webhook URL is correct, and no last_error_message field.
POST similar data to your server, here is some data you can use in curl -d JSON, just copy it and run on your own server.
At last, check your CDN config (if you had applied on that server), temporary disable flooding or any check.

What is the outcome of the /getwebhookinfo method and could you post it here?
Please check if you SSL certificate valid (happend once to me) and what shows if you call the website in your browser (any php errors?).
I would have commented that, but I don´t have enough rep... Sorry :/

Please follow this way and if you don't get answer, your PHP has problem :
1- Revoke your Auth Key from BotFather by typing /revoke
2- Reset your Webbhook by new Auth Key :
https://api.telegram.org/bot<token>/setWebhook?url=https://www.website.com/bot_path
Notice #1 : setWebhook is case sensitive and W keyword must be capital.
Notice #2 : If you're using CMS such as Codeigniter, Laravel, Zend etc. It doesn't need to put .php after your bot_path and if not use it.
Notice #3 : If you want to delete last Webhook that you have set, You just need to write above url without ?url= at the rest.
3- After all you receive this message from telegram :
{"ok":true,"result":true,"description":"Webhook was set"}
Now you can test your Bot, But be aware that do this way to check your Auth Key and If there is no problem, absolutely your PHP code has problem. If you have problem yet please contact me on website to solve it Graphap and then put the right answer here.
I've told to recheck this way because I had problem frequently because of Auth Key which not working But now it works as well.

Retrieve text (Twitter) from command line in Linux

I have the following strange requirement: A given free proprietary software can be downloaded via a temporal link, which is valid for one day. This URL is used in a script (TravisCI / Docker) to download this product and install it. However, after one day, if the script is used again, the URL is invalid, and the script fails.
I would like to have a way to get that a valid URL and use a new one. It is not possible to get the new new URL directly from the website.
I was thinking about publishing a valid link in a public place, such as Twitter with a specific hashtag, and retrieve the link by querying the hashtag in Twitter. Any user that wants to use the scripts (TravisCI / Docker) just publish a new valid link, and the scripts will use it to download the software.
However, it is not possible to search in Twitter.
Is there any way to query Twitter without authentication from Command line in Linux?
What other method do you think I can use to retrieve a text (the valid URL) from Command Line in Linux? The important is that one person shares the valid link, and other persons using the script can use the valid link.
This is the project where I need that feature: https://registry.hub.docker.com/u/angoca/db2-install/dockerfile/

You can try http://tieba.baidu.com, where you can post anythng in a specific topic. Find a topic that no one cares and post anything you like there. Then you can just retrieve the latest post.

What I did is to create a page in github's wiki and put the link there.
From the script, I get the page via curl, and retrieve the last line that contains the valid link.
https://github.com/angoca/db2-docker/wiki/db2-link
https://github.com/angoca/db2-docker/blob/master/install/10.5/expc/download

How to stop Site from Scraping my site

I have this songs site what ever data it has same is being displayed in other site
even if i echo "hello" same is done on other site does any body know how can i prevent that
just getting in more depth i found out that site is using file_get_contents() how can i prevent him from doing that

Well, you can try to dermine their IP address and block it

You said file_get_contents was being used.
A URL can be used as a filename with this function if the fopen wrappers have been enabled. See fopen() for more details on how to specify the filename. See the Supported Protocols and Wrappers for links to information about what abilities the various wrappers have, notes on their usage, and information on any predefined variables they may provide.
To disable them, more information is at http://www.php.net/manual/en/filesystem.configuration.php#ini.allow-url-fopen
Edit: If they go and use CURL or an equivalent after this, try and mess with their script by changing the HTML layout, etc. If that doesn't help, try and locate the IP of the script host, and make it return nonsense ;)
Edit2: If they use an iframe use javascript to redirect on iframe detection

Or you can even generate rubbish information just for that crawler, just to mess the "clone" site.
The first question to be answered is: Have you identified the crawler getting the information from your site?
If so, then you can give anything you want to this process: Nothing (ignore / block), a message telling the owners to stop getting your information, give them back rubbish contents, ...
Anyway, the first step is doing things properly. Be sure that you site has a "robots.txt" with the accepted policy for crawlers.

How to logout of an HTTP authentication (htaccess) that works in Google Chrome?

I got a solution for Firefox and IE but I didn't find any solution for Google Chrome.
Is there a way to do it in Google Chrome?

I know it's a really old post... I mean like friggin 5 years now, but I just found a somewhat good solution.
Inside your protected folder, create another folder, let's call it "logout". Place the same .htaccess file in here as you have in your protected folder, except with a small modification.
instead of:
Require valid-user
now write:
Require user EXIT
And make sure, you don't have a user named exit! :D
In your protected area, your logout link or button or whatever, should redirect the user to this address: example.com/protectedFolder/logout
The browsers usually are able to keep only one user logged in from one site name or realm name... the sign in attempt for the user Exit will overwrite everything, thus the originally logged in user, would have to log in again to the protected area.
But as always, I might be wrong, and you should still close all your browser window, and restart the computer if you want to be sure! :)
Also, it wouldn't hurt, if you would tell your users what is going to happen, when they hit logout!
I have tested this in chrome and in internet explorer 11.(will not work in edge, and maybe others neither)
The solution was found here:
https://www.mavensecurity.com/media/BasicAuthLogOut.pdf

You can't logout a HTTP authenticated session other then closing the browser window. Also see the accepted answer on this question for an extensive explanation.

try redirect to:
wrong_user:wrong_password#yourdomain.com

I have put together the following article which explains how I have managed to achieve this in Chrome. I hope this helps. https://www.hattonwebsolutions.co.uk/articles/how_to_logout_of_http_sessions
In short - you create a sub folder (as per Gyula's answer) but then send an ajax request to the page (which fails) and then trigger a timeout redirect to the logged out page. This avoids having a secondary popup in the logout folder requesting another username (which would confuse users). My article uses Jquery but it should be possible to avoid this.

Post Username & Password To Protected Folder/Site

I'm trying to post a username & password from an HTML form to a protected folder on a website? Is this possible? I thought I just pass in a syntax in the URL like the below but not having any success
http://username:password#theurlofthesite.co.uk
I'm still getting the alert pop up asking for the username and password? I need to be able to auto log the person in..
Hope someone can help? Thanks

If you login via a HTML form, then this won't work. This is only for HTTP authentication, which is something else completely different.
I don't think many (any?) browsers support being opened to post data. Which leaves you hoping that the site accepts GET based logins (and they should be shot if they do.).

The address part of the URL is parsed by your web server, so the code which handles the HTML form never sees it.
If you want to pass parameters to a form, you must use url?field=value&field2=value2. This only works with forms that use the GET action. For POST, you need a program to generate an encoded document and upload that.
In both cases, your user name and password are broadcasted as plain text on the Internet, so the account will be hacked within a few hours. To put it more clearly: There is no way to "protect" the data in this folder this way. This is like adding a door with four locks to your house and keep the keys on a nail in a post on the street next to the door.

I did exactly what I did in the question and it works on all browser except Safari on a Mac

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string