Linux Bash: cURL - how to pass variables to the URL - linux

I want to do cURL GET-request. The following URL should be used:
https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi' -H 'Host: iant.toulouse.inra.fr' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' --data '__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand
At the end of the URL, I have some words, which I want to design as variables, so depending on the input, the URL is different and I then request another resource.
The end of the URL. $ab, $start, $end and $strand are the variables, all of them are Strings.
...2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand
I came across "urlencode" and I though of storing my URL as one big String in a variable and pass it to URL encode, but I am not sure, how to do it.
I tried this/I am searching for something like this:
#!bin/bash
[...]
cURL="https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi' -H 'Host: iant.toulouse.inra.fr' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' --data '__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"
# storing HTTP response code in variable response. Only if the
# reponse code is OK (200), we move on
response=$(curl -X HEAD -I --header 'Accept:txt/html' "https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB8jqwTM&__wb_main_menu=Genome&__wb_function="$location""|head -n1|awk '{print $2}')
echo "$response"
# getting information via curl request
if [ $response = 200 ] ; then
info=$(curl -G "$ (urlencode "$cURL")")
fi
echo $info
For my response-code checkup, the method of directly passing $location seems to work, but with more variables, I get an error (response code 100, whereas I get 200 with the code-checkup)
Do I have a general error in understanding curl/urlencode? What did I miss?
Thanks for you time and effort in advance :)
UPDATE
#!/bin/sh
# handling command-line input
file=$1
ecf=$2
# iterating through file and pulling out
# information for the GET- and POST-request
while read -r line
do
parent=$(echo $line | awk '{print substr($1,2,3)}')
start=$(echo $line | awk '{print substr($2,2,6)}')
end=$(echo $line | awk '{print substr($3,2,6)}')
strand=$(echo $line | awk '{print substr($4,2,1)}')
locus=$(echo $line | awk '{print substr($6,2,8)}')
# depending on $parent, the right insertion for the URL is generated
if [ $parent = "SMc" ] ; then
location="Genome"
ab="SMc"
elif [ $parent = "SMa" ] ; then
location="PrintPsyma"
ab="pSymA"
else [ $parent = "SMb" ]
location="PrintPsymb"
ab="pSymB"
fi
# building variables for curl content request
options=( --compressed)
headers=(
-H 'Host: iant.toulouse.inra.fr'
-H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0'
-H 'Accept: txt/html,application/xhtml+xml,application/xml;1=0.9,*/*;q=0.8'
-H 'Accept-Language: de,en-US;q=0.7,en;q=0.3'
-H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent'
-H 'Content-Type: application/x-www-form-urlencoded'
-H 'Connection: keep-alive'
-H 'Upgrade-Insecure-Requests: 1'
-H 'Pragma: no-cache'
-H 'Cache-Control: no-cache'
)
url='https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi'
ab=$(urlencode "${ab}")
start=$(urlencode "${start}")
end=$(urlencode "${end}")
strand=$(urlencode "${strand}")
data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"
# storing HTTP response code in variable response. Only if the
# reponse code is OK (200), we move on
response=$(curl -X HEAD -I --header 'Accept:txt/html' "https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB8jqwTM&__wb_main_menu=Genome&__wb_function="$location""|head -n1|awk '{print $2}')
echo "$response"
# getting information via curl request
if [ $response = 200 ] ; then
info=$(curl -G "${options[#]}" "${headers[#]}" --data "${data}" "${url}")
fi
echo $info
done < $file

You need to separate concepts. That string that you put in cURL variable is not a URL, it is URL + set of headers + parameters + one option for compression. They all are different things.
Define them separately like this:
url='https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi'
headers=(
-H 'Host: iant.toulouse.inra.fr'
-H 'User-Agent: ...'
-H 'Accept: ...'
-H 'Accept-Language: ...'
... other headers from your example ...
)
options=(
--compressed
)
data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"
And then run curl in this fashion:
curl -G "${options[#]}" "${headers[#]}" --data "${data}" "${url}"
This will expand to correct curl command.
About urlencode part: You need encode each of $ab, $start, $end and $strand separately. If you insert them in the string and then encode, then all special characters in that string like & and = will be encoded too, and those already encoded ones like %2F in your example will be encoded twice (will become %252F).
To keep the code tidy, you can encode them beforehand:
ab=$(urlencode "${ab}")
start=$(urlencode "${start}")
end=$(urlencode "${end}")
strand=$(urlencode "${strand}")
data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"
... or do it in a cumbersome way:
data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$(urlencode "${ab}").genomic&begin=$(urlencode "${start}")&end=$(urlencode "${end}")&strand=$(urlencode "${strand}")"
I hope this helps.

Related

How to delay curl request

I'd like to add sleep to this request so as not to stress the server with too many requests at a go. I've tried adding sleep but I don't get the expected behaviour. The help is appreciated.
xargs -I{} curl --location --request POST 'https://g.com' \
--header 'Authorization: Bearer cc' \
--header 'Content-Type: application/json' \
--data-raw '{
"c_ids": [
"{}"
]
}' '; sleep 5m' < ~/recent.txt
Escaping arbitrary strings into valid JSON is a job for jq.
If you don't have a particular reason to define the curl args outside your loop:
while IFS= read -r json; do
curl \
--location --request POST 'https://g.com' \
--header 'Authorization: Bearer cc' \
--header 'Content-Type: application/json' \
--data-raw "$json"
sleep 5m
done < <(jq -Rc '{"c_ids": [ . ]}' recent.txt)
...or if you do:
curl_args=(
--location --request POST 'https://g.com' \
--header 'Authorization: Bearer cc' \
--header 'Content-Type: application/json' \
)
while IFS= read -r json; do
curl "${curl_args[#]}" --data-raw "$json"
sleep 5m
done < <(jq -Rc '{"c_ids": [ . ]}' recent.txt)
Putting the sleep in the xargs is a bit wonky. I advise the following approach which is more likely to supply the desired result.
#!/bin/sh
siteCommon="--location --request POST 'https://g.com' \
--header 'Authorization: Bearer cc' \
--header 'Content-Type: application/json' "
while read -r line
do
eval curl ${siteCommon} --data-raw \'{ \"c_ids\": [ \"${line}\" ] }\'
sleep 5m
done < ~/recent.txt

How to send curl request with post data imported from a file

I have a curl command like below which works fine and I get the response back. I am posting json data to an endpoint which gives me response back after hitting it.
curl -v 'url' -H 'Accept-Encoding: gzip, deflate, br' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'Connection: keep-alive' -H 'DNT: 1' -H 'Origin: url' --data-binary '{"query":"\n{\n data(clientId: 1234, filters: [{key: \"o\", value: 100}], key: \"world\") {\n title\n type\n pottery {\n text\n pid\n href\n count\n resource\n }\n }\n}"}' --compressed
Now I am trying to read the binary data from temp.json file outside but somehow it doesn't work and I get an error -
curl -v 'url' -H 'Accept-Encoding: gzip, deflate, br' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'Connection: keep-alive' -H 'DNT: 1' -H 'Origin: url' --data-binary "#/Users/david/Downloads/temp.json" --compressed
I have stored json in below temp.json file -
{
data(clientId: 1234, filters: [{key: "o", value: 100}], key: "world") {
title
type
pottery {
text
pid
href
count
resource
}
}
}
This is the error I am getting -
.......
* upload completely sent off: 211 out of 211 bytes
< HTTP/1.1 500 Internal Server Error
< date: Fri, 28 May 2021 23:38:12 GMT
< server: envoy
< content-length: 0
< x-envoy-upstream-service-time: 1
<
* Connection #0 to host url left intact
* Closing connection 0
Is there anything wrong in my above curl command?
Update
If I copy the exact same content in the temp.json file that I have in my original curl with \n then it works fine. So looks like that is the issue.
It means I need to find a way to convert new lines to \n manually from temp.json before sending the curl request or is there any other way?

How to give Form-param, both values and files in a http request in python

I wanna make a http request as shown in the below curl command:
curl -X PUT \
https://anypoint.mulesoft.com/cloudhub/api/v2/applications/highfiles \
-H 'authorization: Bearer XXX' \
-H 'cache-control: no-cache' \
-H 'content-length: 0' \
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
-H 'host: anypoint.mulesoft.com' \
-H 'postman-token: XXX' \
-H 'x-anypnt-env-id: XXX' \
-H 'x-anypnt-org-id: XXX' \
-F 'appInfoJson={
"muleVersion": {
"version": "3.8.5"
},
"properties":{"env":"dev"}
}'
I have tried the below request but all vain
files = {'file': open('C:\Users\\highfiles.zip', 'rb')}
appInfoJson1 = {
"muleVersion": {
"version": "3.8.5"
},
"properties": {"env":"dev1"}
}
print dict(appInfoJson=appInfoJson1)
headers = {"X-ANYPNT-ORG-ID": "XXXX",
"X-ANYPNT-ENV-ID": "XXXX",
"Authorization": "Bearer " + access_token,
}
response = requests.put("https://anypoint.mulesoft.com/cloudhub/api/v2/applications/highfiles",
data=dict(appInfoJson=appInfoJson1) , files=files, headers = headers)
How do I give a form-Param values and file in a python http request.
I was doing it wrong.
The change was only w.r.t handling the dict values, shown below:
response = requests.put("https://anypoint.mulesoft.com/cloudhub/api/v2/applications/highfiles",
data=dict(appInfoJson=appInfoJson1.values()) , files=files, headers = headers)

GET request returns different JSON contents

I am crawling some data by using Scrapy. Every time I open product detail on browser and check this request was requested by browser always returned the same correct content without character '?????'
But if I open the request above on browser then It returned correct content about 10 times. Then, It returned wrong content by adding character '?????'
Can you explain why does this problem happen? And how to make Scrapy act as real browser?
This is correct content
{"itemid": 43369300, "liked": false, "offer_count": 6, "videos": [], "image": "41dabd8fe9b7cbc2ab30501592f65a80", "image_list": ["41dabd8fe9b7cbc2ab30501592f65a80", "91bf75885fffd2b1fbcc55099457bc22", "f4516bb9667f8329f031ff75896a71fd", "d2639a1ffe75912873de6d8e011dc0dd", "38d00637b021e1701542a6afa7ae58f3", "10ab99e3bd211bd4dd63993555d6454b"].....
And this is wrong content
{"itemid": 43369300, "liked": false, "offer_count": 10, "videos": [], "rating_star": 4.069458216402549, "image": "41dabd8fe9?????????????????????", "image_list": ["41dabd8fe9?????????????????????", "91bf75885f?????????????????????", "f4516bb966?????????????????????", "d2639a1ffe?????????????????????", "38d00637b0?????????????????????", "10ab99e3bd?????????????????????"].....
You can test with other requests request1, request2,...
The issue may be because you are hitting the API directly and they are preventing scraping. If I hit the below URL using curl and extra headers 10-15 times, it works fine
curl 'https://xxxx.vn/api/v0/shop/6088300/item/43369300/shipping_info_to_address/?state=H%C3%A0%20N%E1%BB%99i&city=Huy%E1%BB%87n%20Ba%20V%C3%AC&district=' \
-H 'Pragma: no-cache' \
-H 'DNT: 1' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Accept-Language: en-US,en;q=0.8' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36' \
-H 'X-API-SOURCE: pc' \
-H 'Accept: */*' \
-H 'Cache-Control: no-cache' \
-H 'X-Requested-With: XMLHttpRequest' \
-H 'Referer: https://xxx.vn/H%E1%BB%99p-%C4%91%E1%BB%B1ng-gi%C3%A0y-trong-su%E1%BB%91t-theo-d%C3%B5i-c%C3%B3-gi%C3%A1-t%E1%BB%91t-i.6088300.43369300' \
--compressed
So I think 4 important headers that you should send are below
'X-Requested-With: XMLHttpRequest'
'X-API-SOURCE: pc'
'Referer: https://xxx.vn/H%E1%BB%99p-%C4%91%E1%BB%B1ng-gi%C3%A0y-trong-su%E1%BB%91t-theo-d%C3%B5i-c%C3%B3-gi%C3%A1-t%E1%BB%91t-i.6088300.43369300'
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36'
Send these headers while creating the Request in Scrapy

Not Authorized to create vertexes in IBM Graph

Tried to enter vertexes:
curl -X 'POST' -d '{"vertexLables": [{"name": "event"},{"name": "category"}]}' -H 'content-type: application/json' -H 'authorization: gds-token yyyy https://ibmgraph-alpha.ng.bluemix.net/zzzz/g/schema
{"code":"NotAuthorized","message":""}

Resources