File downloaded by curl but not by node.js - node.js

So I'm trying to download a file through nodejs that opens fine in the browser, and even downloads fine in tools like curl.
But nodejs just fails for some reason to download the file. I tried downloading the file through the request module in node and through a node cli module called download-cli. Both of them fail with either a 400 or 404 response yet the file downloads fine through regular tools like curl.
What could be the issue? I have tried setting the user-agent to that of Firefox (where it opens just fine) but that doesn't do the trick. I'm assuming the problem isn't about the user-agent anyway since curl doesn't have its own user-agent.
The url in question can be any url from alicdn but lets take this one as an example:
https://ae01.alicdn.com/kf/HTB1ftVmPVXXXXXUXVXXq6xXFXXXG/Langtek-smart-watch-gt12-часы-поддержка-синхронизации-notifier-sim-карты-подключение-bluetooth-для-android-apple-iphone.jpg_640x640.jpg
Here's the response by running the above url through the node download-cli tool and the Invoke-WebRequest tool in powershell.
PS C:\code> download https://ae01.alicdn.com/kf/HTB1ftVmPVXXXXXUXVXXq6xXFXXXG/Langtek-smart-watch-gt12-часы-поддержка-син
хронизации-notifier-sim-карты-подключение-bluetooth-для-android-apple-iphone.jpg_640x640.jpg
Couldn't connect to https://ae01.alicdn.com/kf/HTB1ftVmPVXXXXXUXVXXq6xXFXXXG/Langtek-smart-watch-gt12-часы-поддержка-синхронизации-notifier-sim-карты-подключение-bluetooth-для-android-apple-iphone.jpg_640x640.jpg (404)
PS C:\code> curl https://ae01.alicdn.com/kf/HTB1ftVmPVXXXXXUXVXXq6xXFXXXG/Langtek-smart-watch-gt12-часы-поддержка-синхрон
изации-notifier-sim-карты-подключение-bluetooth-для-android-apple-iphone.jpg_640x640.jpg
StatusCode : 200
StatusDescription : OK
Content : {255, 216, 255, 224...}
RawContent : HTTP/1.1 200 OK
X-Application-Context: fileserver2-download:prod:7001
From-Req-Dns-Type: NA,NA
SERVED-FROM: 72.247.178.95
Connection: keep-alive
Network_Info: DE_FRANKFURT_16509
Timing-Allow-Ori...
Headers : {[X-Application-Context, fileserver2-download:prod:7001], [From-Req-Dns-Type, NA,NA], [SERVED-FROM, 72.247.178.95],
[Connection, keep-alive]...}
RawContentLength : 114927

Okay so I tried downloading the file through node's native http module, I tried downloading through the popular request module AND I tried downloading through a node based cli tool called download-cli. Everyone of them had the same response.
So I fired up Wireshark and tried to see exactly where the requests are different and it turns out that tools like curl and Invoke-WebRequest escape the path before making a GET request but node's native module doesn't do that. That was the only difference. Using the escaped url works fine.
Invoke-WebRequest's GET path:
GET /kf/HTB1ftVmPVXXXXXUXVXXq6xXFXXXG/Langtek-smart-watch-gt12-%D1%87%D0%B0%D1%81%D1%8B-%D0%BF%D0%BE%D0%B4%D0%B4%D0%B5%D1%80%D0%B6%D0%BA%D0%B0-%D1%81%D0%B8%D0%BD%D1%85%D1%80%D0%BE%D0%BD%D0%B8%D0%B7%D0%B0%D1%86%D0%B8%D0%B8-notifier-sim-%D0%BA%D0%B0%D1%80%D1%82%D1%8B-%D0%BF%D0%BE%D0%B4%D0%BA%D0%BB%D1%8E%D1%87%D0%B5%D0%BD%D0%B8%D0%B5-bluetooth-%D0%B4%D0%BB%D1%8F-android-apple-iphone.jpg_640x640.jpg HTTP/1.1
Node's GET path:
GET /kf/HTB1ftVmPVXXXXXUXVXXq6xXFXXXG/Langtek-smart-watch-gt12-G0AK-?>445#6:0-A8=E#>=870F88-notifier-sim-:0#BK-?>4:;NG5=85-bluetooth-4;O-android-apple-iphone.jpg_640x640.jpg HTTP/1.1

why you didnt do it :
$url='https://ae01.alicdn.com/kf/HTB1ftVmPVXXXXXUXVXXq6xXFXXXG/Langtek-smart-watch-gt12-%D1%87%D0%B0%D1%81%D1%8B-%D0%BF%D0%BE%D0%B4%D0%B4%D0%B5%D1%80%D0%B6%D0%BA%D0%B0-%D1%81%D0%B8%D0%BD%D1%85%D1%80%D0%BE%D0%BD%D0%B8%D0%B7%D0%B0%D1%86%D0%B8%D0%B8-notifier-sim-%D0%BA%D0%B0%D1%80%D1%82%D1%8B-%D0%BF%D0%BE%D0%B4%D0%BA%D0%BB%D1%8E%D1%87%D0%B5%D0%BD%D0%B8%D0%B5-bluetooth-%D0%B4%D0%BB%D1%8F-android-apple-iphone.jpg_640x640.jpg'
Invoke-WebRequest -Uri $url -OutFile C:\temp\android-apple-iphone.jpg_640x640.jpg

Related

Node.js childprocess.execSync for curl request not returning error if curl request is 404 or 403

this might not be an issue per say but here is my problem : I'm using childprocess.execSync to execute curl request on pages I need the html content of. It works perfectly fine for valid pages but if the curl request to a page encounters pretty much any error code such as 404 or 403, then the result of the execSync is empty and I have no way to know what error code the curl encountered.
Is there any way to know the curl error code that happens during the childprocess.execSync ?
node.js version : 8.16.2
Well, I found a way, look like I just had to pass "-i" with my curl request to return the response headers too and not just the body, now I can just parse that and see the error code I got.

why calling curl from execSync in Node.js fails but directly run the exact-same command works?

I have come into a trouble that when using execSync in node.js, it's not working as directly type the command in the shell.
Here is my issue:
I use a curl to request for some data from a server, and I need to do that with a cookie because there is a login requirement.
It's easy to handle the login process and get the cookie, but it's weird that using the cookie with a curl in node.js would cause the server an "internal error". And since I don't have the permission to change the server-code, I'm looking for help about the difference of calling curl in Node.js and directly use curl.
Here is the code:
var command = 'curl --cookie cookie.txt ' + getURL();
console.log(command);
// output: curl --cookie cookie.txt http://example.com/getdata
var result = child_process.execSync(command).toString();
// will cause an internal error and the "result" is an error-reporting page.
Directly calling this in the shell:
curl --cookie cookie.txt http://example.com/getdata
Everything works, I got the data I need.
I tried to find some plots, for instance, changing the code to:
var command = 'curl --cookie cookie-bad.txt ' + getURL();
I put some wrong cookie in the cookie-bad.txt, I will get a "you are not log in" result.
So there must be something wrong with:
sending a cookie to the server to request some data with curl running inside a nodejs script with execSync.
Is there any way I can improve the code or something?
What is your Node.js version? I don't have any problem with 10.16.0.

Python script which access GitLab works on Windows but returns 'Project Not Found' on Windows Subsystem for Linux (WSL) - Used python requests

I have a python script which does a GET request to GitLab and stores the data from the response in an excel file using tablib library.
This script works fine in Windows when I execute it using python3.
I have tried to execute the same script in the Windows Subsystem for Linux (WSL) I have enabled and the script fails.
The output when I execute with python3 script.py in WSL is the following:
RESPONSE {"message":"404 Project Not Found"}
When I execute from Windows using python .\gitlab.py where python is python3:
RESPONSE [{"id":567,"iid":22}, {"id":10,"iid":3}]
I think the problem could be related to the GET api call I am doing because in WSL it returns Project Not Found.
I executed that request using curl in WSL to see if the unix in general has this issue, but I get back the expected response instead of the not found response. This was the request:
curl -X GET 'https://URL/api/v4/projects/server%2Fproducts%2FPROJECT/issues?per_page=100' -H 'Content-Type: application/json' -H 'PRIVATE-TOKEN: TOKEN' --insecure
Why is python failing in unix using Python if unix is able to execute the get request using curl? Should I enable/disable something in the request perhaps?
This is the request I am doing in my python script:
def get_items():
url = "https://URL/api/v4/projects/server%2Fproducts%2FPROJECT/issues"
payload = {}
querystring = {"state": "closed", "per_page": "100"}
headers = {
'Content-Type': "application/json",
'PRIVATE-TOKEN': os.environ.get("GITLAB_KEY") # enviromental variable added in windows
}
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
response = requests.request(
"GET", url, headers=headers, data=payload, params=querystring, verify=False)
print("RESPONSE " + response.text)
return json.loads(response.text)
UPDATE:
I have tried using the project id as well instead of the path but it didn't work
REF: https://docs.gitlab.com/ee/api/projects.html#get-single-project
GET /projects/:id
Change this:
url = "https://URL/api/v4/projects/server%2Fproducts%2FPROJECT/issues"
To
projectId = 1234 # or whatever your project id is ... Project Page, Settings -> General
url = "https://URL/api/v4/projects/" + projectId + "/issues"
Based on an answer I got in the post I did in Reddit, I found the problem.
In the python script, I am using an environmental variable which is not accessible in that way ( os.environ.get("GITLAB_KEY") ) from the WSL.
For now, I have replaced it with the hard-coded value just to check that this was really the issue. The script now works as expected.
I will find a way to access the env var again now that I know what the problem was.

GitLab API - Unable to access file which is within a directory

I have a GitLab project which is set up as follows:
myserver.com/SuperGroup/SubGroup/SubGroupProject
The tree of the following project is a top-level txt file and a txt file within a directory. I get the tree from the GitLab API with:
myserver.com/api/v4/projects/1/repository/tree?recursive=true
[{"id":"aba61143388f605d3fe9de9033ecb4575e4d9b69","name":"myDirectory","type":"tree","path":"myDirectory","mode":"040000"},{"id":"0e3a2b246ab92abac101d0eb2e96b57e2d24915d","name":"1stLevelFile.txt","type":"blob","path":"myDirectory/1stLevelFile.txt","mode":"100644"},{"id":"3501682ba833c3e50addab55e42488e98200b323","name":"top_level.txt","type":"blob","path":"top_level.txt","mode":"100644"}]
If I request the contents for top_level.txt they are returned without any issue via:
myserver.com/api/v4/projects/1/repository/files/top_level.txt?ref=master
However I am unable to access myDirectory/1stLevelFile.txt with any API call I try. E.g.:
myserver.com/api/v4/projects/1/repository/files/"myDirectory%2F1stLevelFile.txt"?ref=master
and,
myserver.com/api/v4/projects/1/repository/files/"myDirectory%2F1stLevelFile%2Etxt"?ref=master
Results in:
Not Found The requested URL /api/v4/projects/1/repository/files/myDirectory/1stLevelFile.txt was not found on this server.
Apache/2.4.25 (Debian) Server at myserver.com Port 443
myserver.com/api/v4/projects/1/repository/files/"myDirectory/1stLevelFile.txt"?ref=master and,
myserver.com/api/v4/projects/1/repository/files?ref=master&path=myDirectory%2F1stLevelFile.txt
Results in:
error "404 Not Found"
The versions of the components are:
GitLab 10.6.3-ee
GitLab Shell 6.0.4
GitLab Workhorse v4.0.0
GitLab API v4
Ruby 2.3.6p384
Rails 4.2.10
postgresql 9.6.8
According to my research there was a similar bug which was fixed with the 10.0.0 update.
I also added my ssh-key although I doubt it has any effect, following this advice with the same issue in php.
Solution:
I eventually solved it by adjusting the apache installed on the server.
Just follow these instructions: https://gitlab.com/gitlab-org/gitlab-ce/issues/35079#note_76374269
According to your code, I will go thinking you use curl.
If it is the case, why are you adding double quotes to your file path ?
The doc do not contains it.
Can you test it like that please ?
curl --request GET --header 'PRIVATE-TOKEN: XXXXXXXXX' myserver.com/api/v4/projects/1/repository/files/myDirectory%2F1stLevelFile%2Etxt?ref=master

wget and curl somehow modifying bencode file when downloading

Okay so I have a bit of a weird problem going on that I'm not entirely sure how to explain... Basically I am trying to decode a bencode file (.torrent file) now I have tried 4 or 5 different scripts I have found via google and S.O. with no luck (get returns like this in not a dictionary or output error from same )
Now I am downloading the .torrent file like so
wget http://link_to.torrent file
//and have also tried with curl like so
curl -C - -O http://link_to.torrent
and am concluding that there is something happening to the file when I download in this way.
The reason for this is I found this site which will decode a .torrent file you upload online to display the info contained in the file. However when I download a .torrent file by not just clicking on the link through a browser but instead using one of the methods described above it does not work either.
So Has anyone experienced a similar problem using one of these methods and found a solution to the problem or even explain why this is happening ?
As I can;t find much online about it nor know of a workaround that I can use for my server
Update:
Okay as was suggested by #coder543 to compare the file size of download through browser vs. wget. They are not the same size using wget style results in a smaller filesize so clearly the problem is with wget & curl not the something else .. idea's?
Updat 2:
Okay so I have tried this now a few times and I am narrowing down the problem a little bit, the problem only seems to occur on torcache and torrage links. Links from other sites seems to work properly or as expected ... so here are some links and my results from the thrre different methods:
*** differnet sizes***
http://torrage.com/torrent/6760F0232086AFE6880C974645DE8105FF032706.torrent
wget -> 7345 , curl -> 7345 , browser download -> 7376
*** same size***
http://isohunt.com/torrent_details/224634397/south+park?tab=summary
wget -> 7491 , curl -> 7491 , browser download -> 7491
*** differnet sizes***
http://torcache.net/torrent/B00BA420568DA54A90456AEE90CAE7A28535FACE.torrent?title=[kickass.to]the.simpsons.s24e12.hdtv.x264.lol.eztv
wget -> 4890 , curl-> 4890 , browser download -> 4985
*** same size***
http://h33t.com/download.php?id=cc1ad62bbe7b68401fe6ca0fbaa76c4ed022b221&f=Game%20of%20Thrones%20S03E10%20576p%20HDTV%20x264-DGN%20%7B1337x%7D.torrent
wget-> 30632 , curl -> 30632 , browser download -> 30632
*** same size***
http://dl7.torrentreactor.net/download.php?id=9499345&name=ubuntu-13.04-desktop-i386.iso
wget-> 32324, curl -> 32324, browser download -> 32324
*** differnet sizes***
http://torrage.com/torrent/D7497C2215C9448D9EB421A969453537621E0962.torrent
wget -> 7856 , curl -> 7556 ,browser download -> 7888
So I it seems to work well on some site but sites which really on torcache.net and torrage.com to supply files. Now it would be nice if i could just use other sites not relying directly on the cache's however I am working with the bitsnoop api (which pulls all it data from torrage.com so it's not really an option) anyways, if anyone has any idea on how to solve this problems or steps to take to finding a solution it would be greatly appreciated!
Even if anyone can reproduce the reults it would be appreciated!
... My server is 12.04 LTS on 64-bit architecture and the laptop I tried the actual download comparison on is the same
For the file retrieved using the command line tools I get:
$ file 6760F0232086AFE6880C974645DE8105FF032706.torrent
6760F0232086AFE6880C974645DE8105FF032706.torrent: gzip compressed data, from Unix
And sure enough, decompressing using gunzip will produce the correct output.
Looking into what the server sends, gives interesting clue:
$ wget -S http://torrage.com/torrent/6760F0232086AFE6880C974645DE8105FF032706.torrent
--2013-06-14 00:53:37-- http://torrage.com/torrent/6760F0232086AFE6880C974645DE8105FF032706.torrent
Resolving torrage.com... 192.121.86.94
Connecting to torrage.com|192.121.86.94|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Connection: keep-alive
Content-Encoding: gzip
So the server does report it's sending gzip compressed data, but wget and curl ignore this.
curl has a --compressed switch which will correctly uncompress the data for you. This should be safe to use even for uncompressed files, it just tells the http server that the client supports compression, but in this case curl does look at the received header to see if it actually needs decompression or not.

Resources