Downloading most recent file using curl - linux

I need to download most recent file based on the file creation time from the remote site using curl. How can I achieve it?
These are files in remote site
user-producer-info-etl-2.0.0-20221213.111513-53-exec.jar
user-producer-info-etl-2.0.0-20221212.111513-53-exec.jar
user-producer-info-etl-2.0.0-20221214.111513-53-exec.jar
user-producer-info-etl-2.0.0-20221215.111513-53-exec.jar
Above user-producer-info-etl-2.0.0-20221215.111513-53-exec.jar is the most recent file that I want to download? How can I achieve it?

Luckily for you, file names contains dates that are alphabetically sortable !
I don't know where you are so I'm guessing you have at least a shell and I propose this bash answer:
First get the last file name
readonly endpoint="https://your-gitlab.local"
# Get the last filename
readonly most_recent_file="$(curl -s "${endpoint}/get_list_uri"|sort|tail -n 1)"
# Download it
curl -LOs curl "${endpoint}/get/${most_recent_file}"
You will obviously need to replace urls accordingly but I'm sure you get the idea
-L : follow HTTP 302 redirects
-O : download file to local dir keeping the name as is
-s : silent don't show network times and stuff
you can also specify another local name with -o <the_most_recent_file>
for more info:
man curl
hth

Related

Problem running bitbucket rest api command using python

I,am building a script to update files on Bitbucket using the rest api.
My problems are:
Running the command using subprocess lib and running the command directly on the command line gives two different results.
If I run the command using the command line, when I inspect my commits on the Bit bucket app I can see a Commit message and a Issue.
If I run the command using the help of the subprocess lib I don't have a commit message and a Issue in the end. The commit message sets itself by default to "edited by bitbucket" and the issue is null.
This is the command:
curl -X PUT -u user:pass -F content=#conanfile_3_f62hu.py -F 'message= test 4' -F branch=develop -F sourceCommitId={} bitbucket_URL".format(latest_commit)
The other problem is that I need to pass a file to the content in order to update it.
If I pass it like above it works. The problem is that I am generating the file content as raw string and creating a temporary file with that content.
And when I pass the file as a variable, it does not get the content of the file.
My code:
content = b'some content'
current_dir = os.getcwd()
temp_file=tempfile.NamedTemporaryFile(suffix=".py",prefix="conanfile", dir=current_dir)
temp_file.name = temp_file.name.split("\\")
temp_file.name = [x for x in temp_file.name if x.startswith("conanfile")][0]
temp_file.name = "#" + temp_file.name
temp_file.write(content)
temp_file.seek(0)
update_file_url = "curl -X PUT -u user:pass -F content={} -F 'message=test 4' -F branch=develop -F sourceCommitId={} bitbucket_url".format(temp_file.name, latest_commit)
subprocess.run(update_file_url)
Basically I'am passing the file like before, just passing the name to the content, but it does not work.
If I print the command everything looks good, so I don't know why the commit message does not get set and the file content as well.
Updated:
I was able to pass the file, My mistake was that I was not passing it like temp_file.name.
But I could not solve the problem of the message.
What I found is that the message will only take the first word. If there is a space and one more word after, it will ignore it.
The space is causing some problem.
I found the solution, if someone found himself with this problem we need to use a \ before the message= .
Example: '-F message=\" Updated with latest dependencies"'

How to get the filename downloaded via wget after a redirect

I'm currently retrieving a file served after a redirection with the --content-disposition parameter, so the filename is the right filename (after redirection)
Now how can I retrieve the filename for future use in my shell script?
The only direct way in the HTTP Spec for getting the filename is the Content-Disposition header. In the absence of that header, the client will usually deduce the name of the file based on the request URI.
In the case of Wget (assuming no Content-Disposition header exists), it will save the file with the name as mentioned in the URI of the original request. For example, if you invoke Wget with http://example.com/afile which redirects you to http://example.com/bfile, then the saved file will be called afile. This is a security measure to prevent a malicious server from overwriting other important files in your current directory, e.g. your .bashrc.
You can disable this behaviour with the --trust-server-names option, in which case it will save the file with the name bfile.
And then there is content-disposition. If it is enabled and the header exists, it will be used to name the file.
All this to say that the final name of the file is a little difficult to gauge. The easiest way is to save the file with -O filename, so you know the exact name of the file. If you don't want to do that, then the simplest option would be to invoke wget with the -nv option which outputs a line like this:
% wget -nv example.com
2019-04-20 10:43:48 URL:http://example.com/ [1270/1270] -> "index.html" [1]
You can parse this output in order to get the name of the downloaded file.

Downloading json file from json file curl

I have a json file with the structure seen below:
{
url: "https://mysite.com/myjsonfile",
version_number: 69,
}
This json file is accessed from mysite.com/myrootjsonfile
I want to run a load data script to access mysite.com/myrootjsonfile and load the json content from the url field using curl and save the resulting content to local storage.
This is my attempt so far.
curl -o assets/content.json 'https://mysite.com/myrootjsonfile' | grep -Po '(?<="url": ")[^"]*'
unfortunately, instead of saving the content from mysite.com/myjsonfile its saving the content from above: mysite.com/myrootjsonfile. Can anyone point out what i might be doing wrong? Bear in mind in a completely new to curl. Thanks!
It is saving the content from myrootjsonfile because that is what you are telling curl to do - to save that file to the location assets/content.json, and then greping stdin, which is empty. You need to use two curl commands, one to download the root file (and process it to find the URL of the second), and the second to download the actual content you want. You can use command substitution for this:
my_url=$(curl https://mysite.com/myrootjsonfile | grep -Po '(?<=url: )[^,]*')
curl -o assets/content.json "$my_url"
I also changed the grep regex - this one matches a string of non-comma characters which follow after "url: ".
Assuming you wished to save the file to assets/content.json, note that flags are case sensitive.
Use -o instead of -O to redirect the output to assets/content.json.

I get a scheme missing error with cron

when I use this to download a file from an ftp server:
wget ftp://blah:blah#ftp.haha.com/"$(date +%Y%m%d -d yesterday)-blah.gz" /myFolder/Documents/"$(date +%Y%m%d -d yesterday)-blah.gz"
It says "20131022-blah.gz saved" (it downloads fine), however I get this:
/myFolder/Documents/20131022-blah.gz: Scheme missing (I believe this error prevents it from saving the file in /myFolder/Documents/).
I have no idea why this is not working.
Save the filename in a variable first:
OUT=$(date +%Y%m%d -d yesterday)-blah.gz
and then use -O switch for output file:
wget ftp://blah:blah#ftp.haha.com/"$OUT" -O /myFolder/Documents/"$OUT"
Without the -O, the output file name looks like a second file/URL to fetch, but it's missing http:// or ftp:// or some other scheme to tell wget how to access it. (Thanks #chepner)
If wget takes time to download a big file then minute will change and your download filename will be different from filename being saved.
In my case I had it working with the npm module http-server.
And discovered that I simply had a leading space before http://.
So this was wrong " http://localhost:8080/archive.zip".
Changed to working solution "http://localhost:8080/archive.zip".
In my case I used in cpanel:
wget https://www.blah.com.br/path/to/cron/whatever

How do I pull image links from a website and download them using wget?

I really want to download images from a website, but I don't know a lot of wget to do so. They host the images on a seperate website, how I do pull the image link from the website using cat or something, so I could use wget to download them all. All I know is the wget part. Example would be Reddit.com
wget -i download-file-list.txt
Try this:
wget -r -l 1 -A jpg,jpeg,png,gif,bmp -nd -H http://reddit.com/some/path
It will recurse 1 level deep starting from the page http://reddit.com/some/path, and it will not create a directory structure (if you want directories, remove the -nd), and it will only download files ending in "jpg", "jpeg", "png", "gif", or "bmp". And it will span hosts.
I would use the perl module WWW::Mechanize. The following dumps all links to stdout:
use WWW::Mechanize;
$mech = WWW::Mechanize->new();
$mech->get("URL");
$mech->dump_links(undef, 'absolute' => 1);
Replace URL with the actual url you want.

Resources