Downloading json file from json file curl - linux

I have a json file with the structure seen below:
{
url: "https://mysite.com/myjsonfile",
version_number: 69,
}
This json file is accessed from mysite.com/myrootjsonfile
I want to run a load data script to access mysite.com/myrootjsonfile and load the json content from the url field using curl and save the resulting content to local storage.
This is my attempt so far.
curl -o assets/content.json 'https://mysite.com/myrootjsonfile' | grep -Po '(?<="url": ")[^"]*'
unfortunately, instead of saving the content from mysite.com/myjsonfile its saving the content from above: mysite.com/myrootjsonfile. Can anyone point out what i might be doing wrong? Bear in mind in a completely new to curl. Thanks!

It is saving the content from myrootjsonfile because that is what you are telling curl to do - to save that file to the location assets/content.json, and then greping stdin, which is empty. You need to use two curl commands, one to download the root file (and process it to find the URL of the second), and the second to download the actual content you want. You can use command substitution for this:
my_url=$(curl https://mysite.com/myrootjsonfile | grep -Po '(?<=url: )[^,]*')
curl -o assets/content.json "$my_url"
I also changed the grep regex - this one matches a string of non-comma characters which follow after "url: ".

Assuming you wished to save the file to assets/content.json, note that flags are case sensitive.
Use -o instead of -O to redirect the output to assets/content.json.

Related

Downloading most recent file using curl

I need to download most recent file based on the file creation time from the remote site using curl. How can I achieve it?
These are files in remote site
user-producer-info-etl-2.0.0-20221213.111513-53-exec.jar
user-producer-info-etl-2.0.0-20221212.111513-53-exec.jar
user-producer-info-etl-2.0.0-20221214.111513-53-exec.jar
user-producer-info-etl-2.0.0-20221215.111513-53-exec.jar
Above user-producer-info-etl-2.0.0-20221215.111513-53-exec.jar is the most recent file that I want to download? How can I achieve it?
Luckily for you, file names contains dates that are alphabetically sortable !
I don't know where you are so I'm guessing you have at least a shell and I propose this bash answer:
First get the last file name
readonly endpoint="https://your-gitlab.local"
# Get the last filename
readonly most_recent_file="$(curl -s "${endpoint}/get_list_uri"|sort|tail -n 1)"
# Download it
curl -LOs curl "${endpoint}/get/${most_recent_file}"
You will obviously need to replace urls accordingly but I'm sure you get the idea
-L : follow HTTP 302 redirects
-O : download file to local dir keeping the name as is
-s : silent don't show network times and stuff
you can also specify another local name with -o <the_most_recent_file>
for more info:
man curl
hth

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

How to get the filename downloaded via wget after a redirect

I'm currently retrieving a file served after a redirection with the --content-disposition parameter, so the filename is the right filename (after redirection)
Now how can I retrieve the filename for future use in my shell script?
The only direct way in the HTTP Spec for getting the filename is the Content-Disposition header. In the absence of that header, the client will usually deduce the name of the file based on the request URI.
In the case of Wget (assuming no Content-Disposition header exists), it will save the file with the name as mentioned in the URI of the original request. For example, if you invoke Wget with http://example.com/afile which redirects you to http://example.com/bfile, then the saved file will be called afile. This is a security measure to prevent a malicious server from overwriting other important files in your current directory, e.g. your .bashrc.
You can disable this behaviour with the --trust-server-names option, in which case it will save the file with the name bfile.
And then there is content-disposition. If it is enabled and the header exists, it will be used to name the file.
All this to say that the final name of the file is a little difficult to gauge. The easiest way is to save the file with -O filename, so you know the exact name of the file. If you don't want to do that, then the simplest option would be to invoke wget with the -nv option which outputs a line like this:
% wget -nv example.com
2019-04-20 10:43:48 URL:http://example.com/ [1270/1270] -> "index.html" [1]
You can parse this output in order to get the name of the downloaded file.

How can I send a file's contents as a POST parameter using cURL?

I'm trying to use cURL to POST the contents of a file, as if I'd pasted that contents in to an html textarea. That's to say I don't want to upload the file, I just want a post parameter called foo to be filled with text from a file called bar.txt. bar.txt's contents may include newlines, quotes, and so on.
Is this possible?
Thanks.
Edit: I found out how to do it in the end:
curl --data-urlencode "foo#bar.txt" http://example.com/index.php
This will take the contents of the file bar.txt, url encode it, place the resultant string in a parameter called foo in a POST request of http://example.com/index.php.
I can't speak to whether the solutions others have suggested will work or not, but the one above seems like the best way.
You can by doing something like:
$ curl --data "foo:$(cat foo.txt)" http://localhost/yourfile.php
Note that you'll probably want to encode the file, as cacheguard said. To encode it in base64, just modify the previous command like this:
$ curl --data "foo:$(cat foo.txt | base64)" http://localhost/yourfile.php
You should encode/decode the content of your file (for instance by using the base64 command under Linux).
file foo.txt:
8<----------------------------
Hello World
I am a Secure Web Gateway
8<----------------------------
base64 foo.txt | base64 -d

curl with wildchars in url

I have file that ends by -comps.xml and has the following form:
http://some/url/<sha256sum>-<2 chars>-x86_64-comps.xml
sha256sum is alphanumeric string of 65 length.
For example:
http://some/url/0dae8d32824acd9dbdf7ed72f628152dd00b85e4bd802e6b46e4d7b78c1042a3-c6-x86_64-comps.xml
How I can download this file using curl?
I've found solution using wget:
wget --recursive --level=1 --no-parent --no-directories --accept '*-comps.xml' --directory-prefix=. http://some/url
Assuming that you already know the filename, to download the contents of the file, then simply use
curl -O http://some/url/0dae8d32824acd9dbdf7ed72f628152dd00b85e4bd802e6b46e4d7b78c1042a3-c6-x86_64-comps.xml
If you are looking to somehow predetermine the file name based on an SHA256 of the file's contents, then you will need to either already have access to these contents to be able to determine the SHA256 part of the URL, or to have access to an alternative source for this information.

Resources