How to grab different sort of images from it's src using wget? - linux

This is an example image src. I want to save this image using wget. How to do that?
http://lp.hm.com/hmprod?set=key[source],value[/environment/2013/2BV_0002_007R.jpg]&set=key[rotate],value[-0.1]&set=key[width],value[3694]&set=key[height],value[4319]&set=key[x],value[248]&set=key[y],value[354]&set=key[type],value[FASHION_FRONT]&hmver=0&call=url[file:/product/large]

wget -L "http://lp.hm.com/hmprod?set=key[source],value[/environment/2013/2BV_0002_007R.jpg]&set=key[rotate],value[-0.1]&set=key[width],value[3694]&set=key[height],value[4319]&set=key[x],value[248]&set=key[y],value[354]&set=key[type],value[FASHION_FRONT]&hmver=0&call=url[file:/product/large]" -O zz.jpg
Providing quotes to your link to be downloaded is very essential. This link in particular has many special character capable of screwing things up.

Related

How to convert markdown to pdf in command line

I need to convert the GitHub README.md file to pdf. I tried many modules, those are not working fine. Is there any new tool to get the exact pdf format. In this website is providing good conversion format of pdf. http://www.markdowntopdf.com/
I need command line tool like this format.
Try this software:
https://github.com/BlueHatbRit/mdpdf
Or explain what tools you've tried and why those are not working fine.
Also check this question on superuser:
https://superuser.com/questions/689056/how-can-i-convert-github-flavored-markdown-to-a-pdf
Pandoc
I've personally liked using pandoc as it support a wide range of input and output formats.
Installation
Pandoc is available in most repositories: sudo apt install pandoc
Usage
Sometimes, pandoc can tell the formats to use which makes converting easy. However, I find that this often interprets the input format as plain text which might not be what you want:
pandoc README.md -o README.pdf
Instead, you might want to be explicit about the input/output formats to ensure a better conversion. In the below case, I'm specifically claiming the README.md is in Github-Flavored Markdown:
pandoc --from=gfm --to=pdf -o README.pdf README.md
Again, there are quite a few different formats and options to choose from but to be honest, the basics suffice for the majority of my needs.
I found md-to-pdf very useful.
Examples:
– Convert ./file.md and save to ./file.pdf
$ md-to-pdf file.md
– Convert all markdown files in current directory
$ md-to-pdf ./*.md
– Convert all markdown files in current directory recursively
$ md-to-pdf ./**/*.md
– Convert and enable watch mode
$ md-to-pdf ./*.md -w
And many more options.

how to print a text/plain document in CUPS printer without using raw option

I am using a CUPs command to print the pages of documents,But it is printing all the pages ignoiring the pages option. After some investigation I came to know raw option is overwriting the pages option , Please tell me how to print the pages without using raw option ,If I am not using this option , text file not supporting error is coming ,Here is my code :
system("lpr -P AFSCMSRPRNT3 -o pages=1,2,6 -o raw -T test_womargin abc.txt"
Plain text files don't really specify how things should be printed, and thus aren't allowed.
Try to convert the text to any usable format first. There's a popular tool a2ps which should be available for every linux distribution in the world. Try that!
EDIT you seem to be confused by the word "convert":
What I meant is that instead of printing the text file, you print a postscript file generated form that; something that you can get by doing something like
a2ps -o temporaryoutput.ps input.txt
and then
lpr -P AFSCMSRPRNT3 -o pages=1,2,6 -T test_womargin temporaryoutput.ps

How to handle special characters in a wget download link?

I have a link like this:
wget --user=user_nm --http-password=pass123 https://site.domain.com/Folder/Folder/page.php?link=/Folder/Folder/Csv.Stock.php\&namefile=STOCK.Stock.csv
But while the password authorization is fine, wget still cannot process the link. Why?
The safest way when handling a link from e.g. a browser is to use single quotes (') to quote the whole link string. That way the shell will not try to break it up, without you having to manually escape each special character:
wget --user=user_nm --http-password=pass123 'https://site.domain.com/Folder/Folder/page.php?link=/Folder/Folder/Csv.Stock.php&namefile=STOCK.Stock.csv'
Or, for a real example:
wget --user-agent=firefox 'https://www.google.com/search?q=bash+shell+singl+quote&ie=utf-8&oe=utf-8&aq=t&rls=org.mageia:en-US:official&client=firefox-a#q=bash+single+quote&rls=org.mageia:en-US:official'
Keep in mind that server-side restrictions might make using wget like this quite hard. Google, for example, forbids certain user agent strings, hence the --user-agent option above. Other servers use cookies to maintain session information and simply feeding a link to wget will not work. YMMV.

How do I move files by looking at part of a file name

My web application creates multiple image thumbnail files when users upload images.
I want to separate original images and thumbnail images. Thumbnail images contain 'crop-smart' in their file name.
For example, original image is watermelon.jpg, then thumbnail's name is watermelon_jpg_120x120_crop-smart.jpg.
How do I find by say 'crop-smart' and either move them to the different folder or delete them?
Standard file globbing will do this, the exact details may vary depending out which shell you are running but for your exact problem, it should be the same:
mv -- *_crop-smart.jpg /path/to/new/folder/
(This will also work if you have spaces in the filename)
Note the -- signals to mv that no more option switches will follow, so even if filenames look like options, mv won't get confused.

How to download images from "wikimedia search result" using wget?

I need to mirror every images which appear on this page:
http://commons.wikimedia.org/w/index.php?title=Special:Search&ns0=1&ns6=1&ns12=1&ns14=1&ns100=1&ns106=1&redirs=0&search=buitenzorg&limit=900&offset=0
The mirror result should give us the full size images, not the thumbnails.
What is the best way to do this with wget?
UPDATE:
I update the solution below.
Regex is your friend my friend!
Using cat, egrep and wget youll get this task done pretty fast
Download the search results URI wget, then run
cat DownloadedSearchResults.html | egrep (?<=class="searchResultImage".+href=").+?\.jpg/
That should give you http://commons.wikimedia.org/ based links to each of the image's web page. Now, for each one of those results, download it and run:
cat DownloadedSearchResult.jpg | egrep (?<=class="fullImageLink".*href=").+?\.jpg
That should give you a direct link to the highest resolution available for that image.
Im hoping your bash knowledge will do the rest. Good Luck.
Came here with the same problem .. found this >> http://meta.wikimedia.org/wiki/Wikix
I don't have access to a linux machine now, so I didn't try it yet.
It is quite difficult to write all the script in stackoverflow editor, you can find the script at the address below. The script only downloads all images at the first page, you can modify it to automate download process in another page.
http://pastebin.com/xuPaqxKW

Resources