Curl the results of a grep? - linux

Here's how I'm grepping for lines:
grep "random text" /
I want to curl based on the text found. This is what I've tried:
grep "random text" / | curl http://example.com/test.php?text=[TEXT HERE]
What I don't understand how to do is use the results of grep while curling. How can I replace [TEXT HERE] which the results of my grep so it's getting the correct url?

Passing all results from grep in one request:
curl --data-urlencode "text=$(grep PATTERN file)" "http://example.com/test.php"
One request per grep result:
Use a while loop in combination with read:
grep PATTERN file | while read -r value ; do
curl --data-urlencode "text=${value}" "http://example.com/test.php"
done

grep 'random text' file | xargs -I {} curl 'http://example.com/test.php?text={}'

Put the output of grep in a variable, and put that variable in place of [TEXT HERE]
output=$(grep "random text" filename)
curl "http://example.com/test.php?text=$output"
The quotes are important, since ? has special meaning to the shell and $output might contain whitespace or wildcard characters that would otherwise be processed.
If $output can contain special URL characters, you need to URL-encode it. See How to urlencode data for curl command? for various ways to do this.

Related

Curl/sed command not processing inputs correctly

I'm having some trouble with getting this to do what I want it to do.
read -p "URL to read: " U
read -p "Word to fin: " O
read -p "Filename: " F
curl -O $U | sed "s/\<$O\>/\*$O\*/g" > $F.txt
So basically what I want is to use curl to get a .txt file from a url, then sort through it to find the word specified by the user input. Then mark all those words with a * and put them in a file specified by the user.
Almost the exact same code works in Linux, but this doesn't work on my Mac. Anyone got an idea?
Two issues:
-O makes curl store the downloaded file, not output it on stdout.
word boundary metacharacters \< and \> are a GNU extension. On BSD sed, you can use [[:<:]] and [[:>:]] instead.
This should work on OSX:
curl "$U" | sed "s/[[:<:]]$O[[:>:]]/\*$O\*/g" > $F.txt

Trimming string up to certain characters in Bash

I'm trying to make a bash script that will tell me the latest stable version of the Linux kernel.
The problem is that, while I can remove everything after certain characters, I don't seem to be able to delete everything prior to certain characters.
#!/bin/bash
wget=$(wget --output-document - --quiet www.kernel.org | \grep -A 1 "latest_link")
wget=${wget##.tar.xz\">}
wget=${wget%</a>}
echo "${wget}"
Somehow the output "ignores" the wget=${wget##.tar.xz\">} line.
You're trying remove the longest match of the pattern .tar.xz\"> from the beginning of the string, but your string doesn't start with .tar.xz, so there is no match.
You have to use
wget=${wget##*.tar.xz\">}
Then, because you're in a script and not an interactive shell, there shouldn't be any need to escape \grep (presumably to prevent usage of an alias), as aliases are disabled in non-interactive shells.
And, as pointed out, naming a variable the same as an existing command (often found: test) is bound to lead to confusion.
If you want to use command line tools designed to deal with HTML, you could have a look at the W3C HTML-XML-utils (Ubuntu: apt install html-xml-utils). Using them, you could get the info you want as follows:
$ curl -sL www.kernel.org | hxselect 'td#latest_link' | hxextract a -
4.10.8
Or, in detail:
curl -sL www.kernel.org | # Fetch page
hxselect 'td#latest_link' | # Select td element with ID "latest_link"
hxextract a - # Extract link text ("-" for standard input)
Whenever I need to extract a substring in bash I always see if I can brute force it in a couple of cut(1) commands. In your case, the following appears to work:
wget=$(wget --output-document - --quiet www.kernel.org | \grep -A 1 "latest_link")
echo $wget | cut -d'>' -f3 | cut -d'<' -f1
I'm certain there's a more elegant way, but this has simple syntax that I never forget. Note that it will break if 'wget' gets extra ">" or "<" characters in the future.
It is not recommended to use shell tools grep, awk, sed etc to parse HTML files.
However if you want a quick one liner then this awk should do the job:
get --output-document - --quiet www.kernel.org |
awk '/"latest_link"/ { getline; n=split($0, a, /[<>]/); print a[n-2] }'
4.10.8
sed method:
wget --output-document - --quiet www.kernel.org | \
sed -n '/latest_link/{n;s/^.*">//;s/<.*//p}'
Output:
4.10.8

Search Text with Linebreaks recursiv in a directory?

I Have many large logfiles which are looks like that:
DATETIME ["2015-03-03 21:52"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","BBB","TEST1"]
DATETIME ["2015-03-03 21:53"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","CCC"]
POST ["POST_JSON","DDD","TEST2"]
DATETIME ["2015-03-03 21:54"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","BBB","TEST3"]
DATETIME ["2015-03-03 21:55"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","EEE","TEST4"]
I want to search about 2 keywords (between them are linebreaks). one specific word in the GET-Line and one specific word in the POST-Line.
i need something like:
grep "GET(.*)AAA(.*)POST(.*)BBB"
what im searching for: AAA (in GET-Line) && BBB (In POST-Line)
the expected result:
POST ["POST_JSON","BBB","TEST1"]
POST ["POST_JSON","BBB","TEST3"]
with which simple methods this is doable?
Using GNU awk for the 3rd arg to match():
$ find . -type f |
xargs gawk -v RS= 'match($0,/\nGET.*AAA.*\n(POST.*BBB.*)/,a){print a[1]}'
POST ["POST_JSON","BBB","TEST1"]
POST ["POST_JSON","BBB","TEST3"]
Add -v ORS='\n\n' if you really want a blank line between output lines.
grep is the command you are searching for
grep -rHn "GET.*KEYWORD_A" -A1 /path/to/files | grep "POST.*KEYWORD_B"
I would first grep for lines containing KEYWORD_A and append one line after the match since the POST comes after the GET in your logfiles. Then search for KEYWORD_B
-r greps recursively in a directory
-H prints the file name
-n prints the line number
i solved this with grep -P for Regular Expressions as i know it from PHP and particularly with -A to get the next n Lines. Then i filtered the result with "|" and grep -P again

convert bash `ls` output to json array

Is it possible to use a bash script to format the output of the ls to a json array? To be valid json, all names of the dirs and files need to be wrapped in double quotes, seperated by a comma, and the entire thing needs to be wrapped in square brackets. I.e. convert:
jeroen#jeroen-ubuntu:~/Desktop$ ls
foo.txt bar baz
to
[ "foo.txt", "bar", "baz" ]
edit: I strongly prefer something that works across all my Linux servers; hence rather not depend on python, but have a pure bash solution.
If you know that no filename contains newlines, use jq:
ls | jq -R -s -c 'split("\n")[:-1]'
Short explanation of the flags to jq:
-R treats the input as string instead of JSON
-s joins all lines into an array
-c creates a compact output
[:-1] removes the last empty string in the output array
This requires version 1.4 or later of jq. Try this if it doesn't work for you:
ls | jq -R '[.]' | jq -s -c 'add'
Yes, but the corner cases and Unicode handling will drive you up the wall. Better to delegate to a scripting language that supports it natively.
$ ls
あ a "a" à a b 私
$ python -c 'import os, json; print json.dumps(os.listdir("."))'
["\u00e0", "\"a\"", "\u79c1", "a b", "\u3042", "a"]
Hello you can do that with sed and awk:
ls | awk ' BEGIN { ORS = ""; print "["; } { print "\/\#"$0"\/\#"; } END { print "]"; }' | sed "s^\"^\\\\\"^g;s^\/\#\/\#^\", \"^g;s^\/\#^\"^g"
EDIT: updated to solve the problem with " and spaces. I use /# as replacement pattern for ", since / is not a valid character for filename.
Use perl as the encoder; it's guaranteed to be non-buggy, is everywhere, and with pipes, it's still reasonably clean:
ls | perl -e 'use JSON; #in=grep(s/\n$//, <>); print encode_json(\#in)."\n";'
Most of the Linux machine already has python. all you have to do is:
python -c 'import os, json; print json.dumps(os.listdir("/yourdirectory"))'
This is for . directory , you can add any path.
Here's a bash line
echo '[' ; ls --format=commas|sed -e 's/^/\"/'|sed -e 's/,$/\",/'|sed -e 's/\([^,]\)$/\1\"\]/'|sed -e 's/, /\", \"/g'
Won't properly deal with ", \ or some commas in the name of the file. Also, if ls puts newlines between filenames, so will this.
I was also searching for a way to output a Linux folder / file tree to some JSON or XML file. Why not use this simple terminal command:
$ tree --dirsfirst --noreport -n -X -i -s -D -f -o my.xml
so, just the linux tree command, and config your own parameters. Here -X gives XML output! For me, that's OK, and i guess there's some script to convert XML to JSON ..
NOTE: I think this covers the same question.
Personnaly, I would code script that would run the command ls, send the output to a file of you choice while parsing the output to make format it to a valid JSON format.
I'm sure that a simple Bash file will do the work.
Bash ouput
Can't you use a python script like this?
myOutput = subprocess.check_output["ls"]
output = ["+str(e)+" for e in myOutput]
return output
I didn't check if it works, but you can find the specification here
Should be pretty easy.
$ cat ls2json.bash
#!/bin/bash
echo -n '['
for FILE in $(ls | sed -e 's/"/\\"/g')
do
echo -n \"${FILE}\",
done
echo -en \\b']'
then run:
$ ./ls2json.bash > json.out
but python would be even easier
import os
directory = '/some/dir'
ls = os.listdir(directory)
dirstring = str(ls)
print dirstring.replace("'",'"')
Here's an elegant one-liner solution that doesn't rely on jq:
echo '[ "'"$(echo "$list" | sed ':a;N;$!ba;s/\n/", "/g')"'" ]'
$list here is a newline-separated string.
Using gnu column (i.e. doesn't work on OSX)
ls -ldG * --time-style=long-iso | column -t -n "$PWD" -N mod,links,user,size,date,time,name -J
Output :
{
"/home/pouet": [
{"mod":"-rwxr-xr-x", "links":"1", "user":"pouet", "size":"21978", "date":"2022-08-12", "time":"11:47", "name":"file1"},
{"mod":"-rw-r--r--", "links":"1", "user":"pouet", "size":"2634", "date":"2022-06-20", "time":"11:14", "name":"file2"}
]
}
Don't use bash, use a scripting language. Untested perl example:
use JSON;
my #ls_output = `ls`; ## probably better to use a perl module to do this, like DirHandle
print encode_json( #ls_output );

How to grep curl -I header information

I'm trying to get the redirect link from a site by using curl -I then grep to "location" and then sed out the location text so that I am left with the URL.
But this doesn't work. It outputs the URL to screen and doesn't put it
test=$(curl -I "http://www.redirectURL.com/" 2> /dev/null | grep "location" | sed -E 's/location:[ ]+//g')
echo "1..$test..2"
Which then outputs:
..2http://www.newURLfromRedirect.com/bla
What's going on?
As #user353852 points out, you have a carriage return character in you output from curl that is only apparent when you try to echo any character after it. The less pager shows this up as ^M
You can use sed to remove "control characters", like in this example:
% test=$(curl -I "http://www.redirectURL.com/" 2>|/dev/null | awk '/^Location:/ { print $2 }' | sed -e 's/[[:cntrl:]]//') && echo "1..${test}..2"
1..http://www.redirecturl.com..2
Notes:
I used awk rather than your grep [...] | sed approach, saving one process.
For me, curl returns the location in a line starting with 'Location:' (with a capital 'L'), if your version is really reporting it with a lowercase 'l', then you may need to change the regular expression accordingly.
the "Location" http header starts with a capital L, try replacing that in your command.
UPDATE
OK, I have run both lines separately and each runs fine, except that it looks like the output from the curl command includes some control chars which is being captured in the variable. When this is later printed in the echo command, the $test variable is printed followed by carriage return to set the cursor to the start of the line and then ..2 is printed over the top of 1..
Check out the $test variable in less:
echo 1..$test..2 | less
less shows:
1..http://www.redirectURL.com/^M..2
where ^M is the carriage return character.

Resources