How do I substring piped output from grep in Linux? - linux

I'm trying to write a script to login to a Drupal website automagically to put it into maintenance mode. Here's what I have so far, and the grep gives me back the line I want.
curl http://www.drupalwebsite.org/?q=user | grep '<input type="hidden" name="form_build_id" id="form-[a-zA-Z0-9]*" value="form-[a-zA-Z0-9]*" />'
Now I'm kind of a Linux newbie, and I'm using Cygwin with BASH. How would I then pipe the output and use a command to get the value of the id attribute from the output that grep generated? I'll be using this substring later to do another curl request to actually submit the login.
I was looking at using expr but I don't really understand how I would tell expr "oh hey this stdin data I want to you manipulate in this way". It seems like the only way I could do this would be by saving off the grep output in a variable and then feeding the variable to expr.

Use sed to trim the results you get from your grep, ie.
edit : added myID variable, use any name you like.
myID=$(
curl http://www.drupalwebsite.org/?q=user \
| grep '<input type="hidden" name="form_build_id" id="form-[a-zA-Z0-9]*" value="form-[a-zA-Z0-9]*" />' \
| sed 's/^.* id="//;s/" value=.*$//'
)
#use ${myID} later in script
printf "myID=${myID}\n"
The first part removes the 'front' part of the string, everything up to the id=", while the 2nd part removes every " value= .....
Note that you can chain together multiple sub-replace actions in sed by separating them with the ';'.
edit2
Also, once you're using sed, there's no reason to use grep, try this:
myID=$(
curl http://www.drupalwebsite.org/?q=user \
| sed -n '\#<input type="hidden" name="form_build_id" id="form-[a-zA-Z0-9]*" value="form-[a-zA-Z0-9]*" />#{
s\#^.* id="##
s\#" value=.*$##p
}'
)
( It's a good habit to get into to removing unnecessary processes. It may not matter in this case, but if you get to where you are writing code that will be executed 1000's of time in a hour, then having an extra grep when you don't need it is creating 1000's of extra processes that don't need to be created.)
You may have to escape the '< and >' chars like '\< >' or , worst case '[<] [>]'.
I'm using the '#' as the reg-ex replacement separator now to avoid having to escape any '/' chars in the srch-target string. And I continue using it in the whole example, just to be consistent. For some seds you have tell them that you're using a non-standard separator, hence the leading \# at the front of each block of sed code.
The -n means "don't default print each line of input", and because of that, we have to add the 'p' at the end, which means print the current buffer.
Finally, I'm not sure about your regular expression, particularly the -[a-zA-Z0-9]*, this means zero or more of the previous character (or character class in this case). Typically people wanting at least one alpha-numeric, will use -[a-zA-Z0-9][a-zA-Z0-9]*, yes OR [[:alnum:]][[:alnum:]]*, but I don't know your data well enough to say for sure.
I hope this helps.

You could use grep again with the -o option. Possibly two consecutive greps to also filter out the surrounding id="..." part.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.

Related

How to pass multiple variables in grep

I have a json file that is download using curl. It has some information of a confluence page. I want to extract only 3 parts that downloaded information - that is page: id, status and title.
I have written a bash script for this and my constraint is that I am not sure how to pass multiple variables in grep command
id=id #hardcoded
status=status #hardcoded
echo Enter title you are looking for: #taking input from user here read title_name echo echo echo Here are details
curl -u username:password -sX GET "http://X.X.X.X:8090/rest/api/content?type=page&start=0&limit=200" | python -mjson.tool | grep -Eai "$title_name"|$id|$status"
Aside from a typo (you have an unbalanced quote - please always check the syntax for correctness before you are posting something), the basic idea of your approach would work in that
grep -Eai "$title_name|$id|$status"
would select those text lines which contain those lines which contain the content of one of the variables title_name, id or status.
However, it is a pretty fragile solution. I don't know what can be the actual content of those variables, but for instance, if title_name were set to X.Z, it would also match lines containing the string XYZ, since the dot matches any character. Similarily, if title_name would contain, say, a lone [ or (, grep would complained about an unmatched parentheses error.
If you want to match the string literally and not be taken as regular expressions, it is better to write those pattern into a file (one pattern per line) and use
grep -F -f patternfile
for searching. Of course, since you are using bash, you can also use process substitution if you prefer not using an explicit temporary file.

How to capitalize and replace characters in shell script in one echo

I am trying to find a way to capitalize and replace dashes of a string in one echo. I do not have the ability to use multiple lines for reassigning the string value.
For example:
string='test-e2e-uber' needs to echo $string as TEST_E2E_UBER
I currently can do one or the other by utilizing
${string^^} for capitalization
${string//-/_} for replacement
However, when I try to combine them it does not appear to work (bad substitution error).
Is there a correct syntax to achieve this?
echo ${string^^//-/_}
This does not answer directly your question, but still following script achieves what you wanted :
declare -u string='test-e2e-uber'
echo ${string//-/_}
You can do that directly with the 'tr' command, in just one 'echo'
echo "$string" | tr "-" "_" | tr "[:lower:]" "[:upper:]"
TEST_E2E_UBER
I don't think 'tr' allows to do the conversion of 2 objects in one command only, so I used pipe for output redirection
or you could do something similar with 'awk'
echo "$string" | awk '{gsub("-","_",$0)} {print toupper($0)}'
TEST_E2E_UBER
in this case, I'm replacing with 'gsub' the hyphen, then i'm printing the whole record to uppercase
Why do you dislike it so much to have two successive assignment statements? If you really hate it, you will have to revert to some external program to do the task for you, such as
string=$(tr a-z- A-Z_ <<<$string)
but I would consider it a waste of resources to create a child process for such a simple operation.

expr bash for sed a line in log does not work

my goal is to sed the 100th line and convert it to a string, then separate the data of the sentence to word
#!/bin/bash
fid=log.txt;
sentence=`expr sed -n '100p' ${fid}`;
for word in $sentence
do
echo $word
done
but apparently this has failed.
expr: syntax error
would somebody please let me know what have I done wrong? previously for number it worked.
The expr does not seem to serve a useful purpose here, and if it did, a sed command would certainly not be a valid or useful thing to pass to it, under most circumstances. You should probably just take it out.
However, the following loop is also problematic. Unquoted variables in shell script are very frequently an error. In this case, you can't quote the thing you pass to the for loop (that would cause the loop to only run once, with the loop variable set to the quoted string) but you also cannot prevent the shell from performing wildcard expansion on the unquoted string. So if the string happened to contain *, the shell will expand that to a list of files in the current directory, for example.
Fortunately, this can all be done in an only slightly more complicated sed script.
sed '100!d;s/[ \t]\+/\n/g;q' "$fid"
That is, if the line number is not 100, delete this line and start over with the next line. Otherwise, we are at line 100; replace runs of whitespace with newlines, (print) and quit.
(The backslash escape codes \t and \n are not universally portable; and \+ for repetition is also an optional extension. I believe there are also sed variants which dislike semicolon as a command separator. Consult your sed manual page, experiment, and if everything else fails, maybe switch to Awk or Perl. Just in case, here is a version which works even on Mac OSX:
sed '100!d
s/[ ][ ]*/\
/g;q' log.txt
The stuff inside the square brackets are a space and a literal tab; in Bash, with default keybindings, type ctrl-V, tab to produce a literal tab.)
Incidentally, this also gets rid of the variable capture antipattern. There are good reasons to capture output to a variable, but if it can be avoided, you often end up with a simpler, more robust and efficient, as well as more idiomatic and elegant script. (I see no reason to put the log file name in a variable, either, in this isolated case; but in a larger script, it might make sense.)
I don't think you need expr command in this case.
expr is used to do calculations. Something like:
expr 1 + 1
Just this one is fine:
sentence=`sed -n '100p' ${fid}`;
#!/bin/bash
fid=log.txt;
sentence=$(sed -n '100p' ${fid});
for word in $sentence
do
echo $word
done
put a dollar sign and parenthesis solve the problem

How can I get substring from a string in linux?

I am trying to extract a specific string from a string in linux.
For example, I want to extract 'android.content.pm.PackageParser.parseBaseApplication' from the below string.
The String has a regular format and only the string within parenthesis is changeable.
Join point 'method-execution(boolean android.content.pm.PackageParser.parseBaseApplication(android.content.pm.PackageParser$Package, android.content.res.Resources, org.xmlpull.v1.XmlPullParser, android.util.AttributeSet, int, java.lang.String[]))' in Type
However, I have a trouble in finding a proper approach to do this.
At first, I tried sed command but it's too complicate so I couldn't complete the work.
Could you recommend any other approach to do this?
Thanks alot.
If the interested string is always the second string after the first ( then:
echo "..." | awk -F '[()]' '{split($2,a," "); printf a[2]}'
extract it.
It splits the line using delimiters ( and ). So $2 will the data between ( and ). split splits $2 and you get the second string which is
android.content.pm.PackageParser.parseBaseApplication
for your example.
This looks like AOP syntax. So with certain assumption, this can be done as :
echo "Join point...." | cut -d'(' -f2 | cut -d' ' -f2
Explanation : cut based on ( and get second field, which is the method signature except parameters. Since we are not interested in return type as well, split the signature based on blank space and get the second field, which is the method name.
This is based your stated invariant, that the substring you're capturing is the only part that varies from file to file, here is a perl solution:
Extract=$(perl -ne 'print $1 if /\s*Join point \x27method-execution\(boolean\s+([^(]*)/' file_to_search)
echo $Extract
android.content.pm.PackageParser.parseBaseApplication
I used the full lead-in because it reduced the chance of false-positive, but if you find other things change and want to use yet a substring of that (e.g., "method-execution(boolean "), that's your choice to make.
This matches out to the where the variant substring starts, which goes to the next invariant--the open parenthesis--so we can just capture while not open parenthesis. Since it's probably some human interaction changing the variant, I allowed for extra spaces with the \s+ (one or more white space).
You could use almost the same regex with sed, but would need to consume the entire string to avoid it becoming part of the output. e.g., in shorthand:
sed -r 's/.*LEAD_IN(CAPTURE_TEXT).*/\1/
Where LEAD_IN is the constant leader, "Join point..." and CAPTURE_TEXT the same capture group as in the perl solution. Main difference is leading and triling ".*" to consume the entire subject.

How can I remove a doubled section of a string?

I'm having trouble with data manipulation in a txt file. My file currently looks like this:
HG02239 -23.42333333
NA06985NA06985 -20.125
NA06991NA06991 -20.92
This shows some of my tab-delimited data. Half the entries are in the correct seven-characters (letterletternumbernumbernumbernumbernumber) format, but some are doubled up. I want to go into the second column (first column is empty for a reason!) and remove the repeats in the string so it would read
HG02239 -23.42333333
NA06985 -20.125
NA06991 -20.92
I can't work out how to do this with sed/awk on a per column basis. I feel like I should be able to write a regex, but because the data is a repeat, I don't want to lose the first half of the string; and I can't work out how to cut on a specific column, or I would just delete the 7th character. Any help much appreciated!
Solution
You can solve this with a backreference. For example, using GNU sed:
$ cat << EOF | sed --regexp-extended 's/(.{7})\1/\1/'
HG02239 -23.42333333
NA06985NA06985 -20.125
NA06991NA06991 -20.92
EOF
HG02239 -23.42333333
NA06985 -20.125
NA06991 -20.92
If you aren't using GNU sed, you may need to escape the capture groups. In addition, you can tune the regular expression if you need a more accurate character match.
Explanation
The cat pipeline is just a here-document to make it easy to display and test the code. You can call sed directly on your file, or use the -i flag to perform an in-place edit when you're comfortable with the results.
The sed script does the following:
It stores any group of 7 consecutive characters in a capture group using an "interval expression" (the number in the curly braces).
The \1 is a backreference that matches the first capture group.
The match looks for "a capture group followed by a copy of the capture group."
The substitution replaces the match with a single copy of the capture group.
One way, using awk:
awk '{ print substr($1, 1, 7), $2 }' file.txt
Output:
HG02239 -23.42333333
NA06985 -20.125
NA06991 -20.92
You could use something like that:
sed -i 's|\([A-Z]\{2\}[0-9]\{5\}\)[A-Z0-9]*\s*\(.*\)|\1 \2|g' <your-file>

Resources