-bash: syntax error near unexpected token `done' in script - linux

I am sorry for posting this but this is driving me crazy. I am very new to bash scripting and am really struggling. I have files with the following format 8_S58_L001.sorted.bam and I would like to take the first digit (8 in this case) from many files and generate a csv file. This will give me the order in which samples were processed by a downstream function.
The script is as follows and it works, however I get an error (-bash: syntax error near unexpected token `done') everytime I run it and am struggling to understand why. So far I have spent 2 days trying to get to the bottom of it and have searched extensively through various forums.
do
test=$(ls -LR | grep .bam$| sed 's/_.*//'| awk '{print}' ORS=',' | sed 's/*$//')
echo $test>../SampleOrder/fileOrder2.csv
done
If I just run
test=$(ls -LR | grep .bam$| sed 's/_.*//'| awk '{print}' ORS=',' | sed 's/*$//')
echo $test>../SampleOrder/fileOrder2.csv
Then I get the desired output and no errors but if it is incorporated within an do statement I get the above error. I am hoping to incorporate this into a larger script so I want to deal with this error first.
I should say that this is being run on a linux based cluster.
Can someone with more experience tell me where I am going wrong.
Thanks in advance
Sam

bash doesn't have a do statement, and done is a reserved word when it is the first word in a command.
So in
do
something
something
done
do is a syntax error. do is only useful in the context of for and while loops, where it serves to separate the condition from the body of the loop.
Since you're reporting a syntax error on the done as opposed to the do, my guess is that you've let Windows line-endings creep into your file. Bash doesn't regard the \r (CR) character as special, so if your file actually contains do\r, then that will be considered to be the name of an external command.

You should be aware that grep .bam$ doesn't do what you are expecting it to do. The dot is a grep wildcard which matches any single character, so the pattern .bam$ will match any string of 4 or more characters that ends in "bam". If you are trying to match all strings that end in ".bam", you should escape the dot and write grep "\.bam$"
But as a previous commenter correctly noted, you should be using shell wildcards (ls *.bam) instead of grep (ls | grep .bam$)

Related

Simple grep command to search for a string not working on my ubuntu

I have to search for a string in a file like below using grep which is not working as expected.
It's just a simple search of the string, but not sure why it is not working
echo "Naizhu NZ1020 Lady Necklace Sexy Tcollarbone Chain Alloy PlatingSilver" | grep "Lady Necklace"
Can somebody help me here why it's not working, want to know the reason
The command grep will print the whole line matching the pattern.
So
echo "Naizhu NZ1020 Lady Necklace Sexy Tcollarbone Chain Alloy PlatingSilver" | grep "Lady Necklace"
will give you
Naizhu NZ1020 Lady Necklace Sexy Tcollarbone Chain Alloy PlatingSilver
You might use grep -o or --only-matching to
Print only the matched (non-empty) parts of a matching line, with each
such part on a separate output line.
and to get only
Lady Necklace
Within the comments of the question it was mentioned that a file is used for input. Since the encoding of that is unknown currently, you may also try use character classes
grep -o "Lady[[:space:]]Necklace"
Please see man grep for more options.
You should also have a look into your input file and if the words you like to lookup are in the same line and separated with a space and not with other not printable characters.

convert this linux statement into a statement which is supported by windows command prompt

This is my statement supported by unix environment
"cat document.xml | grep \'<w:t\' | sed \'s/<[^<]*>//g\' | grep -v \'^[[:space:]]*$\'"
But I want to execute that statement in windows command prompt .
How do I do that? and what are the commands which are similar to cat, grep,sed .
please tell me the exact code supported for windows similar to above command
The double quotes around the pipeline in your question are a syntax error, and the backslashed single quotes should apparently really not have backslashes, but I assume it's just an artefact of a slightly imprecise presentation.
Here's what the code does.
cat document.xml |
This is a useless use of cat but its purpose is to feed the contents of this file into the pipeline.
grep '<w:t' |
This looks for lines containing the literal string <w:t (probably the start of a tag in the XML format in the file). The single quotes quote the string so that it is not interpreted by the shell (otherwise the < would be interpreted as a redirection operator); they are consumed by the shell, and not passed through to grep.
sed 's/<[^<]*>//g' |
This replaces every pair of open/close brokets with an empty string. The regular expression [^<]* matches zero or more occurrences of a character which can be anything except <. If the XML is well-formed, these should always occur in pairs, and so we effectively remove all XML tags.
grep -v '^[[:space:]]*$'
This removes any line which is empty or consists entirely of whitespace.
Because sed is a superset of grep, the program could easily be rephrased as a single sed script. Perhaps the easiest solution for your immediate problem would be to obtain a copy of sed for your platform.
sed -e '/<w:t/!d' -e 's/<[^<]*>//g' -e '/[^[:space]]/!d' document.xml
I understand quoting rules on Windows may be different; try with double quotes instead of single, or put the script in a file and use sed -f file document.xml where file contains the script itself, like this:
/<w:t/!d
s/<[^<]*>//g
/[^[:space]]/!d
This is a rather crude way to extract the CDATA from an XML document, anyway; perhaps some XML processor would be the proper way forward. E.g. xmlstarlet appears to be available for Windows. It works even if the XML input doesn't have the beginning and ending <w:t> tags on the same line, with nothing else on it. (In fact, parsing XML with line-oriented tools is a massive antipattern.)
May try with "powershell" ?
It is included since Win8 I think,
for sure on W10 it is.
I've just tested a "cat" command and it works.
"grep" don't but may be adapt like this :
PowerShell equivalent to grep -f
and
https://communary.wordpress.com/2014/11/10/grep-the-powershell-way/
The equivalent of grep on windows would be findstr and the equivalent of cat would be type.

using a literal $ in a grep search

I am trying to use a literal $ as an end anchor for a search using grep. The entire problem is search for line(s) in a file that start with At and ends with a literal $. I have tried several variations of the code I think will work and get no results even though there should be.
grep '\<At[a-zA-z]\{1,\}\$\>' test.txt
Any suggestions would be appreciated and I am a first year student of Linux so forgive me if I am missing something simple. Thank you
grep '\<At[a-zA-z]\{1,\}[$]\>' test.txt
To avoid playing shell escaping games, put the $ inside a character class.

How to replace a whole word without changing any other matching strings?

I want to change all instances of the variable Status to status in my code.
However there are some lines where Status is on the same line as strings like Current_Status and Check_Status_After_Write, etc.
I want to replace the variable name only, and not change definitions of other variables or matching comment strings if possible.
I tried to use:
grep -nwrs Status ./Status.txt | xargs sed -i 's/i\<Status\>/status/g'
This returns:
sed: no input files
I tried simplifying it to use:
grep -rl Status ./Status.txt | xargs sed -i 's/i\<Status\>/status/g'
But this fails to work as I wanted.
Am I over complicating this? Can anyone offer a solution?
As an aside, I've had a few failed attempts to do this and am a bit paranoid about the string replacement, is there a way to ask for verification before the replace happens without writing a script?
can you not use the pattern
^Status$
to match only the word status, with nothing before or after it?

Extract Directory from Log File with sed

I'm trying to parse through an application.log that has many lines that follow the same syntax below.
"Error","jrpp-237","10/13/11","02:55:04",,"File not found: /indexUsa~.cfm The specific sequence of files included or processed is: c:\websites\pj7fe4\indexUsa~.cfm '' "
I need to use some type of command to pull out what is listed between c:\websites\ and the next \
e.g. in this case it would be pj7fe4
I thought that the following command would work..
bin/sed -n '/c:\\websites\\/,/\\/p' upload/test.log
Unfortunately from reading further I now understand that this will return the entire line containing c:\websites through the \ and I need to know the in between, not the whole line.
To be more difficult I need to match all of the directory sub paths, not just one particular line as this is for multiple sites.
You're using range patterns incorrectly. You can't use it to limit the command (print in this case) to a part of the line, only to a range of lines. You also don't escape the backspaces.
Try this: sed 's/.*c:\\websites\\\([0-9a-zA-Z]*\)\\.*/\1/'
There's a good sed tutorial here: Sed - An Introduction and Tutorial by Bruce Barnett
grep way:
grep -Po "(?<=c:\\\websites\\\)[^\\\]+(?=\\\)" yourFile
test:
kent$ echo '"Error","jrpp-237","10/13/11","02:55:04",,"File not found: /indexUsa~.cfm The specific sequence of files included or processed is: c:\websites\pj7fe4\indexUsa~.cfm '' "'|grep -Po "(?<=c:\\\websites\\\)[^\\\]+(?=\\\)"
pj7fe4

Resources