Strange symbol in output filename from bash script - linux

I have a bash script that take a single text file 'power_coords.txt' containing 115 rows of data, and within each row there are four (space separated) columns containing x,y,z coordinates (first 3 cols) and a name (4th col). Example:
36 54 19 cotc1
45 13 -27 cotc2
1 -6 14 cotc3
....
My script runs the following lines of code to run an operation on each line of the text file:
#!/bin/bash
input="power_coords.txt"
while IFS=" " read x y z name
do
fslmaths avg152T1.nii.gz -mul 0 -add 1 -roi $x 1 $y 1 $z 1 0 1 $name -odt float
done < "$input"
This appears to work just fine, however when I run the code I get a strange symbol in each filename created:
Does anyone know why this is happening and how to fix? Or a simple way to clean up the filenames (i.e., remove the bit that looks like lego) after the script has run?
Cheers

The small square in your screenshot says "000D", so it's just a CR newline symbol.
First of all, I would recommend checking what's in the variable $name, by just printing it:
while IFS=" " read x y z name
do
echo "aaa${name}bbb"
done < "$input"
aaa and bbb are there to more easily see the output.
It would also help to check what character is used in power_coords.txt. You can check it by passing path to the utility file. It should print out which "line terminator" is used in file.
After figuring out what's being read into $name, you can either convert line breaks in power_coords.txt, or tweak IFS to allow CR to be treated as delimiter as well.

Related

How do I get AWK to reaarrange and manipulate text in a file to two output files depending on conditions?

I tried to find an efficient way to split then recombine text in one file into two seperate files. it's got a lot going on like removing the decimal point, reversing the sign (+ becomes - and - becomes +) in amount field and padding. For example:
INPUT file input.txt:
(this first line is there just to give character position more easily instead of counting, it's not present in the input file, the "|" is just there to illustrate position only)
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345
| | | | | | | ("|" shows position)
123456789XXPPPPPPPPPP NNNNNN#1404.58 #0.00 0 1
987654321YYQQQQQQQQQQ NNNNNN#-97.73 #-97.73 1 1
777777777XXGGGGGGGGGG NNNNNN#115.92 #115.92 0 0
888888888YYHHHHHHHHHH NNNNNN#3.24 #3.24 1 0
Any line that contains a "1" as the 85th character above goes to one file say OutputA.txt rearranged like this:
PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~+0000009773+0000009773
As well as any line that contains a "0" as the 85th character above goes to another file OutputB.txt rearranged like this:
GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324
It seems so complicated, but if I could just grab each portion of the input lines as different variables and then write them out in a different order with right alignment for the amount padded with 0s and splitting them into different files depending on the last column. Not sure how I can put all these things together in one go.
I tried printing out each line into a different file depending whether the 85th charater is a 1 or 0, then then trying to create variables say from first character to 11th character is varA and the next 10 is varB etc... but it get complex quickly because I need to change + to - and - to + and then pad with zeros and change te spacing. it gets a bit mad. This should be possible with one script but I just can't put all the pieces together.
I've looked for tutorials but nothing seems to cover grabbing based on condition whilst at the same time padding, rearranging, splitting etc.
Many thanks in advance
split
Use GNU AWK ability to print to file, consider following simple example
seq 20 | awk '$1%2==1{print $0 > "fileodd.txt"}$1%2==0{print $0 > "fileeven.txt"}'
which does read output of seq 20 (numbers from 1 to 20, inclusive, each on separate line) and does put odd numbers to fileodd.txt and even number to fileeven.txt
recombine text
Use substr and string contatenation for that task, consider following simple example, say you have file.txt with DD-MM-YYYY dates like so
01-29-2022
01-30-2022
01-31-2022
but you want YYYY-MM-DD then you could do that by
awk '{print substr($0,7,4) "-" substr($0,1,2) "-" substr($0,4,2)}' file.txt
which gives output
2022-01-29
2022-01-30
2022-01-31
substr arguments are: string ($0 is whole line), start position and length, space is concatenation operator.
removing the decimal point
Use gsub with second argument set to empty string to delete unwanted characters, but keep in mind . has special meaning in regular expression, consider following simple example, let file.txt content be
100.15
200.30
300.45
then
awk '{gsub(/[.]/,"");print}' file.txt
gives output
10015
20030
30045
Observe that /[.]/ not /./ is used and gsub does change in-place.
reversing the sign(...)padding
Multiply by -1, then use sprintf with suitable modifier, consider following example, let file.txt content be
1
-10
100
then
awk '{print "Reversed value is " sprintf("%+05d",-1*$1)}' file.txt
gives output
Reversed value is -0001
Reversed value is +0010
Reversed value is -0100
Explanation: % - this is place where value will be instered, + - prefix using - or +, 05 - pad with leading zeros to width of 5 characters, d assume value is integer. sprintf does return formatted string which can be concatenated with other string as shown above.
(tested in GNU Awk 5.0.1)
You can use jq for this task:
#!/bin/bash
INPUT='
123456789XXPPPPPPPPPP NNNNNN#1404.58 #0.00 0 1
987654321YYQQQQQQQQQQ NNNNNN#-97.73 #-97.73 1 1
777777777XXGGGGGGGGGG NNNNNN#115.92 #115.92 0 0
888888888YYHHHHHHHHHH NNNNNN#3.24 #3.24 1 0
'
convert() {
jq -rR --arg lineSelector "$1" '
def transformNumber($len):
tonumber | # convert string to number
(if . < 0 then "+" else "-" end) as $sign | # store inverted sign
if . < 0 then 0 - . else . end | # abs(number)
. * 100 | # number * 100
tostring | # convert number back to string
$sign + "0" * ($len - length) + .; # indent with leading zeros
# Main program
split(" ") | # split each line by space
map(select(length > 0)) | # remove empty entries
select(.[4] == $lineSelector) | # keep only lines with the selected value in last column
# generate output # example for first line
.[0][11:21] + # PPPPPPPPPP
"~~" + # ~~
(.[1] | split("#")[0]) + # NNNNNN
.[0][0:11] + # 123456789XX
"~~~" + # ~~~
(.[1] | split("#")[1] | transformNumber(10)) + # -0000140458
(.[2] | split("#")[1] | transformNumber(10)) # -0000000000
' <<< "$2"
}
convert 0 "$INPUT" # or convert 1 "$INPUT"
Output for 0
GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324
Output for 1
PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~+0000009773+0000009773

Convert carriage return (\r) to actual overwrite

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.
The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")
With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.
To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

Isolate product names from strings by matching string after (including) first letter in a variable

I have a bunch of strings of following pattern in a text file:
201194_2012110634 Appliance 130 AB i Some optional (Notes )
300723_2017050006(2016111550) Device 16 AB i Note
The first part is serial, the second is date. Device/Appliance name and model (about 10 possible different names) is the string after date number and before (including AB i).
I was able to isolate dates and serials using
SERIAL=${line:0:6}
YEAR=${line:7:4}
I'm trying to isolate Device name and note after that:
#!/bin/bash
while IFS= read line || [[ -n $line ]]; do
NAME=${line#*[a-zA-Z]}
STRINGAP='Appliance '"${line/#*Appliance/}"
The first approach is to take everything after the first letter appearing in line, which gives me
NAME = ppliance 130 AB i Some optional (Notes )
The second approach is to write tests for each of the ~10 possible appliance/device names and then append appliance name after the subtracted test. Then test variable which actually matched Appliance / Device (or other name) and use that to input into the database.
Is it possible to write a line that would select everything, including first letter in a line, in text file? Then I would subtract everything after AB i to get notes and everything before AB i would become appliance name.
Remove the ${line#*[az-A-Z]} line (which will, as you see, remove the first character of the name), and instead use
STRINGAP=$(echo "$line" | sed 's/^[0-9_]* \(.*\) AB i.*/\1/')
This drops the leading digits and underscore, and everything from " AB i" to the end.
Edit: The details are unclear - do you want to keep the "AB i", and will it always be "AB i"? If you want it, change the line to
STRINGAP=$(echo "$line" | sed 's/^[0-9_]* \(.* AB i\).*/\1/')
I also forgot the double quotes round the text line.
You can use sed and read to give you more control of parsing.
tmp> line2="300723_2017050006(2016111550) Device 16 AB i Note"
tmp> read serial date type val <<<$(echo $line2 | \
sed 's/\([0-9]*\)_\([0-9]*\)[^A-Z]*\(Device\|Appliance\) \
\([0-9]*\).*/\1 \2 \3 \4/')
tmp> echo "$serial|$date|$type|$val"
300723|2017050006|Device|16
Basically, read allows you to assign multiple variables in one line. The sed statment parses the line, and gives you space delimitted output of its results. You can also read each variable seperately if you don't mind running sed a few extra times:
device="$(echo $line2 | sed -e 's/^.*Device \([0-9]*\).*/\1/;t;d')"
appliance="$(echo $line2 | sed -e 's/^.*Appliance \([0-9]*\).*/\1/;t;d')"
This way $device is populated with device if present, and is blank otherwise (note the -e and ;t;d at the end of the regex to prevent it from dumping the line if it doesn't match.)
Your question isn't clear but it seems like you might be trying to parse strings into substrings. Try this with GNU awk for the 3rd arg to match() and let us know if there's something else you were looking for:
$ awk 'match($0,/^([0-9]+)_([0-9]+)(\([0-9]+\))?\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(.*)/,a) {
for (i=1; i<=8; i++) {
print i, a[i]
}
print "---"
}' file
1 201194
2 2012110634
3
4 Appliance
5 130
6 AB
7 i
8 Some optional (Notes )
---
1 300723
2 2017050006
3 (2016111550)
4 Device
5 16
6 AB
7 i
8 Note
---
If you wanted a CSV output, for example, then it'd just be:
$ awk -v OFS=',' 'match($0,/^([0-9]+)_([0-9]+)(\([0-9]+\))?\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(.*)/,a) {
for (i=1; i<=8; i++) {
printf "%s%s", a[i], (i<8?OFS:ORS)
}
}' file
201194,2012110634,,Appliance,130,AB,i,Some optional (Notes )
300723,2017050006,(2016111550),Device,16,AB,i,Note
Massage to suit...

remove end of line characters with a bash script?

I'm trying to make a script to remove this characters (/r/n) that windows puts. BUT ONLY if they are between this ( " ) why this?
because the dump file puts this characters I don't know why.
and why between quotes? because it only affect me if they are chopping my result
For Example. "this","is","a","result","from","database"
the problem :
"this","is","a","result","from","da
tabase"
[EDIT]
Thanks to the answer of #Cyrus I got something like this
, but it gets bad flag in substitute command '}' I'm on MAC OSX
Can you help me?
Thanks
OS X uses a different sed than the one that's typically installed in Linux.
The big differences are that sequences like \r and \n don't get expanded or used as part of the expression as you might expect, and you tend to need to separate commands with semicolons a little more.
If you can get by with a sed one-liner that implements a rule like "Remove any \r\n on lines containing quotes", it will certainly simplify your task...
For my experiments, I used what I infer is your sample input data:
$ od -c input.txt
0000000 F o r E x a m p l e . " t h
0000020 i s " , " i s " , " a " , " r e
0000040 s u l t " , " f r o m " , " d a
0000060 t a \r \n b a s e " \n
0000072
First off, a shell-only solution might be to use smaller tools that are built in to the operating system. For example, here's a one-liner:
od -A n -t o1 -v input.txt | rs 0 1 | while read n; do [ $n -eq 015 ] && read n && continue; printf "\\$n"; done
Broken out for easier reading, here's what this looks like:
od -A n -t o1 -v input.txt | rs 0 1 - convert the file into a stream of ocal numbers
| while read n; do - step through the numbers...
[ $n -eq 015 ] && - if the current number is 15 (i.e. octal for a Carriage Return)
read n - read a line (thus skipping it),
&& continue - and continue to the next octal number (thus skipping the newline after a CR)
printf "\\$n"; done - print the current octal number.
This kind of data conversion and stream logic works nicely in a pipeline, but is a bit harder to implement in sed, which only knows how to deal with the original input rather than its converted form.
Another bash option might be to use conditional expressions matching the original lines of input:
while read line; do
if [[ $line =~ .*\".*$'\r'$ ]]; then
echo -n "${line:0:$((${#line}-1))}"
else
echo "$line"
fi
done < input.txt
This walks through text, and if it sees a CR, it prints everything up to and not including it, with no trailing newline. For all other lines, it just prints them as usual. The result is that lines that had a carriage return are joined, other lines are not.
From sed's perspective, we're dealing with two input lines, the first of which ends in a carriage return. The strategy for this would be to search for carriage returns, remove them and join the lines. I struggled for a while trying to come up with something that would do this, then gave up. Not to say it's impossible, but I suspect a generally useful script will be lengthy (by sed standards).

batch script that runs with parameters read from a tab delimited text file (Linux)

I have a tab delimited txt file that looks like this:
1. C1 34 98
2. C3 2 45
How can I make a batch script (linux) that will
extract the second data in the first line to variable 1
extract the third data in the first line to variable 2
extract the fourth data in the first line to variable 3
then
run a series of scripts with parmeters defined to variable 1,2 and 3
e.g.
script $1 $2 $3 > path/file$1-$2-$3
The script should use the values of variables as parameters and than write out the results to a file named according to the values of the variables, thus each cycle would result in a new file)
Finish the loop when all lines are used up from the tab limited txt file.
I am not a programmer...
This can be done in the shell alone(see below, assuming default value for IFS)
while read -r _ x y z ;
do
echo "$x" "$y" "$z";
done < input.txt
Assuming you're using sh/bash/ksh, the shell gives you what you need:
while read dummy v1 v2 v3 dummy
do
echo $v1 $v2 $v3
./dostuff $v1 $v2 $v3
done < inputFile
How a line is tokenised depends on the IFS variable, which by default consists of a tab, a space and a newline. You can change this, but you must manage its contents carefully as it's easy to break a script by not restoring IFS back to it's default values.
So what we're doing here is reading the file inputFile and splitting into five fields, dummy, v1, v2, v3, and dummy. These could have just as easily been called a, b, c, d, e but calling the fields we want to junk dummy it's obvious what the intention is.
If you know that the file will only ever have four fields, then the final dummy isn't needed (i.e., first line can be while read dummy v1 v2 v3); in essence the last field in the while read [...] sucks up the rest of the line, so if the input was 1. 34 45 12 67 65 then without the final dummy variable v3 would contain 12 67 65. With it, v3 becomes 12 with the rest of the line being read into dummy. This'll make sense if you experiment with it :-)
while read -r line
do
var1=$(echo $line | awk '{print $2}')
var2=$(echo $line | awk '{print $3}')
var3=$(echo $line | awk '{print $4}')
echo $var1 $var2 $var3
./yourscript.sh $var1 $var2 $var3
# Do other stuff with these variables.
done < file

Resources