remove end of line characters with a bash script?

remove end of line characters with a bash script? - linux

I'm trying to make a script to remove this characters (/r/n) that windows puts. BUT ONLY if they are between this ( " ) why this?
because the dump file puts this characters I don't know why.
and why between quotes? because it only affect me if they are chopping my result
For Example. "this","is","a","result","from","database"
the problem :
"this","is","a","result","from","da
tabase"
[EDIT]
Thanks to the answer of #Cyrus I got something like this
, but it gets bad flag in substitute command '}' I'm on MAC OSX
Can you help me?
Thanks

OS X uses a different sed than the one that's typically installed in Linux.
The big differences are that sequences like \r and \n don't get expanded or used as part of the expression as you might expect, and you tend to need to separate commands with semicolons a little more.
If you can get by with a sed one-liner that implements a rule like "Remove any \r\n on lines containing quotes", it will certainly simplify your task...
For my experiments, I used what I infer is your sample input data:
$ od -c input.txt
0000000 F o r E x a m p l e . " t h
0000020 i s " , " i s " , " a " , " r e
0000040 s u l t " , " f r o m " , " d a
0000060 t a \r \n b a s e " \n
0000072
First off, a shell-only solution might be to use smaller tools that are built in to the operating system. For example, here's a one-liner:
od -A n -t o1 -v input.txt | rs 0 1 | while read n; do [ $n -eq 015 ] && read n && continue; printf "\\$n"; done
Broken out for easier reading, here's what this looks like:
od -A n -t o1 -v input.txt | rs 0 1 - convert the file into a stream of ocal numbers
| while read n; do - step through the numbers...
[ $n -eq 015 ] && - if the current number is 15 (i.e. octal for a Carriage Return)
read n - read a line (thus skipping it),
&& continue - and continue to the next octal number (thus skipping the newline after a CR)
printf "\\$n"; done - print the current octal number.
This kind of data conversion and stream logic works nicely in a pipeline, but is a bit harder to implement in sed, which only knows how to deal with the original input rather than its converted form.
Another bash option might be to use conditional expressions matching the original lines of input:
while read line; do
if [[ $line =~ .*\".*$'\r'$ ]]; then
echo -n "${line:0:$((${#line}-1))}"
else
echo "$line"
fi
done < input.txt
This walks through text, and if it sees a CR, it prints everything up to and not including it, with no trailing newline. For all other lines, it just prints them as usual. The result is that lines that had a carriage return are joined, other lines are not.
From sed's perspective, we're dealing with two input lines, the first of which ends in a carriage return. The strategy for this would be to search for carriage returns, remove them and join the lines. I struggled for a while trying to come up with something that would do this, then gave up. Not to say it's impossible, but I suspect a generally useful script will be lengthy (by sed standards).

Related

Use bash to find line in java files which include a pattern, and then replace another part of the line

I have a directory that includes a lot of java files, and in each file I have a class variable:
String system = "x";
I want to be able to create a bash script which I execute in the same directory, which will go to only the java files in the directory, and replace this instance of x, with y. Here x and y are a word. Now this may not be the only instance of the word x in the java script, however it will definitely be the first.
I want to be able to execute my script in the command line similar to:
changesystem.sh -x -y
This way I can specify what the x should be, and the y I wish to replace it with. I found a way to find and print the line number at which the first instance of a pattern is found:
awk '$0 ~ /String system/ {print NR}' file
I then found how to replace a substring on a given line using:
awk 'NR==line_number { sub("x", "y") }'
However, I have not found a way to combine them. Maybe there is also an easier way? Or even, a better and more efficient way?
Any help/advice will be greatly appreciated

You may create a changesystem.sh file with the following GNU awk script:
#!/bin/bash
for f in *.java; do
awk -i inplace -v repl="$1" '
!x && /^\s*String\s+system\s*=\s*".*";\s*$/{
lwsp=gensub(/\S.*/, "", 1);
print lwsp"String system = \""repl"\";";
x=1;next;
}1' "$f";
done;
Or, with any awk:
#!/bin/bash
for f in *.java; do
awk -v repl="$1" '
!x && /^[[:space:]]*String[[:space:]]+system[[:space:]]*=[[:space:]]*".*";[[:space:]]*$/{
lwsp=$0; sub(/[^[:space:]].*/, "", lwsp);
print lwsp"String system = \""repl"\";";
x=1;next
}1' "$f" > tmp && mv tmp "$f";
done;
Then, make the file executable:
chmod +x changesystem.sh
Then, run it like
./changesystem.sh 'new_value'
Notes:
for f in *.java; do ... done iterates over all *.java files in the current directory
-i inplace - GNU awk feature to perform replacement inline (not available in a non-GNU awk)
-v repl="$1" passes the first argument of the script to the awk command
!x && /^\s*String\s+system\s*=\s*".*";\s*$/ - if x is false and the record starts with any amount of whitespace (\s* or [[:space:]]*), then String, any 1+ whitespaces, system, = enclosed with any zero or more whitesapces, and then a " char, then has any text and ends with "; and any zero or more whitespaces, then
lwsp=gensub(/\S.*/, "", 1); puts the leading whitespace in the lwsp variable (it removes all text starting with the first non-whitespace char from the line matched)
lwsp=$0; sub(/[^[:space:]].*/, "", lwsp); - same as above, just in a different way since gensub is not supported in non-GNU awk and sub modifies the given input string (here, lwsp)
{print "String system = \""repl"\";";x=1;next}1 - prints the String system = " + the replacement string + ";, assigns 1 to x, and moves to the next line, else, just prints the line as is.

You don't need to pre-compute the line number. The whole job can be done by one not-too-complicated sed command. You probably do want to script it, though. For example:
#!/bin/bash
[[ $# -eq 3 ]] || {
echo "usage: $0 <context regex> <target regex> <replacement text>" 1>&2
exit 1
}
sed -si -e "/$1/ { s/\\<$2\\>/$3/; t1; p; d; :1; n; b1; }" ./*.java
That assumes that the files to modify are java source files in the current working directory, and I'm sure you understand the (loose) argument check and usage message.
As for the sed command itself,
the -s option instructs sed to treat each argument as a separate stream, instead of operating as if by concatenating all the inputs into one long stream.
the -i option instructs sed to modify the designated files in-place.
the sed expression takes the default action for each line (printing it verbatim) unless the line matches the "context" pattern given by the first script argument.
for lines that do match the context pattern,
s/\\<$2\\>/$3/ - attempt to perform the wanted substitution
the \< and \> match word start and end boundaries, respectively, so that the specified pattern will not match a partial word (though it can match multiple complete words if the target pattern allows)
t1 - if a substitution was made, then branch to label 1, otherwise
p; d - print the current line and immediately start the next cycle
:1; n; b1 - label 1 (reachable only by branching): print the current line and read the next one, then loop back to label 1. This prints the remainder of the file without any more tests or substitutions.
Example usage:
/path/to/replace_first.sh 'String system' x y
It is worth noting that that does expose the user to some details of seds interpretation of regular expressions and replacement text, though that does not manifest for the example usage.
Note that that could be simplified by removing the context pattern bit if you are sure you want to modify the overall first appearance of the target in each file. You could also hard-code the context, the target pattern, and/or the replacement text. If you hard-code all three then the script would no longer need any argument handling or checking.

Convert carriage return (\r) to actual overwrite

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.

The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")

With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.

To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

Remove a sequence of chars at the end of file including LF (linefeed)

I have a file that contains some PCL sequences. I have this sequence at the end of the file (hex):
461b 2670 3158 0a F.&p1X.
I want to remove the sequence: <Esc>&p1X including the character that follows. In 99% of cases, LF follows the sequence.
I tried this command:
sed -b 's/\o33&p[0-9]X$//Mg' ~/test.txt >test2.txt
However, it appends LF at the end of test2.txt. Also, if, instead of $ I specify . it doesn't match the line anymore.
If you want to play with this, generate the input file using this command:
echo -e "SomeString\033&p1X" > ~/test.txt
The redirect appends an LF char at the end.
Thanks

If I have understood well you know for sure that your file contains that sequence of characters at the end. If this is the case I would simply truncate the last six bytes. It will work regardless the very last character being new-line or whatever you want...
Example:
$ echo -e "SomeString\033&p1X" > test.txt
$ od -c test.txt
0000000 S o m e S t r i n g 033 & p 1 X \n
0000020
$ truncate -s -6 test.txt
$ od -c test.txt
0000000 S o m e S t r i n g
0000012
This is also very efficient as it will use the system call truncate().

This seems to do the trick based on this thread:
perl -pi -e 's/\x1b&p[0-9]X\n//g' ~/test.txt
(I am a perl beginner as well - any comments would be appreciated).

Return value of sed for no match

I'm using sed for updating my JSON configuration file in the runtime.
Sometimes, when the pattern doesn't match in the JSON file, sed still exits with return code 0.
Returning 0 means successful completion, but why does sed return 0 if it doesn't find the proper pattern and update the file? Is there a workaround for that?

as #cnicutar commented, the return code of a command means if the command was executed successfully. has nothing to do with the logic you implemented in the codes/scripts.
so if you have:
echo "foo"|sed '/bar/ s/a/b/'
sed will return 0 but if you write some syntax/expression errors, or the input/file doesn't exist, sed cannot execute your request, sed will return 1.
workaround
this is actually not workaround. sed has q command: (from man page):
q [exit-code]
here you can define exit-code as you want. For example '/foo/!{q100}; {s/f/b/}' will exit with code 100 if foo isn't present, and otherwise perform the substitution f->b and exit with code 0.
Matched case:
kent$ echo "foo" | sed '/foo/!{q100}; {s/f/b/}'
boo
kent$ echo $?
0
Unmatched case:
kent$ echo "trash" | sed '/foo/!{q100}; {s/f/b/}'
trash
kent$ echo $?
100
I hope this answers your question.
edit
I must add that, the above example is just for one-line processing. I don't know your exact requirement. when you want to get exit 1. one-line unmatched or the whole file. If whole file unmatching case, you may consider awk, or even do a grep before your text processing...

This might work for you (GNU sed):
sed '/search-string/{s//replacement-string/;h};${x;/./{x;q0};x;q1}' file
If the search-string is found it will be replaced with replacement-string and at end-of-file sed will exit with 0 return code. If no substitution takes place the return code will be 1.
A more detailed explanation:
In sed the user has two registers at his disposal: the pattern space (PS) in which the current line is loaded into (minus the linefeed) and a spare register called the hold space (HS) which is initially empty.
The general idea is to use the HS as a flag to indicate if a substitution has taken place. If the HS is still empty at the end of the file, then no changes have been made, otherwise changes have occurred.
The command /search-string/ matches search-string with whatever is in the PS and if it is found to contain the search-string the commands between the following curly braces are executed.
Firstly the substitution s//replacement-string/ (sed uses the last regexp i.e. the search-string, if the lefthand-side is empty, so s//replacement-string is the same as s/search-string/replacement-string/) and following this the h command makes a copy of the PS and puts it in the HS.
The sed command $ is used to recognise the last line of a file and the following then occurs.
First the x command swaps the two registers, so the HS becomes the PS and the PS becomes the HS.
Then the PS is searched for any character /./ (. means match any character) remember the HS (now the PS) was initially empty until a substitution took place. If the condition is true the x is again executed followed by q0 command which ends all sed processing and sets the return code to 0. Otherwise the x command is executed and the return code is set to 1.
N.B. although the q quits sed processing it does not prevent the PS from being reassembled by sed and printed as per normal.
Another alternative:
sed '/search-string/!ba;s//replacement-string/;h;:a;$!b;p;x;/./Q;Q1' file
or:
sed '/search-string/,${s//replacement-string/;b};$q1' file

These answers are all too complicated. What is wrong with writing a bit of shell script that uses grep to figure out if the thing you want to replace is there then using sed to replace it?
grep -q $TARGET_STRING $file
if [ $? -eq 0 ]
then
echo "$file contains the old site"
sed -e "s|${TARGET_STRING}|${NEW_STRING}|g" ....
fi

For 1 line of input. To avoid repeating the /pattern/:
When s succeeds to substitute, use t to jump conditionally to a label, e.g. x. Otherwise use q to quit with an exit code, e.g. 100:
's/pattern/replacement/;tx;q100;:x'
Example:
$ echo 1 > one
$ < one sed 's/1/replaced-it/;tx;q1;:x'
replaced-it
$ echo $?
0
$ < one sed 's/999/replaced-it/;tx;q100;:x'
1
$ echo $?
100
https://www.gnu.org/software/sed/manual/html_node/Branching-and-flow-control.html

We have the answer above but it took some time for me work out what is happening. I am trying to provide a simple explanation for basic user of sed like me.
Lets consider the example:
echo "foo" | sed '/foo/!{q100}; {s/f/b/}'
Here we have two sed commands. First one is '/foo/!{q100}' This command actually check the pattern matching and return exist code 100 if no match. Consider following examples, -n is used to silent the output so we only get exist code.
This example foo matches so exit code return is 0
echo "foo" | sed -n '/foo/!{q100}'; echo $?
0
This example input is foo and we try match boo so no match and exit code 100 is returned
echo "foo" | sed -n '/boo/!{q100}'; echo $?
100
So if my requirement is only to check a pattern match or not I can use
echo "<input string>" | sed -n '/<pattern to match>/!{q<exit-code>}'
More examples:
echo "20200206" | sed -n '/[0-9]*/!{q100}' && echo "Matched" || echo "No Match"
Matched
echo "20200206" | sed -n '/[0-9]{2}/!{q100}' && echo "Matched" || echo "No Match"
No Match
Second command is '{s/f/b/}' is to replace the f in foo with b which I used many times.

Below is the pattern we use with sed -rn or sed -r.
The entire search and replace command ("s/.../.../...") is optional. If the search and replace is used, for speed and having already matched $matchRe, we use as fast a $searchRe value as possible, using . where the character does not need to be re-verified and .{$len} for fixed length sections of the pattern.
The return value for none found is $notFoundExit.
/$matchRe/{s/$searchRe/$replacement/$options; Q}; q$notFoundExit
For the following reasons:
No time wasted testing for both matched and unmatched case
No time wasted copying to or from buffers
No superfluous branches
Reasonable flexibility
Varying the case of Q commands will vary the behavior depending on when the exit should occur. Behaviors involving the application of Boolean logic to a multiple line input requires more complexity in the solution.

For any number of input lines:
sed --quiet 's/hello/HELLO/;t1;b2;:1;h;:2;p;${g;s/..*//;tok;q1;:ok}'
Fills hold space on match, and checks it after the last line.
Returns status 1 if no match in file.
s/hello/HELLO - substitution to check for
t1 - jump to label 1 if substitution succeeded
b2 - jump to label 2 unconditionally
:1 - label 1
h - copy pattern to hold space (when substitution succeeded)
:2 - label 2
p - print pattern space, unconditionally
${ ... } - match last line, evaluate block inside
g - copy hold space into pattern space (non-empty if first substitution succeded before)
s/..*// - dummy substitution, to set branch-flag
tok - jump to label ok (if dummy substitution succeeded on non-empty hold space)
q1 - exit with error status 1
:ok - label ok

As we already know, when sed fails to match then it simply returns its input string - no error has occurred. It is true that a difference between the input and output strings implies a match, but a match does not imply a difference in the strings; after all sed could have simply matched all of the input characters.
The flaw is created in the following example
h=$(echo "$g" | sed 's/.*\(abc[[:digit:]]\).*/\1/g')
if [ ! "$h" = "$g" ]; then
echo "1"
else
echo "2"
fi
where g=Xabc1 gives 1, while setting g=abc1 gives 2; yet both of these input strings are matched by sed! So, it can be hard to determine whether sed has matched or not. A solution:
h=$(echo "fix${g}ed" | sed 's/.*\(abc[[:digit:]]\).*/\1/g')
if [ ! "$h" = "fix${g}ed" ]; then
echo "1"
else
echo "2"
fi
in which case the 1 is printed if-and-only-if sed has matched.

I had wanted to truncate a file by quitting when the match was found (and exclude the matching line). This is handy when a process that adds lines at the end of the file may be re-run. "Q;Q1" didn't work but simply "Q1" did, as follows:
if sed -i '/text I wanted to find/Q1' file.txt
then
insert blank line at end of file + new lines
fi
insert just the new lines without the blank line

Can you explain this sed one-liner?

The following one liner prints out the content of the file in reverse
$ sed -n '1!G;h;$p' test.txt
How is it possible when sed reads the file line by line? Can you explain the meaning of
n flag
1!
G
h
and $p
in this command?

This will do the same job as tac, i.e. revert the order of rows.
Rewriting the sed script to pseudocode, it means:
$line_number = 1;
foreach ($input in $input_lines) {
// current input line is in $input
if ($line_number != 1) // 1!
$input = $input + '\n' + $hold; // G
$hold = $input; // h
$line_number++
}
print $input; // $p
As you can see, the sed language is very expressive :-) the 1! and $ are so called addresses, which put conditions when the command should be run. 1! means not on the first row, $ means at the end. Sed has one auxiliary memory register which is called hold space.
For more information type info sed on linux console (this is the best documentation).
-n disables the default print $input command in the loop itself.
The terms pattern space and hold space are equivalents of the variables $input and $hold (respectively) in this example.

n flag -> Disable auto-printing.
1! -> Any line except the first one.
G -> Append a newline and content of 'hold space' to 'pattern space'
h -> Replace content of 'hold space' with content of 'pattern space'
$ -> Last line.
p -> print
So, it means: Reverse the content of your file, as I understand it.
EDIT to add some explanation (thanks to potong, see his comment for the original one):
Addresses, like 1 and $ are bound to next commands, grouped using {...} or single without them. So in this case 1! applies to G and $ to p, whereas h is not attached to an address and applies to all addresses. That is $!G and $!{G} are the same.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

remove end of line characters with a bash script? - linux

Related

Use bash to find line in java files which include a pattern, and then replace another part of the line

Convert carriage return (\r) to actual overwrite

Remove a sequence of chars at the end of file including LF (linefeed)

Return value of sed for no match

Can you explain this sed one-liner?

Categories

Resources