Sed String manipulation

Sed String manipulation - linux

I need help
From a list I would like to get the addition of characters like the example below:
Start:
1
1
13
5
14
4
1
5
12
7
8
9
4
18
3
20
11
17
13
===============================================
Final results :
9001
9001
9013
9005
9014
9004
9001
9005
9012
9007
9008
9009
9004
9018
9003
9020
9011
9017
9013
this command does not work:
sed "s/^/9000/g" file.txt

This might work for you (GNU sed):
sed -r 's/^/0000/;s/^0*(.{3})$/9\1/' file
Prepend zeroes to the front of the number. Prepend a 9 and remove excess zeroes.

This should work for you.
for num in `cat file.txt`; do if [ $num -le 9000 ]; then echo "$(($num + 9000))"; else echo $num; fi; done

You can do it like this:
for num in 1 1 13 5 14 4 1 5 12 7 8 9 4 18 3 20 11 17; do echo "$(($num + 9000))"; done
If you also have numbers in the list, which you don't want to process, because they are already in the 90XX format you can throw in an if statement:
for num in 1 1 13 5 14 4 1 5 12 7 8 9 4 18 3 20 11 17 9005; do if [ $(($num)) -le 9000 ]; then echo "$(($num + 9000))"; else echo $num; fi; done
For loop in bash - for; do; done;
Bash arithmetic expression to add the numbers - $((EXPR))

You can try (GNU sed):
sed 's/.*/echo $((9000+&))/e' infile

Related

Linux execute php file with arguments

I have .php takes three parameters. For example: ./execute.php 11 111 111
I have like list of data in text file with spacing. For example:
22 222 222
33 333 333
44 444 444
I was thinking for using xargs to pass in the arguements but its not working.
here is my try
cat raw.txt | xargs -I % ./execute.php %0 %1 %2
doesn't work, any idea?
thanks for the help

As per the following transcript, you are not handling the data correctly:
pax> printf '2 22 222\n3 33 333\n4 44 444\n' | xargs -I % echo %0 %1 %2
2 22 2220 2 22 2221 2 22 2222
3 33 3330 3 33 3331 3 33 3332
4 44 4440 4 44 4441 4 44 4442
Each % is giving you the entire line, and the digit following the % is just tacked on to the end.
To investigate, lets first create a fake processing file proc.sh (and chmod 700 it so we can run it easily):
#!/usr/bin/env bash
echo "$# '$1' '$2' '$3'"
Even if you switch to xargs -I % ./proc.sh %, you'll find you get one argument with embedded spaces, not three individual arguments:
pax> vi proc.sh ; printf '2 22 222\n3 33 333\n4 44 444\n' | xargs -I % ./proc.sh %
1 '2 22 222' '' ''
1 '3 33 333' '' ''
1 '4 44 444' '' ''
The easiest solution is probably to switch to a for read loop, something like:
pax:~> printf '2 22 222\n3 33 333\n4 44 444\n' | while read p1 p2 p3 ; do ./proc.sh ${p1} ${p2} ${p3} ; done
3 '2' '22' '222'
3 '3' '33' '333'
3 '4' '44' '444'
You can see there the program is called with three arguments, you just have to adapt it to your own program:
while read p1 p2 p3 ; do ./proc.sh ${p1} ${p2} ${p3} ; done < raw.txt

How to sort a group of data in a columnwise manner?

I have a group of data like the attached raw data, when I sort the raw data by sort -n , the data were sorted line by line, the output looks like this:
3 6 9 22
2 3 4 5
1 7 16 20
I want to sort the data in a columnwise manner, the output would look like this:
1 2 4 3
3 6 9 16
5 7 20 22
Ok, I did try something.
My primary ideal is to extract the data columnwise and then sort and then paste them, but I can't get through. Here is my script:
for ((i=1; i<=4; i=i+1))
do
awk '{print $i}' file | sort -n >>output
done
The output:
1 7 20 16
3 6 9 22
5 2 4 3
1 7 20 16
3 6 9 22
5 2 4 3
1 7 20 16
3 6 9 22
5 2 4 3
1 7 20 16
3 6 9 22
5 2 4 3
It seems that $i is unchangeable and equals to $0
Thanks a lot.
raw data1
3 6 9 22
5 2 4 3
1 7 20 16
raw data2
488.000000 1236.000000 984.000000 2388.000000 788.000000 704.000000
600.000000 1348.000000 872.000000 2500.000000 900.000000 816.000000
232.000000 516.000000 1704.000000 1668.000000 68.000000 16.000000
244.000000 504.000000 1716.000000 1656.000000 56.000000 28.000000
2340.000000 3088.000000 868.000000 4240.000000 2640.000000 2556.000000
2588.000000 3336.000000 1116.000000 4488.000000 2888.000000 2804.000000

Let me introduce a flexible solution using cut and sort that you can use on any M,N size tab delimited input matrix.
$ cat -vTE data_to_sort.in
3^I6^I9^I22$
5^I2^I4^I3$
1^I7^I20^I16$
$ col=4; line=3;
$ for i in $(seq ${col}); do cut -f$i data_to_sort.in |\
> sort -n; done | paste $(for i in $(seq ${line}); do echo -n "- "; done) |\
> datamash transpose
1 2 4 3
3 6 9 16
5 7 20 22
If the input file is not \t delimited you need to define proper delimiter to using -d"$DELIM_CHAR" have the cut working properly.
for i in $(seq ${col}); do cut -f$i data_to_sort.in | sort -n; done will separate each column of the file and sort it
paste $(for i in $(seq ${line}); do echo -n "- "; done) the paste column will then recreate a matrix structure
datamash transpose is needed to transpose the intermediate matrix
Thanks to the feedback from Sundeep, let me introduce to you a better solution using pr instead of paste command to generate the columns:
$ col=4; line=3
$ for i in $(seq ${col}); do cut -f$i data_to_sort.in |\
> sort -n; done | pr -${line}ats | datamash transpose
Last but not least,
$ col=4; for i in $(seq ${col}); do cut -f$i data_to_sort.in |\
> sort -n; done | pr -${col}ts
1 2 4 3
3 6 9 16
5 7 20 22
The following solution will allow us to not use datamash at all!!!
(many thanks to Sundeep)
Proof that is working for the skeptics and the downvoters...
2nd run with 6 columns:
$ col=6; for i in $(seq ${col}); do cut -f$i <(sed 's/^ \+//g;s/ \+/\t/g' data2) | sort -n; done | pr -${col}ts | tr '\t' ' '
232.000000 504.000000 868.000000 1656.000000 56.000000 16.000000
244.000000 516.000000 872.000000 1668.000000 68.000000 28.000000
488.000000 1236.000000 984.000000 2388.000000 788.000000 704.000000
600.000000 1348.000000 1116.000000 2500.000000 900.000000 816.000000
2340.000000 3088.000000 1704.000000 4240.000000 2640.000000 2556.000000
2588.000000 3336.000000 1716.000000 4488.000000 2888.000000 2804.000000

awk to the rescue!!
awk '{f1[NR]=$1; f2[NR]=$2; f3[NR]=$3; f4[NR]=$4}
END{asort(f1); asort(f2); asort(f3); asort(f4);
for(i=1;i<=NR;i++) print f1[i],f2[i],f3[i],f4[i]}' file
1 2 4 3
3 6 9 16
5 7 20 22
there may a smarter way of doing this as well...

Shellscript: Is it possible to format seq to display n numbers per line?

Is it possible to format seq in a way that it will display the range desired but with N numbers per line?
Let say that I want seq 20 but with the following output:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
My next guess would be a nested loop but I'm not sure how...
Any help would be appreciated :)

Use can use awk to format it as per your needs.
$ seq 20 | awk '{ORS=NR%5?FS:RS}1'
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
ORS is awk's built-in variable which stands for Output Record Separator and has a default value of \n. NR is awk's built-in variable which holds the line number. FS is built-in variable that stands for Field Separator and has the default value of space. RS is built-in variable that stands for Record Separator and has the default value of \n.
Our action which is a ternary operator, to check if NR%5 is true. When it NR%5 is not 0 (hence true) it uses FS as Output Record Separator. When it is false we use RS which is newline as Output Record Separator.
1 at the end triggers awk default action that is to print the line.

You can use xargs to limit the sequence displayed per line.
$ seq 20 | xargs -n 5
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
The parameter -n 5 tells xargs to only display 5 sequence numbers.
If you have bash you can use the builtin sequence.
echo {1..20} | xargs -n 5

Using Bash:
while read num; do
((num % 5)) && printf "$line " || echo "$line"
done < <(seq 20)
Or:
for i in {1..20}; do
s+="$i "
if ! ((i % 5)); then
echo $s
s=""
fi
done

How can I separate some repeated patterns in a row into multiple rows using bash script?

I have some problem with bash script.
I've got a string which has some repeated patterns like this.
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 ...
Each fields is separated by tab key.
I want it to look like this...
1 2 3 4
1 2 3 4
1 2 3 4
…
How can I solve this problem using bash script like cut, sed, awk ... ?
I've tried some command like cut -f 'seq 4, 4, 40' example.txt
It doesn't work...
It looks very easy but so difficult to me...

You can use sed like this:
s='1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4'
p='1 2 3 4'
echo "$s"|sed "s/$p\s*/&\n/g"
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
Live Demo: http://ideone.com/P59OCJ

Here's a pure bash solution:
IFS=$'\t' set -- $(<input_file)
seen=()
while [[ $1 ]]; do
if (( ${seen[$1]} )); then # If we've seen the value before, start a new line.
echo
unset seen
fi
printf '%s ' "$1"
seen[$1]=1
shift
done

If you know the ending number of your sequence beforehand, you can do something like:
LAST_NUMBER=4
sed -e "s/$LAST_NUMBER\t*/&\n/g" < example.txt
Just replace 4 with the last number from the sequence
If you don't know the number, you have to search through it using the following:
#!/bin/bash
declare -A CHECKED_NUMBERS
LAST_NUMBER=
while read LINE; do
SPLIT_LINE=$(cut -d" " -f1- <<< "$LINE")
for number in $SPLIT_LINE; do
if [ "${CHECKED_NUMBERS[$number]}" == "1" ]; then
LAST_NUMBER=$number
else
CHECKED_NUMBERS[$number]=1
fi
done
done < example.txt
# do the replacement
sed -e "s/$LAST_NUMBER\t*/&\n/g" < example.txt

An awk version
awk '{for (i=1;i<=NF;i++) {printf "%s"(i%4?" ":"\n"),$i}}' file
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
An gnu awk version
awk -v RS="\t" '{printf "%s"(NR%4?" ":"\n"),$0}' file
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4

xargs may help:
kent$ echo "1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4"|xargs -n4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4

This might work for you:
printf "%s\t%s\t%s\t%s\n" $string
or you want the fields space separated:
printf "%s %s %s %s\n" $string

How to extract every N columns and write into new files?

I've been struggling to write a code for extracting every N columns from an input file and write them into output files according to their extracting order.
(My real world case is to extract every 800 columns from a total 24005 columns file starting at column 6, so I need a loop)
In a simpler case below, extracting every 3 columns(fields) from an input file with a start point of the 2nd column.
for example, if the input file looks like:
aa 1 2 3 4 5 6 7 8 9
bb 1 2 3 4 5 6 7 8 9
cc 1 2 3 4 5 6 7 8 9
dd 1 2 3 4 5 6 7 8 9
and I want the output to look like this:
output_file_1:
1 2 3
1 2 3
1 2 3
1 2 3
output_file_2:
4 5 6
4 5 6
4 5 6
4 5 6
output_file_3:
7 8 9
7 8 9
7 8 9
7 8 9
I tried this, but it doesn't work:
awk 'for(i=2;i<=10;i+a) {{printf "%s ",$i};a=3}' <inputfile>
It gave me syntax error and the more I fix the more problems coming out.
I also tried the linux command cut but while I was dealing with large files this seems effortless. And I wonder if cut would do a loop cut of every 3 fields just like the awk.
Can someone please help me with this and give a quick explanation? Thanks in advance.

Actions to be performed by awk on the input data must be included in curled braces, so the reason the awk one-liner you tried results in a syntax error is that the for cycle does not respect this rule. A syntactically correct version will be:
awk '{for(i=2;i<=10;i+a) {printf "%s ",$i};a=3}' <inputfile>
This is syntactically correct (almost, see end of this post.), but does not do what you think.
To separate the output by columns on different files, the best thing is to use awk redirection operator >. This will give you the desired output, given that your input files always has 10 columns:
awk '{ print $2,$3,$4 > "file_1"; print $5,$6,$7 > "file_2"; print $8,$9,$10 > "file_3"}' <inputfile>
mind the " " to specify the filenames.
EDITED: REAL WORLD CASE
If you have to loop along the columns because you have too many of them, you can still use awk (gawk), with two loops: one on the output files and one on the columns per file. This is a possible way:
#!/usr/bin/gawk -f
BEGIN{
CTOT = 24005 # total number of columns, you can use NF as well
DELTA = 800 # columns per file
START = 6 # first useful column
d = CTOT/DELTA # number of output files.
}
{
for ( i = 0 ; i < d ; i++)
{
for ( j = 0 ; j < DELTA ; j++)
{
printf("%f\t",$(START+j+i*DELTA)) > "file_out_"i
}
printf("\n") > "file_out_"i
}
}
I have tried this on the simple input files in your example. It works if CTOT can be divided by DELTA. I assumed you had floats (%f) just change that with what you need.
Let me know.
P.s. going back to your original one-liner, note that the loop is an infinite one, as i is not incremented: i+a must be substituted by i+=a, and a=3 must be inside the inner braces:
awk '{for(i=2;i<=10;i+=a) {printf "%s ",$i;a=3}}' <inputfile>
this evaluates a=3 at every cycle, which is a bit pointless. A better version would thus be:
awk '{for(i=2;i<=10;i+=3) {printf "%s ",$i}}' <inputfile>
Still, this will just print the 2nd, 5th and 8th column of your file, which is not what you wanted.

awk '{ print $2, $3, $4 >"output_file_1";
print $5, $6, $7 >"output_file_2";
print $8, $9, $10 >"output_file_3";
}' input_file
This makes one pass through the input file, which is preferable to multiple passes. Clearly, the code shown only deals with the fixed number of columns (and therefore a fixed number of output files). It can be modified, if necessary, to deal with variable numbers of columns and generating variable file names, etc.
(My real world case is to extract every 800 columns from a total 24005 columns file starting at column 6, so I need a loop)
In that case, you're correct; you need a loop. In fact, you need two loops:
awk 'BEGIN { gap = 800; start = 6; filebase = "output_file_"; }
{
for (i = start; i < start + gap; i++)
{
file = sprintf("%s%d", filebase, i);
for (j = i; j <= NF; j += gap)
printf("%s ", $j) > file;
printf "\n" > file;
}
}' input_file
I demonstrated this to my satisfaction with an input file with 25 columns (numbers 1-25 in the corresponding columns) and gap set to 8 and start set to 2. The output below is the resulting 8 files pasted horizontally.
2 10 18 3 11 19 4 12 20 5 13 21 6 14 22 7 15 23 8 16 24 9 17 25
2 10 18 3 11 19 4 12 20 5 13 21 6 14 22 7 15 23 8 16 24 9 17 25
2 10 18 3 11 19 4 12 20 5 13 21 6 14 22 7 15 23 8 16 24 9 17 25
2 10 18 3 11 19 4 12 20 5 13 21 6 14 22 7 15 23 8 16 24 9 17 25

With GNU awk:
$ awk -v d=3 '{for(i=2;i<NF;i+=d) print gensub("(([^ ]+ +){" i-1 "})(([^ ]+( +|$)){" d "}).*","\\3",""); print "----"}' file
1 2 3
4 5 6
7 8 9
----
1 2 3
4 5 6
7 8 9
----
1 2 3
4 5 6
7 8 9
----
1 2 3
4 5 6
7 8 9
----
Just redirect the output to files if desired:
$ awk -v d=3 '{sfx=0; for(i=2;i<NF;i+=d) print gensub("(([^ ]+ +){" i-1 "})(([^ ]+( +|$)){" d "}).*","\\3","") > ("output_file_" ++sfx)}' file
The idea is just to tell gensub() to skip the first few (i-1) fields then print the number of fields you want (d = 3) and ignore the rest (.*). If you're not printing exact multiples of the number of fields you'll need to massage how many fields get printed on the last loop iteration. Do the math...
Here's a version that'd work in any awk. It requires 2 loops and modifies the spaces between fields but it's probably easier to understand:
$ awk -v d=3 '{sfx=0; for(i=2;i<=NF;i+=d) {str=fs=""; for(j=i;j<i+d;j++) {str = str fs $j; fs=" "}; print str > ("output_file_" ++sfx)} }' file

I was successful using the following command line. :) It uses a for loop and pipes the awk program into it's stdin using -f -. The awk program itself is created using bash variable math.
for i in 0 1 2; do
echo "{print \$$((i*3+2)) \" \" \$$((i*3+3)) \" \" \$$((i*3+4))}" \
| awk -f - t.file > "file$((i+1))"
done
Update: After the question has updated I tried to hack a script that creates the requested 800-cols-awk script dynamically ( a version according to Jonathan Lefflers answer) and pipe that to awk. Although the scripts looks good (for me ) it produces an awk syntax error. The question is, is this too much for awk or am I missing something? Would really appreciate feedback!
Update: Investigated this and found documentation that says awk has a lot af restrictions. They told to use gawk in this situations. (GNU's awk implementation). I've done that. But still I'll get an syntax error. Still feedback appreciated!
#!/bin/bash
# Note! Although the script's output looks ok (for me)
# it produces an awk syntax error. is this just too much for awk?
# open pipe to stdin of awk
exec 3> >(gawk -f - test.file)
# verify output using cat
#exec 3> >(cat)
echo '{' >&3
# write dynamic script to awk
for i in {0..24005..800} ; do
echo -n " print " >&3
for (( j=$i; j <= $((i+800)); j++ )) ; do
echo -n "\$$j " >&3
if [ $j = 24005 ] ; then
break
fi
done
echo "> \"file$((i/800+1))\";" >&3
done
echo "}"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sed String manipulation - linux

This might work for you (GNU sed): sed -r 's/^/0000/;s/^0*(.{3})$/9\1/' file Prepend zeroes to the front of the number. Prepend a 9 and remove excess zeroes.

This should work for you. for num in `cat file.txt`; do if [ $num -le 9000 ]; then echo "$(($num + 9000))"; else echo $num; fi; done

You can try (GNU sed): sed 's/.*/echo $((9000+&))/e' infile

Related

Linux execute php file with arguments

How to sort a group of data in a columnwise manner?

Shellscript: Is it possible to format seq to display n numbers per line?

How can I separate some repeated patterns in a row into multiple rows using bash script?

How to extract every N columns and write into new files?

Categories

Resources