$paste num let
1 a
2 b
3 c
4 d
So when I do
$ cat num | paste - -
1 2
3 4
My question is why doesn't "cat num | paste - -" generate the output as:
1 1
2 2
3 3
4 4
Clearly, paste reads a line from the first 'file' (which is standard input), and then a line from the second 'file' (which is also standard input) and pastes them to create the first line of output. Then it repeats.
The POSIX specification for paste covers the point explicitly:
If '-' is specified for one or more of the files, the standard input shall be used; the standard input shall be read one line at a time, circularly, for each instance of '-'.
You could use "paste num num" to generate
1 1
2 2
3 3
4 4
t1=$(cat 1.txt)
t2=$(cat 2.txt)
echo "$t1" | paste - <(echo "$t2")
GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
My text file should be of two columns separated by a tab-space (represented by \t) as shown below. However, there are a few corrupted values where column 1 has two values separated by a space (represented by \s).
My objective is to create a table as follows:
i.e. discard the 2nd value that is present after the space in column 1 for eg. in C\sx\t3 I can discard the x that is present after space and store the columns as C\t3.
I have tried a couple of things but with no luck.
I tried to cut the cols based on \t into independent columns and then cut the first column based on \s and join them again. However, it did not work.
Here is the snippet:
col1=(cut -d$'\t' -f1 $file | cut -d' ' -f1)
col2=(cut -d$'\t' -f1 $file)
echo "#{col1[$idx]} #{col2[$idx]}"
# I will append to myArr here
The output is appending the list of col2 to the col1 as A B C D E 1 2 3 4 5. And on top of this, my file is very huge i.e. 5,300,000 rows so I would like to avoid looping over all the records and appending them one by one.
Any advice is very much appreciated.
Thank you. :)
And another sed solution:
Search and replace any literal space followed by any number of non-TAB-characters with nothing.
sed -E 's/ [^\t]+//' file
A 1
B 2
C 3
D 4
E 5
If there could be more than one actual space in there just make it 's/ +[^\t]+//' ...
Assuming that when you say a space you mean a blank character then using any awk:
awk 'BEGIN{FS=OFS="\t"} {sub(/ .*/,"",$1)} 1' file
Solution using Perl regular expressions (for me they are easier than seds, and more portable as there are few versions of sed)
$ cat ls
A 1
B 2
C x 3
D 4
E y 5
$ cat ls |perl -pe 's/^(\S+).*\t(\S+)/$1 $2/g'
A 1
B 2
C 3
D 4
E 5
This code gets all non-empty characters from the front and all non-empty characters from after \t
sed $'s/^\\([^ \t]*\\) [^\t]*/\\1/' file
The ANSI-C Quoting ($'...') feature of Bash is used to make tab characters visible as \t.
take advantage of FS and OFS and let them do all the hard work for you
{m,g}awk NF=NF FS='[ \t].*[ \t]' OFS='\t'
A 1
B 2
C 3
D 4
E 5
if there's a chance of leading edge or trailing edge spaces and tabs, then perhaps
mawk 'NF=gsub("^[ \t]+|[ \t]+$",_)^_+!_' OFS='\t' RS='[\r]?\n'
Linux's sys filesystem represents sets of CPU ids with the syntax:
0,2,8: Set of CPUs containing 0, 2 and 8.
4-6: Set of CPUs containing 4, 5 and 6.
Both syntaxes can be mixed and matched, for example: 0,2,4-6,8
For example, running cat /sys/devices/system/cpu/online prints 0-3 on my machine which means CPUs 0, 1, 2 and 3 are online.
The problem is the above syntax is difficult to iterate over using a for loop in a shell script. How can the above syntax be converted to one more conventional such as 0 2 4 5 6 8?
$ echo 0,2,4-6,8 | awk '/-/{for (i=$1; i<=$2; i++)printf "%s%s",i,ORS;next} 1' ORS=' ' RS=, FS=-
0 2 4 5 6 8
This can be used in a loop as follows:
for n in $(echo 0,2,4-6,8 | awk '/-/{for (i=$1; i<=$2; i++)printf "%s%s",i,ORS;next} 1' RS=, FS=-)
echo cpu="$n"
Which produces the output:
Or like:
printf "%s" 0,2,4-6,8 | awk '/-/{for (i=$1; i<=$2; i++)printf "%s%s",i,ORS;next} 1' RS=, FS=- | while read n
echo cpu="$n"
Which also produces:
How it works
The awk command works as follows:
This tells awk to use , as the record separator.
If, for example, the input is 0,2,4-6,8, then awk will see four records: 0 and 2 and 4-6 and 8.
This tells awk to use - as the field separator.
With FS set this way and if, for example, the input record consists of 2-4, then awk will see 2 as the first field and 4 as the second field.
/-/{for (i=$1; i<=$2; i++)printf "%s%s",i,ORS;next}
For any record that contains -, we print out each number starting with the value of the first field, $1, and ending with the value of the second field, $2. Each such number is followed by the Output Record Separator, ORS. By default, ORS is a newline character. For some of the examples above, we set ORS to a blank.
After we have printed these numbers, we skip the rest of the commands and jump to the next record.
If we get here, then the record did not contain - and we print it out as is. 1 is awk's shorthand for print-the-line.
A Perl one:
echo "0,2,4-6,8" | perl -lpe 's/(\d+)-(\d+)/{$1..$2}/g; $_="echo {$_}"' | bash
Just convert the original string into echo {0,2,{4..6},8} and let bash 'brace expansion' to interpolate it.
eval echo $(cat /sys/devices/system/cpu/online | sed 's/\([[:digit:]]\+\)-\([[:digit:]]\+\)/$(seq \1 \2)/g' | tr , ' ')
cat /sys/devices/system/cpu/online reads the file from sysfs. This can be changed to any other file such as offline.
The output is piped through the substitution s/\([[:digit:]]\+\)-\([[:digit:]]\+\)/$(seq \1 \2)/g. This matches something like 4-6 and replaces it with $(seq 4 6).
tr , ' ' replaces all commas with spaces.
At this point, the input 0,2,4-6,8 is transformed to 0 2 $(seq 4 6) 8. The final step is to eval this sequence to get 0 2 4 5 6 8.
The example echo's the output. Alternatively, it can be written to a variable or used in a for loop.
I have two files -
File 1:
2 923000026531
1 923000031178
2 923000050000
1 923000050278
1 923000051178
1 923000060000
File 2:
2 923000050000
3 923000050278
1 923000051178
1 923000060000
4 923000026531
1 923335980059
I want to achieve the following using awk:
1- If 2nd field is same, sum the 1st field and print it.
2- If 2nd field is not same, print the line as it is. This will have two cases.
2(a) If 2nd field is not same & record belongs to first file
2(b) If 2nd field is not same & record belongs to second file
I have achieved the following using this command:
Command: gawk 'FNR==NR{f1[$2]=$1;next}$2 in f1{print f1[$2]+$1,$2}!($2 in f1){print $0}' f1 f2
4 923000050000
4 923000050278
2 923000051178
2 923000060000
6 923000026531
1 923335980059
However, this doesn't contains the records which were in first file & whose second field didn't match that of the second file i.e. case 2(a), to be more specific, the following record is not present in the final file:
1 923000031178
I know there are multiple work around using extra commands but I am interested if this can be somehow done in the same command.
give this one-liner a try:
$ awk '{a[$2]+=$1}END{for(x in a)print a[x], x}' f1 f2
2 923000060000
2 923000051178
1 923000031178
6 923000026531
4 923000050278
4 923000050000
1 923335980059
Can someone help me how to write a piece of command that will insert some text in multiple places (given column and row) of a given file that already contains data. For example: old_data is a file that contains:
And I wish to get new_data that will contain:
A 1
I read something about awk and sed commands, but I don't believe to understand how to incorporate these, to get what I want.
I would like to add up, that this command I would like to use as a part of script
for b in ./*/ ; do (cd "$b" && command); done
If we imagine content of old_data as a matrix of elements {An*m} where n corresponds to number of row and m to number of column of this matrix, I wish to manipulate with matrix so that I could add new elements. A in old-data has coordinates (1,1). In new_data therefore, I wish to assign 1 to a matrix element that has coordinates (1,3).
If we compare content of old_data and new_data we see that (1,2) element corresponds to space (it is empty).
It's not at all clear to me what you are asking for, but I suspect you are saying that you would like a way to insert some given text in to a particular row and column. Perhaps:
$ cat input
$ row=2 column=2 text="This is some new data"
$ awk 'NR==row {$column = new_data " " $column}1' row=$row column=$column new_data="$text" input
B This is some new data
This bash & unix tools code works:
# make the input files.
echo {A..D} | tr ' ' '\n' > abc ; echo {1..4} | tr ' ' '\n' > 123
# print as per previous OP spec
head -1q abc 123 ; paste abc 123 123 | tail -n +2
B 2 2
C 3 3
D 4 4
Version #3, (using commas as more visible separators), as per newest OP spec:
# for the `sed` code change the `2` to whatever column needs deleting.
paste -d, abc 123 123 | sed 's/[^,]*//2'
The same, with tab delimiters (less visually obvious):
paste abc 123 123 | sed 's/[^\t]*//2'
A 1
B 2
C 3
D 4
I have two files of one column each
I want to write a unique file with both elements as
1 4
2 5
3 6
It should be really simple I think with awk.
You could try paste -d ' ' <file1> <file2>. (Without -d ' ' the delimiter would be tab.)
paste works okay for the example given but it doesn't handle variable length lines very well. A nice little-know core-util pr provides a more flexible solution:
$ pr -mtw 4 file1 file2
1 4
2 5
3 6
A variable length example:
$ pr -mtw 22 file1 file2
10 4
200 5
300,000,00 6
And since you asked about awk here is one way:
$ awk '{a[FNR]=a[FNR]$0" "}END{for(i=1;i<=length(a);i++)print a[i]}' file1 file2
1 4
2 5
3 6
Using awk
awk 'NR==FNR { a[FNR]=$0;next } { print a[FNR],$0 }' file{1,2}
NR==FNR will ensure our first action statement runs for first file only.
a[FNR]=$0 with this we are inserting first file into array a indexed at line number
Once first file is complete we move to second action
Here we print each line of first file along with second file