Swap columns of a file - Linux (exact position, not word) - linux

I would like to know how to swap columns (the exact character) of a file with Linux (using cut, awk, sed or whatever you can help me with).
I have seen how to swap a whole expression (using delimiters) and whole words.
Example:
128934
38 2008
Swapping column 3 with 5:
123984
3802 08
Another way to ask this, would be swap the 3rd char of each row with the 5th.

You can do it with gawk, mawk, nawk and busybox awk with this non-posix complient example:
awk -v FS='' -v OFS='' '{ t=$3; $3=$5; $5=t } 1' infile
Output:
123984
3802 08

A bit unwieldy, with sed:
$ sed -E 's/^(..)(.)(.)(.)/\1\4\3\2/' infile
123984
3802 08
This captures the first five characters of each line in four groups and then rearranges them. -E is just there for convenience; without it, we have to escape the parentheses as in \(.\).

Related

How to multiple a number by 2 (double)present in a particular line number of a file, in Linux?

File_A
Name: John Smith
Grade: 8
Institute: Baldwin
Number of entries: 125
State: Texas
File_B
Name: David Buck
Grade: 9
Institute: High Prime
Number of entries: 123
State: California
There are many such similar files in which the Number of entries (present at line number 4 in all files) has to doubled.
For File_A it should be 250 and for File_B 246.
How to do this for all files in Linux?(using sed or awk or any other commands)
Tried commands:
sed -i '4s/$2/$2*2/g' *.txt (nothing happening from this)
awk "FNR==4 {sub($2,$2*2)}" *.txt (getting syntax error)
With your shown samples please try following awk code. Simple explanation would be look/search for string /Number of entries: and then multiply 2 into value of last field and save it within itself, then print lines by mentioning 1.
awk '/Number of entries:/{$NF = ($NF * 2)} 1' File_A File_B
Run above command it will print output on screen, once you are Happy with output and want to save output into Input_file itself then you can try awk's -inplace option(available in GNU awk 4.1+ version etc).
Also if your files extension is .txt then pass it to above awk program itself, awk can read multiple files itself.
This might work for you (GNU sed and shell):
sed -Ei '4s/(.* )(.*)/echo "\1$((\2*2))"/e' file1 file2 filen
For line four of each file input, split the values into two back references and echo back those values using shell arithmetic to double the second value.
N.B. The -i option allows for address of line four to be found in all input files and those files to be amended in situ.
Using sed
$ sed '/^Number of entries/s/[[:digit:]]\+/$((&*2))/;s/^/echo /e' input_file
I want to explain why what you have tried failed, firstly
sed -i '4s/$2/$2*2/g' *.txt
$ has not special meaning for GNU sed, that it is literal dollar sign, also GNU sed does not support arithmetic, so above command is: at 4th line replace dollar sign folowed by 2 using dollar sign followed by 2 followed by asterix followed by 2 and do so globally. You do not have literal $2 at 4th line of file which is firstly rammed so nothing happens.
awk "FNR==4 {sub($2,$2*2)}" *.txt
You should not use " for enclosing awk command unless you want to summon mind-boggling bugs. You should use ' in which case syntax error will be gone, however behavior will be not as desired. In order to do that your code might be reworked to
awk 'BEGIN{FS=OFS=": "}FNR==4{$2*=2}{print}' *.txt
Observe that I specify FS and OFS to inform GNU AWK that field are separated and should be separated by : rather than one-or-more whitespace characters (default) and do not use sub function (which is for working with regular expression), but rather simply increase 2 times operator (*=2) and I also print line, as without it output would be empty. If you want to know more about FS or OFS read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
mawk 'BEGIN{ _+=_^=FS=OFS="Number of entries: " } NF<_ || $NF *=_'
Name: John Smith
Grade: 8
Institute: Baldwin
Number of entries: 250
State: Texas
Name: David Buck
Grade: 9
Institute: High Prime
Number of entries: 246
State: California
Thank you for all the answers.
With your help I was able to figure out simple solution by understanding and combining your answers.
Here it is (which worked in my environment):
To display on terminal:
awk 'FNR==4 {sub($4,$4*2)} 1' File_A
To move to some file:
awk 'FNR==4 {sub($4,$4*2)} 1' File_A > temp_A
To perform changes inside file using inplace:
awk -i inplace 'FNR==4 {sub($4,$4*2)} 1' *.txt
$4 being 4th parameter in the line;
FNR==4 being the line number 4;
1 at the end helps in printing everything

Replacing characters in each line on a file in linux

I have a file with different word in each line.
My goal is to replace the first character to a capital letter and replace the 3rd character to "#".
For example: football will be exchanged to Foo#ball.
I tried thinking about using awk and sed.It didn't help me since (to my knowledge) sed needs an exact character input and awk can print the desired character but not change it.
With GNU sed and two s commands:
echo 'football' | sed -E 's/(.)/\U\1/; s/(...)./\1#/'
Output:
Foo#ball
See: 3.3 The s Command, 5.7 Back-references and Subexpressions and 5.9.2 Upper/Lower case conversion
This might work for you (GNU sed):
sed 's/\(...\)./\u\1#/' file
With bash you can use parameter expansions alone to accomplish the task. For example, if you read each line into the variable line, you can do:
line="${line^}" # change football to Football (capitalize 1st char)
line="${line:0:3}#${line:4}" # make 4th character '#'
Example Input File
$ cat file
football
soccer
baseball
Example Use/Output
$ while read -r line; do line="${line^}"; echo "${line:0:3}#${line:4}"; done < file
Foo#ball
Soc#er
Bas#ball
While shell is typically slower, when use is limited to builtins, it doesn't fall too far behind.
(note: your question says 3rd character, but your example replaces the 4th character with '#')
With GNU awk for the 3rd arg to match():
$ echo 'football' | awk 'match($0,/(.)(..).(.*)/,a){$0=toupper(a[1]) a[2] "#" a[3]} 1'
Foo#ball
Cyrus' or Potong's answers are the preferred ones. (For Linux or systems with GNU sed because of \U or \u.)
This is just an additional solution with awk because you mentioned it and used also awk tag:
$ echo 'football'|awk '{a=substr($0,1,1);b=substr($0,2,2);c=substr($0,5);print toupper(a)b"#"c}'
Foo#ball
This is a most simple solution without RegEx. It will also work on non-GNU awk.
This should work with any version of awk:
awk '{
for(i=1;i<=NF;i++){
# Note that string indexes start at 1 in awk !
$i=toupper(substr($i,1,1)) "" substr($i,2,1) "#" substr($i,3)
}
print
}' file
Note: If a word is less than 3 characters long, like it, it will be printed as It#
if your data in 'd' file, tried on gnu sed:
sed -E 's/^(\w)(\w\w)\w/\U\1\E\2#/' d

How To Delete All Words Before X Characters

I'm using code from this question How To Delete All Words After X Characters and I'm having a trouble keeping (not deleting) all the words after 30 characters.
Original code:
awk 'BEGIN{FS=OFS="" } length>30{i=30; while($i~/\w/) i++; NF=i-1; }1'
My attempt:
awk 'BEGIN{FS=OFS="" } length>30{i=30; while($i~/\w/) i++; NF=i+1; }1'
Basically, I understand I need to change the NF which was NF=i-1 so I tried changing it to NF=i+1 but obviously I'm only getting one field. How can I specify NF to print the rest of the line?
Sample data:
StackOverflow Users Are Brilliant And Hard Working
#character 30 ---------------^
Desired output:
And Hard Working
If you could please help me keep the rest of the line by using NF, I would really appreciate your positive input and support.
It is much easier using gnu grep:
grep -oP '^.{30}\w*\W*\K.*' file
And Hard Working
Where \K is used for reseting matched information.
RegEx Breakup:
^: Start
.{30}: Match first 30 characters
\w*: followed by 0 or more word characters
\W*: followed by 0 or more non-word characters
\K: reset matched information so far
.*: Match anything after this position
Using awk you can use this solution:
awk '{sub(/^.{30}[_[:alnum:]]*[[:blank:]]*/, "")} 1' file
And Hard Working
Finally a sed solution:
sed -E 's/^.{30}[_[:alnum:]]*[[:blank:]]*//' file
And Hard Working
another awk
awk '{print substr($0, index(substr($0,30),FS)+30)}'
find the delimiter index after the 30th char, take a substring from that index on.
I can't imagine why your considering anything related to NF for this since you're not doing anything with fields, you're just splitting each line at a blank char. It sounds like this is all you need for both questions, using GNU awk for gensub():
$ awk '{print gensub(/(.{30}\S*)\s+(.*)/,"\\1",1)}' file
StackOverflow Users Are Brilliant
$ awk '{print gensub(/(.{30}\S*)\s+(.*)/,"\\2",1)}' file
And Hard Working
or it's briefer using GNU sed:
$ sed -E 's/(.{30}\S*)\s+(.*)/\1/' file
StackOverflow Users Are Brilliant
$ sed -E 's/(.{30}\S*)\s+(.*)/\2/' file
And Hard Working
With the use of NF, you can try
awk '{for(i=1;i<=NF;i++){a+=length($i)+1;if(a>30){for(j=i+1;j<=NF;j++)b=b $j" ";print b;exit}}}'
cut -c30- file | cut -d' ' -f2-
this will keep only the words that start after 30th character (index >= 31)

Cut number from string

I want to cut several numbers from a .txt file to add them later up. Here is an abstract from the .txt file:
anonuser pts/25 127.0.0.1 Mon Nov 16 17:24 - crash (10+23:07)
I want to get the "10" before the "+" and I only want the number, nothing else. This number should be written to another .txt file. I used this code, but it only works if the number has one digit:
awk ' /^'anonuser' / {split($NF,k,"[(+0:)][0-9][0-9]");print k[1]} ' log2.txt > log3.txt
With GNU grep:
grep -Po '\(\K[^+]*' file > new_file
Output to new_file:
10
See: PCRE Regex Spotlight: \K
What if you use the match() function in awk?
$ awk '/^anonuser/ && match($NF,/^\(([0-9]*)/,a) {print a[1]}' file
10
How does this work?
/^anonuser/ && match() {print a[1]} if the line starts with anonuser and the pattern is found, print it.
match($NF,/^\(([0-9]*)/,a) in the last field ((10+23:07)), look for the string ( + digits and capture these in the array a[].
Note also that this approach allows you to store the values you capture, so that you can then sum them as you indicate in the question.
The following uses the same approach as the OP, and has a couple of advantages, e.g. it does not require anything special, and it is quite robust (with respect to assumptions about the input) and maintainable:
awk '/^anonuser/ {split($NF,k,/+/); gsub(/[^0-9]/,"",k[1]); print k[1]}'
for anything more complex use awk but for simple task sed is easy enough
sed -r '/^anonuser/{s/.*\(([0-9]+)\+.*/\1/}'
find the number between a ( and + sign.
I am not sure about the format in the file.
Can you use simple cut commands?
cut -d"(" -f2 log2.txt| cut -d"+" -f1 > log3.txt

How to do something like grep -B to select only one line?

Everything is in the title. Basicaly let's say I have this pattern
some text lalala
another line
much funny wow grep
I grep funny and I want my output to be "lalala"
Thank you
One possible answer is to use either ed or ex to do this (it is trivial in them):
ed - yourfile <<< 'g/funny/.-2p'
(Or replace ed with ex. You might have red, the restricted editor, too; it can't modify files.) This looks for the pattern /funny/ globally, and whenever it is found, prints the line 2 before the matching line (that's the .-2p part). Or, if you want the most recent line containing 'lalala' before the line matching 'funny':
ed - yourfile <<< 'g/funny/?lalala?p'
The only problem is if you're trying to process standard input rather than a file; then you have to save the standard input to a file and process that file, which spoils the concurrency.
You can't do negative offsets in sed (though GNU sed allows you to do positive offsets, so you could use sed -n '/lalala/,+2p' file to get the 'lalala' to 'funny' lines (which isn't quite what you want) based on finding 'lalala', but you cannot find the 'lalala' lines based on finding 'funny'). Standard sed does not allow offsets at all.
If you need to print just the IP address found on a line 8 lines before the pattern-matching line, you need a slightly more involved ed script, but it is still doable:
ed - yourfile <<< 'g/funny/.-8s/.* //p'
This uses the same basic mechanism to find the right line, then runs a substitute command to remove everything up to the last space on the line and print the modified version. Since there isn't a w command, it doesn't actually modify the file.
Since grep -B only prints each full number of lines before the match, you'll have to pipe the output into something like grep or Awk.
grep -B 2 "funny" file|awk 'NR==1{print $NF; exit}'
You could also just use Awk.
awk -v s="funny" '/[[:space:]]lalala$/{n=NR+2; o=$NF}NR==n && $0~s{print o}' file
For the specific example of an IP address 8 lines before the match as mentioned in your comment:
awk -v s="funny" '
/[[:space:]][0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/ {
n=NR+8
ip=$NF
}
NR==n && $0~s {
print ip
}' file
These Awk solutions first find the output field you might want, then print the output only if the word you want exists in the nth following line.
Here's an attempt at a slightly generalized Awk solution. It maintains a circular queue of the last q lines and prints the line at the head of the queue when it sees a match.
#!/bin/sh
: ${q=8}
e=$1
shift
awk -v q="$q" -v e="$e" '{ m[(NR%q)+1] = $0 }
$0 ~ e { print m[((NR+1)%q)+1] }' "${#--}"
Adapting to a different default (I set it to 8) or proper option handling (currently, you'd run it like q=3 ./qgrep regex file) as well as remembering (and hence printing) the entire line should be easy enough.
(I also didn't bother to make it work correctly if you see a match in the first q-1 lines. It will just print an empty line then.)

Resources