Unix command to remove everything after first column - linux

I have a text file in which I have something like this-
10.2.57.44 56538154 3028
120.149.20.197 28909678 3166
10.90.158.161 869126135 6025
In that text file, I have around 1,000,000 rows exactly as above. I am working in SunOS environment. I needed a way to remove everything from that text file leaving only IP Address (first column in the above text file is IP Address). So after running some unix command, file should look like something below.
10.2.57.44
120.149.20.197
10.90.158.161
Can anyone please help me out with some Unix command that can remove all the thing leaving only IP Address (first column) and save it back to some file again.
So output should be something like this in some file-
10.2.57.44
120.149.20.197
10.90.158.161

If delimiter is space character use
cut -d " " -f 1 filename
If delimiter is tab character , no need for -d option as tab is default delimiter for cut command
cut -f 1 filename
-d
Delimiter; the character immediately following the -d option is the field delimiter .
-f
Specifies a field list, separated by a delimiter

nawk '{print $1}' file > newFile && mv newFile file
OR
cut -f1 file > newFile && mv newFile file
As you're using SunOS, you'll want to get familiar with nawk (not awk, which is the old, and cranky version of awk, while nawk= new awk ;-).
In either case, you're printing the first field in the file to newFile.
(n)awk is a complete programming language designed for the easy manipulation of text files. The $1 means the first field on each line, $9 would mean the ninth field, etc, while $0 means the whole line. You can tell (n)awk what to use to separate the fields by, it might be a tab char, or a '|' char, or multiple spaces. By default, all versions of awk uses white space, i.e. multiple spaces, or 1 tab to delimit the columns/fields, per line in a file.
For a very good intro to awk, see Grymoire's Awk page
The && means, execute the next command only if the previous command finished without a problem. This way you don't accidentally erase your good data file, becuase of some error.
IHTH

If you have vim , open the file with it. Then in command mode write for substitution (tab or space or whatever is the delimiter) %s:<delimiter>.*$::g. Now save the file with :wq.
Using sed give command like this sed -e 's/<delimiter>.*$//' > file.txt

How about a perl script ;)
#!/usr/bin/perl -w
use strict;
my $file = shift;
die "Missing file or can't read it" unless $file and -r $file;
sub edit_in_place
{
my $file = shift;
my $code = shift;
{
local #ARGV = ($file);
local $^I = '';
while (<>) {
&$code;
}
}
}
edit_in_place $file, sub {
my #columns = split /\s+/;
print "$columns[0]\n";
};
This will edit the file in place since you say it is a large one. You can also create a backup by modifying local $^I = ''; to local $^I = '.bak';

Try this
awk '{$1=$1; print $1}' temp.txt
Output
10.2.57.44
120.149.20.197
10.90.158.161

awk '{ print $1 }' file_name.txt > tmp_file_name.txt
mv tmp_file_name.txt file_name.txt
'> tmp_file_name.txt' means redirecting STDOUT of awk '{ print $1 }' file_name.txt to a file named tmp_file_name.txt
FYI :
$1 means first column based on delimiter. The default delimiter is whitespace
$2 means second column based on delimiter. The default delimiter is whitespace
..
..
$NR means last column based on delimiter. The default delimiter is whitespace
If you want to change delimiter, use awk with -F

Related

How to get first word of every line and pipe it into dmenu script

I have a text file like this:
first state
second state
third state
Getting the first word from every line isn't difficult, but the problem comes when adding the extra \n required to separate every word (selection) in dmenu, per its syntax:
echo -e "first\nsecond\nthird" | dmenu
I haven't been able to figure out how to add the separating \n. I've tried this:
state=$(awk '{for(i=1;i<=NF;i+=2)print $(i)'\n'}' text.txt)
But it doesn't work. I also tried this:
lol=$(grep -o "^\S*" states.txt | perl -ne 'print "$_"')
But same deal. Not sure what I'm doing wrong.
Your problem is in the AWK script. You need to identify each input line as a record. This way, you can control how each record in the output is separated via the ORS variable (output record separator). By default this separator is the newline, which should be good enough for your purpose.
Now to print the first word of every input record (each line in the input stream in this case), you just need to print the first field:
awk '{print $1}' textfile | dmenu
If you need the output to include the explicit \n string (not the control character), then you can just overwrite the ORS variable to fit your needs:
awk 'BEGIN{ORS="\\n"}{print $1}' textfile | dmenu
This could be more easily done in while loop, could you please try following. This is simple, while is reading the file and during that its creating 2 variables 1st is named first and other is rest first contains first field which we are passing to dmenu later inside.
while read first rest
do
dmenu "$first"
done < "Input_file"
Based on the text file example, the following should achieve what you require:
awk '{ printf "%s\\n",$1 }' textfile | dmenu
Print the first space separated field of each line along with \n (\n needs to be escaped to stop it being interpreted by awk)
In your code
state=$(awk '{for(i=1;i<=NF;i+=2)print $(i)'\n'}' text.txt)
you attempted to use ' inside your awk code, however code is what between ' and first following ', therefore code is {for(i=1;i<=NF;i+=2)print $(i) and this does not work. You should use " for strings inside awk code.
If you want to merely get nth column cut will be enough in most cases, let states.txt content be
first state
second state
third state
then you can do:
cut -d ' ' -f 1 states.txt | dmenu
Explanation: treat space as delimiter (-d ' ') and get 1st column (-f 1)
(tested in cut (GNU coreutils) 8.30)

Bash: Read in file, edit line, output to new file

I am new to linux and new to scripting. I am working in a linux environment using bash. I need to do the following things:
1. read a txt file line by line
2. delete the first line
3. remove the middle part of each line after the first
4. copy the changes to a new txt file
Each line after the first has three sections, the first always ends in .pdf and the third always begins with R0 but the middle section has no consistency.
Example of 2 lines in the file:
R01234567_High Transcript_01234567.pdf High School Transcript R01234567
R01891023_Application_01891023127.pdf Application R01891023
Here is what I have so far. I'm just reading the file, printing it to screen and copying it to another file.
#! /bin/bash
cd /usr/local/bin;
#echo "list of files:";
#ls;
for index in *.txt;
do echo "file: ${index}";
echo "reading..."
exec<${index}
value=0
while read line
do
#value='expr ${value} +1';
echo ${line};
done
echo "read done for ${index}";
cp ${index} /usr/local/bin/test2;
echo "file ${index} moved to test2";
done
So my question is, how can I delete the middle bit of each line, after .pdf but before the R0...?
Using sed:
sed 's/^\(.*\.pdf\).*\(R0.*\)$/\1 \2/g' file.txt
This will remove everything between .pdf and R0 and replace it with single space.
Result for your example:
R01234567_High Transcript_01234567.pdf R01234567
R01891023_Application_01891023127.pdf R01891023
The Hard, Unreliable Way
It's a bit verbose, and much less terse and efficient than what would make sense if we knew that the fields were separated by tab literals, but the following loop does this processing in pure native bash with no external tools:
shopt -s extglob
while IFS= read -r line; do
[[ $line = *".pdf"*R0* ]] || continue # ignore lines that don't fit our format
filename=${line%%.pdf*}.pdf
id=R0${line##*R0}
printf '%s\t%s\n' "$filename" "$id"
done
${line%%.pdf*} returns everything before the first .pdf in the line; ${line%%.pdf*}.pdf then appends .pdf to that content.
Similarly, ${line##*R0} expands to everything after the last R0; R0${line##*R0} thus expands to the final field starting with R0 (presuming that that's the only instance of that string in the file).
The Easy Way (Using Tab Delimiters)
If cat -t file (on MacOS) or cat -A file (on Linux) shows ^I sequences between the fields (but not within the fields), use the following instead:
while IFS=$'\t' read -r filename title id; do
printf '%s\t%s\n' "$filename" "$id"
done
This reads the three tab separated fields into variables named filename, title and id, and emits the filename and id fields.
Updated answer assuming tab delim
Since there is a tab delimiter, then this is a cinch for awk. Borrowing from my originally deleted answer and #geek1011 deleted answer:
awk -F"\t" '{print $1, $NF}' infile.txt
Here awk splits each record in your file by tab, then prints the first field $1 and the last field $NF where NF is the built in awk variable for the record's Number of Fields; by prepending a dollar sign, it says "The value of the last field in the record".
Original answer assuming space delimiter
Leaving this here in case someone has space delimited nonsense like I originally assumed.
You can use awk instead of using bash to read through the file:
awk 'NR>1{for(i=1; $i!~/pdf/; ++i) firstRec=firstRec" "$i} NR>1{print firstRec,$i,$NF}' yourfile.txt
awk reads files line by line and processes each record it comes across. Fields are delimited automatically by white space. The first field is $1, the second is $2 and so on. awk has built in variables; here we use NF which is the Number of Fields contained in the record, and NR which is the record number currently being processed.
This script does the following:
If the record number is greater than 1 (not the header) then
Loop through each field (separated by white space here) until we find a field that has "pdf" in it ($i!~/pdf/). Store everything we find up until that field in a variable called firstRec separated by a space (firstRec=firstRec" "$i).
print out the firstRec, then print out whatever field we stopped iterating on (the one that contains "pdf") which is $i, and finally print out the last field in the record, which is $NF (print firstRec,$i,$NF)
You can direct this to another file:
awk 'NR>1{for(i=1; $i!~/pdf/; ++i) firstRec=firstRec" "$i} NR>1{print firstRec,$i,$NF}' yourfile.txt > outfile.txt
sed may be a cleaner way of going here since, if your pdf file has more than one space separating characters, then you will lose the multiple spaces.
You can use sed on each line like that:
line="R01234567_High Transcript_01234567.pdf High School Transcript R01234567"
echo "$line" | sed 's/\.pdf.*R0/\.pdf R0/'
# output
R01234567_High Transcript_01234567.pdf R01234567
This replace anything between .pdf and R0 with a spacebar.
It doesn't deal with some edge cases but it simple and clear

How can I remove the last character of a file in unix?

Say I have some arbitrary multi-line text file:
sometext
moretext
lastline
How can I remove only the last character (the e, not the newline or null) of the file without making the text file invalid?
A simpler approach (outputs to stdout, doesn't update the input file):
sed '$ s/.$//' somefile
$ is a Sed address that matches the last input line only, thus causing the following function call (s/.$//) to be executed on the last line only.
s/.$// replaces the last character on the (in this case last) line with an empty string; i.e., effectively removes the last char. (before the newline) on the line.
. matches any character on the line, and following it with $ anchors the match to the end of the line; note how the use of $ in this regular expression is conceptually related, but technically distinct from the previous use of $ as a Sed address.
Example with stdin input (assumes Bash, Ksh, or Zsh):
$ sed '$ s/.$//' <<< $'line one\nline two'
line one
line tw
To update the input file too (do not use if the input file is a symlink):
sed -i '$ s/.$//' somefile
Note:
On macOS, you'd have to use -i '' instead of just -i; for an overview of the pitfalls associated with -i, see the bottom half of this answer.
If you need to process very large input files and/or performance / disk usage are a concern and you're using GNU utilities (Linux), see ImHere's helpful answer.
truncate
truncate -s-1 file
Removes one (-1) character from the end of the same file. Exactly as a >> will append to the same file.
The problem with this approach is that it doesn't retain a trailing newline if it existed.
The solution is:
if [ -n "$(tail -c1 file)" ] # if the file has not a trailing new line.
then
truncate -s-1 file # remove one char as the question request.
else
truncate -s-2 file # remove the last two characters
echo "" >> file # add the trailing new line back
fi
This works because tail takes the last byte (not char).
It takes almost no time even with big files.
Why not sed
The problem with a sed solution like sed '$ s/.$//' file is that it reads the whole file first (taking a long time with large files), then you need a temporary file (of the same size as the original):
sed '$ s/.$//' file > tempfile
rm file; mv tempfile file
And then move the tempfile to replace the file.
Here's another using ex, which I find not as cryptic as the sed solution:
printf '%s\n' '$' 's/.$//' wq | ex somefile
The $ goes to the last line, the s deletes the last character, and wq is the well known (to vi users) write+quit.
After a whole bunch of playing around with different strategies (and avoiding sed -i or perl), the best way i found to do this was with:
sed '$! { P; D; }; s/.$//' somefile
If the goal is to remove the last character in the last line, this awk should do:
awk '{a[NR]=$0} END {for (i=1;i<NR;i++) print a[i];sub(/.$/,"",a[NR]);print a[NR]}' file
sometext
moretext
lastlin
It store all data into an array, then print it out and change last line.
Just a remark: sed will temporarily remove the file.
So if you are tailing the file, you'll get a "No such file or directory" warning until you reissue the tail command.
EDITED ANSWER
I created a script and put your text inside on my Desktop. this test file is saved as "old_file.txt"
sometext
moretext
lastline
Afterwards I wrote a small script to take the old file and eliminate the last character in the last line
#!/bin/bash
no_of_new_line_characters=`wc '/root/Desktop/old_file.txt'|cut -d ' ' -f2`
let "no_of_lines=no_of_new_line_characters+1"
sed -n 1,"$no_of_new_line_characters"p '/root/Desktop/old_file.txt' > '/root/Desktop/my_new_file'
sed -n "$no_of_lines","$no_of_lines"p '/root/Desktop/old_file.txt'|sed 's/.$//g' >> '/root/Desktop/my_new_file'
opening the new_file I created, showed the output as follows:
sometext
moretext
lastlin
I apologize for my previous answer (wasn't reading carefully)
sed 's/.$//' filename | tee newFilename
This should do your job.
A couple perl solutions, for comparison/reference:
(echo 1a; echo 2b) | perl -e '$_=join("",<>); s/.$//; print'
(echo 1a; echo 2b) | perl -e 'while(<>){ if(eof) {s/.$//}; print }'
I find the first read-whole-file-into-memory approach can be generally quite useful (less so for this particular problem). You can now do regex's which span multiple lines, for example to combine every 3 lines of a certain format into 1 summary line.
For this problem, truncate would be faster and the sed version is shorter to type. Note that truncate requires a file to operate on, not a stream. Normally I find sed to lack the power of perl and I much prefer the extended-regex / perl-regex syntax. But this problem has a nice sed solution.

passing awk variable to bash script

I am writing a bash/awk script to process hundreds of files under one directory. They all have name suffix of "localprefs". The purpose is to extract two values from each file (they are quoted by ""). I also want to use the same file name, but without the name suffix.
Here is what I did so far:
#!/bin/bash
for file in * # Traverse all the files in current directory.
read -r name < <(awk ' $name=substr(FILENAME,1,length(FILENAME)-10a) END {print name}' $file) #get the file name without suffix and pass to bash. PROBLEM TO SOLVE!
echo $name # verify if passing works.
do
awk 'BEGIN { FS = "\""} {print $2}' $file #this one works fine to extract two values I want.
done
exit 0
I could use
awk '{print substr(FILENAME,1,length(FILENAME)-10)}' to extract the file name without suffix, but I am stuck on how to pass that to bash as a variable which I will use as output file name (I read through all the posts on this subject here, but maybe I am dumb none of them works for me).
If anyone can shed a light on this, especially the line starts with "read", you are really appreciated.
Many thanks.
Try this one:
#!/bin/bash
dir="/path/to/directory"
for file in "$dir"/*localprefs; do
name=${file%localprefs} ## Or if it has a .: name=${file%.localprefs}
name=${name##*/} ## To exclude the dir part.
echo "$name"
awk 'BEGIN { FS = "\""} {print $2}' "$file" ## I think you could also use cut: cut -f 2 -d '"' "$file"
done
exit 0
To just take sbase name, you don't even need awk:
for file in * ; do
name="${file%.*}"
etc
done

Replace whitespace with a comma in a text file in Linux

I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv

Resources