extracting the column using AWK

extracting the column using AWK - linux

I am trying to extract column using AWK.
Source file is a .CSV file and below is command I am using:
awk -F ',' '{print $1}' abc.csv > test1
Data in file abc.csv is like below:
xyz#yahoo.com,160,1,2,3
abc#ymail.com,1,2,3,160
But data obtained in test1 is like :
abc#ymail.comxyz#ymail.com
when file is opened in notepad after downloading the file from server.

Notepad doesn't show newlines created on unix. If you want to add them, try
awk -F ',' '{print $1"\r"}' abc.csv > test1

Since you're using a Window tool to read the output you just need to tell awk to use Windows line-endings as the Output Record Separator:
awk -v ORS='\r\n' -F',' '{print $1}' file

Related

Changing file type for multiple files at once using awk command

I am trying to change .dat files to .csv files using the awk command. An example file has 3 columns of numbers with spaces between each column:
23.00005 320.0054 0.0039734
xx.xxxxx xxx.xxxx x.xxxxxxx
The filenames are organized as filenameX.project.dat where X is any number from 1 to a couple hundred. The folder has many other files that I do not want changed. I want to be able to change all of these files at once instead of having to do them over and over.
Here is my example command:
awk '{print $1","$2","$3}' filenameX.project.dat > filenameX.project.csv
How can I automate this to run one command that will make every file a csv file?
I have tried the below command and others similar but none work.
awk '{print $1","$2","$3}' filename*.project.dat > filename*.project.csv

Something like this:
$ for i in filename*dat; do awk '{print $1","$2","$3}' "$i" >> $(echo "$i" | sed 's,\.dat$,.csv,'); done
It will loop through all filename*dat files in a directory, execute awk command on them and redirect output to the file that has .csv instead of .dat at the end.

You can do this all in awk like so:
awk 'BEGIN {OFS=","}
FNR==1 {fn=FILENAME; sub(/\.dat$/,".csv",fn)
printf "Copying %s to %s\n", FILENAME, fn}
{ for (i=1;i<=NF;i++) printf "%s%s", $i, i<NF ? OFS : RS > fn}' *.dat

Please make a backup first, as I am still not certain what you mean, but suspect it is:
rename -n -S .dat .csv filename*.project.dat
If it looks good, remove the -n and run again for real.

zcat file not working for gzip file

I have a .gz which I need to merge and do other manipulations with (without compressing it), but I am having trouble just using zcat or gzip -dc or awk, for example when I pass these value to less -S like this:
awk '{print $1}' <(gzip -dc file.gz) | less -S
I get the incorrect column printed. When I use just less -S to view the file, only the last few columns are printed. So I thought it was a problem with the delimiter, but I have tried importing in R some lines (it is too big to import the whole file), and it seems to be space delimited since all the columns are showing up when I do this:
x=read.table("file.gz", header=T, nrows=100)
But how do I read the lines correctly to use this file with zcat?
Thank you so much for your help!

If you want the whole line to be printed, try $0.
awk '{print $0}' <(gzip -dc file.gz) | less -S
If you want specific columns to be printed, use -F to specific field separator. For example, if you want first field of ':' separated fields from each line (like in /etc/passwd), try this command.
awk -F':' '{print $1}' <(gzip -dc passwd.gz) |less -S

Printing columns in the output file

I got the output for the last command using the below command
last -w -F | awk '{print $1","$3","$5$6$7$8","$11$12$13$14","$15}' | tac | tr ',' '\t'
Now for the same output i want to add the below column names and then copy to csv file or xls file.
Can someone help me out here.
Column Names
USERNAME
HOSTNAME
LOGIN_TIME
LOGOUT_TIME
DURATION
Output looks like this
oracle localhost 2015 2.30
root localhost 2014 2.30
Appreciate your help on this.

Try this:
last -w -F | awk '{print $1,$3,$5$6$7$8,$11$12$13$14,$15} END{print "USERNAME\tUSERNAME\tHOSTNAME\tHOSTNAME\tLOGIN_TIME\tLOGIN_TIME\tLOGOUT_TIME\tLOGOUT_TIME DURATION"}' OFS='\t' | tac
I added the headings to the END statement in awk. This way, after tac is run, the headings will be at the beginning.
I also set awk's OFS to a tab so that the tr step should no longer be needed.
I couldn't thoroughly test this because my last command apparently produces a different format than yours.
Writing to a file
To write the above output to a file, we use redirection: stdout is sent to a file:
last -w -F | awk '{print $1,$3,$5$6$7$8,$11$12$13$14,$15} END{print "USERNAME\tUSERNAME\tHOSTNAME\tHOSTNAME\tLOGIN_TIME\tLOGIN_TIME\tLOGOUT_TIME\tLOGOUT_TIME DURATION"}' OFS='\t' | tac >new.tsv
The above code produces a tab-separated file. After selecting the options for tab-separated format, Excel should be able to read this file.
If one wants a comma-separated-file, then all we need to to is replace the \t by ,:
last -w -F | awk '{print $1,$3,$5$6$7$8,$11$12$13$14,$15} END{print "USERNAME,USERNAME,HOSTNAME,HOSTNAME,LOGIN_TIME,LOGIN_TIME,LOGOUT_TIME,LOGOUT_TIME DURATION"}' OFS=',' | tac >new.csv
If I recall correctly, one can open this in excel with file->open->text file.

Cut and Awk command in linux

How can I extract word between 2 words in a file using cut and awk command.
Lets say: I have a file with below content.
This is my file and it has lots of content along wiht password and want to extract PASSWORD=MYPASSWORDISHERE==and file is ending here.
exptected output
1) using awk command linux.
2) using cut command linux.
MYPASSWORDISHERE==

Using awk actually gawk
awk '{match($0,/PASSWORD=(.*==)/,a); print a[1];}' input.txt
Using cut you can try, I'm not sure if it works with your file
cut -d"=" -s -f2,3 --output-delimiter="==" input.txt

Awk split file give incomplete lines

My file is a csv file with comma delimited fields.
I tried to split the file into multiple files by first field. I did the following:
cat myfile.csv | awk -F',' '{print $0 > "Mydata"$1".csv"}'
It does split the file, but the file is corrupted, the last line of each file is not complete. The breaking position seems random. Anyone has the same problem?

These types of problem are invariably because you created your input file on Windows and so it has spurious control-Ms at the end of the lines. Run dos2unix on your input file to clean it up then re-run your awk command but re-write it as:
awk -F',' '{print > ("Mydata" $1 ".csv") }' myfile.csv
to solve a couple of unrelated problems.

Use this awk command to ignore \r characters before \n:
awk -F ',' -v RS='\r\n' '{print > ("Mydata" $1 ".csv") }' myfile.csv

Just don't forget to close your files :
awk -F ',' '{ f="Mydata"$1".csv"; print $0 > f; close(f) }' myfile.csv

Use a real CSV parser/generator instead. It's safe for unusual inputs including those with multi-lined values. And here's a one-liner for Ruby:
ruby -e 'require "csv";CSV.foreach(ARGV.shift){|r| File.open("Mydata#{r[0]}.csv","w").puts(CSV.generate_line(r))}' file.csv

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

extracting the column using AWK - linux

Notepad doesn't show newlines created on unix. If you want to add them, try awk -F ',' '{print $1"\r"}' abc.csv > test1

Since you're using a Window tool to read the output you just need to tell awk to use Windows line-endings as the Output Record Separator: awk -v ORS='\r\n' -F',' '{print $1}' file

Related

Changing file type for multiple files at once using awk command

zcat file not working for gzip file

Printing columns in the output file

Cut and Awk command in linux

Awk split file give incomplete lines

Categories

Resources