UNIX command to search blank word [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 9 years ago.
Improve this question
In a file (tab delimited Text, CSV or database file) you have first name, last name and address. In some rows you do not have last name but first name and address is there. How can you list the rows that last name is blank using UNIX command?
FirstName LastName Street City
Dan, God, 1st Street, Chicago
Sam, , 2nd Street, Chicago
Adam, Smith, 3rd Street, Chicago
It could be CSV, tab delimited text file(;,:). answer should be 2nd row above.

Assuming input file is CSV, you can use awk:
awk -F, '$2 == ""' file
to print all the rows where 2nd columns (last name) is blank.

Try this:
awk 'NF!=3' file
it prints all lines where the number of fields is not 3.

Since you didn't provide sample text, I've had to take some guesses about what you're after.
Here's the sample text I'm using:
06:33:20 0 1 james#brindle:/tmp$ cat sample.csv
first,last,address,otherstuff
first,,address,otherstuff
first,last,,
A simple grep ,, doesn't work as it also finds the last line:
06:33:22 0 0 james#brindle:/tmp$ grep ,, sample.csv
first,,address,otherstuff
first,last,,
Since the first name field is first on the line, we can simplify the problem a little bit: we want to find places where the first comma on the line is immediately followed by a second comma.
06:35:07 0 0 james#brindle:/tmp$ grep "^[^,]*,," sample.csv
first,,address,otherstuff
In that regex, the first ^ anchors the regex to the start of the line; [^,]* matches 0 or more occurrences of any character except the comma (yes, the ^ is doing something very different in this context), and finally ,, matches the two commas.
If you wanted to look for the 3rd field being empty you'd need to repeat yourself a little bit.
06:35:28 0 0 james#brindle:/tmp$ grep "^[^,]*,[^,]*,," sample.csv
first,last,,
Here you're looking for 0 or more non-comma characters, followed by a comma, followed by 0 or more non-commas, followed by two commas.

Related

How to add NULL values in empty cells of csv while concatenating 2 csv files in linux [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 months ago.
Improve this question
While appending 2 csv files with unequal number of rows , we need 'NULL' value in place of missing lines in csv.
in the below mentioned images
First file is having 5 rows while second is having 4 , so we need NULL in the last line of second CSV for all the columns. 3rd image is the expected final file.
we need command to add NULL in the missing cells
paste should help here along with awk to complete the file merge.
Firstly, let's determine the number of fields. Then, using paste, we merge both files line by line separated by ','. This output is then piped to awk. If it sees a line less than the expected field count, adds string "NULL" for the appropriate count. Otherwise, it is printed as is. Assuming files are file1.csv and file2.csv, here is the full code:
NUM_FIELDS=$(paste -d"," file1.csv file2.csv | head -1 | awk -F, '{print NF}')
paste -d"," file1.csv file2.csv | awk -F, -v cnt="$NUM_FIELDS" 'NF < cnt {
printf "%s",$0;
for (i=NF;i<cnt;i++)
printf "NULL,";
print "NULL";
next}
{print}'
Here is the output:
ProductCc,SITE,BatchID,PROCESS_CATEGORY,DO actual,Runtime,Tank Volume,Open pipe
MK3,Biberach,15300289,BiologicsUpstream,60.62,14.396,12460.1,0
MK3,Biberach,15300289,BiologicsUpstream,59.33,14.403,12462.7,0
MK3,Biberach,15300289,BiologicsUpstream,60.68,14,41,12457.3,0
MK3,Biberach,15300289,BiologicsUpstream,59.99,14.417,12453.3,0
MK3,Biberach,15300289,BiologicsUpstream,NULL,NULL,NULL,NULL
p/s: In future, please include any sample data lines as text instead of images to simplify testing.

how to print two strings in a line one with space delimiter and another between two strings in Linux

I have a file with more than 100 lines.
But only some lines have specific pattern like abc.
My question is that I want two things to print
5th word of line which has pattern abc.
words between 2 distinct strings (xxx, yyy).
Say for example my file has the content below:
This is first line.
Second line has abc pattern with xxx as first separator and yyy as second separator.
This is third line.
Again fourth line has same pattern abc with separators xxx and yyy.
And so on.
The required output is like below:
pattern as first separator and
same and
I tried many ways in Linux but if I was able to print 5th word then content between xxx and yyy I was not able to print and vice versa.
Can any one help me please?
Let me answer to your question:
My question is that I want two things to print
5th word of line which has pattern abc.
words between 2 distinct strings (xxx, yyy).
You can use awk for both parts of your question:
awk '/abc/{print $5}' input_file.txt
awk '/xxx.*yyy/{if(match($0,"xxx.*yyy)){print substr($0,RSTART,RLENGTGH)}}' input_file.txt
if you need to combine both requirements in one command:
awk '/abc/{print $5} /xxx.*yyy/{if(match($0,"xxx.*yyy)){print substr($0,RSTART,RLENGTGH)}}'
OUTPUT:
pattern
xxx as first separator and yyy
same
xxx and yyy

Extracting two columns and search for specific words in the first column remaining without cuting the ones remaining

I have a .csv file filled with names of people, their group, the city they live in, and the day they are able to work, these 4 informations are separated with this ":".
For e.g
Dennis:GR1:Thursday:Paris
Charles:GR3:Monday:Levallois
Hugues:GR2:Friday:Ivry
Michel:GR2:Tuesday:Paris
Yann:GR1:Monday:Pantin
I'd like to cut the 2nd and the 3rd columns, and prints all the lines containing names ending with "s", but without cutting the 2nd column remaining.
For e.g, I would like to have something like that :
Dennis:Paris
Charles:Levallois
Hugues:Ivry
I tried to this with grep and cut, and but using cut ends with having just the 1st remaining.
I hope that I've been able to make myself understood !
It sounds like all you need is:
$ awk 'BEGIN{FS=OFS=":"} $1~/s$/{print $1, $4}' file
Dennis:Paris
Charles:Levallois
Hugues:Ivry
To address your comment requesting a grep+cut solution:
$ grep -E '^[^:]+s:' file | cut -d':' -f1,4
Dennis:Paris
Charles:Levallois
Hugues:Ivry
but awk is the right way to do this.

Merge two unsorted text files based on the first column of the first one and preserving the order [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
On Linux, how can I merge two unsorted text files based on the first column of the first file and preserving the order (from the first file).
The first one:
DAC
CAD
ADC
BAC
The second one:
CAD:word
DAC:dog
BAC:house
Merged files:
DAC:dog
CAD:word
ADC
BAC:house
As I said, the lines of the merged file must be in the same order of the first file.
Thank you in advance.
Try awk:
awk -F: 'FNR==NR{a[$1]=$0;next}{if($1 in a){print a[$1];} else {print;}}' file2 file1
The "-F:" sets the field separator to a colon. The bit in curly braces after "FNR==NR" applies only while processing file2. It saves the whole line in an associative array "a" indexed by whatever is in field1 to the left of the colon. The bit in the second set of curly braces applies to file1. As each line is read, I check to see if the "word" is in the associative array "a" I created when reading file2, and if it is, I print the whole line I found in file2, if not, I just print the current line from file 1.
Another possible solution with join :
$ join -t":" -a1 -a2 -11 -21 <(sort file1) <(sort file2)
AD:word
ADC
BAC:house
CAD
DAC:dog

How to delete double lines in bash

Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq

Resources