AWK - Show lines where column contains a specific string

AWK - Show lines where column contains a specific string - linux

I have a document (.txt) composed like that.
info1: info2: info3: info4
And I want to show some information by column.
For example, I have some different information in "info3" shield, I want to see only the lines who are composed by "test" in "info3" column.
I think I have to use sort but I'm not sure.
Any idea ?

The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function
awk -F: 'match($3, "test")' file

You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value.
awk -F':' '$3=="test"' input-file

Assuming that the spacing is consistent, and you're looking for only test in the third column, use
grep ".*:.*: test:.*" file.txt
Or to take care of any spacing that might occur
grep ".*:.*: *test *:.*" file.txt

Related

linux shell script delimiter

How to change delimiter from current comma (,) to semicolon (;) inside .txt file using linux command?
Here is my ME_1384_DataWarehouse_*.txt file:
Data Warehouse,ME_1384,Budget for HW/SVC,13/05/2022,10,9999,13/05/2022,27,08,27,08
Data Warehouse,ME_1384,Budget for HW/SVC,09/05/2022,10,9999,09/05/2022,45,58,45,58
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
It is very important that value of last two columns is number with 2 decimal places, so value of last 2 columns in first row for example is:"27,08"
That could be the main problem why delimiter couldn't be change in proper way.
I tried with:
sed 's/,/;/g' ME_1384_DataWarehouse_*.txt
and every comma sign has been changed, including mentioned value of the last 2 columns.
Is there anyone who can help me out with this issue?

With sed you can replace the nth occurrence of a certain lookup string. Example:
$ sed 's/,/;/4' file
will replace the 4th comma with a semicolon.
So, if you know you have 11 fields (10 commas), you can do
$ sed 's/,/;/g;s/;/,/10;s/;/,/8' file
Example:
$ seq 1 11 | paste -sd, | sed 's/,/;/g;s/;/,/10;s/;/,/8'
1;2;3;4;5;6;7;8,9;10,11

Your question is somewhat unclear, but if you are trying to say "don't change the last comma, or the third-to-last one", a solution to that might be
perl -pi~ -e 's/,(?![^,]+(?:,[^,]+,[^,]+)?$)/;/g' ME_1384_DataWarehouse_*.txt
Perl in isolation does not perform any loop over the input lines, but the -p option says to loop over input one line at a time, like sed, and print every line (there is also -n to simulate the behavior of sed -n); the -i~ says to modify the file, but save the original with a tilde added to its file name as a backup; and the regex uses a negative lookahead (?!...) to protect the two fields you want to exempt from the replacement. Lookaheads are a modern regex feature which isn't supported by older tools like sed.
Once you are satisfied with the solution, you can remove the ~ after -i to disable the generation of backups.

You can do this with awk:
awk -F, 'BEGIN {OFS=";"} {a=$NF;NF-=1; printf "%s,%s\n",$0,a} ' input_file
This should work with most awk version (do not count on Solaris standard awk)
The idea is to store the last element from row in variable, decrease the number of fields and then print using new delimiter, comma and stored last field.

Extracting two columns and search for specific words in the first column remaining without cuting the ones remaining

I have a .csv file filled with names of people, their group, the city they live in, and the day they are able to work, these 4 informations are separated with this ":".
For e.g
Dennis:GR1:Thursday:Paris
Charles:GR3:Monday:Levallois
Hugues:GR2:Friday:Ivry
Michel:GR2:Tuesday:Paris
Yann:GR1:Monday:Pantin
I'd like to cut the 2nd and the 3rd columns, and prints all the lines containing names ending with "s", but without cutting the 2nd column remaining.
For e.g, I would like to have something like that :
Dennis:Paris
Charles:Levallois
Hugues:Ivry
I tried to this with grep and cut, and but using cut ends with having just the 1st remaining.
I hope that I've been able to make myself understood !

It sounds like all you need is:
$ awk 'BEGIN{FS=OFS=":"} $1~/s$/{print $1, $4}' file
Dennis:Paris
Charles:Levallois
Hugues:Ivry
To address your comment requesting a grep+cut solution:
$ grep -E '^[^:]+s:' file | cut -d':' -f1,4
Dennis:Paris
Charles:Levallois
Hugues:Ivry
but awk is the right way to do this.

use uniq -d on a particular column?

Have a text file like this.
john,3
albert,4
tom,3
junior,5
max,6
tony,5
I'm trying to fetch records where column2 value is same. My desired output.
john,3
tom,3
junior,5
tony,5
I'm checking if we can use uniq -d on second column?

Here's one way using awk. It reads the input file twice, but avoids the need to sort:
awk -F, 'FNR==NR { a[$2]++; next } a[$2] > 1' file file
Results:
john,3
tom,3
junior,5
tony,5
Brief explanation:
FNR==NR is a common AWK idiom that is true for the first file in the arguments list. Here, column two is added to an array and incremented. On the second read of the file, we simply check if the value of column two is greater than one (the next keyword skips processing the rest of the code).

You can use uniq on fields (columns), but not easily in your case.
Uniq's -f and -s options filter by fields and characters respectively. However neither of these quite do what want.
-f divides fields by whitespace and you separate them with commas.
-s skips a fixed number of characters and your names are of variable length.
Overall though, uniq is used to compress input by consolidating duplicates into unique lines. You are actually wishing to retain duplicates and eliminate singletons, which is the opposite of what uniq is used to do. It would appear you need a different approach.

Awk item from column one, then awk again using the result in column two?

I have a CSV that I need to sort through using a key provided that'd be in the first column of said CSV, and then I need to awk again and search via column 2 and return all matching data.
So: I'd awk with the first key, and it'd return just the result of the second column [so just cell]. Then I'd awk using the cell contents and have it return all matching rows.
I have almost no bash/awk scripting experience so please bear with me. :)
Input:
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1
,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2
,TRACKINGKEY1,TRACKINGNUMBER1-3,PACKAGENUM1-3
,TRACKINGKEY1,TRACKINGNUMBER1-4,PACKAGENUM1-4
,TRACKINGKEY1,TRACKINGNUMBER1-5,PACKAGENUM1-5
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2
Command:
awk -v key=KEY1 -F' *,' '$1==key{f=1} $1 && $1!=key{f=0} f{print $3}' file
Output:
TRACKINGNUMBER1-1
TRACKINGNUMBER1-2
TRACKINGNUMBER1-3
TRACKINGNUMBER1-4
TRACKINGNUMBER1-5
That's what I've tried. I'd like to awk so if I search for key1 that trackingkey1 is returned, then awk with trackingkey one and output each full matching row.
Sorry, I should have been more clear. For example - if I searched for KEY3 I'd like the output to be:
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2
So what I want is I'd search for KEY3 initially, and it would return TRACKINGKEY3. I'd then search for TRACKINGKEY3 and it would return each full row with said TRACKINGKEY3 in it.

Does this do what you want?
awk -v key=KEY3 -F ',' '{if($1==key)tkey=$2;if($2==tkey)print}' file
It only makes a single pass through the file, not the multiple passes you described, but the output matches what you requested. When it finds the specified key in the first column is grabs the tracking key from the second column. It prints every line that matches this tracking key.
A shorter way to achieve the same thing is by using awk's implicit printing:
awk -v key=KEY3 -F ',' '$1==key{tkey=$2}$2==tkey' file

Output columns format sort linux

Basically after I sort I want my columns to be separated by tabs. right now it is separated by two spaces. The man pages did not have anything related to output formatting (at least I didn't notice it).
If its not possible, I guess I have to use awk to sort and print. Any better alternative?
EDIT:
To clarify the question, the location of the double spaces is not consistent. I actually have data like this:
<date>\t<user>\t<message>.
I sort by date by year, month, day and time which looks like
Wed Jan 11 23:44:30 CST 2012
and then have the output of the sorted data like the original file that is
<date>\t<user>\t<message>.
EDIT 2: Seems like my testing for tab was wrong. I was copy pasting raw line from bash to my Windows box. That's why it didn't recognize as a tab instead it showed spaces. I downloaded whole file to windows and now I can see that the fields are tab separated.
Also, I figured out that separation of fields (\t \n , : ;, etc) is same in the new file after sorting. That means, in the original file if I have tab separated field, my sorted file is also going to be tab separated.
One last thing, the "correct" answer was not exactly the correct solution to the problem. I don't know if I can comment on my own thread and mark it as correct. If it is OK to do that, please let me know.
Thanks for the comments guys. Really appreciate your help!

Pipe your output to column:
sort <whatever> | column -t -s\t

You can use sed:
sort data.txt | sed 's/ /\t/g'
^^
||
2 blank spaces
This will take the output of your sort operation and substitute a single tab for 2 consecutive blanks.

From what I understood the file is already sorted and what you want is to replace the two separating spaces by a TAB character, in that case, use the following:
sed 's/ /\t/g' < sorted_file > new_formatted_file
(Be careful to copy/paste correctly the two spaces in the regular expression)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

AWK - Show lines where column contains a specific string - linux

The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function awk -F: 'match($3, "test")' file

You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value. awk -F':' '$3=="test"' input-file

Assuming that the spacing is consistent, and you're looking for only test in the third column, use grep ".:.: test:." file.txt Or to take care of any spacing that might occur grep ".:.: test :." file.txt

Related

linux shell script delimiter

Extracting two columns and search for specific words in the first column remaining without cuting the ones remaining

use uniq -d on a particular column?

Awk item from column one, then awk again using the result in column two?

Output columns format sort linux

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

AWK - Show lines where column contains a specific string - linux

The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function awk -F: 'match($3, "test")' file

You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value. awk -F':' '$3=="test"' input-file

Assuming that the spacing is consistent, and you're looking for only test in the third column, use grep ".*:.*: test:.*" file.txt Or to take care of any spacing that might occur grep ".*:.*: *test *:.*" file.txt

Related

linux shell script delimiter

Extracting two columns and search for specific words in the first column remaining without cuting the ones remaining

use uniq -d on a particular column?

Awk item from column one, then awk again using the result in column two?

Output columns format sort linux

Categories

Resources

Assuming that the spacing is consistent, and you're looking for only test in the third column, use grep ".:.: test:." file.txt Or to take care of any spacing that might occur grep ".:.: test :." file.txt