Printing specific parts from a file in shell

Printing specific parts from a file in shell - string

I'm trying to print some specific information from a file with a specific format (The file is as following : id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
)
I want to print out just the firstName sorted out and unique.
I specifically want to use these arguments when calling the script(let's call it script.sh) :
./script.sh --firstnames -f <file>
My code so far is the following :
--firstnames )
OlIFS=$IFS
content=$(cat "$3" | grep -v "#")
content=$(cat "$3" | tr -d " ") #cut -d " " -f6 )
for i in $content
do
IFS="|"
first=( $i )
echo ${first[2]}
IFS=$OlIFS
done | sort | uniq
;;
esac
For example for the following file :
#id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
933|Perera|Mahinda|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.12|Firefox
1129|Lepland|Carmen|female|1984-02-18|2010-02-28T04:39:58:781+0000|81.25.252.111|Internet Explorer
is supposed to have the output :
Carmen
Mahinda
One problem I've noticed is that the script prints the comments too. The above will print :
Carmen
firstnames
Mahinda
even though I've used grep to get rid of the lines starting with "#".
This is only part of the code (it's where I believe is the problem). It's supposed to recognize the "--firstnames". Since some of the fields from the file will have spaces in between, specifically in the last section(the browser section) , I wanted to remove just that section.
This is for a school project, and according to the program that grades this section, it's all wrong. The script works as far as I can tell though(I tested it). I don't know what's wrong with this therefore I don't know what to correct. Please help !

awk would be best for your case
$ awk -F "|" 'FNR>1 && !a[$3]++{print $3}' file | sort
Carmen
Mahinda
-F "|" : To set | as field delimiter while reading fields in file
FNR>1 : To skip first header line
a[$3]++ : creates an associative array with keys as the string in 3rd field/column i.e in firstName and incrementing it's value by 1 each time the key is found. However the value of $3 is printed only when !a[$3]++ is true i.e when the key doesn't exist in the array or I should say the key is being read the first time.

grep -vE '^#' "$3" | cut -d'|' -f3 should be enough :
$ echo '#id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
> 933|Perera|Mahinda|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.12|Firefox
> 1129|Lepland|Carmen|female|1984-02-18|2010-02-28T04:39:58:781+0000|81.25.252.111|Internet Explorer
>' | grep -vE '^#' | cut -d'|' -f3
Mahinda
Carmen
the grep command removes lines starting with # (it uses regular expressions to do so hence the -E flag ; if you want to keep removing any line containing a #, your current grep -v # is correct), the cut -d'|' -f3 command splits the string around a | delimiter and returns its third field.

Related

String split and extract the last field in bash

I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311

Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"

Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}

I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram

A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.

Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.

Command 'cut' doesn't show last column CSV

I've created a CSV from shell. Then I need to filter the information by column. I used this command:
$cut -d ';' -f 12,22 big_file.csv
The input looks like:
ACT;XXXXXX;MCD;881XXXX;881017XXXXXX;ABCD;BMORRR;GEN;88XXXXXXXXXX;00000;01;2;000008608008602;AAAAAAAAAAA;0051;;;;;;093505;
ACT;XXXXXX;MCD;881XXXX;881017XXXXXX;ABCD;BMORRR;GEN;88XXXXXXXXXX;00000;01;3;000008608008602;AAAAAAAAAAA;0051;;;;;;085000;anl#mail.com
The output is:
ID CLIENT;email
00000xxxxxxxxx
00000000xxxxxx;anl#mail.com
As you can see, the last column does not appear (note, that the semicolon is missing in the first line). I want this:
ID CLIENT;email
00000xxxxxxxxx;
00000000xxxxxx;anl#mail.com
I have another CSV file with information and it works. I've reviewed the csv and the columns exist.

There doesn't seem to be a way to make cut do this. The next step up in expressivity is awk, which does it easily:
$ cat testfile
one;two;three;four
1;2;3
first;second
only
$ awk -F';' '{ OFS=FS; print $1, $3 }' < testfile
one;three
1;3
first;
only;
$

You don't get the semicolon in the output of your second line, because your second line contains just 21 fields (the first contains 23 fields).
You can check that using:
(cat bigfile.csv | tr -d -c ";\n" ; echo "1234567890123456789012") | cat -n | grep -v -E ";{22}"
This will output all lines from bigfile.txt with less than 22 semicolons along with the corresponding line numbers.
To fix that, you can add a bunch of empty fields at the end of each line and pipe the result to cut like this:
sed -e's|^\(.*\)|\1;;;;;;;;;;;;;;;;;;;;;;;;|g' bigfile.csv | cut -d ';' -f 12,22 | cut -d ';' -f 12,22
The result is:
XXXXXXXXYYY;XXXNNN
XXXXYYYYXXXXX;

Piping grep to cut

This line:
echo $(grep Uid /proc/1/status) | cut -d ' ' -f 2
Produces output:
0
This line:
grep Uid /proc/1/status | cut -d ' ' -f 2
Produces output:
Uid: 0 0 0 0
My goal was the first output. My question is, why the second command does not produce the output I expected. Why am I required to echo it?

One way to do this is to change the Output Field Separator or OFS variable in the bash shell
IFSOLD="$IFS" # IFS for Internal field separator
IFS=$'\t'
grep 'Uid' /proc/1/status | cut -f 2
0 # Your result
IFS="$IFSOLD"
or the easy way
grep 'Uid' /proc/1/status | cut -d $'\t' -f 2
Note : By the way tab is the default delim for cut as pointed out [ here ]

Use awk
awk '/Uid/ { print $2; }' /proc/1/status

You should almost never need to write something like echo $(...) - it's almost equivalent to calling ... directly. Try echo "$(...)" (which you should always use) instead, and you'll see it behaves like ....
The reason is because when the $() command substitution is invoked without quotes the resulting string is split by Bash into separate arguments before being passed to echo, and echo outputs each argument separated by a single space, regardless of the whitespace generated by the command substitution (in your case tabs).
As sjsam suggested, if you want to cut tab-delimited output, just specify tabs as the delimiter instead of spaces:
cut -d $'\t' -f 2

grep Uid /proc/1/status |sed -r “s/\s+/ /g” | awk ‘{print $3}’
Output
0

Split string at special character in bash

I'm reading filenames from a textfile line by line in a bash script. However the the lines look like this:
/path/to/myfile1.txt 1
/path/to/myfile2.txt 2
/path/to/myfile3.txt 3
...
/path/to/myfile20.txt 20
So there is a second column containing an integer number speparated by space. I only need the part of the string before the space.
I found only solutions using a "for-loop". But I need a function that explicitly looks for the " "-character (space) in my string and splits it at that point.
In principle I need the equivalent to Matlabs "strsplit(str,delimiter)"

If you are already reading the file with something like
while read -r line; do
(and you should be), then pass two arguments to read instead:
while read -r filename somenumber; do
read will split the line on whitespace and assign the first field to filename and any remaining field(s) to somenumber.

Three (of many) solutions:
# Using awk
echo "$string" | awk '{ print $1 }'
# Using cut
echo "$string" | cut -d' ' -f1
# Using sed
echo "$string" | sed 's/\s.*$//g'

If you need to iterate trough each line of the file anyways, you can cut off everything behind the space with bash:
while read -r line ; do
# bash string manipulation removes the space at the end
# and everything which follows it
echo ${line// *}
done < file

This should work too:
line="${line% *}"
This cuts the string at it's last occurrence (from left) of a space. So it will work even if the path contains spaces (as long as it follows by a space at end).

while read -r line
do
{ rev | cut -d' ' -f2- | rev >> result.txt; } <<< $line
done < input.txt
This solution will work even if you have spaces in your filenames.

concatenate the result of echo and a command output

I have the following code:
names=$(ls *$1*.txt)
head -q -n 1 $names | cut -d "_" -f 2
where the first line finds and stores all names matching the command line input into a variable called names, and the second grabs the first line in each file (element of the variable names) and outputs the second part of the line based on the "_" delim.
This is all good, however I would like to prepend the filename (stored as lines in the variable names) to the output of cut. I have tried:
names=$(ls *$1*.txt)
head -q -n 1 $names | echo -n "$names" cut -d "_" -f 2
however this only prints out the filenames
I have tried
names=$(ls *$1*.txt
head -q -n 1 $names | echo -n "$names"; cut -d "_" -f 2
and again I only print out the filenames.
The desired output is:
$
filename1.txt <second character>
where there is a single whitespace between the filename and the result of cut.
Thank you.

Best approach, using awk
You can do this all in one invocation of awk:
awk -F_ 'NR==1{print FILENAME, $2; exit}' *"$1"*.txt
On the first line of the first file, this prints the filename and the value of the second column, then exits.
Pure bash solution
I would always recommend against parsing ls - instead I would use a loop:
You can avoid the use of awk to read the first line of the file by using bash built-in functionality:
for i in *"$1"*.txt; do
IFS=_ read -ra arr <"$i"
echo "$i ${arr[1]}"
break
done
Here we read the first line of the file into an array, splitting it into pieces on the _.

Maybe something like that will satisfy your need BUT THIS IS BAD CODING (see comments):
#!/bin/bash
names=$(ls *$1*.txt)
for f in $names
do
pattern=`head -q -n 1 $f | cut -d "_" -f 2`
echo "$f $pattern"
done

If I didn't misunderstand your goal, this also works.
I've always done it this way, I just found out that this is a deprecated way to do it.
#!/bin/bash
names=$(ls *"$1"*.txt)
for e in $names;
do echo $e `echo "$e" | cut -c2-2`;
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Printing specific parts from a file in shell - string

Related

String split and extract the last field in bash

Command 'cut' doesn't show last column CSV

Piping grep to cut

Split string at special character in bash

concatenate the result of echo and a command output

Categories

Resources