Command 'cut' doesn't show last column CSV - linux

I've created a CSV from shell. Then I need to filter the information by column. I used this command:
$cut -d ';' -f 12,22 big_file.csv
The input looks like:
ACT;XXXXXX;MCD;881XXXX;881017XXXXXX;ABCD;BMORRR;GEN;88XXXXXXXXXX;00000;01;2;000008608008602;AAAAAAAAAAA;0051;;;;;;093505;
ACT;XXXXXX;MCD;881XXXX;881017XXXXXX;ABCD;BMORRR;GEN;88XXXXXXXXXX;00000;01;3;000008608008602;AAAAAAAAAAA;0051;;;;;;085000;anl#mail.com
The output is:
ID CLIENT;email
00000xxxxxxxxx
00000000xxxxxx;anl#mail.com
As you can see, the last column does not appear (note, that the semicolon is missing in the first line). I want this:
ID CLIENT;email
00000xxxxxxxxx;
00000000xxxxxx;anl#mail.com
I have another CSV file with information and it works. I've reviewed the csv and the columns exist.

There doesn't seem to be a way to make cut do this. The next step up in expressivity is awk, which does it easily:
$ cat testfile
one;two;three;four
1;2;3
first;second
only
$ awk -F';' '{ OFS=FS; print $1, $3 }' < testfile
one;three
1;3
first;
only;
$

You don't get the semicolon in the output of your second line, because your second line contains just 21 fields (the first contains 23 fields).
You can check that using:
(cat bigfile.csv | tr -d -c ";\n" ; echo "1234567890123456789012") | cat -n | grep -v -E ";{22}"
This will output all lines from bigfile.txt with less than 22 semicolons along with the corresponding line numbers.
To fix that, you can add a bunch of empty fields at the end of each line and pipe the result to cut like this:
sed -e's|^\(.*\)|\1;;;;;;;;;;;;;;;;;;;;;;;;|g' bigfile.csv | cut -d ';' -f 12,22 | cut -d ';' -f 12,22
The result is:
XXXXXXXXYYY;XXXNNN
XXXXYYYYXXXXX;

Related

Using sed to fetch date

I have a file which contains two values for abc... keyword. I want to grab the latest date for matching abc... string. After getting the date I also need to format the date by replacing / with -
---other data
2018/01/15 01:56:14.944+0000 INFO newagent.bridge BridgeTLSAssetector::setupACBContext() - abc...
2018/02/14 01:56:14.944+0000 INFO newagent.bridge BridgeTLSAssetector::setupACBContext() - abc...
---other data
In the above example, my output should be 2018-02-14. Here, I am fetching the line which contains abc... value and only getting the line with latest date value. Then, I need to strip out the remaining string and fetch only the date value.
I am using the following sed but it is not working
grep -iF "abc..." file.txt | tail -n 1 | sed -e 's/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/' -e 's%/%-%g'
With awk:
$ awk '/abc\.\.\./{d=$1} END{gsub("/", "-", d); print d}' file.txt
2018-2-14
Something with sed:
tac file.txt | grep -Fi 'abc...' | sed 's/ .*//;s~/~-~g;q'
This does what you want:
grep -iF "abc..." file.txt | tail -n 1 | awk '{print $1}' | sed 's#/#-#g'
Outputs this:
2018-02-14
Since you asked for sed -
$: sed -nE ' / abc[.]{3}/x; $ { x; s! .*!!; s!/([0-9])/!/0\1/!g; s!/([0-9])$!/0\1!g; s!/!-!g; p; }' in
2018-02-14
arguments
-n says don't print by default
-E says use extended regexes
the script
/ abc[.]{3}/x; say on each line with abc... swap the line for the buffer
$ { x; s! .*!!; s!/([0-9])/!/0\1/!g; s!/([0-9])$!/0\1!g; s!/!-!g; p; } says on the LAST line($) do the set of commands inside the {}.
x swaps the buffer to get the last saved record back.
s! .*!!; deletes everything from the first space (after the date)
s!/([0-9])/!/0\1/!g; adds a zero to the month if needed
s!/([0-9])$!/0\1!g; adds a zero to the day if needed
s!/!-!g; converts the /'s to dashes
p prints the resulting record.
When you use sed for matching a part of the date, you can have it match year. month, date and abc... in one command.
sed -rn 's#([0-9]{4})/([0-9]{2})/([0-9]{2}).*abc[.]{3}.*#\1-\2-\3#p' file.txt | tail -1
Easy and more simple try this.
cat filename.txt | grep 'abc' | awk -F' ' '{print $1}'
As pattern abc always fix as per the given logs. So this will be more easier way to get desire output.

Fetch latest matching string value

I have a file which contains two values for initial... keyword. I want to grab the latest date for matching initial... string. After getting the date I also need to format the date by replacing / with -
---other data
INFO | abc 1 | 2018/01/04 20:04:35 | initial...
INFO | abc 1 | 2018/02/05 17:01:42 | INFO | new| InitialLauncher | c.t.s.s.setup.launch | initial...
---other data
In the above example, my output should be 2018-02-05. Here, I am fetching the line which contains initial... value and only getting the line with latest date value. Then, I need to strip out the remaining string and fetch only the date value.
I am using the following grep but it is not yet as per the requirement.
grep -q -iF "initial..." /tmp/file.log
Using the knowledge that later dates appear later in the file, it's only necessary to print the date from the last line containing initial....
First step (drop the -q from grep — you don't want it to be quiet):
grep -iF 'initial...' /tmp/file.log |
tail -n 1 |
sed -e 's/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/' -e 's%/%-%g'
The (first) s/// command matches a series of non-pipes followed by a pipe, another series of non-pipes followed by a pipe, a blank, then captures a series of non-blanks, and finally matches a blank and anything; it replaces all that with just the captured string, which is the date field after the second pipe on the input line. The (second) s%%% command replaces slashes with dashes, using % to avoid the confusion that the equivalent s/\//-/g might engender, thereby reformatting the date in ISO 8601-style format.
But we can lose the tail with:
grep -iF 'initial...' /tmp/file.log |
sed -n -e '$ { s/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/; s%/%-%gp; }'
The -n suppresses normal output; the $ matches only the last line; the p after the second s/// operation prints the result.
The case-insensitive fixed-pattern search is more conveniently written in grep than in sed. Although it could be done in a single sed command, you have to work fairly hard, saving matching rows in the hold space, then swapping the hold and pattern space at the end, and doing the substitution and printing:
sed -n \
-e '/[Ii][Nn][Ii][Tt][Ii][Aa][Ll]\.\.\./h' \
-e '$ { x; s/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/; s%/%-%gp; }' /tmp/file.log
Each of these produces the output 2018-02-05 on the sample data. If fed an input with no initial... in it, they output nothing.
Grep for only (-o) the string you want, sort it, and cut for the first word:
grep -o '2[0-9]\{3\}/[0-9][0-9]/[0-9][0-9] [0-2][0-9]:[0-5][0-9]:[0-9][0-9] .* | initial' file.txt | sort | cut -d' ' -f1 | tai -1
something like this...
$ awk -F'|' '$NF~/initial\.\.\./ {if(max<$3) max=$3}
END {gsub("/","-",max);
split(max,dt," "); print dt[1]}' file

Replace string in a file from a file [duplicate]

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 5 years ago.
I need help with replacing a string in a file where "from"-"to" strings coming from a given file.
fromto.txt:
"TRAVEL","TRAVEL_CHANNEL"
"TRAVEL HD","TRAVEL_HD_CHANNEL"
"FROM","TO"
First column is what to I'm searching for, which is to be replaced with the second column.
So far I wrote this small script:
while read p; do
var1=`echo "$p" | awk -F',' '{print $1}'`
var2=`echo "$p" | awk -F',' '{print $2}'`
echo "$var1" "AND" "$var2"
sed -i -e 's/$var1/$var2/g' test.txt
done <fromto.txt
Output looks good (x AND y), but for some reason it does not replace the first column ($var1) with the second ($var2).
test.txt:
"TRAVEL"
Output:
"TRAVEL" AND "TRAVEL_CHANNEL"
sed -i -e 's/"TRAVEL"/"TRAVEL_CHANNEL"/g' test.txt
"TRAVEL HD" AND "TRAVEL_HD_CHANNEL"
sed -i -e 's/"TRAVEL HD"/"TRAVEL_HD_CHANNEL"/g' test.txt
"FROM" AND "TO"
sed -i -e 's/"FROM"/"TO"/g' test.txt
$ cat test.txt
"TRAVEL"
input:
➜ cat fromto
TRAVEL TRAVEL_CHANNEL
TRAVELHD TRAVEL_HD
➜ cat inputFile
TRAVEL
TRAVELHD
The work:
➜ awk 'BEGIN{while(getline < "fromto") {from[$1] = $2}} {for (key in from) {gsub(key,from[key])} print}' inputFile > output
and output:
➜ cat output
TRAVEL_CHANNEL
TRAVEL_CHANNEL_HD
➜
This first (BEGIN{}) loads your input file into an associate array: from["TRAVEL"] = "TRAVEL_HD", then rather inefficiently performs search and replace line by line for each array element in the input file, outputting the results, which I piped to a separate outputfile.
The caveat, you'll notice, is that the search and replaces can interfere with each other, the 2nd line of output being a perfect example since the first replacement happens. You can try ordering your replacements differently, or use a regex instead of a gsub. I'm not certain if awk arrays are guaranteed to have a certain order, though. Something to get you started, anyway.
2nd caveat. There's a way to do the gsub for the whole file as the 2nd step of your BEGIN and probably make this much faster, but I'm not sure what it is.
you can't do this oneshot you have to use variables within a script
maybe something like below sed command for full replacement
-bash-4.4$ cat > toto.txt
1
2
3
-bash-4.4$ cat > titi.txt
a
b
c
-bash-4.4$ sed 's|^\s*\(\S*\)\s*\(.*\)$|/^\2\\>/s//\1/|' toto.txt | sed -f - titi.txt > toto.txt
-bash-4.4$ cat toto.txt
a
b
c
-bash-4.4$

Printing specific parts from a file in shell

I'm trying to print some specific information from a file with a specific format (The file is as following : id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
)
I want to print out just the firstName sorted out and unique.
I specifically want to use these arguments when calling the script(let's call it script.sh) :
./script.sh --firstnames -f <file>
My code so far is the following :
--firstnames )
OlIFS=$IFS
content=$(cat "$3" | grep -v "#")
content=$(cat "$3" | tr -d " ") #cut -d " " -f6 )
for i in $content
do
IFS="|"
first=( $i )
echo ${first[2]}
IFS=$OlIFS
done | sort | uniq
;;
esac
For example for the following file :
#id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
933|Perera|Mahinda|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.12|Firefox
1129|Lepland|Carmen|female|1984-02-18|2010-02-28T04:39:58:781+0000|81.25.252.111|Internet Explorer
is supposed to have the output :
Carmen
Mahinda
One problem I've noticed is that the script prints the comments too. The above will print :
Carmen
firstnames
Mahinda
even though I've used grep to get rid of the lines starting with "#".
This is only part of the code (it's where I believe is the problem). It's supposed to recognize the "--firstnames". Since some of the fields from the file will have spaces in between, specifically in the last section(the browser section) , I wanted to remove just that section.
This is for a school project, and according to the program that grades this section, it's all wrong. The script works as far as I can tell though(I tested it). I don't know what's wrong with this therefore I don't know what to correct. Please help !
awk would be best for your case
$ awk -F "|" 'FNR>1 && !a[$3]++{print $3}' file | sort
Carmen
Mahinda
-F "|" : To set | as field delimiter while reading fields in file
FNR>1 : To skip first header line
a[$3]++ : creates an associative array with keys as the string in 3rd field/column i.e in firstName and incrementing it's value by 1 each time the key is found. However the value of $3 is printed only when !a[$3]++ is true i.e when the key doesn't exist in the array or I should say the key is being read the first time.
grep -vE '^#' "$3" | cut -d'|' -f3 should be enough :
$ echo '#id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
> 933|Perera|Mahinda|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.12|Firefox
> 1129|Lepland|Carmen|female|1984-02-18|2010-02-28T04:39:58:781+0000|81.25.252.111|Internet Explorer
>' | grep -vE '^#' | cut -d'|' -f3
Mahinda
Carmen
the grep command removes lines starting with # (it uses regular expressions to do so hence the -E flag ; if you want to keep removing any line containing a #, your current grep -v # is correct), the cut -d'|' -f3 command splits the string around a | delimiter and returns its third field.

return all lines that match String1 in a file after the last matching String2 in the same file

I figured out how to get the line number of the last matching word in the file :
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
It gave me the value of 1787. So, I passed it manually to the sed command to search for the lines that contains the sentence "blades are down" after that line number and it returned all the lines successfully
sed -n '1787,$s/blades are down/&/p' myfile.txt
Is there a way that I can pass the line number from the first command to the second one through a variable or a file so I can but them in the script to be executed automatically ?
Thank you.
You can do this by just connecting your two commands with xargs. 'xargs -I %' allows you to take the stdin from a previous command and place it whenever you want in the next command. The '%' is where your '1787' will be written:
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1 | xargs -I % sed -n %',$s/blades are down/&/p' myfile.txt
You can use:
command substitution to capture the result of the first command in a variable.
simple string concatenation to use the variable in your sed comand
startLine=$(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)
sed -n ${startLine}',$s/blades are down/&/p' myfile.txt
You don't strictly need the intermediate variable - you could simply use:
sed $(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)',$s/blades are down/&/p' myfile.txt`
but it may make sense to do error checking on the result of the command substitution first.
Note that I've streamlined the first command by using grep's -n option, which puts the line number separated with : before each match.
First we can get "half" of the file after the last match of string2, then you can use grep to match all the string1
tac your_file | awk '{ if (match($0, "string2")) { exit ; } else {print;} }' | \
grep "string1"
but the order is reversed if you don't care about the order. But if you do care, just add another tac at the end with a pipe |.
This might work for you (GNU sed):
sed -n '/\n/ba;/ b /h;//!H;$!d;x;//!d;s/$/\n/;:a;/\`.*blades are down.*$/MP;D' file
This reads through the file storing all lines following the last match of the first string (" b ") in the hold space.
At the end of file, it swaps to the hold space, checks that it does indeed have at least one match, then prints out those lines that match the second string ("blades are down").
N.B. it makes the end case (/\n/) possible by adding a new line to the end of the hold space, which will eventually be thrown away. This also caters for the last line edge condition.

Resources