Newbie to unix/shell/bash. I have a file name CellSite whose 6th line is as below:
btsName = "RV74XC038",
I want to extract the string from 6th line that is between double quotes (i.e.RV74XC038) and save it to a variable. Please note that the 6th line starts with 4 blank spaces. And this string would vary from file. So I am looking for a solution that would extract a string from 6th line between the double quotes.
I tried below. But does not work.
str2 = sed '6{ s/^btsName = \([^ ]*\) *$/\1/;q } ;d' CellSite;
Any help is much appreciated. TIA.
sed is a stream editor.
For just parsing files, you want to look into awk. Something like this:
awk -F \" '/btsName/ { print $2 }' CellSite
Where:
-F defines a "field separator", in your case the quotation marks "
the entire script consists of:
/btsName/ act only on lines that contain the regex "btsName"
from that line print out the second field; the first field will be everything before the first quotation marks, second field will be everything from the first quotes to the second quotes, third field will be everything after the second quotes
parse through the file named "CellSite"
There are possibly better alternatives, but you would have to show the rest of your file.
Using sed
$ str2=$(sed '6s/[^"]*"\([^"]*\).*/\1/' CellSite)
$ echo "$str2"
RV74XC038
You can use the following awk solution:
btsName=$(awk -F\" 'NR==6{print $2; exit}' CellSite)
Basically, get to the sixth line (NR==6), print the second field value (" is used to split records (lines) into fields) and then exit.
See the online demo:
#!/bin/bash
CellSite='Line 1
Line 2
Line 3
btsName = "NO74NO038",
Line 5
btsName = "RV74XC038","
Line 7
btsName = "no11no000",
'
btsName=$(awk -F\" 'NR==6{print $2; exit}' <<< "$CellSite")
echo "$btsName" # => RV74XC038
This might work for you (GNU sed):
var=$(sed -En '6s/.*"(.*)".*/\1/p;6q' file)
Simplify regexs and turn off implicit printing.
Focus on the 6th line only and print the value between double quotes, then quit.
Bash interpolates the sed invocation by means of the $(...) and the value extracted defines the variable var.
I have strings like following which should be parsed with only unix command (bash)
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
I want to trim the strings like above upto 4th underscore from end/right side. So output should be
49_sftp_mac_myfile_simul_test
Number of underscores can vary in overall string. For example, The string could be
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
Output should be (after trimming up to 4th occurrence of underscore from right.
49_sftp_simul_test
Easily done using awk that decrements NF i.e. no. of fields to -4 after setting input+output field separator as underscore:
s='49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed'
awk 'BEGIN{FS=OFS="_"} {NF -= 4; $1=$1} 1' <<< "$s"
49_sftp_mac_myfile_simul_test
You can use bash's parameter expansion for that:
string="..."
echo "${string%_*_*_*_*}"
With GNU sed:
$ sed -E 's/(_[^_]*){4}$//' <<< "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"
49_sftp_mac_myfile_simul_test
From the end of line, removes 4 occurrences of _ followed by non _ characters.
Perl one-liner
echo $your-string | perl -lne '$n++ while /_/g; print join "_",((split/_/)[-$n-1..-5])'
input
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_mac_myfile_simul_test
input
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_simul_test
Not the fastest but maybe the easiest to remember and funiest:
echo "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"|
rev | cut -d"_" -f5- | rev
I have a *.csv file. with value as below
"ASDP02","8801942183589"
"ASDP06","8801939151023"
"CSDP04","8801963981740"
"ASDP09","8801946305047"
"ASDP12","8801941195677"
"ASDP05","8801922826186"
"CSDP08","8801983008938"
"ASDP04","8801944346555"
"CSDP11","8801910831518"
or sometimes the value is as below
"8801989353984","KSDP05"
"8801957608165","ASDP11"
"8801991455848","CSDP10"
"8801981363116","CSDP07"
"8801921247870","KSDP07"
"8801965386240","CSDP06"
"8801956293036","KSDP10"
"8801984383904","KSDP11"
"8801944211742","ASDP09"
I just want to put the numeric value (e.g. 8801989353984) always in 1st column. Is it possible using BASH script?
Sed is also your friend here
Input
cat 41189347
"ASDP02","8801942183589"
"ASDP06","8801939151023"
"CSDP04","8801963981740"
"ASDP09","8801946305047"
"ASDP12","8801941195677"
"ASDP05","8801922826186"
"CSDP08","8801983008938"
"ASDP04","8801944346555"
"CSDP11","8801910831518"
Script
sed -E 's/^("[[:alpha:]]+.*"),("[[:digit:]]+")$/\2,\1/' 41189347
Output
"8801942183589","ASDP02"
"8801939151023","ASDP06"
"8801963981740","CSDP04"
"8801946305047","ASDP09"
"8801941195677","ASDP12"
"8801922826186","ASDP05"
"8801983008938","CSDP08"
"8801944346555","ASDP04"
"8801910831518","CSDP11"
awk to the rescue!
$ awk -F, -v OFS=, '$1~/[A-Z]/{t=$2;$2=$1;$1=t}1' file
if first field has alpha chars, swap first and second columns and print.
Bash can do the work but awk might be a better choice for rearrange your file:
sample.csv:
"ASDP02","8801942183589"
"8801944211742","ASDP09"
command:
awk -F, 'BEGIN{OFS=","}{$1=$1;if(substr($1, 2, length($1) - 2) + 0 == substr($1, 2, length($1) - 2)){print $1,$2}else{print $2,$1}}' sample.csv
substr($1, 2, length($1) - 2) + 0 == substr($1, 2, length($1) - 2) checks the column is numeric or not. If it is, print the original line otherwise switch column1 and column2
Output:
"8801942183589","ASDP02"
"8801944211742","ASDP09"
You can create a pure bash script to generate other file which has the structure you need:
#!/bin/bash
csv_file="/path/to/your/csvfile"
output_file="/path/to/output_file"
#Optional
rm -rf "${output_file}"
readarray -t LINES < <(cat < "${csv_file}" 2> /dev/null)
for item in "${LINES[#]}"; do
if [[ $item =~ ^\"([0-9A-Z]+)\"\,\"([0-9]+)\" ]]; then
echo "\"${BASH_REMATCH[2]}\",\"${BASH_REMATCH[1]}\"" >> "${output_file}"
else
echo "$item" >> "${output_file}"
fi
done
This works even if your file is "mixed" I mean with some lines in the right format and other lines in the bad format.
The following commands assume that the cells in the CSV files do not contain newlines and commas. Otherwise, you should write a more complicated script in Perl, PHP, or other programming language capable of parsing CSV files properly. But Bash, definitely, is not appropriate for this task.
Perl
perl -F, -nle '#F = reverse #F if $F[0] =~ /^"\d+"$/;
print join(",", #F)' file
Beware, If the cells contain newlines, or commas, use Perl's Text::CSV module, for instance. Although it is a simple task in Perl, it goes beyond the scope of the current question.
The command splits the input lines by commas (-F,) and stores the result into #F array, for each line. The items in the array are reversed, if the first field $F[0] matches the regular expression. You can also swap the items this way: ($F[0], $F[1]) = ($F[1], $F[0]).
Finally, the joins the array items with commas, and prints to the standard output.
If you want to edit the file in-place, use -i option: perl -i.backup -F, ....
AWK
awk -F, -vOFS=, '/^"[0-9]+",/ {print; next}
{ t = $1; $1 = $2; $2 = t; print }' file
The input and output field separators are set to , with -F, and -vOFS=,.
If the line matches the pattern /^"[0-9]+",/ (the line begins with a "numeric" CSV column), the script prints the record and advances to the next record. Otherwise the next block is executed.
In the next block, it swaps the first two columns and prints the result to the standard output.
If you want to edit the file in-place, see answers to this question.
I would like to replace a handful of strings with others (e.g. "GG" with "GGX", "GG " with "GGX", "FG" with "FGX", etc) in the first column of a big csv file using a shell command.
I know I need something like
big.csv shell_commands big.csv
but I don't know awk or sed
Using sed, replacing all instances of "GG" with "GGX" in big.csv would look like:
sed 's/^GG/GGX/g' big.csv >big_translated.csv
If you need to replace multiple patterns, you can use multiple replace commands in sed separated by semicolons.
sed 's/^GG/GGX/g; s/^FG/FGX/g' big.csv >big_translated.csv
The ^ character means beginning of line and ensures that we only edit the first field of the csv.
awk 'BEGIN{ r["GG"] = "GGX"; r["FG"] = "FGX" }
{ for( k in r ) if( gsub( k, r[k], $1 ) break } 1' input-file
The break is there to prevent multiple substitutions.
Try this (provided you have single occurance of strings)
awk '{sub("GG","GGX",$0); sub("FG","FGX",$0); print}' temp.txt
How about this?
sed -i "s/^\(..\),/\1X,/" big.csv
Or if you got there some spaces this:
sed -i "s/^\([^ ][^ ][ ]*\),/\1X,/" big.csv
Need to grep only the word between the 2nd and 3rd to last /
This is shown in the extract below, to note that the location on the filename is not always the same counting from the front. Any ideas would be helpful.
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt
Here is a Perl script that does the job:
my $str = q!/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt!;
my $res = (split('/',$str))[-3];
print $res;
output:
usernameneedtogrep
I'd use awk:
awk -F/ '{print $(NF-2)}'
splits on /
NF is the index of the last column, $NF the last column itself and $(NF-2) the 3rd-to-last column.
You might of course first need to filter out lines in your input that are not paths (e.g. using grep and then piping to awk)
a regular expression something like this should do the trick:
/.\/(.+?)\/.*?\/.*$/
(note I'm using lazy searches (+? and *?) so that it doesn't includes slashes where we don't want it to)