How to extract word from a sentence in Linux Shell? - linux

I have a shell file with the below SQL statements in it:
SELECT distinct vpi.pin_id_e
FROM MSSINT.V_DSLAMS vd,
MSSINT.v_pin_inventory_old vpi
where vd.dslam like '%#%'
and vd.dslam_id = vpi.dslam_id ;
select pa.circuit_design_id,pa.node_address,c.exchange_carrier_circuit_id,c.type,c.rate_code,c.status
from ASAP.port_address pa,
asap.circuit c
where pa.equipment_id = 4561233 and pa.circuit_design_id is not null
and pa.circuit_design_id = c.circuit_design_id;
In the above content of my shell file, I have to extract the table or view names alone (those between from and where keywords).
I have seen a lot of suggestions to get words based on position, but I don't want those since they will not work like between operators.

awk 'toupper($0) ~ /^FROM/ { getline;flag=1 } toupper($0) ~ /^WHERE/ { flag=0 }flag' filename
With awk, convert the string to upper case and then pattern match against FROM at the beginning of the line. If this exists, read in the next line and set flag to one. When WHERE is encountered at the beginning of the line, set the flag equal to 0. The complete line will then only print when flag is set to one i.e. between the from and where lines

Related

how do I get rid of leading/trailing spaces in SAS search terms?

I have had to look up hundreds (if not thousands) of free-text answers on google, making notes in Excel along the way and inserting SAS-code around the answers as a last step.
The output looks like this:
This output contains an unnecessary number of blank spaces, which seems to confuse SAS's search to the point where the observations can't be properly located.
It works if I manually erase superflous spaces, but that will probably take hours. Is there an automated fix for this, either in SAS or in excel?
I tried using the STRIP-function, to no avail:
else if R_res_ort_txt=strip(" arild ") and R_kom_lan=strip(" skåne ") then R_kommun=strip(" Höganäs " );
If you want to generate a string like:
if R_res_ort_txt="arild" and R_kom_lan="skåne" then R_kommun="Höganäs";
from three variables, let's call them A B C, then just use code like:
string=catx(' ','if R_res_ort_txt=',quote(trim(A))
,'and R_kom_lan=',quote(trim(B))
,'then R_kommun=',quote(trim(C)),';') ;
Or if you are just writing that string to a file just use this PUT statement syntax.
put 'if R_res_ort_txt=' A :$quote. 'and R_kom_lan=' B :$quote.
'then R_kommun=' C :$quote. ';' ;
A saner solution would be to continue using the free-text answers as data and perform your matching criteria for transformations with a left join.
proc import out=answers datafile='my-free-text-answers.xlsx';
data have;
attrib R_res_ort_txt R_kom_lan length=$100;
input R_res_ort_txt ...;
datalines4;
... whatever all those transforms will be performed on...
;;;;
proc sql;
create table want as
select
have.* ,
answers.R_kommun_answer as R_kommun
from
have
left join
answers
on
have.R_res_ort_txt = answers.res_ort_answer
& have.R_kom_lan = abswers.kom_lan_answer
;
I solved this by adding quotes in excel using the flash fill function:
https://www.youtube.com/watch?v=nE65QeDoepc

How to use grep to find a specific string of numbers and move that to a new test file

I am new to using linux and grep and I am looking for some direction in how to use grep. I am trying to get two specific numbers from a text file. I will need to do this for thousands of files so I believe using grep or some equivalent to be best for my mental health.
The text file I am working with looks as follows:
*Average spectrum energy: 0.00100 MeV
Average sampled energy : 0.00100 MeV [ -0.0000%]
K/phi = <E*mu_tr/rho> = 6.529719E+02 10^-12 Gy cm^2 [ 0.0008%]
Kcol/phi = <E*mu_tr/rho>*(1-<g>) = 6.529719E+02 10^-12 Gy cm^2 [ 0.0008%]
<g> = 1.0000E-15 [ 0.4264%]
1-<g> = 1.000000 [ 0.0000%]
<mu_tr/rho> = <E*mu_tr/rho>/Eave = 4.075530E+03 cm^2/g [ 0.0008%]
<mu_en/rho> = <E*mu_tr/rho>*(1-<g>)/Eave = 4.075530E+03 cm^2/g [ 0.0008%]
<E*mu_en/rho> = 4.075530E+00 MeV cm^2/g
The values I am looking to extract from this are "0.00100" and "4.075530E+00".
At the moment I am using grep -iE "Average spectrum energy|<E*mu_en/rho>" * which is allowing me to see the full lines, but I am not quite sure how to refine the search to only show me the numbers instead of just the whole line. Is this possible using grep?
As for moving the numbers into a new file, I believe the command is > newdata.txt. My question is when using this with grep can you change how it writes the data to the new text file? I am looking for the format of the numbers to be like this:
0.00100001 3.4877754595352117
0.00100367 3.4665273232204363
0.00100735 3.4453747056004884
0.00101104 3.4243696230289187
0.00101474 3.4035147003587718
Again is that possble using the grep > newdata.txt?
I really appreciate any help or direction people can give me. Thank you.
I'm not quite sure why it was giving the 4.075530E+03 value.
That's because * has the special meaning of a repetition of the previous item any number of times (including zero), so the pattern <E*mu_en/rho> does not match the text <E*mu_en/rho>, but rather < any number of E mu_en/rho>, i. e. especially <mu_en/rho>. To escape this special meaning and match a literal *, prepend a backslash, i. e. <E\*mu_en/rho>.
I am not quite sure how to refine the search to only show me the numbers instead of just the whole line. Is this possible using grep?
It is if PCRE (grep -P) is available in the system. To only (-o) show the numbers, we can use the feature of Resetting the match start with \K. Your modified grep command is then:
grep -hioP "(Average spectrum energy: *|<E\*mu_en/rho> *= )\K\S*" *
(option -h drops the file names, pattern item \S means not a white space).
when using this with grep can you change how it writes the data to the new text file?
grep by itself cannot change the format of numbers (except maybe cutting digits off). If you want this, we need another tool. Now, since we need another tool, I'd consider using a tool which is capable of doing the whole job, e. g. awk:
awk '
/Average spectrum energy/ { printf "%.8f ", $4 }
/<E\*mu_en\/rho>/ { printf "%.16f\n", $3 }
' * >newdata.txt

how to modify textfile using U-SQL

I have a large file of around 130MB containing 10 A characters in each line and \t at the end of 10th "A" character, I want to extract this text file and then change all A's to B's. Can any one help with its code snippet?
this is what I have wrote till now
USE DATABASE imodelanalytics;
#searchlog =
EXTRACT characters string
FROM "/iModelAnalytics/Samples/Data/dummy.txt"
USING Extractors.Text(delimiter: '\t', skipFirstNRows: 1);
#modify =
SELECT characters AS line
FROM #searchlog;
OUTPUT #modify
TO "/iModelAnalytics/Samples/Data/B.txt"
USING Outputters.Text();
I'm new to this, so any suggestions will be helpful ! Thanks
Assuming all of the field would be AAAAAAAAAA then you could write:
#modify = SELECT "BBBBBBBBBB" AS characters FROM #searchlog;
If only some are all As, then you would do it in the SELECT clause:
#modify =
SELECT (characters == "AAAAAAAAAA" ? "BBBBBBBBBB" : characters) AS characters
FROM #searchlog;
If there are other characters around the AAAAAAAAAA then you would use more of the C# string functions to find them and replace them in a similar pattern.

How to get ordered, defined or all columns except or after or before a given column

In BASH
I run the following one liner to get an individual column/field after splitting on a given character (one can use AWK as well if they want to split on more than one char i.e. on a word in any order, ok).
#This will give me first column i.e. 'lori' i.e. first column/field/value after splitting the line / string on a character '-' here
echo "lori-chuck-shenzi" | cut -d'-' -f1
# This will give me 'chuck'
echo "lori-chuck-shenzi" | cut -d'-' -f2
# This will give me 'shenzi'
echo "lori-chuck-shenzi" | cut -d'-' -f3
# This will give me 'chuck-shenzi' i.e. all columns after 2nd and onwards.
echo "lori-chuck-shenzi" | cut -d'-' -f2-
Notice the last command above, How can I do the same last cut command shit in Groovy?
For ex: if the contents are in a file and they look like:
1 - a
2 - b
3 - c
4 - d
5 - e
6 - lori-chuck shenzi
7 - columnValue1-columnValue2-columnValue3-ColumnValue4
I tried the following Groovy code, but it's not giving me lori-chuck shenzi (i.e. after ignoring the 6th bullet and first occurence of the -, I want my output to be lori-chuck shenzi and the following script is returning me just lori (which is givning me the correct output as my index is [1] in the following code, so I know that).
def file = "/path/to/my/file.txt"
File textfile= new File(file)
//now read each line from the file (using the file handle we created above)
textfile.eachLine { line ->
//list.add(line.split('-')[1])
println "Bullet entry full value is: " + line.split('-')[1]
}
// return list
Also, is there an easy way for the last line in the file above, if I can use Groovy code to change the order of the columns after they are split i.e. reverse the order like we do in Python [1:], [:1], [:-1] etc.. or in some fashion
I don't like this solution but I did this to get it working. After getting index values from [1..-1 (i.e. from 1st index, excluding the 0th index which is the left hand side of first occurrence of - character), I had to remove the [ and ] (LIST) using join(',') and then replacing any , with a - to get the final result what I was looking for.
list.add(line.split('-')[1..-1].join(',').replaceAll(',','-'))
I would still like to know what's a better solution and how can this work when we talk about cherry picking individual columns + in a given order (instead of me writing various Groovy statements to pick individual elements from the string/list per statement).
If I'm understanding your question correctly, what you want is:
line.split('-')[1..-1]
This will give you from position 1 to the last. You can do -2 (next to last) and so on, but just be aware that you can get an ArrayIndexOutOfBoundsException moving backwards too, if you go past the beginning of your array!
-- Original answer is above this line --
Adding to my answer, since comments don't allow code formatting. If all you want is to pick specific columns, and you want a string in the end, you could do something like:
def resultList = line.split('-')
def resultString = "${resultList[1]}-${resultList[2]} ${resultList[3]}"
and pick whatever columns you want that way. I thought you were looking for a more generic solution, but if not, specific columns are easy!
If you want the first value, a dash, then the rest joined by spaces, just use:
"${resultList[1]}-${resultList[2..-1].join(" ")}"
I don't know how to give you specific answers for every combination you might want, but basically once you have your values in a list, you can manipulate that however you want, and turn the results back into a string with GStrings or with .join(...).

Substituting everything from = to end of the line in VIM

Let's say I have several lines like:
$repeat_on = $_REQUEST['repeat_on'];
$opt_days = $_REQUEST['opt_day'];
$opt_days = explode(",", $opt_days);
... and so on.
Let's say I use visual mode to select all the lines: how can I replace everything from = to the end of the line so it looks like:
$repeat_on = NULL;
$opt_days = NULL;
$opt_days = NULL;
With the block selected, use this substitute:
s/=.*$/= NULL;
The substitution regex changes each line by replacing anything between = and the end of the line, including the =, with = NULL;.
The first part of the command is the regex matching what is to be replaced: =.*$.
The = is taken literally.
The dot . means any character.
So .* means: 0 or more of any character.
This is terminated by $ for end of line, but this actually isn't necessary here: try it also without the $.
So the regex will match the region after the first = in each line, and replace that region with the replacement, which is = NULL;. We need to include the = in the replacement to add it back, since it's part of the match to be replaced.
When you have a block selected, and you hit : to enter a command, the command line will be automatically prefixed with a range for the visual selection that looks like this:
:'<,'>
Continue typing the command above, and your command-line will be:
:'<,'>s/=.*$/= NULL;
Which will apply the replacement to the selected visual block.
If you'll need to have multiple replacements on a single line, you'll need to add the g flag:
:'<,'>s/=.*$/= NULL;/g
Some alternatives:
Visual Block (fast)
On the first line/character do... Wl<C-v>jjCNULL;<Esc>bi<Space><Esc>
Macro (faster)
On the first line/character do... qqWllCNULL;<esc>+q2#q
:norm (fastest)
On the first line do... 3:no<S-tab> WllCNULL;<Enter>
Or if you've visually selected the lines leave the 3 off the beginning.

Resources