Write a file using AWK on linux - linux

I have a file that has several lines of which one line is
-xxxxxxxx()xxxxxxxx
I want to add the contents of this line to a new file
I did this :
awk ' /^-/ {system("echo" $0 ">" "newline.txt")} '
but this does not work , it returns an error that says :
Unnexpected token '('
I believe this is due to the () present in the line. How to overcome this issue?

You need to add proper spaces!
With your erronous awk ' /^-/ {system("echo" $0 ">" "newline.txt")} ', the shell command is essentially echo-xxxxxxxx()xxxxxxxx>newline.txt, which surely doesn't work. You need to construct a proper shell command inside the awk string, and obey awks string concatenation rules, i.e. your intended script should look like this (which is still broken, because $0 is not properly quoted in the resulting shell command):
awk '/^-/ { system("echo " $0 " > newline.txt") }'
However, if you really just need to echo $0 into a file, you can simply do:
awk '/^-/ { print $0 > "newline.txt" }'
Or even more simply
awk '/^-/' > newline.txt
Which essentially applies the default operation to all records matching /^-/, whereby the default operation is to print, which is short for neatly printing the current record, i.e. this script simply filters out the desired records. The > newline.txt redirection outside awk simply puts it into a file.

You don't need the system, echo commands, simply:
awk '/^-/ {print $1}' file > newfile
This will capture lines starting with - and truncate the rest if there's a space.
awk '/^-/ {print $0}' file > newfile
Would capture the entire line including spaces.
You could use grep also:
grep -o '^-.*' file > newfile
Captures any lines starting with -
grep -o '^-.*().*' file > newfile
Would be more specific and capture lines starting with - also containing ()

First of all for simple extraction of patterns from file, you do not need to use awk it is an overkill, grep would be more than enough for the task:
INPUT:
$ more file
123
-xxxxxxxx()xxxxxxxx
abc
-xyxyxxux()xxuxxuxx
123
abc
123
command:
$ grep -oE '^-[^(]+\(\).*' file
-xxxxxxxx()xxxxxxxx
-xyxyxxux()xxuxxuxx
explanations:
Option: -oE to define the output as the pattern and not the whole line (can be removed)
Regex: ^-[^(]+\(\).* will select lines that starts with - and contains ()
You can redirect your output to a new_file by adding > new_file at the end of your command.

Related

How to run file with afk script on another text file

I try to use awk in file(awk.007) on a file.txt
If some string begins with a letter "j" print.
In awk file I have this:
^J* {print $0}
Name Surname
Maths 2 5 6
I run it by cat file.txt | awk -f awk.007 but each time it show:
Syntax error: ^
If I run awk by command line everything workning fine.
The regular expression needs to be in //'s:
awk '/^J/' file.txt
^J* will match every line, btw, since * is 0 or more repetitions. And the default action for a pattern is print $0, so you don't really need to include that.

Print all columns except first using AWK

I have a file which contains file list. The file looks like this
$ cat filelist
D src/layouts/PersonAccount-Person Account Layout.layout
D src/objects/Case Account-Record List.object
I want to cut first two Columns and print only file names with along directory path. This list is dynamic. File name has spaces in between. So I can't use space as delimiter. How to get this using AWK command?
The output should be like this
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
Can you try this once:
bash-4.4$ cat filelist |awk '{$1="";print $0}'
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
else if you want to remove 2 columns it would be:
awk '{$1=$2="";print $0}'
This will produce the below output:
bash-4.4$ cat filelist |awk '{$1=$2="";print $0}'
Account Layout.layout
Account-Record List.object
Try this out:
awk -F" " '{$1=""; print $0}' filelist | sed 's/^ //c'
Here sed is used to remove the first space of the output line.
print only file names with along directory path
awk approach:
awk '{ sub(/^[[:space:]]*[^[:space:]][[:space:]]+/,"",$0) }1' filelist
The output:
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
----------
To extract only basename of the file:
awk -F'/' '{print $NF}' filelist
The output:
PersonAccount-Person Account Layout.layout
Case Account-Record List.object
This will do exactly what you want for your example :
sed -E 's/(.*)([ ][a-zA-Z0-9]+\/[a-zA-Z0-9]+\/[a-zA-Z0-9. -]+)/\2/g' filelist
Explanation :
Its matching your path (including spaces if there were any ) and then replacing the whole line with that one match. Easy peasy lemon squeezy :)
Regards!
A simple grep
grep -o '[^[:blank:]]*/.*' filelist
That's zero or more non-blank characters followed by a slash followed by the rest of the string.
This will not match any lines that don't have a slash
Here is a portable POSIX shell solution:
#!/bin/sh
cat "$#" |while read line; do
echo "${line#* * }"
done
This loops over each line of the given input file(s) (or else standard input) and prints the line without the first two spaces or the text that exists before them. It is not greedy.
Unlike some of the other answers here, this will preserve spacing (if any) in the rest of the line.
If you want that as a one-liner:
while read L < filelist; do echo "${L#* * }"; done
This will fail if the uppermost directory's name starts with a space. To work around that, you need to peel away the leading ten characters (which I assume are static):
#!/bin/sh
cat "$#" |while read line; do
echo "${line#??????????}"
done
As a one-liner, in bash, this can be simplified by using substrings:
while read L < filelist; do echo "${L:10}"; done

awk system does not take hyphens

I want to redirect the output of some command to awk and use system call in awk. But Awk does not accept flags with hyphen. For example, Lets say I have bunch of files, and I want to "cat" them. I would use ls -1 | awk '{ system(" cat " $1)}'
Now, if I want to print the line number also with -n then it does not work ls -1 | awk '{ system(" cat -n" $1)}'
You need a space between -n and the file name:
ls -1 | awk '{ system(" cat -n " $1)}'
Notes
-1 is not needed. ls implicitly prints 1 file per line when its output goes to a pipe.
Any file name with whitespace in it will cause this code to fail.
Parsing the output of ls is generally a bad idea. Both find and the shell offer superior handling of difficult file names.
John1024's helpful answer fixes your problem and contains helpful advice, but let me focus on the syntax aspects:
As a command string, cat -n <file> requires at least 1 space (or tab) between the n, which is an option, and <file>, which is an operand.
String concatenation works differently in awk than in the shell:
" cat -n" $1, despite the presence of a space between " cat -n" and $1, does not insert that space in the resulting string, because awk's string concatenation works by directly joining strings placed next to one another irrespective of intervening whitespace.
For instance, the following commands all yield string literal ab, irrespective of any whitespace between the operands of the string concatenation:
awk 'BEGIN { print "a""b" }'
awk 'BEGIN { print "a" "b" }'
awk 'BEGIN { s = "b"; print "a"s }'
awk 'BEGIN { s = "b"; print "a" s }'
this is not a proper use case for awk, you're better off with something like this
find . -maxdepth 1 -type f -exec cat -n {} \;

Finding contents of one file in another file

I'm using the following shell script to find the contents of one file into another:
#!/bin/ksh
file="/home/nimish/contents.txt"
while read -r line; do
grep $line /home/nimish/another_file.csv
done < "$file"
I'm executing the script, but it is not displaying the contents from the CSV file. My contents.txt file contains number such as "08915673" or "123223" which are present in the CSV file as well. Is there anything wrong with what I do?
grep itself is able to do so. Simply use the flag -f:
grep -f <patterns> <file>
<patterns> is a file containing one pattern in each line; and <file> is the file in which you want to search things.
Note that, to force grep to consider each line a pattern, even if the contents of each line look like a regular expression, you should use the flag -F, --fixed-strings.
grep -F -f <patterns> <file>
If your file is a CSV, as you said, you may do:
grep -f <(tr ',' '\n' < data.csv) <file>
As an example, consider the file "a.txt", with the following lines:
alpha
0891234
beta
Now, the file "b.txt", with the lines:
Alpha
0808080
0891234
bEtA
The output of the following command is:
grep -f "a.txt" "b.txt"
0891234
You don't need at all to for-loop here; grep itself offers this feature.
Now using your file names:
#!/bin/bash
patterns="/home/nimish/contents.txt"
search="/home/nimish/another_file.csv"
grep -f <(tr ',' '\n' < "${patterns}") "${search}"
You may change ',' to the separator you have in your file.
Another solution:
use awk and create your own hash(e.g. ahash), all controlled by yourself.
replace $0 to $i and you can match any fields you want.
awk -F"," '
{
if (nowfile==""){ nowfile = FILENAME; }
if(FILENAME == nowfile)
{
hash[$0]=$0;
}
else
{
if($0 ~ hash[$0])
{
print $0
}
}
} ' xx yy
I don't think you really need a script to perform what you're trying to do.
One command is enough. In my case, I needed an identification number in column 11 in a CSV file (with ";" as separator):
grep -f <(awk -F";" '{print $11}' FILE_TO_EXTRACT_PATTERNS_FROM.csv) TARGET_FILE.csv

grep match only lines in a specified range

Is it possible to use grep to match only lines with numbers in a pre-specified range?
For instance I want to list all lines with numbers in the range [1024, 2048] of a log that contain the word 'error'.
I would like to keep the '-n' functionality i.e. have the number of the matched line in the file.
Use sed first:
sed -ne '1024,2048p' | grep ...
-n says don't print lines, 'x,y,p' says print lines x-y inclusive (overrides the -n)
sed -n '1024,2048{/error/{=;p}}' | paste - -
Here /error/ is a pattern to match and = prints the line number.
Awk is a good tool for the job:
$ awk 'NR>=1024 && NR<=2048 && /error/ {print NR,$0}' file
In awk the variable NR contains the current line number and $0 contains the line itself.
The benefits with using awk is that you can easily change the output to display however you want it. For instance to separate the line number from line with a colon followed by a TAB:
$ awk 'NR>=1024 && NR<=2048 && /error/ {print NR,$0}' OFS=':\t' file

Resources