Search a string in a file, then print the lines that start with that string - linux

I have an assignment and I have no idea when it comes to managing files, reading and writing. Here's my main problem:
I have a script that manages a address book, at the moment the menu is finished, functions are being used but I don't know how to search or write a file.
The first "option" gives the user the option (duh!) to search the address book by the contact name. The pattern I want to use is something along the lines of "name:address:email:phone", letting the user to put Spaces in the name, address but not email nor phone, and only numbers in the last one. I believe I could achieve this with Regular Expressions, which I understand a bit from Java lessons.
How can I do this, then? I know grep may be useful, but I don' know of the parameters even after reading the man pages. Parsing line by line could be done with for line in $(file) but still not sure.

If you're allowed to use grep, then you probably may use awk, and that's what I would prefer for most parts of your assignment.
Looking up a contact by name:
awk -v name="Anton Kovalenko" -F: '$1==name' "$file"

Here's one way to do it:
grep "^something" $file | while read line
do
echo $line; #do whatever you want with your $line here
done

Related

Script to extract strings between two strings in linux

I am trying to write a little script that will let me "org-capture" articles from my rss-reader (newsboat). So my scenario is this: I will pipe the article to a script; however, the article gets piped in one line, like this:
Title: ABC boss quits over Australian political interference claims Author: Date: Thu, 27 Sep 2018 09:39:16 +0200 Link: https://www.bbc.co.uk/news/world-australia-45661871 The broadcaster's chair quits amid allegations the government leaned on him to dismiss two journalists.
So what I need to do is to consistently store the link and the title in a variable and then call a command with these variables (emacsclient org-protocol:/ ...)
So basically I need this:
TITLE="ABC boss quits over Australian political interference claims"
URL="https://www.bbc.co.uk/news/world-australia-45661871"
I considered using awk or sed, but they work best for separate lines. So, I thought maybe split the single line at 'Title:', 'Author:', 'Date:' and 'Link:' and then extract with awk/sed.
I found similar use cases and questions here, but not quite the same. I want a pretty minimal script without necessarily using python.
Am I on the right track?
Thanks for helping out.
With GNU awk for the 3rd arg to match():
$ cat tst.awk
match($0,/^Title:\s*(.*)\s+Author:\s*(.*)\s+Date:\s*(.*)\s+Link:\s*(\S+)\s+(.*)/,a) {
printf "TITLE=\"%s\"\n", a[1]
printf "URL=\"%s\"\n", a[4]
}
$ awk -f tst.awk file
TITLE="ABC boss quits over Australian political interference claims"
URL="https://www.bbc.co.uk/news/world-australia-45661871"
I showed how to save all the other fields too so you can also do anything else you need to with your input.
This might work for you (GNU sed):
sed -r 's/^Title: (.*) Author:.* Link: (\S+).*/TITLE="\1"\nURL="\2"/' file
Use pattern matching to extract the fields required. The first may contain spaces so match on the key Author:. The second is a string of non-space characters following the key Link:.

concatenate two strings and one variable using bash

I need to generate filename from three parts, two strings, and one variable.
for f in `cat files.csv`; do echo fastq/$f\_1.fastq.gze; done
files.csv has the following lines:
Sample_11
Sample_12
I need to generate the following:
fastq/Sample_11_1.fastq.gze
fastq/Sample_12_1.fastq.gze
My problem is that I got the below files:
_1.fastq.gze_11
_1.fastq.gze_12
the string after the variable deletes the string before it.
I appreciate any help
Regards
By the way your idiom: for f in cat files.csv should be avoid. Refer: Dangerous Backticks
while read f
do
echo "fastq/${f}/_1.fastq.gze"
done < files.csv
You can make it a one-liner with xargs and printf.
xargs printf 'fastq/%s_1.fastq.gze\n' <files.csv
The function of printf is to apply the first argument (the format string) to each argument in turn.
xargs says to run this command on as many files as it can fit onto the command line (splitting it up into multiple invocations if the input file is too large to fit all the arguments onto a single command line, subject to the ARG_MAX constant in your kernel).
Your best bet, generally, is to wrap the variable name in braces. So, in this case:
echo fastq/${f}_1.fastq.gz
See this answer for some details about the general concept, as well.
Edit: An additional thought looking at the now-provided output makes me think that this isn't a coding problem at all, but rather a conflict between line-endings and the terminal/console program.
Specifically, if the CSV file ends its lines with just a carriage return (ASCII/Unicode 13), the end of Sample_11 might "rewind" the line to the start and overwrite.
In that case, based loosely on this article, I'd recommend replacing cat (if you understandably don't want to re-architect the actual script with something like while) with something that will strip the carriage returns, such as:
for f in $(tr -cd '\011\012\040-\176' < temp.csv)
do
echo fastq/${f}_1.fastq.gze
done
As the cited article explains, Octal 11 is a tab, 12 a line feed, and 40-176 are typeable characters (Unicode will require more thinking). If there aren't any line feeds in the file, for some reason, you probably want to replace that with tr '\015' '\012', which will convert the carriage returns to line feeds.
Of course, at that point, better is to find whatever produces the file and ask them to put reasonable line-endings into their file...

how to use do loop to read several files with similar names in shell script

I have several files named scale1.dat, scale2.dat scale3.dat ... up to scale9.dat.
I want to read these files in do loop one by one and with each file I want to do some manipulation (I want to write the 1st column of each scale*.dat file to scale*.txt).
So my question is, is there a way to read files with similar names. Thanks.
The regular syntax for this is
for file in scale*.dat; do
awk '{print $1}' "$file" >"${file%.dat}.txt"
done
The asterisk * matches any text or no text; if you want to constrain to just single non-zero digits, you could say for file in scale[1-9].dat instead.
In Bash, there is a non-standard additional glob syntax scale{1..9}.dat but this is Bash-only, and so will not work in #!/bin/sh scripts. (Your question has both sh and bash so it's not clear which you require. Your comment that the Bash syntax is not working for you suggests that you may need a POSIX portable solution.) Furthermore, Bash has something called extended globbing, which allows for quite elaborate pattern matching. See also http://mywiki.wooledge.org/glob
For a simple task like this, you don't really need the shell at all, though.
awk 'FNR==1 { if (f) close (f); f=FILENAME; sub(/\.dat/, ".txt", f); }
{ print $1 >f }' scale[1-9]*.dat
(Okay, maybe that's slightly intimidating for a first-timer. But the basic point is that you will often find that the commands you want to use will happily work on multiple files, and so you don't need shell loops at all in those cases.)
I don't think so. Similar names or not, you will have to iterate through all your files (perhaps with a for loop) and use a nested loop to iterate through lines or words or whatever you plan to read from those files.
Alternatively, you can copy your files into one (say, scale-all.dat) and read that single file.

Scripting-Search Multiple Strings

I have a script (./lookup) that will search a file ($c). The file will contain a list of cities. What I would like to do be able to search the file for what the user enters as an argument (./lookup Miami). For example; I can make the script return what I want if it is a single word city (Miami), but I can't figure out a way to make it work for 2 or more words (Los Angeles). I can get the single strings to return what I want with the following.
grep $1 $c
I was thinking about a loop, but I am not sure on how to do that as I am new to scripting and Linux. Thanks for any help.
Whenever arguments could possibly contain spaces, proper quoting is essential in Bash:
grep "$1" "$c"
The user will need to say ./lookup "Los Angeles". If you don't like that, you can try:
grep "$*" "$c"
Then all arguments to the script will be passed together as one string to grep.

How do I grep for entire, possibly wrapped, lines of code?

When searching code for strings, I constantly run into the problem that I get meaningless, context-less results. For example, if a function call is split across 3 lines, and I search for the name of a parameter, I get the parameter on a line by itself and not the name of the function.
For example, in a file containing
...
someFunctionCall ("test",
MY_CONSTANT,
(some *really) - long / expression);
grepping for MY_CONSTANT would return a line that looked like this:
MY_CONSTANT,
Likewise, in a comment block:
/////////////////////////////////////////
// FIXMESOON, do..while is the wrong choice here, because
// it makes the wrong thing happen
/////////////////////////////////////////
Grepping for FIXMESOON gives the very frustrating answer:
// FIXMESOON, do..while is the wrong choice here, because
When there are thousands of hits, single line results are a little meaningless. What I would like to do is have grep be aware of the start and stop points of source code lines, something as simple as having it consider ";" as the line separator would be a good start.
Bonus points if you can make it return the entire comment block if the hit is inside a comment.
I know you can't do this with grep alone. I also am aware of the option to have grep return a certain number of lines of context. Any suggestions on how to accomplish under Linux? FYI my preferred languages are C and Perl.
I'm sure I could write something, but I know that somebody must have already done this.
Thanks!
You can use pcregrep with the -M option (multiline matching; pcregrep is grep with Perl-compatible regular expressions). Something like:
pcregrep -M ";*\R*.*thingtosearchfor*\R*.*;.*"
Here's an example using awk.
$ cat file
blah1
blah2
function1 ("test",
MY_CONSTANT,
(some *really) - long / expression);
function2( one , two )
blah3
blah4
$ awk -vRS=")" '/function1/{gsub(".*function1","function1");print $0RT}' file
function1 ("test",
MY_CONSTANT,
(some *really)
the concept behind: RS is record separator. by setting it to ")", then every record in your file is separated by ")" instead of newline. This make it easy to find your "function1" since you can then "grep" for it. If you don't use awk, the same concept can be applied using "splitting" on ")".
You can write a command line using grep with the options that give you the line number and the filename, then xarg these results into awk to parse these columns and then use a little script from you to display the N lines surrounding that line? :)
If this isn't an academic endeavour you could just use cscope (for C code only though). If you are willing to drop the requirement to search in comments ctags should be enough (and it also supports Perl).
I had a situation in which I had an xml file full of the names of zip files in an xml style format, that is, with carrots bracketing the names of the files, say example.zip<\stuff>
I used awk to change all carrots into newlines then used grep :)

Resources