How to use sed to extract a field from a delimited file - text

Am using centos 7 linux
I do have a text file which a lot of lines in same format which is email,password
example:
test#test.com,test
i would like to use sed to only save test#test.com and remove ,testwhich means it's will remove from all lines starting from ','.

#Setop's answer is good - in general, using cut or awk is an usual practice while dealing with delimited files.
We can use sed as well, as per your question:
sed -i 's/,.*//' file # changes the file in-place
or, using two steps:
sed 's/,.*//' file > file.modified && mv file.modified file
s/,.*// replaces , and all characters after it with nothing
This can get trickier if you have multiple fields and want a small subset from it.

cut -d, -f1 yourfile
or
awk -F, '{print $1}'

Related

Eliminate multiple space from a file and modify the original file

Specify the command that removes multiple spaces from a text file, leaving a single space in their place. Extra requirements : Original file to be modified.
Managed to pull out those 3 commands:
awk '{$2=$2};1' filename.txt
tr -s '[:space:]' < filename.txt > filename.new && mv filename.new filename.txt
sed -i 's/\s\+/ /g' filename.txt
Not sure if using a 'temporary file' is the best way to do the trick. Is there any more efficient way to do the problem ? Doesn't matter if it is tr / sed / awk or anything else, you can post all of them.
Example input:
I'm just giving spaces
Output :
I'm just giving spaces
Edit: Still looking for more answers
I'd use ed over the non-standard sed -i (And non-portable RE in your example) if you want to alter the original file:
printf "%s\n" '1,$s/[[:space:]]\{2,\}/ /g' w | ed -s filename.txt
or with perl:
perl -pi -e 's/\s{2,}/ /g' filename.txt
The {2,} regular expression construct (\{2,\} for POSIX Basic Regular Expressions like sed and ed use) matches 2 or more of the previous token.
Both of these match any whitespace characters, not just space, because that's how your examples work. If the goal is to only compress multiple spaces, not spaces + tabs, switch out the [[:space:]] and \s for just a single space.
(Anything that modifies a file "in place", be it ed, sed -i, perl -i, or a regular editor, has a good chance that it's going to be using a temporary file under the hood, by the way. They just handle it for you so you don't have to do it manually like with your tr example.)

Use regex in grep while while using two files

I know that you can use regex in grep and use patterns from a file to search another file. But, can you combine these two options?
For example, from the file where the patterns come from (with the -f option for use patterns from a file), I only want to use the first column to search the second file.
I tried this:
grep -E '^(*)\b' -f file_1 file_2 > file_3
To grep the first column from file_1 with the * wildcard, but it is not working. Any ideas?
Grep doesn't use wildcards for patterns, it uses regular expressions, so (*) makes little sense.
If you want to extract the first column from a file, use cut -f1 or awk '{print $1}' (or sed or perl or whatever to extract it), the redirect to grep using the special - (i.e. standard input) as the source file:
cut -f1 file1 | grep -f- file_2 > file_3

Linux command to replace set of lines for a group of files under a directory

I need to replace first 4 header lines of only selected 250 erlang files (with extension .erl), but there are 400 erlang files in total in the directory+subdirectories, I need to avoid modifying the files which doesn't need the change.
I've the list of file names that are to be modified, but don't know how to make my linux command to make use of them.
sed -i '1s#.*#%% This Source Code Form is subject to the terms of the Mozilla Public#' *.erl
sed -i '2s#.*#%% License, v. 2.0. If a copy of the MPL was not distributed with this file,#' *.erl
sed -i '3s#.*#%% You can obtain one at http://mozilla.org/MPL/2.0/.#' *.erl
sed -i '4s#.*##' *.erl
in the above commands instead of passing *.erl I want to pass those list of file names which I need to modify, doing that one by one will take me more than 3 days to complete it.
Is there any way to do this?
Iterate over the shortlisted file names using awk and use xargs to execute the sed. You can execute multiple sed commands to a file using -e option.
awk '{print $1}' your_shortlisted_file_lists | xargs sed -i -e first_sed -e second_sed $1
xargs gets the file name from awk in a $1 variable.
Try this:
< file_list.txt xargs -1 sed -i -e 'first_cmd' -e 'second_cmd' ...
Not answering your question but a suggestion for improvement. Four sed commands for replacing header is inefficient. I would instead write the new header into a file and do the following
sed -i -e '1,3d' -e '4{r header' -e 'd}' file
will replace the first four lines of the file with header.
Another concern with your current s### approach is you have to watch for special chars \, & and your delimiter # in the text you are replacing.
You can apply the sed c (for change) command to each file of your list :
while read file; do
sed -i '1,4 c\
%% This Source Code Form is subject to the terms of the Mozilla Public\
%% License, v. 2.0. If a copy of the MPL was not distributed with this file,\
%% You can obtain one at http://mozilla.org/MPL/2.0/.\
' "$file"
done < filelist
Let's say you have a file called file_list.txt with all file names as content:
file1.txt
file2.txt
file3.txt
file4.txt
You can simply read all lines into a variable (here: files) and then iterate through each one:
files=`cat file_list.txt`
for file in $files; do
echo "do something with $file"
done

grep for a line in a file then remove the line

$ cat example.txt
Yields:
example
test
example
I want to remove 'test' string from this file.
$ grep -v test example.txt > example.txt
$ cat example.txt
$
The below works, but I have a feeling there is a better way!
$ grep -v test example.txt > example.txt.tmp;mv example.txt.tmp example.txt
$ cat example.txt
example
example
Worth noting that this is going to be on a file with over 10,000 lines.
Cheers
You could use sed,
sed -i '/test/d' example.txt
-i saves the changes made to that file. so you don't need to use a redirection operator.
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
You're doing it the right way but use an && before the mv to make sure the grep succeeded or you'll zap your original file:
grep -F -v test example.txt > example.txt.tmp && mv example.txt.tmp example.txt
I also added the -F options since you said you want to remove a string, not a regexp.
You COULD use sed -i but then you need to worry about figuring out and/or escaping sed delimiters and sed does not support searching for strings so you'd need to try to escape every possible combination of regexp characters in your search string to try to make sed treat them as literal chars (a process you CANNOT automate due to the position-sensitive nature of regexp chars) and all it'd save you is manually naming your tmp file since sed uses one internally anyway.
Oh, one other option - you could use GNU awk 4.* with "inplace editing". It also uses a tmp file internally like sed does but it does support string operations so you don't need to try to escape RE metacharacters and it doesn't have delimiters as part of the syntax to worry about:
awk -i inplace -v rmv="test" '!index($0,rmv)' example.txt
Any grep/sed/awk solution will run the in blink of an eye on a 10,000 line file.

Cut and Awk command in linux

How can I extract word between 2 words in a file using cut and awk command.
Lets say: I have a file with below content.
This is my file and it has lots of content along wiht password and want to extract PASSWORD=MYPASSWORDISHERE==and file is ending here.
exptected output
1) using awk command linux.
2) using cut command linux.
MYPASSWORDISHERE==
Using awk actually gawk
awk '{match($0,/PASSWORD=(.*==)/,a); print a[1];}' input.txt
Using cut you can try, I'm not sure if it works with your file
cut -d"=" -s -f2,3 --output-delimiter="==" input.txt

Resources