parsing data in file - linux

I have a text file with the following type of data in it below:
Example:
10212012115655_113L_-247R_247LRdiff_0;
10212012115657_114L_-246R_246LRdiff_0;
10212012115659_115L_-245R_245LRdiff_0;
10212012113951_319L_-41R_41LRdiff_2;
10212012115701_116L_-244R_244LRdiff_0;
10212012115703_117L_-243R_243LRdiff_0;
10212012115705_118L_-242R_242LRdiff_0;
10212012113947_317L_-43R_43LRdiff_0;
10212012114707_178L_-182R_182LRdiff_3;
10212012115027_278L_-82R_82LRdiff_1;
I would like to copy all the data lines that have
1) _2 _3 _1 at the end of it into another file along with
2) stripping out the semicolon at the end of it.
So at the end the data in the file will be
Example:
10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1
How can I go about doing this?
I'm using linux ubuntu 10.04 64bit
Thanks

Here's one way using sed:
sed -n 's/\(.*_[123]\);$/\1/p' file.txt > newfile.txt
Here's one way using grep:
grep -oP '.*_(1|2|3)(?=;$)' file.txt > newfile.txt
Contents of newfile.txt:
10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1

If the format is always the same and there is only a semi-colon at the very end of each line you can use grep to find the lines and then sed to replace the ;:
grep -P "_(1|2|3);$" your_file | sed 's/\(.*\);$/\1/' > your_new_file
The -P in the grep command tells it to use the Perl-regex interpreter for parsing. Alternatively, you could use egrep (if available).

here is the awk solution if at all you are interested:
awk '/_[321];$/{gsub(/;/,"");print}' your_file
tested below:
> awk '/_[321];$/{gsub(/;/,"");print}' temp
10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1

tr -c ";" "\n" > newfile
grep '*_[123]$' newfile > newfile
This should work. At first you translate all ; to \n and save result to destination file. Then use grep to match the lines only containing *_[123] at the end and save matching result to that file again that will replace all previous data. To mark at the end I used $.
Some examples using tr and grep in case you are not familiar to it.

Related

How to generate a UUID for each line in a file using AWK or SED?

I need to append a UUID ( newly generated unique for each line) to each line of a file. I would prefer to use SED or AWK for this activity and take advantage of UUIDGEN executable on my linux box. I cannot figure out how to generate the the UUID for each line and append it.
I have tried:
awk '{print system(uuidgen) $1} myfile.csv
sed -i -- 's/^/$(uuidgen)/g' myfile.csv
And many other variations that didn't work. Can this be done with SED or AWK, or should I be investigating another solution that is not shell script based?
Sincerely,
Stephen.
Using bash, this will create a file outfile.txt with a concatenated uuid:
NOTE: Please run which bash to verify the location of your copy of bash on your system. It may not be located in the same location used in the script below.
#!/usr/local/bin/bash
while IFS= read -r line
do
uuid=$(uuidgen)
echo "$line $uuid" >> outfile.txt
done < myfile.txt
myfile.txt:
john,doe
mary,jane
albert,ellis
bob,glob
fig,newton
outfile.txt
john,doe 46fb31a2-6bc5-4303-9783-85844a4a6583
mary,jane a14bb565-eea0-47cd-a999-90f84cc8e1e5
albert,ellis cfab6e8b-00e7-420b-8fe9-f7655801c91c
bob,glob 63a32fd1-3092-4a72-8c24-7b01c400820c
fig,newton 63d38ad9-5553-46a4-9f24-2e19035cc40d
Just tweaking the syntax on your attempt, something like this should work:
awk '("uuidgen" | getline uuid) > 0 {print uuid, $0} {close("uuidgen")}' myfile.csv
For example:
$ cat file
a
b
c
$ awk '("uuidgen" | getline uuid) > 0 {print uuid, $0} {close("uuidgen")}' file
52a75bc9-e632-4258-bbc6-c944ff51727a a
24c97c41-d0f4-4cc6-b0c9-81b6d89c5b77 b
76de9987-a60f-4e3b-ba5e-ae976ab53c7b c
The right solution is to use other shell commands though since the awk isn't buying you anything:
$ xargs -n 1 printf "%s %s\n" $(uuidgen) < file
763ed28c-453f-47f4-9b1b-b2f972b2cc7d a
763ed28c-453f-47f4-9b1b-b2f972b2cc7d b
763ed28c-453f-47f4-9b1b-b2f972b2cc7d c
Try this
awk '{ "uuidgen" |& getline u; print u, $1}' myfile.csv
if you want to append instead of prepend change the order of print.
Using xargs is simpler here:
paste -d " " myfile.csv <(xargs -I {} uuidgen {} < myfile.csv)
This will call uuidgen for each line of myfile.csv
You can use paste and GNU sed:
paste <(sed 's/.*/uuidgen/e' file) file
This uses the GNU execute extension e to generate a UUID per line, then paste pastes the text back together. Use the -d paste flag to change the delimiter from the default tab, to whatever you want.

How to remove lines from text file not starting with certain characters (sed or grep)

How do I delete all lines in a text file which do not start with the characters #, & or *? I'm looking for a solution using sed or grep.
Deleting lines:
With grep
From http://lowfatlinux.com/linux-grep.html :
The grep command selects and prints lines from a file (or a bunch of files) that match a pattern.
I think you can do something like this:
grep -v '^[\#\&\*]' yourFile.txt > output.txt
You can also use sed to do the same thing (check http://lowfatlinux.com/linux-sed.html ):
sed '^[\#\&\*]/d' yourFile.txt > output.txt
It's up to you to decide
Filtering lines:
My mistake, I understood you wanted to delete the lines. But if you want to "delete" all other lines (or filter the lines starting with the specified characters), then grep is the way to go:
grep '^[\#\&\*]' yourFile.txt > output.txt
sed -n '/^[#&*].*/p' input.txt > output.txt
this should work.
sed -ni '/^[#&*].*/p' input.txt
this one will edit the input file directly, be careful +
egrep '^(&|#|\*)' input.txt > output.txt

How to remove a special character in a string in a file using linux commands

I need to remove the character : from a file. Ex: I have numbers in the following format:
b3:07:4d
I want them to be like:
b3074d
I am using the following command:
grep ':' source.txt | sed -e 's/://' > des.txt
I am new to Linux. The file is quite big & I want to make sure I'm using the write command.
You can do without the grep:
sed -e 's/://g' source.txt > des.txt
The -i option edits the file in place.
sed -i 's/://' source.txt
the first part isn't right as it'll completely omit lines which don't contain :
below is untested but should be right. The g at end of the regex is for global, means it should get them all.
sed -e 's/://g' source.txt > out.txt
updated to better syntax from Jon Lin's answer but you still want the /g I would think

Bash - How to remove all white spaces from a given text file?

I want to remove all the white spaces from a given text file.
Is there any shell command available for this ?
Or, how to use sed for this purpose?
I want something like below:
$ cat hello.txt | sed ....
I tried this : cat hello.txt | sed 's/ //g' .
But it removes only spaces, not tabs.
Thanks.
$ man tr
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard
input, writing to standard output.
In order to wipe all whitespace including newlines you can try:
cat file.txt | tr -d " \t\n\r"
You can also use the character classes defined by tr (credits to htompkins comment):
cat file.txt | tr -d "[:space:]"
For example, in order to wipe just horizontal white space:
cat file.txt | tr -d "[:blank:]"
Much simpler to my opinion:
sed -r 's/\s+//g' filename
I think you may use sed to wipe out the space while not losing some infomation like changing to another line.
cat hello.txt | sed '/^$/d;s/[[:blank:]]//g'
To apply into existing file, use following:
sed -i '/^$/d;s/[[:blank:]]//g' hello.txt
Try this:
sed -e 's/[\t ]//g;/^$/d'
(found here)
The first part removes all tabs (\t) and spaces, and the second part removes all empty lines
If you want to remove ALL whitespace, even newlines:
perl -pe 's/\s+//g' file
This answer is similar to other however as some people have been complaining that the output goes to STDOUT i am just going to suggest redirecting it to the original file and overwriting it. I would never normally suggest this but sometimes quick and dirty works.
cat file.txt | tr -d " \t\n\r" > file.txt
Easiest way for me:
echo "Hello my name is Donald" | sed s/\ //g
This is probably the simplest way of doing it:
sed -r 's/\s+//g' filename > output
mv ouput filename
Dude, Just python test.py in your terminal.
f = open('/home/hduser/Desktop/data.csv' , 'r')
x = f.read().split()
f.close()
y = ' '.join(x)
f = open('/home/hduser/Desktop/data.csv','w')
f.write(y)
f.close()
Try this:
tr -d " \t" <filename
See the manpage for tr(1) for more details.
hmm...seems like something on the order of sed -e "s/[ \t\n\r\v]//g" < hello.txt should be in the right ballpark (seems to work under cygwin in any case).

Print the last line of a file, from the CLI

How to print just the last line of a file?
$ cat file | awk 'END{print}'
Originally answered by Ventero
Use the right tool for the job. Since you want to get the last line of a file, tail is the appropriate tool for the job, especially if you have a large file. Tail's file processing algorithm is more efficient in this case.
tail -n 1 file
If you really want to use awk,
awk 'END{print}' file
EDIT : tail -1 file deprecated
Is it a must to use awk for this? Why not just use tail -n 1 myFile ?
Find out the last line of a file:
Using sed (stream editor): sed -n '$p' fileName
Using tail: tail -1 fileName
using awk: awk 'END { print }' fileName
You can achieve this using sed as well. However, I personally recommend using tail or awk.
Anyway, if you wish to do by sed, here are two ways:
Method 1:
sed '$!d' filename
Method2:
sed -n '$p' filename
Here, filename is the name of the file that has data to be analysed.

Resources