Delete empty lines in csv file - linux

I have a file with 4 million of lines, every line ends with the char $, but I mistakenly add a new line after the the line delimiter while scraping a website, so right now it is looking like this:
fist name, last name, phone, address, postal code, city, region,$
$
fist name, last name, phone, address, postal code, city, region,$
$
the new line '$' only shows up of course if I use :set list, but I'm trying to use this file for a bulk insert in mysql and I'm having problems with it now.
I would like to change the file to:
fist name, last name, phone, address, postal code, city, region,$
fist name, last name, phone, address, postal code, city, region,$
How can I do this? with sed or awk or even vi ? looked up around and what I found is not really applying to this case.
please don't take in consideration the extra empty line shown above.
Thanks in advance

To remove blank lines with sed:
sed -i '/^$/d' yourfile.csv
To remove lines consisting of a single $:
sed -i '/^$$/d' yourfile.csv
Most versions of sed support the -i switch; if yours does not you will need e.g. sed '/^$$/d' yourfile.csv > newfile.csv.
Removing blank lines with white space is more complicated. This usually works:
sed '/^ *$/d' yourfile.csv
If this is not sufficient, try checking also for tabs. For older sed's, this will work:
sed '/^[ X]*$/d' yourfile.csv
where X here a tab, entered via Control-V Tab.
Newer sed's will take a [ \t\r]* or \s* or [[:space:]]*, sometimes requiring a -E switch.

grep can filter lines by match (or negative match) against a regex. To exclude empty lines:
grep -v '^$' yourfile.csv > yourfile_fixed.csv

Here are your options:
With awk:
awk 'NF' file > tmp && mv tmp file
With sed (in-place changes so make sure to backup your file using -i.bak):
sed -i '/^$/d' file
With vi:
:g/^$/d

Related

Bash deleting a specific row in .dat file

So, I have this assignment which requires me to delete a certain line from a .dat file. Now my file is basically a phone book. I have a Bash script that adds the ID, name, last name, phone number, address, etc., to the .dat file. Now one of the flags is supposed to be -delete and it takes the parameter id. So, basically I need to implement the function where I'd put ./phonebook.sh -delete -id 7 and it would delete the row where the id is 7.
I tried using sed and awk, but nothing is working and it's frustrating. My current code for that short script (delete.sh) is:
id=$1
sed "/$id/d" phonebook.dat
Try this:
On Mac:
sed -i '' -e "/$id/d" phonebook.dat
Otherwise:
sed -i -e "/$id/d" phonebook.dat
By default, sed will output the results to stdout. So, your command was working, but the output wasn't going back into the file. The -i flag says that the file should be replaced with the results. -i is also meant to backup the original file. For example:
sed -i .bk -e "/$id/d" phonebook.dat
The above will create a copy of the original called: phonebook.dat.bk. However, to do in place replacement without a backup, you can specify no value for -i. On the MAC, sed really really really wants a value, so you can pass it an empty string ( making sure there is a space between the -i and the empty quotes ).
I'm making some assumptions because I don't know what the format of your dat file is. I'll assume that the id field is the first field and the file is comma delimited. If I'm wrong, you should be able to modify the below to fit your needs.
I personally like to use grep -v for this problem. From the --help:
-v, --invert-match select non-matching lines
Running this will output every line of a file that does not match your pattern.
id="$1"
grep -v "^${id}," phonebook.dat > phonebook.temp
mv phonebook.temp phonebook.dat
The pattern consists of
^: Beginning of the line
${id}: Your variable
,: Our assumed delimiter
The reason for specifying the beginning of the line to the first delimiter is to avoid deleting entries where the entered id ($1) is a substring of other ids. You wouldn't want to enter 22 and delete id 22 as well as id 122.

How to remove blank space between some words using sed?

I want to replace characters between specific words in a line (multiple lines). for example:
first second third | first line
first second third | second line
first second third | third line
first second third | forth line
....
I want to replace characters between third and first/second/third/forth etc...using sed or vi in linux.
If this question is already answered, can you please provide me the link?
Thanks!
You can use the following:
sed 's/ |.[^a-z]*//g' text.txt
or if you want to have a space after 'third':
sed 's/ |.[^a-z]*/ /g' text.txt
remember about the -i flag to make permanent changes.
sed -i 's/\ /whatever/g' ej.txt
-i: in file, means that changes are made directly in the file
-s: substitute
-'\ ': to recognize blank space
-g: all matches on each line
Try this
sed 's/second third[^a-zA-Z]*/second third/g' file
It will replace everything between third and this first letter. And if it works use -i if you want to modify the original file

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

Linux/Unix Replacing a pattern in a string and saving to a new file with sed

I have a task, to replace specific pattern in a string.
So far I tried commands like sed -e 's/text_to_find/text_to_replace/g' file
but I don't why it changed all string, not just a part which I wanted to change.
what I want to do is in every string that contains word china to add this Tomas_proxy.lt
To make it very clear, what I am looking for, there is file I am using:
987173,businesswirechina.com
988254,chinacfa.com
988808,1012china.com
989146,chinawise.ru
989561,chinaretailnews.com
989817,mobileinchina.cn
990894,cmt-china.com.cn
990965,chinajoy.net
992753,octaviachina.com
993238,chinadftzalex.com
993447,china-kena.com
And this is I would like to see in a new file
987173,Tomas_proxy.lt/businesswirechina.com
988254,Tomas_proxy.lt/chinacfa.com
988808,Tomas_proxy.lt/1012china.com
989146,Tomas_proxy.lt/chinawise.ru
989561,Tomas_proxy.lt/chinaretailnews.com
989817,Tomas_proxy.lt/mobileinchina.cn
990894,Tomas_proxy.lt/cmt-china.com.cn
990965,Tomas_proxy.lt/chinajoy.net
992753,Tomas_proxy.lt/octaviachina.com
993238,Tomas_proxy.lt/chinadftzalex.com
993447,Tomas_proxy.lt/china-kena.com
P.s. This is just example file, In real file I am using, not every line has word china ,there is 100000 strings and lets say about 500 has china
You can try this sed command
sed 's/,\(.*china\)/,Tomas_proxy.lt\/\1/' FileName
or
sed 's/,\(.*china\)/,Tomas_proxy.lt\/\1/' FileName > NewFile
or
sed -i.bak 's/,\(.*china\)/,Tomas_proxy.lt\/\1/' FileName
sed '/[Cc]hina/s/,/,Tomas_proxy.lt\//' File > New_File
In all the lines matching china / China (change if you don't want case check), replace the first , with ,Tomas_proxy.lt/. Output redirected to New_File.
If you want the changes to be in the same file, use -i (inplace option):
sed -i '/[Cc]hina/s/,/,Tomas_proxy.lt\//' File
Her is an awk version:
awk '/china/ {sub(/,/,"&Tomas_proxy.lt/")} 1' file
987173,Tomas_proxy.lt/businesswirechina.com
988254,Tomas_proxy.lt/chinacfa.com
988808,Tomas_proxy.lt/1012china.com
989146,Tomas_proxy.lt/chinawise.ru
989561,Tomas_proxy.lt/chinaretailnews.com
989817,Tomas_proxy.lt/mobileinchina.cn
990894,Tomas_proxy.lt/cmt-china.com.cn
990965,Tomas_proxy.lt/chinajoy.net
992753,Tomas_proxy.lt/octaviachina.com
993238,Tomas_proxy.lt/chinadftzalex.com
993447,Tomas_proxy.lt/china-kena.com
Search for china, if found, replace , with ,Tomas_proxy.lt/, then print all lines.
sed '/china/ s#,#,Tomas_proxy.lt/#' YourFile
based on your sample and assuming first , is the place to insert your text in the line

Replace whitespace with a comma in a text file in Linux

I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv

Resources