delete ';' at the end of each line - linux

I have a huge (10+ GB) .csv file on a Linux server. The lines look somehow like this:
6;20000327;20000425;990099,0;20000327;LL;UBXO;7;-1;62;F;30;001;NO;NO;wgB;0;99;0002;5530;001;708;196;1;AA;N;N;100;53,81;0;0;0;1;1;;1;
6;20000327;20000425;990099,0;20000425;LL;OLD*;62;62;92;F;30;001;NO;NO;ueB;0;99;0002;XXXX;001;;;1;AA;N;N;;;0;0;1;0;0;;30;
I am searching for a fast script to do the following:
change any occurrence of <number>,<number> to <number>.<number>
delete the last semicolon of each line
I have especially problems with the second one, because the script shouldn't mind if it is a Linux file or a windows file.
I tried to do it with sed but failed thus far.
[edit]
I finally used a mix of Dennis Williams and SiegeX solutions:
sed 's/;\([0-9]*\),\([0-9]*\);/;\1.\2;/g;s/;\(\r\?\)$/\1/' inputfile
(the part with s/;[[:blank:]]*$// didn't work at my file...)

sed 's/;\([0-9]*\),\([0-9]*\);/;\1.\2;/g;s/;[[:blank:]]*$//' ./infile

$ cat file
6;20000327;20000425;990099,0;20000327;LL;UBXO;7;-1;62;F;30;001;NO;NO;wgB;0;99;0002;5530;001;708;196;1;AA;N;N;100;53,81;0;0;0;1;1;;1;
6;20000327;20000425;990099,0;20000425;LL;OLD*;62;62;92;F;30;001;NO;NO;ueB;0;99;0002;XXXX;001;;;1;AA;N;N;;;0;0;1;0;0;;30;
$ perl -p -e 's/(\d+),(\d+)/\1.\2/g; s/;$//' file
6;20000327;20000425;990099.0;20000327;LL;UBXO;7;-1;62;F;30;001;NO;NO;wgB;0;99;0002;5530;001;708;196;1;AA;N;N;100;53.81;0;0;0;1;1;;1
6;20000327;20000425;990099.0;20000425;LL;OLD*;62;62;92;F;30;001;NO;NO;ueB;0;99;0002;XXXX;001;;;1;AA;N;N;;;0;0;1;0;0;;30
Note: perl handles different line endings for you.

Give this a try:
sed 's/,/./g;s/;\r\?$//' inputfile
To preserve the carriage return if it's there:
sed 's/,/./g;s/;\(\r\?\)$/\1/' inputfile

If you are handy with perl, you You can use a perl one liner to do these things. Here's an example of you might do the number change:
perl -i -pe 's/(\d),(\d)/$1\.$2/' yourfile
be very careful with the -i option, as it causes perl to operate on the existing file in place.

Related

remove \n and keep space in linux

I have a file contained \n hidden behind each line:
input:
s3741206\n
s2561284\n
s4411364\n
s2516482\n
s2071534\n
s2074633\n
s7856856\n
s11957134\n
s682333\n
s9378200\n
s1862626\n
I want to remove \n behind
desired output:
s3741206
s2561284
s4411364
s2516482
s2071534
s2074633
s7856856
s11957134
s682333
s9378200
s1862626
however, I try this:
tr -d '\n' < file1 > file2
but it goes like below without space and new line
s3741206s2561284s4411364s2516482s2071534s2074633s7856856s11957134s682333s9378200s1862626
I also try sed $'s/\n//g' -i file1 and it doesn't work in mac os.
Thank you.
This is a possible solution using sed:
sed 's/\\n/ /g'
with awk
awk '{sub(/\\n/,"")} 1' < file1 > file2
What you are describing so far in your question+comments doesn't make sense. How can you have a multi-line file with a hidden newline character at the end of each line? What you show as your input file:
s3741206\n
s2561284\n
s4411364\n
etc.
where each "\n" above according to your comment is a single newline character "\n" is impossible. If those "\n"s were newline characters then your file would simply look like:
s3741206
s2561284
s4411364
etc.
There's really only 2 possibilities I can think of:
You are wrongly interpreting what you are seeing in your input file
and/or using the wrong terminology and you actually DO have \r\n
at the end of every line. Run cat -v file to see the \rs as
^Ms and run dos2unix or similar (e.g. sed 's/\r$//' file) to
remove the \rs - you do not want to remove the \ns or you will
no longer have a POSIX text file and so POSIX tools will exhibit
undefined behavior when run on it. If that doesn't work for you then
copy/paste the output of cat -v file into your question so we can
see for sure what is in your file.
Or:
It's also entirely possible that your file is a perfectly fine POSIX
text file as-is and you are incorrectly assuming you will have a
problem for some reason so also include in your question a
description of the actual problem you are having, include an example
of the command you are executing on that input file and the output
you are getting and the output you expected to get.
You could use bash-native string substitution
$ cat /tmp/newline
s3741206\n
s2561284\n
s4411364\n
s2516482\n
s2071534\n
s2074633\n
s7856856\n
s11957134\n
s682333\n
s9378200\n
s1862626\n
$ for LINE in $(cat /tmp/newline); do echo "${LINE%\\n}"; done
s3741206
s2561284
s4411364
s2516482
s2071534
s2074633
s7856856
s11957134
s682333
s9378200
s1862626

Why do SED, GREP or AWK fail to remove blank lines from text files?

I am trying to remove blank lines from large text files. For some reason it seems that neither
sed "/^$/d" file.txt > trimmed.txt
nor
grep -v "^$" file.txt > trimmed.txt
nor
awk /./ file.txt > trimmed.txt
do anything. Any thoughts?
UPDATE
Thanks to the great comments by #fedorqui & #Sebastian Stigler the problem was quickly identified as DOS/Windows carriage returns (^M$) at the end of each line.
While I appreciate Sebatian's suggestion to reformat the files using dos2unix I would rather have a solution using the tools generally available in most linux distributions.
The solution that worked for me was an answer given by #Jeremy Stein to this question [Can't remove empty lines with sed regex:
sed -n '/[!-~]/p' file.txt > trimmed.txt
I just tried the commandos with a toy example and they work fine as long the file.txt was a file with unix newlines. If the file contains windows newlines then none of the commands were able to remove the blank lines.
You can use the dos2unix linux tool to convert the newlines in file.txt to unix newlines. If you need the output on a windows system then you can use unix2dos to convert trimmed.txt into a file with windows newlines.

find matching text and replace next line

I'm trying to find a line in a file and replace the next line with a specific value. I tried sed, but it seems to not like the \n. How else can this be done?
The file looks like this:
<key>ConnectionString</key>
<string>anything_could_be_here</string>
And I'd like to change it to this
<key>ConnectionString</key>
<string>changed_value</string>
Here's what I tried:
sed -i '' "s/<key>ConnectionString<\/key>\n<string><\/string>/<key>ConnectionString<\/key>\n<string>replaced_text<\/string>/g" /path/to/file
One way:
Sample file
$ cat file
Cygwin
Unix
Linux
Solaris
AIX
Using sed, replacing the next line after the pattern 'Unix' with 'hi':
$ sed '/Unix/{n;s/.*/hi/}' file
Cygwin
Unix
hi
Solaris
AIX
For your specific question:
$ sed '/<key>ConnectionString<\/key>/{n;s/<string>.*<\/string>/<string>NEW STRING<\/string>/}' your_file
<key>ConnectionString</key>
<string>NEW STRING</string>
This might work for you (GNU sed):
sed '/<key>ConnectionString<\/key>/!b;n;c<string>changed_value</string>' file
!b negates the previous address (regexp) and breaks out of any processing, ending the sed commands, n prints the current line and then reads the next into the pattern space, c changes the current line to the string following the command.
It works. Additionaly is interested to mention that if you write,
sed '/<key>ConnectionString<\/key>/!b;n;n;c<string>changed_value</string>' file
Note the two n's, it replaces after two lines and so forth.

file edit- commandline unix

I want to edit a file from the command line, because opening it in vim or other editors takes forever (a large file). I want to add a string ('chr') to the beginning of every line that is not commented out with a #. The command I am using is this:
cat '/home/me/37.vcf' | sed s/^/chr/>'sp.vcf'
But it adds a chr to the beginning of EVERY line and a > to the END of every line. I don't want either of those things to occur.
Can anyone offer any suggestions to improve my results?
To apply the substitution to only the lines that don't start with a #:
sed '/^[^#]/s/^/chr/' file > output
Note: the command cat is for concatenating files, it is useless here.
You can syntax error in your sed command. Use this syntactically correct sed command:
sed -E 's/^([^#]|$)/chr/' /home/me/37.vcf > sp.vcf
OR on Linux:
sed -r 's/^([^#]|$)/chr/' /home/me/37.vcf > sp.vcf
This might work for you (GNU sed):
sed '/^\s*#/!s/^/chr/' file > new_file

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Resources