entering text in a file at specific locations by identifying the number being integer or real in linux - linux

I have an input like below
46742 1 48276 48343 48199 48198
46744 1 48343 48344 48200 48199
46746 1 48344 48332 48201 48200
48283 3.58077402e+01 -2.97697746e+00 1.50878647e+02
48282 3.67231688e+01 -2.97771595e+00 1.50419488e+02
48285 3.58558188e+01 -1.98122787e+00 1.50894850e+02
Each segment with the 2nd entry like 1 being integer is like thousands of lines and then starts the segment with the 2nd entry being real like 3.58077402e+01
Before anything beings I have to input a text like
*Revolved
*Gripped
*Crippled
46742 1 48276 48343 48199 48198
46744 1 48343 48344 48200 48199
46746 1 48344 48332 48201 48200
*Cracked
*Crippled
48283 3.58077402e+01 -2.97697746e+00 1.50878647e+02
48282 3.67231688e+01 -2.97771595e+00 1.50419488e+02
48285 3.58558188e+01 -1.98122787e+00 1.50894850e+02
so I need to enter specific texts at those locations. It is worth mentioning that the file is space delimited and not tabs delimited and that the text starting with * has to be at the very left of the line without spacing. The format of the rest of the file should be kept too.
Any suggestions with sed or awk would be highly appreaciated!
The text in the beginning could entered directly so that is not a prime problem since that is the start of the file, problematic is the second bunch of line so identify that the second entry has turned to real.

An awk with fixed strings:
awk 'BEGIN{print "*Revolved\n*Gripped\n*Crippled"}
match($2,"\+")&&!pr{print "*Cracked\n*Crippled";pr=1}1' yourfile
match($2,"\+")&&!pr : When + char is found at $2 field(real number) and pr flag is null.

Related

Split a big text file into multiple smaller one on set parameter of regex

I have a large text file looking like:
....
sdsdsd
..........
asdfhjgjksdfk dfkaskk sdkfk skddkf skdf sdk ssaaa akskdf sdksdfsdf ksdf sd kkkkallwow.
sdsdllla lsldlsd lsldlalllLlsdd asdd. sdlsllall asdsdlallOEFOOASllsdl lsdlla.
slldlllasdlsd.ss;sdsdasdas.
......
ddss
................
asdfhjgjksdfk ddjafjijjjj.dfsdfsdfsdfsi dfodoof ooosdfow oaosofoodf aosolflldlfl , dskdkkfkdsa asddf;akkdfkdkk . sdlsllall asdsdlallOEFOOASllsdl lsdlla.
slldlllasdlsd.ss;sdsdasdas.
.....
xxxx
.......
asdfghjkl
I want to split the text files into multiple small text files and save them as .txt in my system on occurences of ..... [multiple period markers] saved like
group1_sdsdsd.txt
....
sdsdsd
..........
asdfhjgjksdfk dfkaskk sdkfk skddkf skdf sdk ssaaa akskdf sdksdfsdf ksdf sd kkkkallwow.
sdsdllla lsldlsd lsldlalllLlsdd asdd. sdlsllall asdsdlallOEFOOASllsdl lsdlla.
slldlllasdlsd.ss;sdsdasdas.
group1_ddss.txt
ddss
................
asdfhjgjksdfk ddjafjijjjj.dfsdfsdfsdfsi dfodoof ooosdfow oaosofoodf aosolflldlfl , dskdkkfkdsa asddf;akkdfkdkk . sdlsllall asdsdlallOEFOOASllsdl lsdlla.
slldlllasdlsd.ss;sdsdasdas.
and
group1_xxxx.txt
.....
xxxx
.......
asdfghjkl
I have figured that by usinf regex of sort of following can be done
txt =re.sub(r'(([^\w\s])\2+)', r' ', txt).strip() #for letters more than 2 times
but not able to figure out completely.
The saved text files should be named as group1_sdsdsd.txt , group1_ddss.txt and group1_xxxx.txt [group1 being identifier for the specific big text file as I have multiple bigger text files and need to do same on all to know which big text file i am splitting.
If you want to get the parts with multiple dots only on the same line, you can use and get the separate parts, you might use a pattern like:
^\.{3,}\n(\S+)\n\.{3,}(?:\n(?!\.{3,}\n\S+\n\.{3,}).*)*
Explanation
^ Start of string
\.{3,}\n Match 3 or more dots and a newline
(\S+)\n Capture 1+ non whitespace chars in group 1 for the filename and match a newline
\.{3,} Match 3 or more dots
(?: Non capture group to repeat as a whole part
\n Match a newline
(?!\.{3,}\n\S+\n\.{3,}) Negative lookahead, assert that from the current position we are not looking at a pattern that matches the dots with a filename in between
.* Match the whole line
)* Close the non capture group and optionally repeat it
Then you can use re.finditer to loop the matches, and use the group 1 value as part of the filename.
See a regex demo and a Python demo with the separate parts.
Example code
import re
pattern = r"^\.{3,}\n(\S+)\n\.{3,}(?:\n(?!\.{3,}\n\S+\n\.{3,}).*)*"
s = ("....your data here")
matches = re.finditer(pattern, s, re.MULTILINE)
your_path = "/your/path/"
for matchNum, match in enumerate(matches, start=1):
f = open(your_path + "group1_{}".format(match.group(1)), 'w')
f.write(match.group())
f.close()

Sort list python3

I would like to order this list.
From:
01104D-BB'42
01104D-BB42
01104D-BB43
01104D-CC'42
01104D-CC'72
01104D-CC32
01104D-CC42
01104D-CC62
01104D-CC72
01104D-DD'74
01104D-DD'75
01104D-DD'76
01104D-DD'77
01104D-DD'78
01104D-DD75
01104D-DD76
01104D-DD77
01104D-DD78
01104D-EE'102
01104D-EE'12
01104D-EE'2
01104D-EE'32
01104D-EE'42
01104D-EE'52
01104D-EE'53
01104D-EE'72
01104D-EE'82
01104D-EE'92
01104D-EE102
01104D-EE12
01104D-EE2
01104D-EE3
01104D-EE32
01104D-EE42
01104D-EE52
01104D-EE62
01104D-EE72
01104D-EE82
01104D-EE83
01104D-EE92
01104D-EE93
To:
01104D-BB42
01104D-BB43
01104D-BB'42
01104D-CC32
01104D-CC42
01104D-CC62
01104D-CC72
01104D-CC'42
01104D-CC'72
01104D-DD75
01104D-DD76
01104D-DD77
01104D-DD78
01104D-DD'74
01104D-DD'75
01104D-DD'76
01104D-DD'77
01104D-DD'78
01104D-EE102
01104D-EE12
01104D-EE2
01104D-EE3
01104D-EE32
01104D-EE42
01104D-EE52
01104D-EE62
01104D-EE72
01104D-EE82
01104D-EE83
01104D-EE92
01104D-EE93
01104D-EE'102
01104D-EE'12
01104D-EE'2
01104D-EE'32
01104D-EE'42
01104D-EE'52
01104D-EE'53
01104D-EE'72
01104D-EE'82
01104D-EE'92
Can you help me?
thanks
I'm guessing here, because you haven't explained how you want the sort to be done. But it looks like you want the character ' to sort after the digits 0-9, and the ascii sort order puts it before the digits. If that is correct, then you need to substitute a different character for '. A good choice might be ~ because it is the last printable ascii character.
If your data is in mylist, then
mylist.sort(key=lambda a: a.replace("'","~"))
will sort it in the order I'm guessing you want.

Ignore the comment sign (%) in m-file within a string

In my code I have the following line:
fprintf(logfile,'Parameters: Size: %d\tH: %.4f\tF: %.1f\tI: %.3f\tR: %d\tSigma: %d\tDisp: %.1f\r\n',parameter_sets(ps,:));
which is too long, so I want to break it to:
fprintf(logfile,'Parameters: Size: %d\tH: %.4f\tF: %.1f\tI: %.3f\tR: ...
%d\tSigma: %d\tDisp: %.1f\r\n',parameter_sets(ps,:));
However, since the brake is within a string, MATLAB see the formatting %d sign in the second line as a start of a comment, and ignore this line (and produce an error...).
So I tried to make it clearer with a [] that warp the string:
fprintf(logfile,['Parameters: Size: %d\tH: %.4f\tF: %.1f\tI: %.3f\tR: ...
%d\tSigma: %d\tDisp: %.1f\r\n'],parameter_sets(ps,:));
but no help, it still interpret the second line as a comment. I also tried with and without the ellipsis (...) in different places, with no success.
So how can I write a line in a formatted way (i.e. a reasonable length) if it has a % sign in it?
Divide it in two lines like this:
fprintf(logfile,['Parameters: Size: %d\tH: %.4f\tF: %.1f\tI: %.3f\tR:', ...
'%d\tSigma: %d\tDisp: %.1f\r\n'],parameter_sets(ps,:));
% notice the apostrophe and comma(',) before ellpsis(...) at the end of first line
% and apostrophe(') at the start of the second line

awk-insert row with specific text within specific position

I have a file where the first couple of rows start with # mark, then follow the classical netlist, where also can be there rows begin with # mark. I need to insert one row with text protect between block of first rows begining on # and first row of classical netlist. In the end of file i need insert row with word unprotect. It will be good to save this modified text to new file with specific name because of the original file protected.
Sample file:
// Generated for: spectre
// Design library name: Kovi
// Design cell name: T_Line
// Design view name: schematic
simulator lang=spectre
global 0
parameters frequency=3.8G Zo=250
// Library name: Kovi
// Cell name: T_Line
// View name: schematic
T8 (7 0 6 0) tline z0=Zo f=3.8G nl=0.5 vel=1
T7 (net034 0 net062 0) tline z0=Zo f=3.8G nl=0.5 vel=1
T5 (net021 0 4 0) tline z0=Zo f=3.8G nl=0.5 vel=1
T4 (net019 0 2 0) tline z0=Zo f=3.8G nl=0.5 vel=1
How about sed
sed -e '/^#/,/^#/!iprotect'$'\n''$aunprotect'$'\n' input_file > new_file
Inserts 'protect' on a line by itself after the first block of commented lines, then adds 'unprotect' at the end.
Note: Because I use $'\n' in place of literal newline bash is assumed as the shell.
Since you awk'd the post
awk 'BEGIN{ protected=""} { if($0 !~ /#/ && !protected){ protected="1"; print "protect";} print $0}END{print "unprotect";}' input_file > output_file
As soon a row is detected without # as the first non-whitespace character, it will output a line with protect. At the end it will output a line for unprotect.
Test file
#
#
#
#Preceded by a tab
begin protect
#
before unprotect
Result
#
#
#
#Preceded by tab
protect
begin protect
#
before unprotect
unprotect
Edit:
Removed the [:space:]* as it seems that is already handled by default.
Support //
If you wanted to support both # and // in the same script, the regex portion would change to /#|\//. The special character / has to be escaped by using \.
This would check for at least one /.
Adding a quantifier {2} will match // exactly: /#|\/{2}/

Add a number to each line of a file in bash

I have some files with some lines in Linux like:
2013/08/16,name1,,5000,8761,09:00,09:30
2013/08/16,name1,,5000,9763,10:00,10:30
2013/08/16,name1,,5000,8866,11:00,11:30
2013/08/16,name1,,5000,5768,12:00,12:30
2013/08/16,name1,,5000,11764,13:00,13:30
2013/08/16,name2,,5000,2765,14:00,14:30
2013/08/16,name2,,5000,4765,15:00,15:30
2013/08/16,name2,,5000,6765,16:00,16:30
2013/08/16,name2,,5000,12765,17:00,17:30
2013/08/16,name2,,5000,25665,18:00,18:30
2013/08/16,name2,,5000,45765,09:00,10:30
2013/08/17,name1,,5000,33765,10:00,11:30
2013/08/17,name1,,5000,1765,11:00,12:30
2013/08/17,name1,,5000,34765,12:00,13:30
2013/08/17,name1,,5000,12765,13:00,14:30
2013/08/17,name2,,5000,1765,14:00,15:30
2013/08/17,name2,,5000,3765,15:00,16:30
2013/08/17,name2,,5000,7765,16:00,17:30
My column separator is "," and in the third column (currently ,,), I need the entry number within the same day. For example, with date
2013/08/16 I have 11 lines and with date 2013/08/17 7 lines, so I need add the numbers for example:
2013/08/16,name1,1,5000,8761,09:00,09:30
2013/08/16,name1,2,5000,9763,10:00,10:30
2013/08/16,name1,3,5000,8866,11:00,11:30
2013/08/16,name1,4,5000,5768,12:00,12:30
2013/08/16,name1,5,5000,11764,13:00,13:30
2013/08/16,name2,6,5000,2765,14:00,14:30
2013/08/16,name2,7,5000,4765,15:00,15:30
2013/08/16,name2,8,5000,6765,16:00,16:30
2013/08/16,name2,9,5000,12765,17:00,17:30
2013/08/16,name2,10,5000,25665,18:00,18:30
2013/08/16,name2,11,5000,45765,09:00,10:30
2013/08/17,name1,1,5000,33765,10:00,11:30
2013/08/17,name1,2,5000,1765,11:00,12:30
2013/08/17,name1,3,5000,34765,12:00,13:30
2013/08/17,name1,4,5000,12765,13:00,14:30
2013/08/17,name2,5,5000,1765,14:00,15:30
2013/08/17,name2,6,5000,3765,15:00,16:30
2013/08/17,name2,7,5000,7765,16:00,17:30
I need do it in bash. How can I do it?
This one's good too:
awk -F, 'sub(/,,/, ","++a[$1]",")1' file
Output:
2013/08/16,name1,1,5000,8761,09:00,09:30
2013/08/16,name1,2,5000,9763,10:00,10:30
2013/08/16,name1,3,5000,8866,11:00,11:30
2013/08/16,name1,4,5000,5768,12:00,12:30
2013/08/16,name1,5,5000,11764,13:00,13:30
2013/08/16,name2,6,5000,2765,14:00,14:30
2013/08/16,name2,7,5000,4765,15:00,15:30
2013/08/16,name2,8,5000,6765,16:00,16:30
2013/08/16,name2,9,5000,12765,17:00,17:30
2013/08/16,name2,10,5000,25665,18:00,18:30
2013/08/16,name2,11,5000,45765,09:00,10:30
2013/08/17,name1,1,5000,33765,10:00,11:30
2013/08/17,name1,2,5000,1765,11:00,12:30
2013/08/17,name1,3,5000,34765,12:00,13:30
2013/08/17,name1,4,5000,12765,13:00,14:30
2013/08/17,name2,5,5000,1765,14:00,15:30
2013/08/17,name2,6,5000,3765,15:00,16:30
2013/08/17,name2,7,5000,7765,16:00,17:30

Resources