I have an input file and I would like to do a search/replace and multiply and dump out the output file. How do I do that in TCL?
All the digit needs to be multiply by a multiplier of 10.
It needs to look for SECTION and END SECTION and find the word 'shape' and multiply all the digit with 10.
Input File
heading
size 9 XY 9
section1
shape name 1 2 3 4
end section1
section 2
shape name 1 2 3 4
end section2
Output file:
heading
size 90 XY 90
section1
shape name 5 10 15 20
end section1
section 2
shape name 100 200 300 400
end section2
tcl
set multiplier1 10
set multiplier2 5
set multiplier3 100
while {[gets $infile1] > 0} {
if {[regexp "size" $value]} {
}
Firstly, you'd be much better off defining the multipliers as an array. Using variable-named variables is usually a bad idea (unless you're about to upvar them). Also, remember that they're associative arrays, so you can use any string as an index and not just numbers; that's sometimes useful.
set multiplier(1) 10
set multiplier(2) 5
set multiplier(3) 100
Secondly, doing the multiplications for a line of numbers is best with a helper procedure:
proc ApplyMultiplies {line multiplier} {
set NUMBER_RE {-?\d+}
# For all locations of numbers, in *reverse* order
foreach location [lreverse [regexp -all -indices -inline -- $NUMBER_RE $line]] {
# Get the number
set value [string range $line {*}$location]
# Multiply it
set value [expr {$value * $multiplier}]
# Write it back into the string
set line [string replace $line {*}$location $value]
}
return $line
}
Testing that interactively:
% ApplyMultiplies {shape name 1 2 3 4} 5
shape name 5 10 15 20
% ApplyMultiplies "tricky_case\"123 yo" 17
tricky_case"2091 yo
In Tcl 8.7, you'll instead be able to do this as a one-liner because of the new -command option to regsub:
proc ApplyMultiplies {line multiplier} {
regsub -all -command -- {-?\d+} $line [list ::tcl::mathop::* $multiplier]
}
I do not understand the conditions under which you are deciding whether to apply the operation. Are the indices to multiplier meant to be section names, but are somehow a bit off? Why are we multiplying values on the size line? Without understanding that, writing the outer control code is impossible for me.
Related
I want to print out a sort of pyramid. User inputs an integer value 'i', and that is displayed i-times.
Like if input=5
1
22
333
4444
55555
I have tried this:
input=5
for i in range(input+1):
print("i"*i)
i=i+1
The result of which is
i
ii
iii
iiii
iiiii
The problem is that (as far as I know), only a string can be printed out 'n' times, but if I take out the inverted commas around "i", it becomes (i*i) and gives out squares:
0
1
4
9
16
25
Is there a simple way around this?
Thanks!
Just convert your int loop varaible to str before building the output string by multiplying:
input = 5
for i in range(1, input+1):
print(str(i) * i)
Try this:
a = 5
for i in range(a): # <-- this causes i to go from 0,1,2,3,...,a-1
print("{}".format(i+1)*(i+1)) # < -- this creates a new string in each iteration ; an alternative would be str(i+1)*(i+1)
i=i+1 # <-- this is unnecessary, i already goes from 0 to a-1 and will be re-created in the next iteration of the loop.
This creates a new string in each iteration of the loop.
Note that for i in range(a) will go through the range by itself. There is no need to additionally increment i at the end. In general it is considered bad practise to change indices you loop over.
I am a fresh learner of Tcl and I faced an issue of understanding this whole concept:
<name of variable> set [split "[string repeat "-,-," [columns]]-",]
columns is a variable with value 6;
How the split will be and which is my whole string?
Thank you all
<name of variable> set [split "[string repeat "-,-," [columns]]-",]
You have to unpack Tcl commands from the inside out because the inner-most nested brackets are executed first.
columns is a proc that, hopefully, returns an integer.
then string repeat repeats "-,-," that many times.
then the double quoted string adds a trailing -
then split should split that "-,-,-,...-" string on commas resulting in *a list of "2 * columns + 1" hyphens*.
Except:
there is a missing space before the last comma in the split command
the set command looks like: set varname value (unless you're dealing with an object)
set <name of variable> [split "[string repeat "-,-," [columns]]-" ,]
# ...............................................................^
Demonstrating:
set columns 6
proc columns {} {return $::columns}
set result [split "[string repeat "-,-," [columns]]-" ,]
puts $result
puts [llength $result] ;# should be 13
- - - - - - - - - - - - -
13
You could achieve the same result with:
set result [lrepeat [expr {2 * [columns] + 1}] "-"]
Tcl is actually a very simple language. The entire syntax only has 12 rules: https://www.tcl.tk/man/tcl8.6/TclCmd/Tcl.htm
I have a file with the line:1 2 3 4 5 10. When I add this line to a set in Python, I get {1,2,3,4,5,0} instead of {1,2,3,4,5,10}. How do I code so that I get the 10 inside the set instead of it recognizing it as a 1 and a 0?
EDIT: This was the code I wrote:
states = set()
line = open("filepath", "r").readlines()[0]
states.add(line)
print (states)
Input file content:
1 2 3 4 5 10
As set cannot have a same number twice, the zero which belongs to 10 is being treated as a unique element thus set cannot contain two same elements.
Do something like this to fix it (Assuming you don't have newline characters, if you do, just use the strip method.):
line = open("filepath", "r").readlines()[0]
line = line.split(' ') #Split by Space
number_set = set(line) #Since file is a list after splitting.
I have a data frame with character strings in column1 and ID in column2. The string contains A,T,G or C.
I would like to print the lines that have an A at position 1.
Then I would like to print the lines that have A at position 2 and so on and save them in separate files.
So far I have used biostrings in R for similar analysis, but it won't work for this problem exactly. I would like to use perl.
Sequence ID
TATACAAGGGCAAGCTCTCTGT mmu-miR-381-3p
TCGGATCCGTCTGAGCT mmu-miR-127-3p
ATAGTAGACCGTATAGCGTACG mmu-miR-411-5p
......
600 more lines
Biostrings will work perfectly, and will be pretty fast. Let's call your DNA stringset mydata
HasA <- sapply(mydata,function(x) as.character(x[2]) == "A")
Now you have a vector of TRUE or FALSE indicating which sequence has an A at position 2. You can make that into a nice data frame like this
HasA.df <- data.frame("SeqName" = names(mydata), "A_at_2" = HasA)
Not sure about the expected result,
mydata <- read.table(text="Sequence ID
TATACAAGGGCAAGCTCTCTGT mmu-miR-381-3p
TCGGATCCGTCTGAGCT mmu-miR-127-3p
ATAGTAGACCGTATAGCGTACG mmu-miR-411-5p",sep="",header=T,stringsAsFactors=F)
mCh <- max(nchar(mydata[,1])) #gives the maximum number of characters in the first column
sapply(seq(mCh), function(i) substr(mydata[,1],i,i)=="A") #gives the index
You can use which to get the index of the row that satisfies the condition for each position
res <- stack(setNames(sapply(seq(mCh),
function(i) which(substr(mydata[,1],i,i)=="A")),1:mCh))[,2:1]
tail(res, 5) #for the 13th position, 1st and 3rd row of the sequence are TRUE
ind values
#11 13 1
#12 13 3
#13 14 2
#14 15 3
#15 20 3
use the index values to extract the rows. For the 1st position
mydata[res$values[res$ind==1],]
# Sequence ID
# 3 ATAGTAGACCGTATAGCGTACG mmu-miR-411-5p
Using a perl one-liner
perl -Mautodie -lane '
BEGIN {($f) = #ARGV}
next if $. == 1;
my #c = split //, $F[0];
for my $i (grep {$c[$_] eq "A"} (0..$#c)) {
open my $fh, ">>", "$f.$i";
print $fh $_;
}
' file
I have a table with 2 columns. In column 1, I have a string information, in column 2, I have a logical index
%% Tables and their use
T={'A2P3';'A2P3';'A2P3';'A2P3 with (extra1)';'A2P3 with (extra1) and (extra 2)';'A2P3 with (extra1)';'B2P3';'B2P3';'B2P3';'B2P3 with (extra 1)';'A2P3'};
a={1 1 0 1 1 0 1 1 0 1 1 }
T(:,2)=num2cell(1);
T(3,2)=num2cell(0);
T(6,2)=num2cell(0);
T(9,2)=num2cell(0);
T=table(T(:,1),T(:,2));
class(T.Var1);
class(T.Var2);
T.Var1=categorical(T.Var1)
T.Var2=cell2mat(T.Var2)
class(T.Var1);
class(T.Var2);
if T.Var1=='A2P3' & T.Var2==1
disp 'go on'
else
disp 'change something'
end
UPDATES:
I will update this section as soon as I know how to copy my workspace into a code format
** still don't know how to do that but here it goes
*** why working with tables is a double edged sword (but still cool): I have to be very aware of the class inside the table to refer to it in an if else construct, here I had to convert two columns to categorical and to double from cell to make it work...
Here is what my data looks like:
I want to have this:
if T.Var1=='A2P3*************************' & T.Var2==1
disp 'go on'
else
disp 'change something'
end
I manage to tell matlab to do as i wish, but the whole point of this post is: how do i tell matlab to ignore what comes after A2P3 in the string, where the string length is variable? because otherwise it would be very tiring to look up every single piece of string information left on A2P3 (and on B2P3 etc) just to say thay.
How do I do that?
Assuming you are working with T (cell array) as listed in your code, you may use this code to detect the successful matches -
%%// Slightly different than yours
T={'A2P3';'NotA2P3';'A2P3';'A2P3 with (extra1)';'A2P3 with (extra1) and (extra 2)';'A2P3 with (extra1)';'B2P3';'B2P3';'NotA2P3';'B2P3 with (extra 1)';'A2P3'};
a={1 1 0 1 1 0 1 1 0 1 1 }
T(:,2)=num2cell(1);
T(3,2)=num2cell(0);
T(6,2)=num2cell(0);
T(9,2)=num2cell(0);
%%// Get the comparison results
col1_comps = ismember(char(T(:,1)),'A2P3') | ismember(char(T(:,1)),'B2P3');
comparisons = ismember(col1_comps(:,1:4),[1 1 1 1],'rows').*cell2mat(T(:,2))
One quick solution would be to make a function that takes 2 strings and checks whether the first one starts with the second one.
Later Edit:
The function will look like this:
for i = 0, i < second string's length, i = i + 1
if the first string's character at index i doesn't equal the second string's character at index i
return false
after the for, return true
This assuming the second character's lenght is always smaller the first's. Otherwise, return the function with the arguments swapped.