Merge 2 lists in Python - python-3.x

List 1 = ['_','_','_','a','_']
List 2 = ['d','_','_','_','_']
I'd like to merge two lists of identical length where:
Alphabets in List 2 must replace special characters in List 1 but
Special characters in List 2 must not replace alphabets in List 1.
The merged list would look like this:
Merged List = ['d','_','_','a','_']
Any tips on the fastest way to accomplish this would be much appreciated!

You can use enumerate and list comprehension:
mergedlist = [c if c.isalnum() else list2[i] for i, c in enumerate(list1)]
NB/ I have used isalnum for making the distinction with "special" characters. Depending on your definition of what makes a character special, you may need a different function to do that.

Related

TCL, extract 2 integers from string into list?

I have 2 string formatted as such:
(1234, 4567)
And I have a list
points {0 1 2 4}
I would like to extract 2 integers from the first list and replace the first two integers in the list, after that extract two more integers from the 2nd list and replace the 3rd and 4th integers in the list so at the end I will have a list of 4 integers from the two strings.
So far I have tried all kind of things but always end up with errors or brackets in the list which I do not want. I feel I am missing out on the easy way to do that.
With the first set of values, you can parse with scan or regexp; in this case, I think scan looks better:
set input "(1234, 5678)"
scan $input "(%d,%d)" a b
To update a Tcl list (formally, one in a variable), you use lset; you can give a sequence of (zero-based) indices to it to navigate into the exact place in the list where you want to update:
set workingArea "points {0 1 2 4}"
lset workingArea 1 2 $a
lset workingArea 1 3 $b
puts $workingArea
# prints: points {0 1 1234 5678}

How to find common elements in string cells?

I want to find the common elements in multiple (>=2) cell arrays of strings.
A related question is here, and the answer proposes to use the function intersect(), however it works for only 2 inputs.
In my case, I have more than two cells, and I want to obtain a single common subset. Here is an example of what I want to achieve:
c1 = {'a','b','c','d'}
c2 = {'b','c','d'}
c3 = {'c','d'}
c_common = my_fun({c1,c2,c3});
in the end, I want c_common={'c','d'}, since only these two strings occur in all the inputs.
How can I do this with MATLAB?
Thanks in advance,
P.S. I also need the indices from each input, but I can probably do that myself using the output c_common, so not necessary in the answer. But if anyone wants to tackle that too, my actual output will be like this:
[c_common, indices] = my_fun({c1,c2,c3});
where indices = {[3,4], [2,3], [1,2]} for this case.
Thanks,
Listed in this post is a vectorized approach to give us the common strings and indices using unique and accumarray. This would work even when the strings are not sorted within each cell array to give us indices corresponding to their positions within it, but they have to be unique. Please have a look at the sample input, output section* to see such a case run. Here's the implementation -
C = {c1,c2,c3}; % Add more cell arrays here
% Get unique strings and ID each of the strings based on their uniqueness
[unqC,~,unqID] = unique([C{:}]);
% Get count of each ID and the IDs that have counts equal to the number of
% cells arrays in C indicate that they are present in all cell arrays and
% thus are the ones to be finally selected
match_ID = find(accumarray(unqID(:),1)==numel(C));
common_str = unqC(match_ID)
% ------------ Additional work to get indices ----------------
N_str = numel(common_str);
% Store matches as a logical array to be used at later stages
matches = ismember(unqID,match_ID);
% Use ismember to find all those indices in unqID and subtract group
% lengths from them to give us the indices within each cell array
clens = [0 cumsum(cellfun('length',C(1:end-1)))];
match_index = reshape(find(matches),N_str,[]);
% Sort match_index along each column based on the respective unqID elements
[m,n] = size(match_index);
[~,sidx] = sort(reshape(unqID(matches),N_str,[]),1);
sorted_match_index = match_index(bsxfun(#plus,sidx,(0:n-1)*m));
% Subtract cumulative group lens to give us indices corres. to each cell array
common_idx = bsxfun(#minus,sorted_match_index,clens).'
Please note that at the step that calculates match_ID : accumarray(unqID(:),1) could be replaced by histc(unqID,1:max(unqID)). Also, histcounts be another alternative there.
*Sample input, output -
c1 =
'a' 'b' 'c' 'd'
c2 =
'b' 'c' 'a' 'd'
c3 =
'c' 'd' 'a'
common_str =
'a' 'c' 'd'
common_idx =
1 3 4
3 2 4
3 1 2
As noted in the comments to this question, there is a file in File Exchange called "MINTERSECT -- Multiple set intersection." at http://www.mathworks.com/matlabcentral/fileexchange/6144-mintersect-multiple-set-intersection that contains simple code to generalize intersect to multiple sets. In a nutshell, the code gets the output from performing intersect on the first pair of cells and then perform intersect on this output with the next cell. This process continues until all cells have been compared. Note that the author points out that the code is not particularly efficient but it may be sufficient for your use case.

How to count a specific word separated by paragraphs?

So I want to be able to count the number of times a certain sequence such as "AGCT" appears in a document full of letters. However I don't just want the total amount in the document, I want how many times it shows up separated by ">".
So for example if the document contained: asdflkdafagctalkjsdjagctlkdjf>asdlfkjaagct>adjkhfhAGCTlksdjfagct>...
It would tell me:
2
1
1
since the sequence "AGCT" appears twice before the first ">" and once after the next one and once more after the third one and so on.
I do not know how to do this and any help would be appreciated.
You can use a combination of string methods and Python's llist comprehension like this:
Split your text in paragraphs, and for each paragraph count the ocurrences of the wanted substring. It is actually more concise in Python than in English:
>>> mytext = "asdflkdafagctalkjsdjagctlkdjf>asdlfkjaagct>adjkhfhAGCTlksdjfagct>"
>>> count = [para.count("agc") for para in mytext.split(">") ]
>>> count
[2, 1, 1, 0]

How do you find the sum of all the numbers in an array in Livecode?

I'm having a problem finding the sum of all of the integers in an array in Livecode
The "sum" command is what you need. Check out the dictionary; arrays are supported, though the function is generally used with a comma delimited list.
You can also extract the keys of the array, if they are purely numeric, and apply the function. Otherwise, you may have to extract the nested array of elements of interest into the clear with the "combine" command, and sum that.
Craig Newman
This should work:
put 13 into tArray[1]
put 4 into tArray[2]
put -9 into tArray[3]
put 21 into tArray[4]
put sum(tArray) into tSumVar
Or alternatively:
put the keys of tArray into tKeyList
replace return with comma in tKeyList
put sum(tKeyList) int tSumVar

Matlab,alphabetic order of the intersect result

I use intersect function to find common strings in two cell arrays A and B ([~,idx]=intersect(A,B))and save indexes in idx. Then I extract the common strings by A(idx). I see that the results are sorted in alphabetic order. I want to sort them as they sorted in A, Why these strings sorted in alphabetic order?
Thanks.
As explained in the documentation, you can add the option setOrder='stable' to preserve the order of the elements:
[C,ia,ib] = intersect(A,B,'stable');
You don't even have to capture the indices (unless used elsewhere), as the example shows:
C = intersect([7 0 5],[7 1 5],'stable')
returns
C = [7 5]
and
A='hgfedcba';
B='hac';
[~,ia]=intersect(A,B,'stable');
ia'
> 1 6 8
A(ia)
> hca
For Matlab R2011b and older:
If your matlab version doesn't support the 'stable' option, you can just use sort on the indices:
[~,ia]=intersect(A,B);
ia=sort(ia);
A(ia)
> 1 6 8
A(sort(ia))
Duplicates
If they're duplicates in A, intersect will only find them once. ismember might be better suited if you want to find all of the duplicates:
A='hhggffeeddccbbaa';
B='hac';
[~,ia]=intersect(A,B);
ia=sort(ia);
A(ia)
> hca
[~,loc] = ismember(A,B);
ia=find(loc~=0); % because you want the indices (logical indexing is also an option of course)
A(ia)
> hhccaa

Resources