Loop through two Matrixes in the same for loop - python-3.x

I need to loop through two matrixes, but I can't use Numpy, I'm asked not to use it.
altura_mar = [[1,5,7],
[3,2,31]]
precipitacion_anual = [[10,5,70],
[5,7,8]]
for marIndex, anualIndex in zip(altura_mar, precipitacion_anual):
#some code
so, my problem here is that zip works with just a list not a 2d list if I try to use it with a matrix it gives this error:"ValueError: too many values to unpack (expected 2)"
Now I'm thinking of two possible solutions:
Make the multidimensional list into a single list like altura_mar = [1,5,7,3,2,31] but how could I do that?
Work with the 2 matrixes as they are in a nested for loop (I don't know about this, but maybe you guys can help here).

Since it's a 2d list, you can apply the zip twice:
>>> for a,b in zip(altura_mar, precipitacion_anual):
... for c,d in zip(a,b):
... print(c,d)
...
1 10
5 5
7 70
3 5
2 7
31 8
This solution does not scale well when the nest depth is variable, though.

I think the most basic solution is best, just iterating over the indices rather than nested zip statements (see this post)
>>> for i in range(len(altura_mar)):
... for j in range(len(altura_mar[i])):
... print(altura_mar[i][j], precipitacion_anual[i][j])
...
1 10
5 5
7 70
3 5
2 7
31 8

Assuming you want to pair 1 with 10 not [1,5,7] with [10, 5, 70] then in addition to the other perfectly fine answers, you could also use list comprehension generators like:
altura_mar = [[1,5,7], [3,2,31]]
altura_mar_flat = (cell for row in altura_mar for cell in row)
precipitacion_anual = [[10,5,70], [5,7,8]]
precipitacion_anual_flat = (cell for row in precipitacion_anual for cell in row)
for marIndex, anualIndex in zip(altura_mar_flat, precipitacion_anual_flat):
print(marIndex, anualIndex)
This also gives you:
1 10
5 5
7 70
3 5
2 7
31 8

Related

Use a split function in every row of one column of a data frame

I have a rather big pandas data frame (more than 1 million rows) with columns containing either strings or numbers. Now I would like to split the strings in one column before the expression "is applied".
An example to explain what I mean:
What I have:
a b description
2 4 method A is applied
10 5 titration is applied
3 1 computation is applied
What I am looking for:
a b description
2 4 method A
10 5 titration
3 1 computation
I tried the following,
df.description = df.description.str.split('is applied')[0]
But this didn't bring the desired result.
Any ideas how to do it? :-)
You are close, need str[0]:
df.description = df.description.str.split(' is applied').str[0]
Alternative solution:
df.description = df.description.str.extract('(.*)\s+is applied')
print (df)
a b description
0 2 4 method A
1 10 5 titration
2 3 1 computation
But for better performance use list comprehension:
df.description = [x.split(' is applied')[0] for x in df.description]
you can use replace
df.description = df.description.str.replace(' is applied','')
df
a b description
0 2 4 method A
1 10 5 titration
2 3 1 computation

Use of Vector Cross in Ms Excel - 3D arrays and Redim

I'm facing an issue with the use of Redim in VBA of MS Excel in a 3D array. How can I append a new array with different dimension than the first one to the first array and then another one and then ...; and then be able to read a specific group of data from this newly created matrix (array) with different number of columns in each row, for example, the 2nd group of data. Here is an example to clarify the matter.
Arr1 is in 3x3
Arr2 is in 5x5
Arr3 is in 7x7
Arr4 is in 9x9
Arr5 is in 11x11 dimensions.
Now if I append them together the resulting matrix should be in this format - for example:
1 2 3
4 5 6
7 8 9
1 2 3 4 5
6 7 8 9 1
2 3 4 5 6
7 8 9 1 2
3 4 5 6 7
...
...
1 2 3 ... 11
1 2 3 ... 11
(another 8 rows)
1 2 3 ... 11
(total of 11 rows)
Next I want to be able to read a group of data (sub-matrix) from the resulting matrix, for example the 2nd group of data, which is in 5x5.
If I use Redim in between each loop, while appending those new group of arrays and so to adjust the dimensions of the first matrix and continuing to do so, the last dimension will be in 11x11, which does not give me the needed dimension; in this case 5x5.
I was wondering, if there is any recommendation you could kindly provide. Thank you.

How can the AVERAGEIFS function be translated into MATLAB?

I am working at moving my data over from Excel to Matlab. I have some data that I want to average based on multiple criteria. I can accomplish this by looping, but want to do it with matrix operations only, if possible.
Thus far, I have managed to do so with a single criterion, using accumarray as follows:
data=[
1 3
1 3
1 3
2 3
2 6
2 9];
accumarray(data(:,1),data(:,2))./accumarray(data(:,1),1);
Which returns:
3
6
Corresponding to the averages of items 1 and 2, respectively. I have at least three other columns that I need to include in this averaging but don't know how I can add that in. Any help is much appreciated.
For your single column, you don't need to call accumarray twice, you can provide a function handle to mean as the fourth input
mu = accumarray(data(:,1), data(:,2), [], #mean);
For multiple columns, you can use the row indices as the second input to accumarray and then use those from within the anonymous function to access the rows of your data to operate on.
data = [1 3 5
1 3 10
1 3 8
2 3 7
2 6 9
2 9 12];
tmp = accumarray(data(:,1), 1:size(data, 1), [], #(rows){mean(data(rows,2:end), 1)});
means = cat(1, tmp{:});
% 3.0000 7.6667
% 6.0000 9.3333

fetching data from excel in matlab

I am trying to fetch a column from excel with rows more than 17500. Now problem is that when i call it in MATLAB , it does not gives me whole matrix with all data. it fetches data from somewhere in middle.
Now the real problem is that i have to add up 4 numbers in the column and get average , save it in another column and proceed to next consecutive set of numbers and repeat again till the end..How could i do that in MATLAB .Please help me solve this problem as i am just a rookie. Thank you.
so far i have done is this:
clc
g=xlsread('Data.xlsx',1,'E1:E17500');
x=1;
for i = 1:(17500/4) %as steps has to be stepped at 4 since we need avg of 4
y{i}=((g{x}+g{x+1}+g{x+2}+g{x+3})/4);
x=x+4;
end
xlswrite('Data.xlsx', y, 1, 'F1:F4375');
I see several things here: xlsread with one output gives you a numeric matrix of doubles (not a cell-array). Therefore you should address entries with () and not with {}. The for-loop can be omitted when we use reshape to create a matrix with dimensions 4x4375. The we calculate the average of the 4 values in each column directly with mean (evaluated over the first dimension). To get a column-vector again we have to transpose the result of mean using '.
Here is the code:
g = xlsread('Data.xlsx',1,'E1:E17500');
y = mean(reshape(g,4,[]),1)';
xlswrite('Data.xlsx',y,1,'F1:F4375');
To see in detail what happens within the code, let's see the results of each step using random data for g:
Code:
rng(4);
g = randi(10,12,1)
a = reshape(g,4,[])
b = mean(a,1)
y = b'
Result:
g =
10
6
10
8
7
3
10
1
3
5
8
2
a =
10 7 3
6 3 5
10 10 8
8 1 2
b =
8.5000 5.2500 4.5000
y =
8.5000
5.2500
4.5000

Efficiently concatanate a large number of columns

I tried to concatenate a large number of columns containing integers in one string.
Basically, starting from:
df = pd.DataFrame({'id':[1,2,3,4],'a':[0,1,2,3], 'b':[4,5,6,7], 'c':[8,9,0,1]})
To obtain:
id join
0 1 481
1 2 592
2 3 603
3 4 714
I found several methods to do this (here and here):
Method 1:
conc['glued']=''
i=1
while i < len(df.columns):
conc['glued'] = conc['glued'] + df[df.columns[i]].values.astype(str)
i=i+1
This method work, but is a bit long (45min on my "test" case of 18,000 rows x 40,000 columns). I am concerned by the loop on the columns as this program should be applied at the end on tables of 600.000 columns and I am afraid it will be too long.
Method 2a
conc['join']=[''.join(row) for row in df[df.columns[1:]].values.astype(str)]
Method 2b
conc['apply'] = df[df.columns[1:]].apply(lambda x: ''.join(x.astype(str)), axis=1)
Both of these methods are 10 times more efficient than the previous one, iterate on rows which is good and work perfectly on my "debug" table df. But, when I apply it to my "test" table of 18k x 40k, it leads to a MemoryError: (I have 60% of my 32GB of RAM occupied after reading the corresponding csv file).
I can copy my DataFrame without overpass the memory, but curiously, applying this method make the code crash.
Do you see how I can fix and improve this code to use an efficient row based iteration? Thank you !
Appendix:
Here is the code I use on my test case:
geno_reader = pd.read_csv(genotype_file,header=0,compression='gzip', usecols=geno_columns_names)
fimpute_geno = pd.DataFrame({'SampID': geno_reader['SampID']})
I should use the chunksize option to read this file but I haven't yet really understand how to use it after reading.
Method 1:
fimpute_geno['Calls'] = ''
for i in range(1,len(geno_reader.columns)):
fimpute_geno['Calls'] = fimpute_geno['Calls']\
+ geno_reader[geno_reader.columns[i]].values.astype(int).astype(str)
This work in 45min.
There is some quite disgusting piece of code like the .astype(int).astype(str). I don't know why Python don't recognize my integers and consider them as float.
Method 2:
fimpute_geno['Calls'] = geno_reader[geno_reader.columns[1:]]\
.apply(lambda x: ''.join(x.astype(int).astype(str)), axis=1)
This leads to an MemoryError:
Here' something to try. It would require that you convert your columns to strings though. your sample frame
b c id
0 4 8 1
1 5 9 2
2 6 0 3
3 7 1 4
then
#you could also do this conc[['b','c','id']] for the next two lines
conc.ix[:,'b':'id'] = conc.ix[:,'b':'id'].astype('str')
conc['join'] = np.sum(conc.ix[:,'b':'id'],axis=1)
Would give
a b c id join
0 0 4 8 1 481
1 1 5 9 2 592
2 2 6 0 3 603
3 3 7 1 4 714

Resources