How to create a confusion matrix from an incomplete dataframe in python - python-3.x

I have a dataframe which looks like this:
I1 I2 V
0 1 1 300
1 1 5 7
2 1 9 3
3 2 2 280
4 2 3 4
5 5 1 5
6 5 5 400
I1 and I2 represent indexes while V represent values.
The indexes with values equal to 0 have been omitted, but I'd like to get a confusion matrix showing all the values, i.e. something like this:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
How can I do it?
Thanks in advance!

Use set_index with unstack for reshape, for append missing values add reindex and for data cleaning rename_axis :
r = range(1, 10)
df = (df.set_index(['I1','I2'])['V']
.unstack(fill_value=0)
.reindex(index=r, columns=r, fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
Detail:
print (df.set_index(['I1','I2'])['V']
.unstack(fill_value=0))
I2 1 2 3 5 9
I1
1 300 0 0 7 3
2 0 280 4 0 0
5 5 0 0 400 0
Alternative solution with pivot, if all values are integers:
r = range(1, 10)
df = (df.pivot('I1','I2', 'V')
.fillna(0)
.astype(int)
.reindex(index=r, columns=r, fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0

Option 1: Using numpy you can
In [150]: size = df[['I1', 'I2']].values.max()
In [151]: arr = np.zeros((size, size))
In [152]: arr[df.I1-1, df.I2-1] = df.V
In [153]: idx = np.arange(1, size+1)
In [154]: pd.DataFrame(arr, index=idx, columns=idx).astype(int)
Out[154]:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
Option 2: Using scipy.sparse.csr_matrix
In [178]: from scipy.sparse import csr_matrix
In [179]: size = df[['I1', 'I2']].values.max()
In [180]: idx = np.arange(1, size+1)
In [181]: pd.DataFrame(csr_matrix((df['V'], (df['I1']-1, df['I2']-1)), shape=(size, si
...: ze)).toarray(), index=idx, columns=idx)
Out[181]:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0

Related

Parse .log file to collect data after keyword then create a nested dictionary using predefined column names

I am trying to parse a .log file using python for the Status information about processes of a linux system. The .log file has a lot of different sections of information, the sections of interest start with "##START-ALLPROCESSES-xxxxxxxx" where x is the epoch date and end with '##END-ALLPROCESSES-xxxxxxx". After this line each process is listed with 52 columns each, the number of processes may change depending on the info recorded at the time, and there may be multiple sections with this information at different times.
The idea is to open the .log file, find the sections and then use the XXXXXXX as the key for a nested dictionary where the keys are the predefined column dates filled in with the values from the section, and do this for all different sections that would be found on the .log fie. The nested dictionary would look something like below
[date1-XXXXXX:
[ columnName1: process1,
.
.
.
columnName52: info1
],
.
.
.
[ columnName1: process52,
.
.
.
columName52: info52
]
],
[date2-XXXXXX:
[ columnName1: process1,
.
.
.
columnName52: info1
],
.
.
.
[ columnName1: process52,
.
.
.
columName52: info52
]
]
The data in the .log file looks as follow and would have multiple sections as this but with a different date each line starts with the process id and (process name)
##START-ALLPROCESSES-1676652419
1 (systemd) S 0 1 1 0 -1 4210944 2070278 9743969773 2070 2703 8811 11984 7638026 9190549 20 0 1 0 0 160043008 745 18446744073709551615 187650352414720 187650353516788 281474853505456 0 0 0 671173123 4096 1260 1 0 0 17 0 0 0 2706 0 0 187650353585800 187650353845340 187651263758336 281474853506734 281474853506745 281474853506745 281474853507053 0
10 (rcu_bh) I 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 2 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10251 (kworker/1:2) I 2 0 0 0 -1 69238880 0 0 0 0 0 914 0 0 20 0 1 0 617684776 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
10299 (loop2) S 2 0 0 0 -1 3178560 0 0 0 0 0 24 0 0 0 -20 1 0 10871 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 169 0 0 0 0 0 0 0 0 0 0
10648 (kworker/2:0) I 2 0 0 0 -1 69238880 0 0 0 0 0 567 0 0 20 0 1 0 663634994 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
1082 (nvme-wq) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 109 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 3 0 0 0 0 0 0 0 0 0 0 0 0 0
1095 (scsi_eh_0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 3 0 0 0 0 0 0 0 0 0 0 0 0 0
1096 (scsi_tmf_0) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1099 (scsi_eh_1) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 (migration/0) S 2 0 0 0 -1 69238848 0 0 0 0 0 4961 0 0 -100 0 1 0 2 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 99 1 0 0 0 0 0 0 0 0 0 0 0
1100 (scsi_tmf_1) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##END-ALLPROCESSES-1676652419
I have tried it multiple ways but I cannot seem to get it to go correctly, my last attempt
columns = ['pid', 'comm', 'state', 'ppid', 'pgrp', 'session', 'tty_nr', 'tpgid', 'flags', 'minflt', 'cminflt', 'majflt', 'cmajflt', 'utime', 'stime',
'cutime', 'cstime', 'priority', 'nice', 'num_threads', 'itrealvalue', 'starttime', 'vsize', 'rss', 'rsslim', 'startcode', 'endcode', 'startstack', 'kstkesp',
'kstkeip', 'signal', 'blocked', 'sigignore', 'sigcatch', 'wchan', 'nswap', 'cnswap', 'exit_signal', 'processor', 'rt_priority', 'policy', 'delayacct_blkio_ticks',
'guest_time', 'cguest_time', 'start_data', 'end_data', 'start_brk', 'arg_start', 'arg_end', 'env_start', 'env_end', 'exit_code' ]
for file in os.listdir(dir):
if file.endswith('.log'):
with open(file, 'r') as f:
data = f.read()
data = data.split('##START-ALLPROCESSES-')
data = data[1:]
for i in range(len(data)):
data[i] = data[i].split('##END-ALLPROCESSES-')
data[i] = data[i][0]
data[i] = re.split('\r', data[i])
data[i] = data[i][0]
data[i] = re.split('\n', data[i])
for j in range(len(data[i])):
data[i][j] = re.split('\s+', data[i][j])
#print(data[i])
data[i][0] = str(data[i][0])
data_dict = {}
for i in range(len(data)):
data_dict[data[i][0]] = {}
for j in range(len(columns)):
data_dict[data[i][0]][columns[j]] = data[i][j+1]
print(data_dict)
I converted the epoch date into a str as I was getting unhashable list errors, however that made it so the epoch date shows as a key but each column now has the entire list for the 52 columms of information as a single one, so definitely I am missing something
To solve this problem, you could follow the following steps:
Open the .log file and read the contents
Search for all the sections of interest by finding lines that start with "##START-ALLPROCESSES-" and end with "##END-ALLPROCESSES-"
For each section found, extract the epoch date and create a dictionary with an empty list for each of the 52 columns
Iterate over the lines within the section and split the line into the 52 columns using space as a separator. Add the values to the corresponding list in the dictionary created in step 3
Repeat steps 3 and 4 for all the sections found in the .log file
Return the final nested dictionary
Here is some sample code that implements these steps:
import re
def parse_log_file(log_file_path):
with open(log_file_path, 'r') as log_file:
log_contents = log_file.read()
sections = re.findall(r'##START-ALLPROCESSES-(.*?)##END-ALLPROCESSES-', log_contents, re.DOTALL)
nested_dict = {}
for section in sections:
lines = section.strip().split('\n')
epoch_date = lines[0].split('-')[-1]
column_names = ['column{}'.format(i) for i in range(1, 53)]
section_dict = {column_name: [] for column_name in column_names}
for line in lines[1:]:
values = line.strip().split()
for i, value in enumerate(values):
section_dict[column_names[i]].append(value)
nested_dict['date{}-{}'.format(epoch_date, len(section_dict['column1']))] = section_dict
return nested_dict
You can call this function by passing the path to the .log file as an argument. The function returns the nested dictionary described in the problem statement.

Python 3.x: Pandas Dataframe How do we change the column names for a specific range?

I have a large csv file that I am reading using pandas. The following is a fraction of how my data looks like. The column names are 0,4,6,8,10,12,14,16,18.
0 4 6 8 10 12 14 16 18
-2 4500 4500 4500 4500 4500 4500 4500 4500
-1 4650 4650 4650 4650 4650 4650 4650 4650
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0
If I use Data.columns, I can change the column names. However, I only want to change a part of the column name. For instance I want to change the column 6,8,10 to bird,dog,strawberry,kiwi,tree,chocolate, and snow respectively.
0 4 bird dog strawberry kiwi tree chocolate snow
-2 4500 4500 4500 4500 4500 4500 4500 4500
-1 4650 4650 4650 4650 4650 4650 4650 4650
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0
How would you write a code? Remember that I have a massive file and want to do a large scale change for many numbers of columns. So I will need efficient line of code for this...
Thanks!
Edit: I meant to express that I desire to change column names starting from the third column.
Since you want to rename all the columns from 3 onwards you can use zip to create a dict then rename:
# sample data
df = pd.DataFrame(np.random.randn(5,9), columns=[0,4,6,8,10,12,14,16,18])
# create a dict using zip from df.columns[2:]
d = dict(zip(df.columns[2:].values, ['bird','dog','strawberry','kiwi','tree','chocolate','snow']))
# rename you columns
df = df.rename(columns=d)
0 4 bird dog strawberry kiwi tree \
0 -0.121085 1.263364 -0.008604 -0.240872 1.433633 0.092023 -0.903776
1 0.570377 0.565611 -1.107842 1.498852 -0.655996 -1.215298 0.639862
2 0.367796 -1.357311 -0.106241 -0.824072 1.055168 0.862952 0.475000
3 0.945560 0.359249 -0.282965 0.230909 -2.278477 1.656094 -0.031756
4 -0.611121 -0.159064 -0.711482 2.342169 0.044782 -0.955120 1.481766
chocolate snow
0 0.607185 0.694980
1 -0.666239 0.208806
2 0.018151 -0.656670
3 -0.438527 0.678592
4 1.035624 0.537486
import pandas as pd
example_list = [
{'name' : 'a',
'age' : 2,
'gender' : 'm'},
{'name' : 'b',
'age' : 5,
'gender' : 'm'
}]
df = pd.DataFrame(example_list)
print(df)
df.rename(columns = {'name':'First Name'}, inplace = True)
print(df)
Output
age gender name
0 2 m a
1 5 m b
age gender First Name
0 2 m a
1 5 m b
EDIT:
import pandas as pd
example_list = [
{
0 : 'a',
4 : 2,
6 : 'm'},
{
0 : 'b',
4 : 5,
6 : 'm'
}]
df = pd.DataFrame(example_list)
print(df)
df.rename(columns = {0:'apple', 4 : 'bannana', 6 : 'pear'}, inplace = True)
print(df)
OUTPUT:
0 4 6
0 a 2 m
1 b 5 m
apple bannana pear
0 a 2 m
1 b 5 m

Sumproduct ranges excluding rows with sum=0 or row containing blank cells

Here is a sample data set:
1 2 3 4 5 6 7 8 9 10 11 12
a 20 0 9 0 0 0 0 0 0 0 5 9
a 0 10 0 0 0 0 0 0 0 0 0 10
a 20 10 0 0 0 0 0 0 0 0 0 18
a 0 10 7 0 0 0 0 0 0 0 18
a 0 0 2 0 0 0 0 5 4.5 0 0 18
a 0 10 8 0 0 0 0 0 0 0 5 8
b 0 10 6 0 0 0 0 0 0 0 0 0
b 10 0 9 0 0 0 0 0 0 0 0 5
b 0 10 9 3.5 0 0 0 0 0 0 0
b 0 10 5 0 0 0 0 0 0 0 5 8
b 10 8 6 0 0 0 0 0 0 0 5 10
b 0 15 24 0 5 0 0 0 0 0 5 9
c 0 0 8 0 4.5 0 0 5 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0
c 10 10 27 0 0 0 0 0 0 0 0
c 5 20 5 0 10 0 0 0 0 0 0 0
c 10 10 10 0 0 0 0 0 0 0 0 10
d 0 0 0 0 0 0 0 0 0 0 0 0
d 10 5 5 0 0 0 0 0 0 5 10
I have to calculate the circular vector length (r) of each type: a, b, c and d.
The individuals of each set containing blank or like the second c containing all 0's give error for the formula I am using which forces me to calculate r for each individual first using formula-
For first a (in O column):
=IF(OR(COUNTBLANK(B2:M2)>0,SUM(B2:M2)=0),"",SQRT(SUMPRODUCT(B2:M2,B$1:M$1)^2+SUMPRODUCT(B2:M2,B$1:M$1)^2)/SUM(B2:M2))
Then for average over all a with:
=IF(SUMIFS(O$2:O$20,A$2:A$20,Q2)=0,"",AVERAGEIFS(O$2:O$20,A$2:A$20,Q2))
What I need is something to combine both formula so that excel:
First checks for matching type: a, b, c, d
Excludes rows with blanks or sum=0
Sumproduct each remaining row
Average over all a's for example
Any help is greatly appreciated

Rare error while computing dataframe by using pandas

I am dealing a rare error while making some machine learning with a dataset loaded using pandas.
This is the error I am getting:
I have been reading something releated to it and it seems to be due to the columns and how pandas interpret them but I have no clue what can be wrong.
This is the code I am using for this:
import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-
indians-diabetes/pima-indians-diabetes.data'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi',
'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names)
# define X and y
feature_cols = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi',
'pedigree', 'age']
X = pima[feature_cols]
y = pima.label
#k fold cv
from sklearn.model_selection import KFold, cross_val_score
kf = KFold(n_splits=10) #define number of splits
kf.get_n_splits(X) #to check how many splits will be done.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
clf = LinearDiscriminantAnalysis() #select the model for train, test in kf.split(X, y):
for train, test in kf.split(X, y):
y_pred_prob = clf.fit(X[train], y[train]).predict_proba(X[test])
y_pred_class = clf.predict(X[test])
Thanks in advance
In clf.fit method, according to the document parameters expected are array:
Parameters
----------
X : array-like, shape (n_samples, n_features)
Training data.
y : array, shape (n_samples,)
If you look at the example in link X and y are numpy array:
Try using as_matrix() for both X and y instead of just X = pima[feature_cols] and y = pima.label
X = pima[feature_cols].as_matrix()
y = pima.label.as_matrix()
Testing by print:
import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names)
# define X and y
feature_cols = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age']
X = pima[feature_cols].as_matrix()
y = pima.label.as_matrix()
#k fold cv
from sklearn.model_selection import KFold, cross_val_score
kf = KFold(n_splits=10) #define number of splits
kf.get_n_splits(X) #to check how many splits will be done.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
clf = LinearDiscriminantAnalysis() #select the model for train, test in kf.split(X, y):
for train, test in kf.split(X, y):
y_pred_prob = clf.fit(X[train], y[train]).predict_proba(X[test])
y_pred_class = clf.predict(X[test])
print(y_pred_class)
Result:
[1 0 1 0 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 1
0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0]
[0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 1 1]
[1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0
0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1
1 0 1]
[1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 1
0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0
0 1 0]
[0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0
0 0 0]
[0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0
0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1
1 0 0]
[0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 1
1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0]
[0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1
0 1 0]
[0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
0 0 1 0 0 1 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1
0 1]
[0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0
0 0]

J Language rank of power function

t=:1
test=: monad define
t=.y
t=. t, 0
)
testloop=: monad def'test^:y t'
testloop 1
1 0
testloop 2
1 0 0
testloop 10
1 0 0 0 0 0 0 0 0 0 0
In order to simplify this
(testloop 0),(testloop 1), (testloop 2), ...
110100100010000...
I tried
, testloop"0 (i.10)
but it gives
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0...
It seems like I have a problem with a rank, I can't figure out which one to use.
I would be grateful if you could help me on this issue.
Thank you!
This is not so much a rank problem as the fact that the results are padded with zeros so that the row lengths match.
testloop 1
1 0
testloop 2
1 0 0
testloop"0 [ 1 2
1 0 0
1 0 0
testloop"0 [ 1 2 3
1 0 0 0
1 0 0 0
1 0 0 0
If I redefine your test and testloop to add a different appending digit, we can see how the padding is working.
test2 =: 3 : 0
​t=. y
​t=. t,2
​)
test2loop=: monad def'test2^:y t'
test2loop"0 [1
1 2
test2loop"0 [2
1 2 2
test2loop"0 [ 1 2 NB. 0 padded in first row
1 2 0
1 2 2
test2loop"0 [ 1 2 3 NB. 0's padded in first two rows
1 2 0 0
1 2 2 0
1 2 2 2
To get around the padding issue I will use each=: &.> so that the results are boxed before combining to avoid the padding.
testloop each 1 2 3
+---+-----+-------+
|1 0|1 0 0|1 0 0 0|
+---+-----+-------+
testloop each i. 10
+-+---+-----+-------+---------+-----------+-------------+---------------+-----------------+-------------------+
|1|1 0|1 0 0|1 0 0 0|1 0 0 0 0|1 0 0 0 0 0|1 0 0 0 0 0 0|1 0 0 0 0 0 0 0|1 0 0 0 0 0 0 0 0|1 0 0 0 0 0 0 0 0 0|
+-+---+-----+-------+---------+-----------+-------------+---------------+-----------------+-------------------+
using ; to unbox and ravel the results
; testloop each i. 10
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
To be honest I would be more inclined to use the fact that complex numbers used as the left argument of # introduce 0's for padding. The number of 0's depends on the imaginary value of the complex number.
1j0 # 1
1
1j1 # 1
1 0
1j2 # 1
1 0 0
test3=: monad def '(1 j. y)#1'
test3 1
1 0
test3 2
1 0 0
test3 1 2
1 0 1 0 0
test3 i. 10
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

Resources