In my following code, I am producing print-outs of randomly generated multiplication tables. I would like to make each table generated into a DataFrame. How would I do this? (New in Python 3.x)
This exercise was to generate a multiplication table. It expanded to a project to generate a set number of
multiplication tables with randomly generated one- or two-digit column and row numbers. Currently it is set to run five tables, each with 8 columns and rows. However, these numbers can be changed. Jupyter Notebook can
only print up to 12 columns nicely. While our program will generate as many columns and rows as we want (of equal size, eg, 6x6, 3x3, 9x9, etc), limiting it to a 12x12 matrix or smaller is best for viewing.
import pandas as pd
import numpy as np
%matplotlib notebook
# This sets up how many tables we will generate
for t in range(0,5):
# Make variable place holders for our columns and rows list
a=[]
b=[]
# To use randomly generated numbers, this sets up the random column numbers 'a' and random row numbers 'b'
import random
for x in range(12):
a.append(random.randint(41,99)) # We can adjust the range of the random selection of numbers here
b.append(random.randint(1,35)) # We can adjust the range of the random selection of number here
# Add the column titles for each table - these are the random numbers 'a'
print("C/R: ", end="\t ")
for number in a:
print(number,end = '\t ')
print()
# The double for-loop to generate the table
for row in b:
print(row, end="\t") # First column
for number in a:
print(round(row*number,1),end='\t' )# Next columns
print( )
# Add two blank cosmetic lines between tables for readability
print('\n\n')
Thanks.
Based upon Josewails' assistance, this is the code for which I was looking. Josewails, thank you.
import pandas as pd
import numpy as np
%matplotlib notebook
# This sets up how many tables we will generate
for t in range(0,5):
# Make variable place holders for our columns and rows list
a = []
b = []
dataframes = []
# To use randomly generated numbers, this sets up the random column numbers 'a' and random row numbers 'b'
import random
for x in range(12):
a.append(random.randint(41,99)) # We can adjust the range of the random selection of numbers here
b.append(random.randint(1,35)) # We can adjust the range of the random selection of number here
data = []
for row in b:
temp = []
for number in a:
temp.append(round(row*number,1))
data.append(temp)
dataframe = pd.DataFrame(data=data, columns=a)
dataframe.index = b
dataframes.append(dataframe)
display(dataframes[0])
Refactoring your code a bit should give the desired output.
import pandas as pd
import numpy as np
%matplotlib notebook
# This sets up how many tables we will generate
dataframes = []
for t in range(0,5):
# Make variable place holders for our columns and rows list
a = []
b = []
# To use randomly generated numbers, this sets up the random column numbers 'a' and random row numbers 'b'
import random
for x in range(12):
a.append(random.randint(41,99)) # We can adjust the range of the random selection of numbers here
b.append(random.randint(1,35)) # We can adjust the range of the random selection of number here
data = []
for row in b:
temp = []
for number in a:
temp.append(round(row*number,1))
data.append(temp)
dataframe = pd.DataFrame(data=data, columns=a)
dataframe.index = b
dataframes.append(dataframe)
dataframes[0]
This is the output.
pandas dataframe
Related
I have a DataFrame that has 5 columns including User and MP.
I need to extract a sample of n rows for each User, n being a percentage based on User (if User has 1000 entries and n is 5, select the first 50 rows and and go to the next User. After that I have to add all the samples to a new DataFrame. Also if User has multiple values on the column MP, for example if the user has 2 values in the column MP, select 2.5% for 1 value and 2.5% for the other.
Somehow my logic isn't that good(started with the first step, without adding the logic for multiple MPs)
df = pd.read_excel("Results/fullData.xlsx")
dfSample = pd.DataFrame()
uniqueValues = df['User'].unique()
print(uniqueValues)
n = 5
for u in uniqueValues:
sm = df["User"].str.count(u).sum()
print(sm)
for u in df['User']:
sample = df.head(int(sm*(n/100)))
#print(sample)
dfSample.append(sample)
print(dfSample)
dfSample.to_excel('testFinal.xlsx')
Check Below example. It is intentionally verbose for understanding. The column that solve problem is "ROW_PERC". You can filter it based on the requirement (50% rows or 25% rows) that are required for each USR/MP.
import pandas as pd
df = pd.DataFrame({'USR':[1,1,1,1,2,2,2,2],'MP':['A','A','A','A','B','B','A','A'],"COL1":[1,2,3,4,5,6,7,8]})
df['USR_MP_RANK'] = df.groupby(['USR','MP']).rank()
df['USR_MP_RANK_MAX'] = df.groupby(['USR','MP'])['USR_MP_RANK'].transform('max')
df['ROW_PERC'] = df['USR_MP_RANK']/df['USR_MP_RANK_MAX']
df
Output:
I have a dataframe as follows:
import numpy as np
import pandas as pd
df = pd.DataFrame({'text':['she is good', 'she is bad'], 'label':['she is good', 'she is good']})
I would like to compare row wise and if two same-indexed rows have the same values, replace the duplicate in the 'label' column with the word 'same'.
Desired output:
pos label
0 she is good same
1 she is bad she is good
so far, i have tried the following, but it returns an error:
ValueError: Length of values (1) does not match length of index (2)
df['label'] =np.where(df.query("text == label"), df['label']== ' ',df['label']==df['label'] )
Your syntax is not correct, have a look at the documentation of numpy.where.
Check for equality between your two columns, and replace the values in your label column:
import numpy as np
df['label'] = np.where(df['text'].eq(df['label']),'same',df['label'])
prints:
text label
0 she is good same
1 she is bad she is good
I have a following column in a dataframe:
COLUMN_NAME
1
0
1
1
65280
65376
65280
I want to convert 5 digit values in a column to their corresponding binary values. I know how to convert it by using bin() function, but i don't know how to apply it only to rows that has 5digits.
Note that the column contains only values with either 1 or 5 digits. Values with 1 digit is only 1 or 0.
import pandas as pd
import numpy as np
data = {'c': [1,0,1,1,65280,65376,65280] }
df = pd.DataFrame (data, columns = ['c'])
// create another column 'clen' which has length of 'c'
df['clen'] = df['c'].astype(str).map(len)
//check condition and apply bin function to entire column
df.loc[df['clen']==5,'c'] = df['c'].apply(bin)
I got the following simple code to calculate normality over an array:
import pandas as pd
df = pd.read_excel("directory\file.xlsx")
import numpy as np
x=df.iloc[:,1:].values.flatten()
import scipy.stats as stats
from scipy.stats import normaltest
stats.normaltest(x,axis=None)
This gives me nicely a p-value and a statistic.
The only thing I want right now is to:
Add 2 columns in the file with this p value and statistic and if i have multiple rows, do it for all the rows (calculate p value & statistic for each row and add 2 columns with these values in it).
Can someone help?
If you want to calculate row-wise normaltest, you should not flatten your data in x and use axis=1 such as
df = pd.DataFrame(np.random.random(105).reshape(5,21)) # to generate data
# calculate normaltest row-wise without the first column like you
df['stat'] ,df['p'] = stats.normaltest(df.iloc[:,1:],axis=1)
Then df contains two columns 'stat' and 'p' with the values your are looking for IIUC.
Note: to be able to perform normaltest, you need at least 8 values (according to what I experienced) so you need at least 8 columns in df.iloc[:,1:] otherwise it will raise an error. And even, it would be better to have more than 20 values in each row.
I am new to Python. I have to create a 6x6 matrix of unique random numbers with in the range of 1-100. I have created the matrix of random numbers but they are not unique. Moreover, I have to allow the user to select a number from that very specific grid shown on the screen, but I don't know how to do it.
Here is my code
# creating a matrix of 6 x 6 but the values of grid should b unique i;e creating a matrix of 36 values
import random ;
Random_matrix=[[random.randint(1,100) for row in range(6)] for column in range(6)]
# loops to print the grid
for i in range(6):
for j in range(6):
print(Random_matrix[i][j],end="\t")
print()
# allow the user to input a no so that it could b matched with the no selected by the system
user_no= int(input("hey buddy! guess the number : "))
I think you are looking for the user to input the col and row number of the matrix. it can be done with:
a = [int(x) for x in input("Enter Column and then Row with a space: ").split()]
this you just need to place the values as
Random_matrix[a[0]][a[1]]
I hope this helps