Compare each element of CSV file to every element of a different CSV file, and find the most similar elements - python-3.x

I have two CSV files which I need to compare. The first one is called SAP.csv, and the second is SAPH.csv.
SAP.csv has these cells:
Notification Description
5000000001 Detailed Inspection of Masts (2100mm) (3
5000000002 Ceremonial Awnings-Survey and Load Test
5000000003 HPA-Carry out 4000 hour service routine
5000000004 UxE 8 in Number Temperature Probs for C
5000000005 Overhaul valves
...while, SAPH.csv has these cells:
Notification Description
4000000015 Detailed Inspection of Masts (2100mm) (3
4000000016 Ceremonial Awnings-Survey and Load Test
4000000017 HPA-Carry out 8000 hour service routine
4000000018 UxE 8 in Number Temperature Probs for C
4000000019 Represerve valves
4000000020 STW System
They are similar, but some lines, like the fourth, (HPA-Carry out 4000 hour service routine vs. HPA-Carry out 8000 hour service routine), are slightly different.
I want to compare each value of SAP.csv against every value of SAPH.csv, and, using cosine similarity, find the most similar lines, so that the output would look something like this (the similarity percentages here are just examples, not what they would actually be):
Description
Detailed Inspection of Masts (2100mm) (3 - 100%
Ceremonial Awnings-Survey and Load Test - 100%
HPA-Carry out 4000 hour service routine - 85%
UxE 8 in Number Temperature Probs for C - 90%
Overhaul valves - 0%
Post answer edit
runfile('C:/Users/andrew.stillwell2/.spyder-py3/Estimating Test.py', wdir='C:/Users/andrew.stillwell2/.spyder-py3')
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/andrew.stillwell2/.spyder-py3/Estimating Test.py', wdir='C:/Users/andrew.stillwell2/.spyder-py3')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/andrew.stillwell2/.spyder-py3/Estimating Test.py", line 31, in
similarity_score = similar(job, description) # Get their similarity
File "C:/Users/andrew.stillwell2/.spyder-py3/Estimating Test.py", line 14, in similar
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
File "C:\ProgramData\Anaconda3\lib\site-packages\textdistance\algorithms\base.py", line 173, in distance
return self.maximum(*sequences) - self.similarity(*sequences)
File "C:\ProgramData\Anaconda3\lib\site-packages\textdistance\algorithms\base.py", line 176, in similarity
return self(*sequences)
File "C:\ProgramData\Anaconda3\lib\site-packages\textdistance\algorithms\token_based.py", line 175, in call
return intersection / pow(prod, 1.0 / len(sequences))
ZeroDivisionError: float division by zero
2nd Edit because of solution to the above
So the original request had just two outputs - Description and Similairty score.
Description comes from SAP
Similarity comes from the textdistance calc
Can the solution be ammended to the following
Notifcation (this is a 10 digit number which is in the SAP file)
Description (as it currently is)
Similarity (as it currently is)
Notification (this number comes from the SAPH file and would be the one which provides the similarity score)
So an example row output would like this
80000115360 Additional Materials FWD Rope Guard 86.24% 7123456789
This would be along columns A, B, C, D
A, B comes from SAP
C is calculated
D comes from SAPH
Edit 3
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/andrew.stillwell2/.spyder-py3/Est Test 2.py", line 16, in
SAP = pd.read_csv('H:\Documents/Python/Import into Python/SAP/SAP.csv', dtype={'Notification':'string'})
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 895, in init
self._make_engine(self.engine)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1853, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 490, in pandas._libs.parsers.TextReader.cinit
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\common.py", line 2017, in pandas_dtype
dtype))
TypeError: data type 'string' not understood
Post edit 4 - 25/10/20
Hi, so getting the same error as before I think
This email may contain proprietary information of BAE Systems and/or third parties.
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/andrew.stillwell2/.spyder-py3/Est Test 2.py", line 16, in
SAP = pd.read_csv('H:\Documents/Python/Import into Python/SAP/SAP.csv', dtype={'Notification':'string'}, delimiter=",", engine="python")
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read
data = parser.read(nrows)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read
ret = self._engine.read(nrows)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 2421, in read
data = self._convert_data(data)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 2487, in _convert_data
clean_conv, clean_dtypes)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1705, in _convert_to_ndarrays
cvals = self._cast_types(cvals, cast_type, c)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1808, in _cast_types
copy=True, skipna=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 623, in astype_nansafe
dtype = pandas_dtype(dtype)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\common.py", line 2017, in pandas_dtype
dtype))
TypeError: data type 'string' not understood
I picked up on your bit about the delimiter so I uploaded a csv file to repl.it and it looks as though "," is the delimiter.
Therefore have altered the code to suit. When I did that on repl.it it worked.
This is the code I am using
import textdistance
import pandas as pd
def similar(a, b): # adapted from here: https://stackoverflow.com/a/63838615/8402369
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
return similarity * 100
Read the CSVs
SAP = pd.read_csv('H:\Documents/Python/Import into Python/SAP/SAP.csv', dtype={'Notification':'string'}, delimiter=",", engine="python")
SAPH = pd.read_csv('H:\Documents/Python/Import into Python/SAP/SAP_History.csv', dtype={'Notification':'string'}, delimiter=",", engine="python")
Create a pandas dataframe to store the output. The column 'Description' is populated with the values of SAP['Description']
scores = pd.DataFrame(SAP['Description'], columns = ['Notification (SAP)','Description', 'Similarity', 'Notification (SAPH)'])
Temporary variable to store the highest similarity score
highest_score = 0
desc = 0
Iterate though SAP['Description']
for job in SAP['Description']:
highest_score = 0 # Reset highest_score in each iteration
for description in SAPH['Description']: # Iterate through SAPH['Description']
similarity_score = similar(job, description) # Get their similarity
if(similarity_score > highest_score): # Check if the similarity is higher than the already saved similarity. If so, update highest_score with the new values
highest_score = similarity_score
desc = str(description)
if(similarity_score == 100): # If it's a perfect match, don't bother continuing to search.
break
Update the dataframe 'scores' with highest_score and other values
print(SAPH['Description'][SAPH['Description'] == desc])
scores['Notification (SAP)'][scores['Description'] == job] = SAP['Notification'][SAP['Description'] == job]
scores['Similarity'][scores['Description'] == job] = f'{highest_score}%'
scores['Notification (SAPH)'][scores['Description'] == job] = SAPH['Notification'][SAPH['Description'] == desc]
print(scores)
Output it to Scores.csv without the index column
with open('./Scores.csv', 'w') as file:
file.write(scores.__repr__())
Which is being run on Spyder (Python 3.7)

#George_Pipas's answer to this question demonstrates an example using the library textdistance (I'm paraphrasing part of his answer here):
A solution is to work with the textdistance library. I will provide an example of Cosine Similarity
import textdistance
1-textdistance.Cosine(qval=2).distance('Apple', 'Appel')
and we get:
0.5
So, we can create a similarity finding function:
def similar(a, b):
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
return similarity
Depending on the similarity, this'll output a number closer to 1, if a and b are more similar, and it'll output a number closer to 0 if they aren't. So if a === b, the output will be 1, but if a !== b, the output will be less than 1.
To get percentages, you just need to multiply the output by 100. Like this:
def similar(a, b): # adapted from here: https://stackoverflow.com/a/63838615/8402369
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
return similarity * 100
CSV files can be read pretty easily with pandas:
# Read the CSVs
SAP = pd.read_csv('SAP.csv')
SAPH = pd.read_csv('SAPH.csv')
We create another pandas dataframe to store the results we'll compute in:
# Create a pandas dataframe to store the output. The column 'SAP' is populated with the values of SAP['Description']
scores = pd.DataFrame({'SAP': SAP['Description']}, columns = ['SAP', 'SAPH', 'Similarity'])
Now, we iterate through SAP['Description'] and SAPH['Description'], compare each element against each other element, compute their similarity, and save the highest to scores.
# Temporary variable to store both the highest similarity score, and the 'SAPH' value the score was computed with
highest_score = {"score": 0, "description": ""}
# Iterate though SAP['Description']
for job in SAP['Description']:
highest_score = {"score": 0, "description": ""} # Reset highest_score at each iteration
for description in SAPH['Description']: # Iterate through SAPH['Description']
similarity_score = similar(job, description) # Get their similarity
if(similarity_score > highest_score['score']): # Check if the similarity is higher than the already saved similarity. If so, update highest_score with the new values
highest_score['score'] = similarity_score
highest_score['description'] = description
if(similarity_score == 100): # If it's a perfect match, don't bother continuing to search.
break
# Update the dataframe 'scores' with highest_score
scores['SAPH'][scores['SAP'] == job] = highest_score['description']
scores['Similarity'][scores['SAP'] == job] = highest_score['score']
Here's a breakdown:
A temporary variable, highest_score is created to store, well, the highest computed scores.
Now we iterate thorough SAP['Description'], and within, iterate though SAPH['Description']. This allows us to compare each value of SAP['Description'] (job) to every value of SAPH['Description'] (description).
While iterating though SAPH['Description'], we:
Compute the similarity score of both job and description
If it's higher than the saved score in highest_score, we update highest_score accordingly; otherwise we continue
If similarity_score is equal to 100, we know that it's a perfect match, and don't have to keep looking. We break the loop in this case.
Outside of the SAPH['Description'] loop, now that we've compared job to each element of SAPH['Description'], (or found a perfect match), we save the values to scores.
This repeats for every element of SAP['Description'].
Here's what scores looks like when it's finished:
SAP SAPH Similarity
0 Detailed Inspection of Masts (2100mm) (3 Detailed Inspection of Masts (2100mm) (3 100
1 Ceremonial Awnings-Survey and Load Test Ceremonial Awnings-Survey and Load Test 100
2 HPA-Carry out 4000 hour service routine HPA-Carry out 8000 hour service routine 94.7368
3 UxE 8 in Number Temperature Probs for C UxE 8 in Number Temperature Probs for C 100
4 Overhaul valves Represerve valves 53.4522
And after outputting it to a CSV file with this:
# Output it to Scores.csv without the index column (0, 1, 2, 3... far left in scores above). Remove index=False if you want to keep the index column.
scores.to_csv('Scores.csv', index=False)
...Scores.csv looks like this:
SAP,SAPH,Similarity
Detailed Inspection of Masts (2100mm) (3,Detailed Inspection of Masts (2100mm) (3,100
Ceremonial Awnings-Survey and Load Test,Ceremonial Awnings-Survey and Load Test,100
HPA-Carry out 4000 hour service routine,HPA-Carry out 8000 hour service routine,94.73684210526315
UxE 8 in Number Temperature Probs for C,UxE 8 in Number Temperature Probs for C,100
Overhaul valves,Represerve valves,53.45224838248488
View the full code, and run and edit it online
Note that textdistance and pandas are required libraries for this. Install them, if you don't have them already, with:
pip install textdistance pandas
Notes:
You can round the percent by replacing f'{highest_score}%' with this: f'{round(highest_score, NUMBER_OF_PLACES_TO_ROUND_TO)}%'
Here's a formatted version, and here's the code
EDIT: (for the problems encountered that are mentioned in the comments)
Here is an error-catching version of the similarity function:
def similar(a, b): # adapted from here: https://stackoverflow.com/a/63838615/8402369
try:
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
return similarity * 100
except ZeroDivisionError:
print('There was an error. Here are the values of a and b that were passed')
print(f'a: {repr(a)}')
print(f'b: {repr(b)}')
exit()

Related

Error downloading historical stcok data using pandas_datareader Anconda3, Spyder 5.3.3

I have watch list of 30 stocks. The list is in a text file called "WatchList". I initialize the list as:
stock = []
and read the symbols line by line. I specify a location to store the data in csv format for each symbol.
I have the latest version of pandas_datareader and it is 0.10.0. I have used a while loop and pandas_datareader before. However, now I am experiencing problems. I receive the following error message:
runfile('E:/Stock_Price_Forecasting/NewStockPriceFetcher.py', wdir='E:/Stock_Price_Forecasting')
Enter the name of file to access WatchList
WatchList.txt
0 AAPL <class 'str'>
Traceback (most recent call last):
File "C:\Users\Om\anaconda3\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "e:\stock_price_forecasting\newstockpricefetcher.py", line 60, in
df = web.DataReader(stock[i], data_source='yahoo', start=start_date, end=end_date)
File "C:\Users\Om\anaconda3\lib\site-packages\pandas\util_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\data.py", line 370, in DataReader
return YahooDailyReader(
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\base.py", line 253, in read
df = self._read_one_data(self.url, params=self._get_params(self.symbols))
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py", line 153, in _read_one_data
data = j["context"]["dispatcher"]["stores"]["HistoricalPriceStore"]
TypeError: string indices must be integers
The portion of my code that shows the while loop is shown below:
i = 0
while i < len(stock):
print(i, stock[i], type(stock[i]))
# Format the filename for each security to use in full path
stock_data_file = stock[i] + '.csv'
# Complete the path definition for stock data storage including filename
full_file_path = (file_path/stock_data_file)
# Specify the order for the columns
columnTitles = ('Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close')
# Pull the data for the stock from the Web
df = web.DataReader(stock[i], data_source='yahoo', start=start_date,
end=end_date) ** error in this line!!
# Reorder the columns for plotting Candlesticks
df=df.reindex(columns=columnTitles)
if i == 0:
df.to_csv(full_file_path)
print(i, stock[i], 'has data stored to csv file')
else:
df.to_csv(full_file_path, header=True)
print(i, stock[i], 'has data stored to csv file')
i += 1
I have looked at the parameter requirements for the Datareader and Yahoo. I belive the first paramataer is the ticker and a string value. I have been unable to find out where I am making a mistake. Any suggestions in solving this issue would be greatly appreciated. Thank you.

What is causing this issue when trying to get yahoo_fin to return prices for a list of tickers?

I have a list of tickers that I want to retrieve the prices for by running the following:
from yahoo_fin import stock_info as si
for x in watchlist:
print(si.get_live_price(x))
When I run this I get the following error:
File "", line 1, in
runfile('C:/Users/User/OneDrive/Documents/Stuff/fluff 2.py', wdir='C:/Users/User/OneDrive/Documents/Stuff')
File
"D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 705, in runfile
execfile(filename, namespace)
File
"D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/User/OneDrive/Documents/Stuff/fluff 2.py", line 46,
in
print(si.get_live_price(x))
File "D:\Anaconda3\lib\site-packages\yahoo_fin\stock_info.py", line
338, in get_live_price
df = get_data(ticker, end_date = pd.Timestamp.today() + pd.DateOffset(10))
File "D:\Anaconda3\lib\site-packages\yahoo_fin\stock_info.py", line
68, in get_data
temp = loads(needed)
ValueError: Expected object or value
However, when I refer to a ticker directly, it runs normally:
print(si.get_live_price('tsla'))
348.8399963378906
What could be causing this issue? Is it due to me using a different html parser than that used with yahoo_fin in an earlier part of the code?
Try this out, It gives you complete dataframe for last 6 months data
import yfinance as yf
for x in ['TSLA','AAPL']:
data = yf.download( tickers = x)
print(data['Close'][-1])
Output :
348.8399963378906
268.4800109863281
If you want last 6 month data then you can store individual dataframe. In above case I have printed only last index as you wanted LTP.
This issue should be fixed now in the latest version of yahoo_fin (0.8.4). It was due to a change in Yahoo Finance's structure. See here for news about recent updates: http://theautomatic.net/2019/12/16/updates-to-yahoo_fin-package/

python error in implementing csv file

i am getting this error when i try to run quora duplicates files on my feature python file,
the part of code i am running is below
data = pd.read_csv('train.csv', sep='\t')
data = data.drop(['id', 'qid1', 'qid2'], axis=1)
and the output is
unfile('/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py', wdir='/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master')
Traceback (most recent call last):
File "<ipython-input-31-e29a1095cc40>", line 1, in <module>
runfile('/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py', wdir='/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master')
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py", line 55, in <module>
data = data.drop(['id','qid1','qid2'], axis=1)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2530, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2562, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3744, in drop
labels[mask])
ValueError: labels ['id' 'qid1' 'qid2'] not contained in axis
my csv file is like this
"id","qid1","qid2","question1","question2","is_duplicate"
"0","1","2","What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share market?","0"
"1","3","4","What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?","0"
please help me in trying to figure out the problem
you need to remove the separator argument \ because content in csv already has , as a separator:
# sample.csv file contains following data
"id","qid1","qid2","question1","question2","is_duplicate"
"0","1","2","What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share ,"0"
"1","3","4","What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor(-i-Noor) diamond back?","0"
df = pd.read_csv('sample.csv')
data = df.drop(['id', 'qid1', 'qid2'], axis=1)
print data
#output will be like this:
"question1","question2","is_duplicate"
"What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share ,"0"
"What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor(-i-Noor) diamond back?","0"

ValueError in scipy t test_ind

I have following csv file:
SRA ID ERR169499 ERR169498 ERR169497
Label 1 0 1
TaxID PRJEB3251_ERR169499 PRJEB3251_ERR169499 PRJEB3251_ERR169499
333046 0.05 0.99 99.61
1049 0.03 2.34 34.33
337090 0.01 9.78 23.22
99007 22.33 2.90 0.00
I have 92 columns for case for which label is 0 and 95 columns for control for which label is 1. I have to perform two sample independent T-Test and ranksum test So far I have:
df = pd.read_csv('final_out_transposed.csv', header=[1,2], index_col=[0])
case = df.xs('0', axis=1, level=0).dropna()
ctrl = df.xs('1', axis=1, level=0).dropna()
(tt_val, p_ttest) = ttest_ind(case, ctrl, equal_var=False)
For which I am getting the error: ValueError: operands could not be broadcast together with shapes (92,) (95,).
The traceback is:
File "<ipython-input-152-d58634e75106>", line 1, in <module>
runfile('C:/IBD Bioproject/New folder/temp_3251.py', wdir='C:/IBD
Bioproject/New folder')
File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/IBD Bioproject/New folder/temp_3251.py", line 106, in <module>
tt_val, p_ttest = ttest_ind(case, ctrl, equal_var=False)
File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\scipy\stats\stats.py", line 4068, in ttest_ind
df, denom = _unequal_var_ttest_denom(v1, n1, v2, n2)
File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\scipy\stats\stats.py", line 3872, in _unequal_var_ttest_denom
df = (vn1 + vn2)**2 / (vn1**2 / (n1 - 1) + vn2**2 / (n2 - 1))
ValueError: operands could not be broadcast together with shapes (92,) (95,)
I read few posts but its still unclear also I went through numpy broadcast.
Thanks in advance
Apparently the objects created by the xs method of the Pandas DataFrame look like two-dimensional arrays. These must be flattened to look like one-dimensional arrays when passed to ttest_ind.
Try this:
ttest_ind(case.values.ravel(), ctrl.values.ravel(), equal_var=False)
The values attribute of the Pandas objects gives a numpy array, and the ravel() method flattens the array to one-dimension.

Trouble using exponents in python 3.6 [duplicate]

This question already has answers here:
How can I read inputs as numbers?
(10 answers)
Closed 5 years ago.
I'm trying to create a script that allows users to input numbers and then the numbers are used for to create an exponent. Here's what i'm writing. I'm in python 3.6 and using anaconda/spyder.
print ("Enter number x: ")
x = input()
print ("Enter number y: ")
y = input()
import numpy
numpy.exp2([x,y])
so what I want is for the user to enter a value for x. for example 2. then enter a value for y. for example 3. then i'd like (with the help of numpy) create the exponent 2**3 which equals 8. instead i get this.
Enter number x:
2
Enter number y:
3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "/anaconda/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/Ameen/Desktop/assignment1.py", line 16, in <module>
numpy.exp2([x,y])
TypeError: ufunc 'exp2' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I tried print ('xy') but that printed xy. so i came across this site https://docs.scipy.org/doc/numpy/reference/generated/numpy.exp2.html#numpy.exp2 they suggested np.exp2. when I tried np.exp2([x**y]) it says np is not defined and an error message saying the numpy library is not imported. so now i'm trying numpy.exp2 and got the error above.
Convert strings to integers:
import numpy
print('Enter number x:')
x = int(input())
print('Enter number y:')
y = int(input())
print(numpy.exp2([x, y])) #=> [ 4. 8.]

Resources