I'm new to python programming and trying to implement a code using argv(). Please find the below code for your reference. I want to apply filter where Offer_ID = 'O456' with the help of argv().
Code:
-----
import pandas as pd
import numpy as np
import string
import sys
data = pd.DataFrame({'Offer_ID':["O123","O456","O789"],
'Offer_Name':["Prem New Ste","Prem Exit STE","Online Acquisiton Offer"],
'Rule_List':["R1,R2,R4","R6,R2,R3","R10,R11,R12"]})
data.loc[data[sys.argv[1]] == sys.argv[2]] # The problem is here
print(data)
With this statement I'm getting the output -> "print(data.loc[data['Offer_ID'] =='O456'])"
but I want to accomplish it as shown here "data.loc[data[sys.argv[1]] == sys.argv[2]]" .
Below is the command line argument which I'm using.
python argv_demo2.py Offer_ID O456
Kindly assist me with this.
I'm a little confused as to what the issue is, but is this what you're trying to do?
import pandas as pd
import numpy as np
import string
import sys
data = pd.DataFrame({'Offer_ID':["O123","O456","O789"],
'Offer_Name':["Prem New Ste","Prem Exit STE","Online Acquisiton Offer"],
'Rule_List':["R1,R2,R4","R6,R2,R3","R10,R11,R12"]})
select = data.loc[data[sys.argv[1]] == sys.argv[2]] # The problem is here
print(select)
Related
I'm getting problems reading csv files using pandas on proxy in my student dorm:
drinks=pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv')
type(drinks)
I've try this, but it didn't help me:
import pandas as pd
import io
import requests
proxy_dict = "http://proxy.rcub.bg.ac.rs:8080"
s = requests.get('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv', proxies=proxy_dict).text
df = pd.read_csv(io.StringIO(s))
but I get these errors:
enter image description here
Any help with this?
Your proxy_dict is a string, not a dict. Use
proxy_dict = {"https": "http://proxy.rcub.bg.ac.rs:8080"}
I am working on this problem and unsure on how to proceed.
Load the R data set mtcars as a pandas dataframe.
Build a linear regression model by considering the log of independent variable wt, and log of dependent variable mpg.
Fit the model with data.
Perform ANOVA on the linear model obtained in the previous step.(Hint:Use anova.anova_lm)
Display the F-statistic value.
I see in another post below solution was provided. But it doesn't to seem work.
import statsmodels.api as sm
import numpy as np
mtcars = sm.datasets.get_rdataset('mtcars')
mtcars_data = mtcars.data
liner_model = sm.formula.ols('np.log(wt) ~ np.log(mpg)',mtcars_data)
liner_result = liner_model.fit()
print(liner_result.rsquared)'''
fixed it
import statsmodels.api as sm
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from statsmodels.stats import anova
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
df = pd.DataFrame(mtcars)
model = smf.ols(formula='np.log(mpg) ~ np.log(wt)', data=mtcars).fit()
print(anova.anova_lm(model))
print(anova.anova_lm(model).F["np.log(wt)"])
"Code was developed in pandas=0.24.2, and I need to make the code work in pandas=0.20.1. What is the alternative for pd.notna as it is not working in pandas version 0.20.1.
df.loc[pd.notna(df["column_name"])].query(....).drop(....)
I need an alternative to pd.notna to fit in this line of code to work in pandas=0.20.1
import os
import subprocess
import pandas as pd
import sys
from StringIO import StringIO
from io import StringIO
cmd = 'NSLOOKUP email.fullcontact.com'
df = pd.DataFrame()
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
b = StringIO(a.communicate()[0].decode('utf-8'))
df = pd.read_csv(b, sep=",")
column = list(df.columns)
name = list(df.iloc[1])[0].strip('Name:').strip()
name
New coder here, trying to run some t-tests in Python 3.6. Right now, to run my t-tests between my 2 data sets, I have been doing the following:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF
import numpy as np
import pandas as pd
import scipy
from scipy import stats
long_term_survivor_GENE1 = [-0.38,-0.99,-1.04,0.1, etc..]
short_term_survivor_GENE1 = [0.32, 0.33,0.96, etc...]
stats.ttest_ind(long_term_survivor_GENE1,short_term_survivor_GENE1)
Which requires me to manually enter the values for each column of both data sets for each specific gene (GENE1 in this case). Is there any way to be able to call for the values from the data set so that Python can just read the values without me typing them out myself? For example, some way that I can just say:
long_term_survivor_GENE1 = ##call values from GENE1 column from dataset 1##
short_term_survivor_GENE1 = ## call values from GENE1 column from dataset 2##
Thanks for any help, and sorry that I'm not very well-versed in this stuff. Appreciate any feedback/tips. If you have any other questions, please let me know!
If you've shoved your data into the columns of a pandas dataframe then it might be as easy as this.
>>> import pandas as pd
>>> long_term_survivor_GENE1 = [-0.38,-0.99,-1.04,0.1]
>>> short_term_survivor_GENE1 = [0.32, 0.33,0.96, 0.56]
>>> df = pd.DataFrame({'long_term_survivor_GENE1': long_term_survivor_GENE1, 'short_term_survivor_GENE1': short_term_survivor_GENE1})
>>> from scipy import stats
>>> stats.ttest_ind(df['long_term_survivor_GENE1'], df['short_term_survivor_GENE1'])
Ttest_indResult(statistic=-3.615804684179662, pvalue=0.011153077626049458)
It might be a good idea to review the statistics behind this though. If you haven't already got them in a dataframe then have a look for some of the many answers here on SO about using read_csv for assistance.
I am getting an error while using pandas get_dummies command. Can someone point out why?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
s = pd.read_csv('kddcup.txt')
t=s.apply(pd.Series.nunique)
print(t.index)
r = s.columns
print(r)
for n in range (0,len(t.index)):
if t[r[n]] == 1:
del s[r[n]
s = pd.get_dummies(s)//getting syntax error error here.
Spyder error code
You need to change del s[r[n] to del s[r[n]]. Each opening bracket must have a closing one.