How to read_csv file from url with pandas on proxy? - python-3.x

I'm getting problems reading csv files using pandas on proxy in my student dorm:
drinks=pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv')
type(drinks)
I've try this, but it didn't help me:
import pandas as pd
import io
import requests
proxy_dict = "http://proxy.rcub.bg.ac.rs:8080"
s = requests.get('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv', proxies=proxy_dict).text
df = pd.read_csv(io.StringIO(s))
but I get these errors:
enter image description here
Any help with this?

Your proxy_dict is a string, not a dict. Use
proxy_dict = {"https": "http://proxy.rcub.bg.ac.rs:8080"}

Related

Having issues with sys.argv()

I'm new to python programming and trying to implement a code using argv(). Please find the below code for your reference. I want to apply filter where Offer_ID = 'O456' with the help of argv().
Code:
-----
import pandas as pd
import numpy as np
import string
import sys
data = pd.DataFrame({'Offer_ID':["O123","O456","O789"],
'Offer_Name':["Prem New Ste","Prem Exit STE","Online Acquisiton Offer"],
'Rule_List':["R1,R2,R4","R6,R2,R3","R10,R11,R12"]})
data.loc[data[sys.argv[1]] == sys.argv[2]] # The problem is here
print(data)
With this statement I'm getting the output -> "print(data.loc[data['Offer_ID'] =='O456'])"
but I want to accomplish it as shown here "data.loc[data[sys.argv[1]] == sys.argv[2]]" .
Below is the command line argument which I'm using.
python argv_demo2.py Offer_ID O456
Kindly assist me with this.
I'm a little confused as to what the issue is, but is this what you're trying to do?
import pandas as pd
import numpy as np
import string
import sys
data = pd.DataFrame({'Offer_ID':["O123","O456","O789"],
'Offer_Name':["Prem New Ste","Prem Exit STE","Online Acquisiton Offer"],
'Rule_List':["R1,R2,R4","R6,R2,R3","R10,R11,R12"]})
select = data.loc[data[sys.argv[1]] == sys.argv[2]] # The problem is here
print(select)

Reading CSV file with proper encoding in pandas

I can not read the csv file in my jupiternotebook, the following is the link github link of the csv file
https://github.com/roshanthokchom/new-assignment/blob/master/spam.csv
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
import urllib
pd.read_csv('spam.csv',encoding='latin-1')
ParserError: Error tokenizing data. C error: Expected 2 fields in line 13, saw 4
#Roshan here is the solution to your problem:
import pandas as pd
import csv
with open('spam.csv', newline='') as f:
csvread = csv.reader(f)
raw_data = list(csvread)
data = []
for i in batch_data:
i = i[0].split("\t")
data.append(i)
final_data = pd.DataFrame(data)
You can specify encoding as you have done but your file consists of commas in between text so if you read normally pandas will separate data based on ",". Thats why you are getting an error

Import dataset from url and convert text to csv in python3

I am pretty new to Python (using Python3) and read Pandas to import dataset.
I need to import dataset from url - https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt
and convert it to csv file, I am getting some special character in converted csv -> ��
I am download txt file and converting it to csv, is is the right approach?
and converted csv is putting entire text into one column
from urllib.request import urlretrieve
import pandas as pd
from pandas import DataFrame
url = 'https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt'
urlretrieve(url, 'index.txt')
df = pd.read_csv('index.txt', sep='/t', engine='python', lineterminator='\r\n')
csv_file = df.to_csv('index.csv', sep='\t', index=False, header=True)
print(csv_file)
after successful import, I have to Extract X as all columns except the first column and Y as first column also.
I'll appreciate your all help.
from urllib.request import urlretrieve
import pandas as pd
url = 'https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt'
urlretrieve(url, 'index.txt')
df = pd.read_csv('index.txt', sep='\t',encoding='utf-16')
Y = df[['REMISS']]
X = df.drop(['REMISS'],axis=1)

why I get KeyError when I extract data with specific keywords from CSV file using python?

I am trying to use below code to get posts with specific keywords from my csv file but I keep getting KeyErro "Tag1"
import re
import string
import pandas as pd
import openpyxl
import glob
import csv
import os
import xlsxwriter
import numpy as np
keywords = {"agile","backlog"}
# all your keywords
df = pd.read_csv(r"C:\Users\ferr1982\Desktop\split1_out.csv",
error_bad_lines=False)#, sep="," ,
encoding="utf-8")
output = pd.DataFrame(columns=df.columns)
for i in range(len(df.index)):
#if (df.loc[df['Tags'].isin(keywords)]):
if any(x in ((df['Tags1'][i]),(df['Tags2'][i]), (df['Tags3'][i] ),
(df['Tags4'][i]) , (df['Tags5'][i])) for x in keywords):
output.loc[len(output)] = [df[j][i] for j in df.columns]
output.to_csv("new_data5.csv", incdex=False)
Okay, it turned to be that there is a little space before "Tags" column in my CSV file !
it is working now after I added the space to the name in the code above.

how to convert column titles from int to str in python

I imported a file from spss, (sav file), however, the titles of my columns
appear as integers instead of strings. Is there a way to fix it? Below is the code I used....I would apreciate any help!
import fnmatch
import sys # import sys
import os
import pandas as pd #pandas importer
import savReaderWriter as spss # to import file from SPSS
import io #importing io
import codecs #to resolve the UTF-8 unicode
with spss.SavReader('file_name.sav') as reader: #Should I add "Np"
records = reader.all()
with codecs.open('file_name.sav', "r",encoding='utf-8', errors='strict')
as fdata: # Not sure if the problems resides on this line
df = pd.DataFrame(records)
df.head()
Wondering whether there is a way to actually convert the titles from numbers to strings. It has happened as if it were excel, but excel has an easy fix for that.
Thanks in advance!
After you have created the DataFrame, you can use df.columns = df.columns.map(str) to change the column headers to strings.

Resources