error in importing .csv file in ipython notebook - python-3.x

I was trying to load a .csv file which is present on desktop in iPython notebook but it is showing an error as invalid syntax
here is my code and file I used:
data = np.loadtxt("C:/Users/rj/Desktop/data.csv",dtype={ 'formats':('S10', 'f8','f8','f8','f8', 'f8','f8','f8','f8')},delimiter=',')
data.csv file contains :
24-Dec-15,378.45,380.9,384.75,377.6,382.35,382.4,382.39,4568751
28-Dec-15,382.4,384.9,395,383.75,394.85,394,391.54,7166351
29-Dec-15,394,392.9,397.5,388.75,390.7,391.85,392.95,7359611
30-Dec-15,391.85,392,395,390.5,394,393.45,393.11,4866177
31-Dec-15,393.45,394,395.75,389.15,391.6,391.3,391.85,6410622
01-Jan-16,391.3,392.5,403,373,401.8,401.9,398.24,4377363
04-Jan-16,401.9,400,400.1,375.05,376.15,377.05,383.74,7822660
05-Jan-16,377.05,381.05,382.45,372.1,373,374.45,377.36,6901068
06-Jan-16,374.45,374.25,375.5,364.6,365,365.9,370.04,7211230
07-Jan-16,365.9,356.25,358,338.1,344.8,343.55,347.83,11782307
08-Jan-16,343.55,345.6,355.85,345.6,353.9,353.35,351.97,8770370
error is:
File "<ipython-input-13-177939f245ba>", line 21
... 'formats':('S10', 'f8','f8','f8','f8', 'f8','f8','f8','f8')},delimiter=',')
^
SyntaxError: invalid syntax
how to correct syntax?

How about:
import numpy as np
names = ['date', 'a', 'b', 'c', 'e', 'f', 'g', 'h', 'i']
formats = ['S10', 'f8','f8','f8','f8', 'f8','f8','f8','f8']
data = np.loadtxt('C:/Users/rj/Desktop/data.csv', \
dtype=list(zip(names, formats)), delimiter=',')
Of course you'd probably prefer more meaningful names. Python 2 doesn't need list(...) on the zip object.

Related

Reading CSV column values and append to List in Python

I'd like to read a column from a CSV file and store those values in a list
The CSV file is currently as below
Names
Tom
Ryan
John
The result that I'm looking for is
['Tom', 'Ryan', 'John']
Below is the code that I've written.
import csv
import pandas as pd
import time
# Declarations
UserNames = []
# Open a csv file using pandas
data_frame = pd.read_csv("analysts.csv", header=1, index_col=False)
names = data_frame.to_string(index=False)
# print(names)
# Iteration
for name in names:
UserNames.append(name)
print(UserNames)
So far the result is as follows
['T', 'o', 'm', ' ', '\n', 'R', 'y', 'a', 'n', '\n', 'J', 'o', 'h', 'n']
Any help would be appreciated.
Thanks in advance
Hi instead of using converting your Dataframe to a String you could just convert it to a list like this:
import pandas as pd
import csv
import time
df = pd.read_csv("analyst.csv", header=0)
names = df["Name"].to_list()
print(names)
Output: ['tom', 'tim', 'bob']
Csv File:
Name,
tom,
tim,
bob,
I was not sure how your csv really looked like so you could have to adjust the arguments of the read_csv function.

pandas to_parquet to s3 url leaves a trail of empty directories interpreted from the s3 url

Below is the code that I ran:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 5))
df.columns = ['a', 'b', 'c', 'd', 'e']
df['p'] = 2
df.to_parquet('s3://my_bucket/test01/boo.parquet', engine='fastparquet', compression='gzip', partition_cols=['p'])
The parquet is saved to s3. But at my working dir, i now have a dir called "s3:", which has the full structure interpreted from the s3 url.
Ok, i realize that this is a fastparquet quirk.
This only happens if partition_cols is provided and engine='fastparquet'.
If no partition_cols is provided, or if I use default engine (which is engine='pyarrow'), this empty dir artifact will not appear. It just looks like a weird quirk with fastparquet.

gspread worksheet.update error - Worksheet has no attribute 'update'

I am trying to write a dataframe to an open Google Sheet in Google Colab, but am getting the error:
AttributeError: 'Worksheet' object has no attribute 'update'
I documented and tested the parts up to the error.
# General Imports
# Example at https://colab.research.google.com/notebooks/io.ipynb
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
#Import the library, authenticate, and create the interface to Sheets.
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
import numpy as np
import pandas as pd
# Load the DataFrame
dataframe = [['A', 'B', 'C'], ['1', '2' ,'3'], ['Mary', 'Mai', 'Kat']]
print(dataframe)
# Open the Google Sheet
# This assumes that you have worksheet called "RS Output" with sheet "Data" on your Google Drive,
gc = gspread.authorize(GoogleCredentials.get_application_default())
my_workbook = 'RS Output'
my_sheet = "Data"
worksheet = gc.open(my_workbook).worksheet(my_sheet)
list_of_lists = worksheet.get_all_values()
print(list_of_lists)
# update the Google Sheet with the values from the Dataframe
# per gspread documentation at
# https://gspread.readthedocs.io/en/latest/user-guide.html
worksheet.update([dataframe.columns.values.tolist()] + worksheet.values.tolist())
This is the output:
[['A', 'B', 'C'], ['1', '2', '3'], ['Mary', 'Mai', 'Kat']]
[['Testing'], ['This']]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-132-e085af26b2ed> in <module>()
21 # https://gspread.readthedocs.io/en/latest/user-guide.html
22
---> 23 worksheet.update([dataframe.columns.values.tolist()] + worksheet.values.tolist())
AttributeError: 'Worksheet' object has no attribute 'update'
I can't seem to find a clear example of how to write the dataframe to a Google Sheet.
Thanks
I had same issue, this is my first time using colab.research.google.com notebook.
it turned out the default gspread module was version 3.0
!pip install --upgrade gspread!
Updated it to version 3.7 and fixed the missing .update() problem.
Found existing installation: gspread 3.0.1
Uninstalling gspread-3.0.1:
Successfully uninstalled gspread-3.0.1
Successfully installed gspread-3.7.0
Big thanks to: Shashank Rautela
AttributeError: 'Worksheet' object has no attribute 'update' means that the variable worksheet has no update attribute in it, you can verify that by adding print(dir(worksheet)) in your code.
If the variable has update attribute, it should print something like this:
Also, I tried to replicate your code and found some issues:
dataframe = [['A', 'B', 'C'], ['1', '2' ,'3'], ['Mary', 'Mai', 'Kat']] is already a list. dataframe.columns.values.tolist() will give you error since the variable dataframe is a list and has no attribute columns. Using only the variable dataframe in the update method is enough since it is already a list of list. Here is an example of a dataframe: {'col1': [1, 2], 'col2': [3, 4]}.
Incorrect worksheet.update() usage. According to this document, the parameter of update() are range and values (list of list if the range contains multiple cells). The parameter of your update() method should look like this: worksheet.update("Range", data in form of list of list).
Here is an example on how to use the update() method:
Using List:
Code:
data = [["It" , "works!"]]
worksheet.update("A1:B1", data)
Before:
After:
Using panda's dataframe.
Code:
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Age': [20, 19, 23]})
values = df.columns.values.tolist()
sh.update("A1:B1", [values])
Before:
After:
Based on how you used the update() method, you want to insert the column names
above the current data of worksheet. Instead of using update, you can use insert_rows()
Code:
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Age': [20, 19, 23]})
values = df.columns.values.tolist()
worksheet.insert_rows([values], row=1, value_input_option='RAW')
Before:
After:
References:
insert_rows
update
I ran into the same issue on a Jupyter notebook running on a server (ubuntu 18.04) while it works fine using Pycharm on my local machine (ubuntu 20.04) instead.
Meanwhile, here's how I push my pandas dataframe to a google spreadsheet:
import string
# create a spreadsheet range that matches the size of the df (including 1 row for the column names). It looks like that: 'A1:AA3'
letters = list(string.ascii_uppercase)
col_names_spreadsheet = letters+list(np.array([[X+x for x in letters] for X in letters]).flat)
range_for_df = col_names_spreadsheet[0]+"1"+":"+col_names_spreadsheet[df.shape[1]-1]+str(df.shape[0]+1)
# retrieve the matching cells
cell_list = worksheet.range(range_for_df)
# flatten the df, add the column names at the beginning
cell_values = list(df.columns)+list(df.values.flat)
# set the value of each cell
for i, val in enumerate(cell_values): #gives us a tuple of an index and value
cell_list[i].value = val #use the index on cell_list and the val from cell_values
# update the cells in bulk
worksheet.update_cells(cell_list)
if the df has dates it may return this error
Object of type date is not JSON serializable
In this case I use this
# turn all datetime columns into strings
import datetime
dt_cols = list(df.columns[[type(df[col].iloc[0]) is datetime.date for col in df.columns]])
for c in dt_cols:
df[c] = df[c].apply(lambda x: x.isoformat())
credit to this guy for the trick: Python/gspread - how can I update multiple cells with DIFFERENT VALUES at once?
!pip install --upgrade gspread
upgrade the gspread lib with above command.
you will be able to call the update method.
If you are getting any kind of Attribute Error ( assuming that you have used correct syntax and correct attributes for gspread )
Then that is because you are using gspread's old version 3.0.1, if you haven't used gspread before in google colab then this is the standard version that comes pre-installed. Just do
!pip install --upgrade gspread
At the time of writing this gspread gets upgraded to version 3.7.1 with above command.
Happy Coding!

Read multiple text files to 2D numpy array in Python

I have 10 txt files. Each of them with strings.
A.txt: "This is a cat"
B.txt: "This is a dog"
.
.
J.txt: "This is an ant"
I want to read these multiple files and put it in 2D array.
[['This', 'is', 'a', 'cat'],['This', 'is', 'a', 'dog']....['This', 'is', 'an', 'ant']]
from glob import glob
import numpy as np
for filename in glob('*.txt'):
with open(filename) as f:
data = np.genfromtxt(filename, dtype=str)
It's not working the way I want. Any help will be greatly appreciated.
You are just generating different numpy arrays for each text file and not saving any of them. How about add each file to a list like so and convert to numpy later?
data = []
for filename in glob('*.txt'):
with open(filename) as f:
data.append(f.read().split())
data = np.array(data)

How to load Only column names from csv file (Pandas)?

I have a large csv file and don't want to load it fully into my memory, I need to get only column names from this csv file. How to load it clearly?
try this:
pd.read_csv(file_name, nrows=1).columns.tolist()
If you pass nrows=0 to read_csv then it will only load the column row:
In[8]:
import pandas as pd
import io
t="""a,b,c,d
0,1,2,3"""
pd.read_csv(io.StringIO(t), nrows=0)
Out[8]:
Empty DataFrame
Columns: [a, b, c, d]
Index: []
After which accessing attribute .columns will give you the columns:
In[10]:
pd.read_csv(io.StringIO(t), nrows=0).columns
Out[10]: Index(['a', 'b', 'c', 'd'], dtype='object')

Resources