My Code:
import csv
import requests
url = 'https://xxxxxxxxxxxxxxxxxxx.csv'
r = requests.get(url)
text = r.iter_lines()
reader = csv.reader(text, delimiter=',')
for row in reader:
print(row)
I am getting the following error:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
Just save the csv separately and then read in with pandas:
import pandas
pd.read_csv('SO/AN_LATEST_ANNOUNCED.csv')
Result:
(To save the file separately just open the link in a browser.)
Related
I am reading data from .csv data and want to write parts of that data into an output file.
When I execute the program and print the results, I get the complete data set of the input file.
However, when I hit print() again, only the last line of the input file is shown.
When I write the print-result into another csv file, as well only the last line is transfered
Basically I am new at this and struggle to understand how data is stored in cache and passed on.
import csv
with open("path to the input file") as file:
reader = csv.reader(file)
for line in file:
input_data = line.strip().split(";")
print(input_data)
with open(os.path.join("path to the output file"), "w") as file1:
toFile = input_data
file1.write(str(toFile))
There is no error messages, just not the expected result. I expect 10 lines to be transferred, but only the last makes it to the output .csv
Thank you for your help!
When you loop over the lines in the csv each iteration you assign the value of that line to input data, overwriting the value previously stored in input_data.
I would recommend something like the following:
import csv
with open('path to input file', 'r') as input, open('path to output file', 'w') as output:
reader = csv.reader(input)
for line in reader:
ouput.write(line.strip().split(';'))
you can open multiple files in a single with clause like I showed in the example. Then for each line in the file you write the stripped and split string to the file.
This should do it. You created the reader object correctly but didn't use it. I hope my example will better your understanding of the reader class.
#!/usr/bin/env python
import csv
from os import linesep
def write_csv_to_file(csv_file_name, out_file_name):
# Open the csv-file
with open(csv_file_name, newline='') as csvfile:
# Create a reader object, pass it the csv-file and tell it how to
# split up values
reader = csv.reader(csvfile, delimiter=',', quotechar='|')
# Open the output file
with open(out_file_name, "w") as out_file:
# Loop through the rows that the reader found
for row in reader:
# Join the row values using a comma as separator
r = ', '.join(row)
# Print row and write to output file
print(r)
out_file.write(r + linesep)
if __name__ == '__main__':
csv_file = "example.csv"
out_file = "out.txt"
write_csv_to_file(csv_file, out_file)
I have CSV file with some links stored in one of the columns. I want to read only links and print them out. I tried to use following code but output is none.
import csv
filename ='abc.csv'
with open(filename,'rb') as f:
reader = csv.reader(f)
for row in reader:
for item in row:
if item.startswith('http'):
print(item)
import csv
with open ('abc.csv','r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
if line[0].startswith('http'):
print(line)
If you want to make sure that the line starts with for example "http" you should write:
line[0].startswith("http")because the first element of the line list will be a string.
I'm trying to generate records from a csv retrieved from a given url in Python 3.
Given urlopen returns a bytes-mode file object but csv.DictReader expects text-mode file objects, I've had to wrap the urlopen file object in a TextIOWrapper (with a utf-8 decoding). Now, unfortunately I'm stuck between two undesirable options:
1) TextIOWrapper doesn't support seek, so I can't reset the csv_file generator after checking for a header with Sniffer.
2) If I don't seek back to 0, I truncate the first 18 records.
How do I modify the below code so that it can both check for headers and yield all records off of one urlopen call?
What have I tried?
Reading the URL twice: once to check for headers, a second time to generate the csv records. This seems suboptimal.
Code that skips the first 18 records below. Uncomment line 12 to generate the seek error.
import csv
import io
import urllib.request
def read_url_csv(url, fieldnames=None, transform=None):
if transform is None:
transform = lambda x, filename, fieldnames: x
with urllib.request.urlopen(url) as csv_binary:
csv_file = io.TextIOWrapper(csv_binary, "utf-8")
has_header = csv.Sniffer().has_header(csv_file.read(1024))
#csv_file.seek(0)
reader = csv.DictReader(csv_file, fieldnames=fieldnames)
if has_header and fieldnames is not None: #Overwriting
next(reader)
for record in reader:
yield transform(record, url, fieldnames)
StringIO supports seeking.
csv_file = io.StringIO(io.TextIOWrapper(csv_binary, "utf-8").read())
has_header = csv.Sniffer().has_header(csv_file.read(1024))
csv_file.seek(0)
I'm new to python and I'm trying to import some URL's i scraped into a csv file but it is parsing every character in the web addresses into a difference cell. Here's my code:
import csv
with open('test.csv', 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Web Address'])
csv_write.writerows(filter_records)
If i put brackets around the filter_records variable, it just returns the entire list of URLs in a single cell
Any guidance would be great.
Thanks
Garrett
You can do something like this:
import csv
filter_records = ['www.google.com', 'www.stackoverflow.com', 'www.facebook.com']
with open('test.csv', 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Web Address'])
[csv_writer.writerow([record]) for record in filter_records]
or
import csv
filter_records = ['www.google.com', 'www.stackoverflow.com', 'www.facebook.com']
with open('test.csv', 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Web Address'])
csv_writer.writerows([record] for record in filter_records)
This happens because python a string is a list of chars. writerow() method receives a list as param, and writerows() receives a list of lists. So, you will get an comma-split string in every row.
#bug: this program should output url contents but instead outputs the urls themselves
import csv
import requests
import json
with open('PluginIndex.csv', newline='') as csvfile: #opens the plugin index file and stores in var "csvfile"
reader = csv.DictReader(csvfile) #reads the contents of csvfile and stores it
for row in reader: #for each row of text in the contents of csv
url = row["repo"]
print(url) #outputs the repo url of each csv row
resp = requests.get(url)
data = json.loads(resp.text)
data.keys()
I need the content of each url in a csv file to be outputted to the command prompt, however the requests and json libraries are not enough. Does anyone know a way to achieve this?