Python 3.7 Start reading from a specific point within a csv - python-3.x

Hey I could really use help here. I've tried for 1 hour to find a solution for python but was unable to find it.
I am using Python 3.7
My input is a file provided by a customer - I cannot change it. It is structured in the following way:
It starts with random text not in CSV format and from line 3 on the rest of the file is in csv format.
text line
text line
text line or nothing
Enter
[Start of csv file] "column Namee 1","column Namee 2" .. until 6
"value1","value2" ... until 6 - continuing for many lines.
I wanted to extract the first 3 lines to create a pure CSV file but was unable to find code to only do it for a specific line range. It also seems the wrong solution as I think starting to read from a certain point should be possible.
Then I thought split () is the solution but it did not work for this format. The values are sometimes numbers, dates or strings. You cannot use the seek() method as they start differently.
Right now my dictreader takes the first line as an index and consequently the rest is rendered in chaos.
import csv
import pandas as pd
from prettytable import PrettyTable
with open(r'C:\Users\Hans\Downloads\file.csv') as csvfile:
csv_reader = csv.DictReader (r'C:\Users\Hans\Downloads\file.csv', delimiter=',')
for lines in csvfile:
print (lines)
If some answer for python has been found please link it, I was not able to find it.
Thank you so much for your help. I really appreciate it.

I will insist with the pandas option, given that the documentation clearly states that the skiprows parameter allows to skip n number of lines. I tried it with the example provided by #Chris Doyle (saving it to a file named line_file.csv) and it works as expected.
import pandas as pd
f = pd.read_csv('line_file.csv', skiprows=3)
Output
name num symbol
0 chris 4 $
1 adam 7 &
2 david 5 %

If you know the number of lines you want to skip then just open the file and read that many lines then pass the filehandle to Dictreader and it will read the remaining lines.
import csv
skip_n_lines = 3
with open('test.dat') as my_file:
for _ in range(skip_n_lines):
print("skiping line:", my_file.readline(), end='')
print("###CSV DATA###")
csv_reader = csv.DictReader(my_file)
for row in csv_reader:
print(row)
FILE
this is junk
this is more junk
last junk
name,num,symbol
chris,4,$
adam,7,&
david,5,%
OUTPUT
skiping line: this is junk
skiping line: this is more junk
skiping line: last junk
###CSV DATA###
OrderedDict([('name', 'chris'), ('num', '4'), ('symbol', '$')])
OrderedDict([('name', 'adam'), ('num', '7'), ('symbol', '&')])
OrderedDict([('name', 'david'), ('num', '5'), ('symbol', '%')])

Related

How to get specific column value from .csv Python3?

I have a .csv file with Bitcoin price and market data, and I want to get the 5th and 7th columns from the last row in the file. I have worked out how to get the last row, but I'm not sure how to extract columns (values) 5 and 7 from it. Code:
with open('BTCAUD_data.csv', mode='r') as BTCAUD_data:
writer = csv.reader(BTCAUD_data, delimiter=',')
data = list(BTCAUD_data)[-1]
print(data)
Edit: How would I also add column names, and would adding them help me? (I have already manually put the names into individual columns in the first line of the file itself)
Edit #2: Forget about the column names, they are unimportant. I still don't have a working solution. I have a vague idea that I'm not actually reading the file as a list, but rather as a string. (This means when I subscript the data variable, I get a single digit, rather than an item in a list) Any hints to how I read the line as a list?
Edit #3: I have got everything working to expectations now, thanks for everyone's help :)
Your code never uses the csv-reader. You can do so like this:
import csv
# This creates a file with demo data
with open('BTCAUD_data.csv', 'w') as f:
f.write(','.join( f"header{u}" for u in range(10))+"\n")
for l in range(20):
f.write(','.join( f"line{l}_{c}" for c in range(10))+"\n")
# this reads and processes the demo data
with open('BTCAUD_data.csv', 'r', newline="") as BTCAUD_data:
reader = csv.reader(BTCAUD_data, delimiter=',')
# 1st line is header
header = next(reader)
# skip through the file, row will be the last line read
for row in reader:
pass
print(header)
print(row)
# each row is a list and you can index into it
print(header[4], header[7])
print(row[4], row[7])
Output:
['header0', 'header1', 'header2', 'header3', 'header4', 'header5', 'header6', 'header7', 'header8', 'header9']
['line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19']
header4 header7
line19_4 line19_7
Better use pandas for handling CSV file.
import pandas as pd
df=pd.read_csv('filename')
df.column_name will give the corresponding column
If you read this csv file into df and try df.Year will give you the Year column.

Python3 Extracting only emails from csv file

I have written a working script that extracts information from a .csv file. However, when extracted, it prints out all information instead of the emails when I wrote the code to specifically look for # symbols.
#!/bin/python3
import re
def print_csv():
in_file = open('sample._data.csv', 'rt')
for line in in_file:
if re.findall(r'(.*)#(.*).(.*)', line):
print(line)
print_csv()
Here's a sample of the output:
"Carlee","Boulter","Tippett, Troy M Ii","8284 Hart St","Abilene","Dickinson","KS",67410,"785-347-1805","785-253-7049","carlee.boulter#hotmail.com","http://www.tippetttroymii.com"
"Thaddeus","Ankeny","Atc Contracting","5 Washington St #1","Roseville","Placer","CA",95678,"916-920-3571","916-459-2433","tankeny#ankeny.org","http://www.atccontracting.com"
"Jovita","Oles","Pagano, Philip G Esq","8 S Haven St","Daytona Beach","Volusia","FL",32114,"386-248-4118","386-208-6976","joles#gmail.com","http://www.paganophilipgesq.com"
"Alesia","Hixenbaugh","Kwikprint","9 Front St","Washington","District of Columbia","DC",20001,"202-646-7516","202-276-6826","alesia_hixenbaugh#hixenbaugh.org","http://www.kwikprint.com"
"Lai","Harabedian","Buergi & Madden Scale","1933 Packer Ave #2","Novato","Marin","CA",94945,"415-423-3294","415-926-6089","lai#gmail.com","http://www.buergimaddenscale.com"
"Brittni","Gillaspie","Inner Label","67 Rv Cent","Boise","Ada","ID",83709,"208-709-1235","208-206-9848","bgillaspie#gillaspie.com","http://www.innerlabel.com"
"Raylene","Kampa","Hermar Inc","2 Sw Nyberg Rd","Elkhart","Elkhart","IN",46514,"574-499-1454","574-330-1884","rkampa#kampa.org","http://www.hermarinc.com"
"Flo","Bookamer","Simonton Howe & Schneider Pc","89992 E 15th St","Alliance","Box Butte","NE",69301,"308-726-2182","308-250-6987","flo.bookamer#cox.net","http://www.simontonhoweschneiderpc.com"
"Jani","Biddy","Warehouse Office & Paper Prod","61556 W 20th Ave","Seattle","King","WA",98104,"206-711-6498","206-395-6284","jbiddy#yahoo.com","http://www.warehouseofficepaperprod.com"
"Chauncey","Motley","Affiliated With Travelodge","63 E Aurora Dr","Orlando","Orange","FL",32804,"407-413-4842","407-557-8857","chauncey_motley#aol.com","http://www.affiliatedwithtravelodge.com"
What I'm trying to do is get the output to look like a list of emails. I have trouble with filtering out the other content from the csv file.
As mentioned aboce, you should be able to use the built in csv library. If the file is csv then it should have a structured format and even if it doesn't have column names, you should be able to pull it by column position. Per your sample data, you can get the correct column by position. Please check out the official Python docs
>>> import os
>>> import csv
>>> with open('sample._data.csv', newline='') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='"')
for row in reader:
print(row[10])
# output:
carlee.boulter#hotmail.com
tankeny#ankeny.org
joles#gmail.com
alesia_hixenbaugh#hixenbaugh.org
lai#gmail.com
bgillaspie#gillaspie.com
rkampa#kampa.org
flo.bookamer#cox.net
jbiddy#yahoo.com
chauncey_motley#aol.com

pandas.read_clipboard only reads whole lines not columns

I transferred all my python3 codes from macOS to Ubuntu 18.04 and in one program I need to use pandas.clipboard(). At this point of time there is a list in the clipboard with multiple lines and columns divided by tabs and each element in quotation marks.
After just trying
import pandas as pd
df = pd.read_clipboard()
I'm getting this error: pandas.errors.ParserError: Expected 8 fields in line 3, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.. And line 3 looks like "word1" "word2 and another" "word3" .... Without the quotation marks you count 11 elements and within quotation marks you count 8.
In the next step I tried
import pandas as pd
df = pd.read_clipboard(sep='\t')
and I'm getting no errors but it results only in a Series with each line of the clipboard source in one element.
Yes, maybe it's a solution to write a code for separating each element of a line after this step but because it's working very well under macOS (with just pd.read_clipboard()) I hope that there's a better solution.
Thank you for helping.
I wrote a "turnaround" for my question. It's not the exact solution but because I just need the elements of one column in an array I solved it like that:
import pyperclip
# read clipboard
cb = pyperclip.paste()
# lines in array
cb_arr = cb.splitlines()
column = []
for cb_line in cb_arr:
# words in array
cb_words = cb_line.split("\"")
# pick element of column 1
word = cb_words[1]
column.append(word)
# delete column name
column.pop(0)
print(column)
Maybe it helps someone else, too.

How to select a specific part in a .csv file on multiple lines? (Python3)

I'm making this example up for the sake of explaining the problem.
If I have a .csv file as follows:
alfa, bravo, Charlie, 1.31
Dragonball, manga, anime, 3.11
delta, Omega, cookie, 3.13
Dragonball, stan, lee, 1.13
How can I pick up the fourth part of each line which has "Dragonball" as the first part? If the list goes on further, and I do not know which lines have the "Dragonball" as the first part.
I have tried:
list = []
for line in file:
line = line.rstrip()
part = line.split(",")
if part[0] == Dragonball:
list.append(part[3])
Expected output:
list = [3.11, 1.13]
You can do it easily using pandas:
import pandas as pd
df = pd.read_csv("path to your csv file")
print(list(df[df[0]=='Dragonball'][3]))
Output:
[3.11, 1.13]

Iterate appending Python List output to rows in excel

As output of my python code I am getting the marks of Randy and Shaw everytime I run my program. I run this program couple of times every month for many years.
I am storing their marks in a list in python. but how do I save it following format? I am getting output in following format[Output in a row for two different persons]
import pandas
from openpyxl import load_workbook
#These lists I am getting from a very complicated code so just creating new lists here
L1=('7/6/2016', 24,24,13)
L2=('5/8/2016', 25,24,16)
L3=('7/6/2016', 21,16,19)
L4=('5/8/2016', 23,24,21)
L5=('4/11/2016', 13, 12,17)
print('Randy's grades')
print(L1)
print(L2)
print(L3)
print('Shaw's grades')
print(L4)
print(L5)
book = load_workbook('C:/Users/Desktop/Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl')
Output at run no 1:
For Randy
7/6/2016, 24,24,13
5/8/2016, 25,24,16
For Shaw
7/6/2016, 21,16,19
5/8/2016, 23,24,21
4/11/2016, 13, 12,17
Output at run no 2:
For Randy
7/8/2016, 24,24,13
5/9/2016, 25,24,16
For Shaw
7/8/2016, 21,16,19
5/9/2016, 23,24,21
I will have many such output runs for couple of years so I want to save the data by appending in the same document.
I am using OpenPyxl to open doc and I know I need to use append() operation but I am having hard time to save my list as row. I am new here. Please help me with Syntax!I understand the logic but difficulty with syntax!
Thank you!
Since you said that you are willing to use csv format, I will show a csv solution.
with open('FileToWriteTo.csv', 'w') as outFile:
outFile.write(','.join([str(item) for item in L1])) # Take everything in L1 and put commas between them then write to file
outFile.write('\n') # Write newline
outFile.write(','.join([str(item) for item in L2]))
outFile.write('\n')
outFile.write(','.join([str(item) for item in L3]))
outFile.write('\n')
outFile.write(','.join([str(item) for item in L4]))
outFile.write('\n')
outFile.write(','.join([str(item) for item in L5]))
outFile.write('\n')
If you keep a list of lists instead of separate lists, this becomes easier with a for loop:
with open('FileToWriteTo.csv', 'w') as outFile:
for row in listOfLists:
outFile.write(','.join([str(item) for item in row]))
outFile.write('\n')

Resources