Python3 Extracting only emails from csv file - python-3.x

I have written a working script that extracts information from a .csv file. However, when extracted, it prints out all information instead of the emails when I wrote the code to specifically look for # symbols.
#!/bin/python3
import re
def print_csv():
in_file = open('sample._data.csv', 'rt')
for line in in_file:
if re.findall(r'(.*)#(.*).(.*)', line):
print(line)
print_csv()
Here's a sample of the output:
"Carlee","Boulter","Tippett, Troy M Ii","8284 Hart St","Abilene","Dickinson","KS",67410,"785-347-1805","785-253-7049","carlee.boulter#hotmail.com","http://www.tippetttroymii.com"
"Thaddeus","Ankeny","Atc Contracting","5 Washington St #1","Roseville","Placer","CA",95678,"916-920-3571","916-459-2433","tankeny#ankeny.org","http://www.atccontracting.com"
"Jovita","Oles","Pagano, Philip G Esq","8 S Haven St","Daytona Beach","Volusia","FL",32114,"386-248-4118","386-208-6976","joles#gmail.com","http://www.paganophilipgesq.com"
"Alesia","Hixenbaugh","Kwikprint","9 Front St","Washington","District of Columbia","DC",20001,"202-646-7516","202-276-6826","alesia_hixenbaugh#hixenbaugh.org","http://www.kwikprint.com"
"Lai","Harabedian","Buergi & Madden Scale","1933 Packer Ave #2","Novato","Marin","CA",94945,"415-423-3294","415-926-6089","lai#gmail.com","http://www.buergimaddenscale.com"
"Brittni","Gillaspie","Inner Label","67 Rv Cent","Boise","Ada","ID",83709,"208-709-1235","208-206-9848","bgillaspie#gillaspie.com","http://www.innerlabel.com"
"Raylene","Kampa","Hermar Inc","2 Sw Nyberg Rd","Elkhart","Elkhart","IN",46514,"574-499-1454","574-330-1884","rkampa#kampa.org","http://www.hermarinc.com"
"Flo","Bookamer","Simonton Howe & Schneider Pc","89992 E 15th St","Alliance","Box Butte","NE",69301,"308-726-2182","308-250-6987","flo.bookamer#cox.net","http://www.simontonhoweschneiderpc.com"
"Jani","Biddy","Warehouse Office & Paper Prod","61556 W 20th Ave","Seattle","King","WA",98104,"206-711-6498","206-395-6284","jbiddy#yahoo.com","http://www.warehouseofficepaperprod.com"
"Chauncey","Motley","Affiliated With Travelodge","63 E Aurora Dr","Orlando","Orange","FL",32804,"407-413-4842","407-557-8857","chauncey_motley#aol.com","http://www.affiliatedwithtravelodge.com"
What I'm trying to do is get the output to look like a list of emails. I have trouble with filtering out the other content from the csv file.

As mentioned aboce, you should be able to use the built in csv library. If the file is csv then it should have a structured format and even if it doesn't have column names, you should be able to pull it by column position. Per your sample data, you can get the correct column by position. Please check out the official Python docs
>>> import os
>>> import csv
>>> with open('sample._data.csv', newline='') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='"')
for row in reader:
print(row[10])
# output:
carlee.boulter#hotmail.com
tankeny#ankeny.org
joles#gmail.com
alesia_hixenbaugh#hixenbaugh.org
lai#gmail.com
bgillaspie#gillaspie.com
rkampa#kampa.org
flo.bookamer#cox.net
jbiddy#yahoo.com
chauncey_motley#aol.com

Related

How to get specific column value from .csv Python3?

I have a .csv file with Bitcoin price and market data, and I want to get the 5th and 7th columns from the last row in the file. I have worked out how to get the last row, but I'm not sure how to extract columns (values) 5 and 7 from it. Code:
with open('BTCAUD_data.csv', mode='r') as BTCAUD_data:
writer = csv.reader(BTCAUD_data, delimiter=',')
data = list(BTCAUD_data)[-1]
print(data)
Edit: How would I also add column names, and would adding them help me? (I have already manually put the names into individual columns in the first line of the file itself)
Edit #2: Forget about the column names, they are unimportant. I still don't have a working solution. I have a vague idea that I'm not actually reading the file as a list, but rather as a string. (This means when I subscript the data variable, I get a single digit, rather than an item in a list) Any hints to how I read the line as a list?
Edit #3: I have got everything working to expectations now, thanks for everyone's help :)
Your code never uses the csv-reader. You can do so like this:
import csv
# This creates a file with demo data
with open('BTCAUD_data.csv', 'w') as f:
f.write(','.join( f"header{u}" for u in range(10))+"\n")
for l in range(20):
f.write(','.join( f"line{l}_{c}" for c in range(10))+"\n")
# this reads and processes the demo data
with open('BTCAUD_data.csv', 'r', newline="") as BTCAUD_data:
reader = csv.reader(BTCAUD_data, delimiter=',')
# 1st line is header
header = next(reader)
# skip through the file, row will be the last line read
for row in reader:
pass
print(header)
print(row)
# each row is a list and you can index into it
print(header[4], header[7])
print(row[4], row[7])
Output:
['header0', 'header1', 'header2', 'header3', 'header4', 'header5', 'header6', 'header7', 'header8', 'header9']
['line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19']
header4 header7
line19_4 line19_7
Better use pandas for handling CSV file.
import pandas as pd
df=pd.read_csv('filename')
df.column_name will give the corresponding column
If you read this csv file into df and try df.Year will give you the Year column.

How to extract the contents of the mth column of the nth row from a csv file using python

I created a CSV file and was able to add headers for it. I tried using loc to extract the contents but to no avail.
I want to get e as an output or use it in code for something.
The code I've used is as follows:
import pandas as pd
import csv
with open("boo.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(('a','b', 'c'))
df = pd.read_csv("boo.csv", header=None)
df.to_csv("boo.csv", header=["alpha", "beta", "gamma"], index=False)
with open('boo.csv','a') as f:
writer=csv.writer(f)
writer.writerow(('c','d','e'))
writer.writerow(('f','g','h'))
print(df.loc[(df["alpha"]=='c')]["gamma"])
Upon running this code, I'm getting a KeyError for alpha. Please help with this. I'm pretty new to handling CSV files and pandas.
Thank you. :)

Python 3.7 Start reading from a specific point within a csv

Hey I could really use help here. I've tried for 1 hour to find a solution for python but was unable to find it.
I am using Python 3.7
My input is a file provided by a customer - I cannot change it. It is structured in the following way:
It starts with random text not in CSV format and from line 3 on the rest of the file is in csv format.
text line
text line
text line or nothing
Enter
[Start of csv file] "column Namee 1","column Namee 2" .. until 6
"value1","value2" ... until 6 - continuing for many lines.
I wanted to extract the first 3 lines to create a pure CSV file but was unable to find code to only do it for a specific line range. It also seems the wrong solution as I think starting to read from a certain point should be possible.
Then I thought split () is the solution but it did not work for this format. The values are sometimes numbers, dates or strings. You cannot use the seek() method as they start differently.
Right now my dictreader takes the first line as an index and consequently the rest is rendered in chaos.
import csv
import pandas as pd
from prettytable import PrettyTable
with open(r'C:\Users\Hans\Downloads\file.csv') as csvfile:
csv_reader = csv.DictReader (r'C:\Users\Hans\Downloads\file.csv', delimiter=',')
for lines in csvfile:
print (lines)
If some answer for python has been found please link it, I was not able to find it.
Thank you so much for your help. I really appreciate it.
I will insist with the pandas option, given that the documentation clearly states that the skiprows parameter allows to skip n number of lines. I tried it with the example provided by #Chris Doyle (saving it to a file named line_file.csv) and it works as expected.
import pandas as pd
f = pd.read_csv('line_file.csv', skiprows=3)
Output
name num symbol
0 chris 4 $
1 adam 7 &
2 david 5 %
If you know the number of lines you want to skip then just open the file and read that many lines then pass the filehandle to Dictreader and it will read the remaining lines.
import csv
skip_n_lines = 3
with open('test.dat') as my_file:
for _ in range(skip_n_lines):
print("skiping line:", my_file.readline(), end='')
print("###CSV DATA###")
csv_reader = csv.DictReader(my_file)
for row in csv_reader:
print(row)
FILE
this is junk
this is more junk
last junk
name,num,symbol
chris,4,$
adam,7,&
david,5,%
OUTPUT
skiping line: this is junk
skiping line: this is more junk
skiping line: last junk
###CSV DATA###
OrderedDict([('name', 'chris'), ('num', '4'), ('symbol', '$')])
OrderedDict([('name', 'adam'), ('num', '7'), ('symbol', '&')])
OrderedDict([('name', 'david'), ('num', '5'), ('symbol', '%')])

How to delete a previous row while reading a row of a csv file by python3

I want to clean-up 12,000 wiki pages from this wiki category. For that, i am having all the 12,000 wikipages in a csv file. When my code runs, it modifies the page, one by one. How can i delete a previous row while reading a (next) row of a CSV file by python3 ? If it is possible, it will be easy to share the remaining rows of the csv file to another wiki contributor. Otherwise, i should manually open the csv file to delete 'the completed rows'.
My code as simplified;-
import csv
import pywikibot
with open('0.csv', 'r') as csvfile:
reader = csv.reader(csvfile,delimiter="~")
for row in reader:
#if len(row) == 8:
wikiPage1 = row[0]
indexPages = row[5]
print (wikiPage1)
site = pywikibot.Site('ta', 'wiktionary')
page1 = pywikibot.Page(site, wikiPage1)
page1.text = page1.text.replace('Number','எண்')
page1.save(summary=''Number --> எண்')
I learned from two web pages page-1, page-2 of this site.
The below code do this target;-
#-*- coding: utf-8 -*-
#bringing the needed library modules
import csv, time, subprocess
#subprocess.call("sed -i `` 1d 0-123.csv",shell=True)
WAIT_TIME = 10
with open('0-123.csv', 'r') as csvfile:
reader = csv.reader(csvfile,delimiter="~")
for row in reader:
#removing the first line of the csv
subprocess.call("sed -i `` 1d 0-123.csv",shell=True)
wiktHeader1 = row[0]#.decode('utf-8')
print ()
print (wiktHeader1 + ' = படி-1: விக்சனரியின் தலைப்புச்சொல்.')
time.sleep(WAIT_TIME)

Extract numbers and text from csv file with Python3.X

I am trying to extract data from a csv file with python 3.6.
The data are both numbers and text (it's url addresses):
file_name = [-0.47, 39.63, http://example.com]
On multiple forums I found this kind of code:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines,)
But this works for numbers only, the url addresses are read as NaN.
If I add dtype:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines, dtype=None)
The url addresses are read correctly, but they got a "b" at the beginning of the address, such as:
b'http://example.com'
How can I remove that? How can I just have the simple string of text?
I also found this option:
file = open(file_path, "r")
csvReader = csv.reader(file)
for row in csvReader:
variable = row[i]
coordList.append(variable)
but it seems it has some issues with python3.

Resources