I'm having some troubles processing some input.
I am reading data from a log file and store the different values according to the name.
So my input string consists of ip, name, time and a data value.
A log line looks like this and it has \t spacing:
134.51.239.54 Steven 2015-01-01 06:09:01 5423
I'm reading in the values using this code:
loglines = file.splitlines()
data_fields = loglines[0] # IP NAME DATE DATA
for loglines in loglines[1:]:
items = loglines.split("\t")
ip = items[0]
name = items[1]
date = items[2]
data = items[3]
This works quite well but I need to extract all names to a list but I haven't found a functioning solution.
When i use print name i get:
Steven
Max
Paul
I do need a list of the names like this:
['Steven', 'Max', 'Paul',...]
There is probably a simple solution and i haven't figured it out yet, but can anybody help?
Thanks
Just create an empty list and add the names as you loop through the file.
Also note that if that file is very large, file.splitlines() is probably not the best idea, as it reads the entire file into memory -- and then you basically copy all of that by doing loglines[1:]. Better use the file object itself as an iterator. And don't use file as a variable name, as it shadows the type.
with open("some_file.log") as the_file:
data_fields = next(the_file) # consumes first line
all_the_names = [] # this will hold the names
for line in the_file: # loops over the rest
items = line.split("\t")
ip, name, date, data = items # you can put all this in one line
all_the_names.append(name) # add the name to the list of names
Alternatively, you could use zip and map to put it all into one expression (using that loglines data), but you rather shouldn't do that... zip(*map(lambda s: s.split('\t'), loglines[1:]))[1]
Related
This is my first post, so if I miss something, let me know.
I'm doing a CS50 beginner python course, and I'm stuck with a problem.
Long story short, the problem is to open a csv file, and it looks like this:
name,house
"Abbott, Hannah",Hufflepuff
"Bell, Katie",Gryffindor
.....
So I would love to put into a dictionary (which I did), but the problem now is that I supposed to split the "key" name in 2.
Here is my code, but it doesn't work:
before = []
....
with open(sys.argv[1]) as file:
reader = csv.reader(file)
for name, house in reader:
before.append({"name": name, "house": house})
# here i would love to split the key "name" in "last", "first"
for row in before[1:]:
last, first = name.split(", ")
Any advice?
Thank you in advance.
After you have the dictionary with complete name, you can split the name as below:
before = [{"name": "Abbott, Hannah", "house": "Hufflepuff"}]
# Before split
print(before)
for item in before:
# Go through each item in before dict and split the name
last, first = item["name"].split(', ')
# Add new keys for last and first name
item["last"] = last
item["first"] = first
# Remove the full name entry
item.pop("name")
# After split
print(before)
You can also do the split from the first pass, e.g. store directly the last and first instead of full name.
I am reading in a csv file and then trying to separate the header from the rest of the file.
hn variable is is the read-in file without the first line.
hn_header is supposed to be the first row in the dataset.
If I define just one of these two variables, the code works. If I define both of them, then the one written later does not contain any data. How is that possible?
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)[1:] #this should contain all rows except the header
hn_header = list(read_file)[0] # this should be the header
print(hn[:5]) #works
print(len(hn_header)) #empty list, does not contain the header
The CSV reader can only iterate through the file once, which it does the first time you convert it to a list. To avoid needing to iterate through multiple times, you can save the list to a variable.
hn_list = list(read_file)
hn = hn_list[1:]
hn_header = hn_list[0]
Or you can split up the file using extended iterable unpacking
hn_header, *hn = list(read_file)
Just change below line in your code, no additional steps needed. read_file = list(reader(opened_file)). I hope now your code is running perfectly.
The reader object is an iterator, and by definition iterator objects can only be used once. When they're done iterating you don't get any more out of them.
You can refer more about from this Why can I only use a reader object once? question and also above block-quote taken from that question.
I'm having trouble figuring this out. Basically I have a .csv file that has 7 employees with their first and last names, employee ID, dept #, and job title. My goal is for def readFile(employees) to accept an empty List (called employees), open the file for reading, and load all the employees from the file into a List of employee objects (employees). I already have my class built as:
class Employee:
def __init__(self, fname, lname, eid, dept, title):
self.__firstName = fname
self.__lastName = lname
self.__employeeID = int(eid)
self.__department = int(dept)
self.__title = title
I have a couple other class methods, but basically I don't quite understand how to properly load the file into a list of objects.
I was able to figure this out. I opened the file and then read a line from it, stripping the \n and splitting my data. I used a while loop to keep reading lines, as long as it wasn't an empty line, and appended it to my empty list. I also had to split the first indexed item as it was first and last name together in the same string and I needed them separate.
def readFile(employees):
with open("employees.csv", "r") as f:
line = f.readline().strip().split(",")
while line != ['']:
line = line[0].split(" ") + line[1:]
employees.append(Employee(line[0], line[1], line[2], line[3], line[4]))
line = f.readline().strip().split(",")
It most likely could be written better and more pythonic but it does what I need it to do.
Why don’t use pandas. So you define an employee pandas object and use their index for select each employee and the name of each column for select an specific employee attribute.
I have a list with product orders and I need to pick the number from the order order_id:TheNumberIwant. I have tried some stuff but none do the trick. I can't Access the number because can change the location in the list, but always comes after the order_id:. I have tried using the split method but only pick one of the order_idand I need to pick all of then.
here is what I'm doing:
i have this string
{"data":[{"order_id":744152,"pedido_venda":"Z921211","supplier_id":11042,.....
with open("items.txt","r") as file:
data = file.readlines()
for line in data:
word = line.split("order_id:" )
abre_arquivo1 = open("items2.txt","w")
abre_arquivo1.write("%s\n" % word)
abre_arquivo1.close()
This removes the order_id but i want the number that comes after in the string to save in the "items2.txt".
I am having a little trouble finding an efficient way to compare two files in order to create a third file.
I'm using Python 3.6
The first file is a list of IP addresses that I want to delete. The second file contains all of the DNS records associated with that IP address targeted for deletion.
If I find DNS record in the second file, I want to add the entire line to a third file.
This is sample of file 1:
IP
10.10.10.234
10.34.76.4
This is sample of file 2:
DNS Record Type,DNS Record,DNS Response,View
PTR,10.10.10.234,testing.example.com,internal
A,testing.example.com,10.10.10.234,internal
A,dns.google.com,8.8.8.8,external
This is what I'm trying to do. It is accurate, however it is taking forever. There are ~2 million lines in file 2 and 150K lines in file 1.
def create_final_stale_ip_file():
PD = set()
with open(stale_file) as f1:
reader1 = csv.DictReader(f1)
for row1 in reader1:
with open(prod_dns) as f2:
reader2 = csv.DictReader(f2)
for row2 in reader2:
if row2['DNS Record Type'] == 'A':
if row1['IP'] == row2['DNS Response']:
PD.update([row2['View']+'del,'+row2['DNS Record Type']+','+row2['DNS Record']+','+row2['DNS Response']])
if row2['DNS Record Type'] == 'PTR':
if row1['IP'] == row2['DNS Record']:
PD.update([row2['View']+'del,'+row2['DNS Record Type']+','+row2['DNS Response']+','+row2['DNS Record']])
o1 = open(delete_file,'a')
for i in PD:
o1.write(i+'\n')
o1.close()
Thanks in advance!
You should read the whole IP file into a set first, and then check if the IPs on the second file are found in that set, since checking if an element exists in a set is very fast:
def create_final_stale_ip_file():
PD = set()
# It's much prettier and easier to manage the strings in one place
# and without using the + operator. Read about `str.format()`
# to understand how these work. They will be used later in the code
A_string = '{View}del,{DNS Record Type},{DNS Record},{DNS Response}'
PTR_string = '{View}del,{DNS Record Type},{DNS Response},{DNS Record}'
# We can open and create readers for both files at once
with open(stale_file) as f1, open(prod_dns) as f2:
reader1, reader2 = csv.DictReader(f1), csv.DictReader(f2)
# Read all IPs into a python set, they're fast!
ips = {row['IP'] for row in reader1}
# Now go through every line and simply check if the IP
# exists in the `ips` set we created above
for row in reader2:
if (row['DNS Record Type'] == 'A'
and row['DNS Response'] in ips):
PD.add(A_string.format(**row))
elif (row['DNS Record Type'] == 'PTR'
and row2['DNS Record'] in ips):
PD.add(PTR_string.format(**row))
# Finally, write all the lines to the file using `writelines()`.
# Also, it's always better to use `with open()`
with open(delete_file, 'a') as f:
f.writelines(PD)
As you see, I also changed some minor stuff, like:
write to a file using writelines()
open the last file using with open() for safety
we're only adding one element to our set, so use PD.add() instead of PD.update()
use Python's awesome str.format() to create much cleaner string formatting
Last but not least, I would actually split this into multiple functions, one for reading the files, one for going through the read dictionaries, etc, and each function taking the proper arguments instead of using global variable names like stale_file and prod_dns as you seem to be using. But that's up to you.
You can do it using grep very easily:
grep -xf file1 file2
This will give you a file with the lines of file2 which match lines in file1. From there it should be much easier to manipulate the text to the final form you need.