Python iterate over specific column in csv , and replacing values - python-3.x

First sorry for my english ;)
I have a problem regarding a csv file. The file contains a lot of col. with a lot of different features. I want to iterate over the col. host_location to get the entries of each row. For each String which contains ("London" or "london") i want to change the string into an binary. So if the string contains "London" or "london" the entry should be 1 , if not 0.
Im familiar with Java, but Python is new for me.
What i know so far with reference to this problem:
i cant change the csv file directly, i have to read it, change the value and write it back to a new file.
My method so far:
listings = io.read_csv('../data/playground/neu.csv')
def Change_into_Binaryy():
listings.loc[listings["host_location"] == ( "London" or
"london"),"host_location"] = 1
listings.to_csv("../data/playground/neu1.csv",index =False)
The code is from another question of stackoverflow, and im really not familiar with Python so far. The problem is that i can only use the equal operator and not something like contains in java.
As a result only the entries with the string "London" or "london" are changed to 1. But there are also entries like "London, Uk" that i want to change
In addition i don't know how i can change the remaining entries to 0 , because i don't know how i can combine the .loc with sth. like a if/else construct
I also tried another solution:
def Change_into_Binary():
for x in listings['host_location']:
if "London" or "london" in x:
x = 1
else:
x = 0
listings.to_csv("../data/playground/neu1.csv",index =False)
But also do not work. In this case the entries are not changed.
Thanks for you answers

from csv import DictReader, DictWriter
with open('infile.csv', 'r') as infile, open('outfile.csv', 'w') as outfile:
reader = DictReader(infile)
writer = DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
if row['host_location'].capitalize() == 'London':
row['host_location'] = 1
else:
row['host_location'] = 0
writer.writerow(row)

Related

python 3 - how to split a key in a dictionary in 2

This is my first post, so if I miss something, let me know.
I'm doing a CS50 beginner python course, and I'm stuck with a problem.
Long story short, the problem is to open a csv file, and it looks like this:
name,house
"Abbott, Hannah",Hufflepuff
"Bell, Katie",Gryffindor
.....
So I would love to put into a dictionary (which I did), but the problem now is that I supposed to split the "key" name in 2.
Here is my code, but it doesn't work:
before = []
....
with open(sys.argv[1]) as file:
reader = csv.reader(file)
for name, house in reader:
before.append({"name": name, "house": house})
# here i would love to split the key "name" in "last", "first"
for row in before[1:]:
last, first = name.split(", ")
Any advice?
Thank you in advance.
After you have the dictionary with complete name, you can split the name as below:
before = [{"name": "Abbott, Hannah", "house": "Hufflepuff"}]
# Before split
print(before)
for item in before:
# Go through each item in before dict and split the name
last, first = item["name"].split(', ')
# Add new keys for last and first name
item["last"] = last
item["first"] = first
# Remove the full name entry
item.pop("name")
# After split
print(before)
You can also do the split from the first pass, e.g. store directly the last and first instead of full name.

How to drop some parts of a text in a column

I know that this question must have been addressed but I don't seem to find the answer.
I have a column in my dataframe and I want to drop some parts of a string from a specified character. The string is 'WD-2020-04-115R:WD-2020-03-111'. I want everything gone starting from R such that I remain with WD-2020-04-115. For any string in my column without an R in it, I want to keep it
Try:
data_array = ['WD-2020-04-115R:WD-2020-03-111', 'WD-2020-05-10582', 'WD-2020-05-10575', 'WD-2020-05-10576','WD-2020-05-10574', 'WD-2020-05-10571R:WD-2020-03-10563', 'WD-2020-05-10577', 'WD-2020-04-10571R:WD-2020-03-10562']
for data in data_array:
t = data.find('R')
if t < 0:
dropped = data
else:
dropped = data[:t]
print(dropped)
#You can either print, append to an array or write to a file

How to assign number to each value in python

I am comparatively new to python and data science and I was working with a CSV file which looks something like:
value1, value2
value3
value4...
Thing is, I want to assign a unique number to each of these values in the csv file such that the unique number acts as the key and the item in the CSV acts as the value like in a dictionary.
I tried using pandas but if possible, I wanted to know how I can solve this without using any libraries.
The desired output should be something like this:
{
"value1": 1,
"value2": 2,
"value3": 3,
.
.
.
and so on..
}
Was just about to talk about pandas before I saw that you wanted to do it in vanilla Python. I'd do it with pandas personally, but here you go:
You can read in lines from a file, split them by delimiter (','), and then get your word tokens.
master_dict = {}
counter = 1
with open("your_csv.csv", "r") as f:
for line in f:
words = line.split(',') # you may or may not want to add a call to .strip() as well
for word in words:
master_dict[counter] = word
counter += 1

Using regex to find and delete data

Need to search through data and delete customer Social Security Numbers.
with open('customerdata.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
for row in customerdata.csv:
results = re.search(r'\d{3}-\d{2}-\d{4}', row)
re.replace(results, "", row)
print(results)
New to scripting and not sure what it is I need to do to fix this.
This is not a job for a regex.
You are using a csv.DictReader, which is awesome. This means you have access to the column names in your csv file. What you should do is make a note of the column that contains the SSN, then write out the row without it. Something like this (not tested):
with open('customerdata.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
del row['SSN']
print(row)
If you need to keep the data but blank it out, then something like:
with open('customerdata.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
row['SSN'] = ''
print(row)
Hopefully you can take things from here; for example, rather than printing, you might want to use a csv dict writer. Depends on your use case. Though, do stick with csv operations and definitely avoid regexes here. Your data is in csv format. Think about the data as rows and columns, not as individual strings to be regexed upon. :)
I'm not seeing a replace function for re in the Python 3.6.5 docs.
I believe the function you would want to use is re.sub:
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged.
This means that all you need in your second for loop is:
for row in customerdata.csv:
results = re.sub(r'\d{3}-\d{2}-\d{4}', row, '')
print(results)

How do i get my code to append to the end of a specific csv row

Here is my code:
import csv
with open("Grades.txt", "r") as file:
reader = csv.reader(file)
for row in reader:
if name == row[0]:
with open("Grades.txt", "a") as file:
writer = csv.writer(file)
writer.writerow(grade)
The variable name and grade have already been defined in an earlier function. I have a text file with a list of names so the code checks if the name(John) is in the text file and then is supposed to write the grade(A) next the name with a comma separating it. The problem is that my code will write the grade a space or 2 spaces below the entire list of names. If I can get it to write to the end of the name it would just be shown like (JohnA) with no separation. Im clueless about how to go about fixing this. I would appreciate if you could correct my code to do what I need it to. The variable name is an input from a login in a different function so the input is different every time. Also new names may be added through my sign up function so the similar question doesn't help.
for example say my text file looked like this:
John
Sam
Bob
And the grade Sam got was an A. How would I append the A grade to the end of Bobs name with a comma separating the name and the grade?
I don't see how this code example should do the job you describe. Sorry.
import csv
students = [["Anne", "A"], ["Emily", "B"]]
with open("grades.csv", "w", newline="") as f:
writer = csv.writer(f)
for row in students:
writer.writerow(row)
You must give a tupel or a list as a row to csv.writer. What you describe sounds that you write two times to that file, but I don't see that this is been done by your code as described.
I hope to help you a little bit. Sorry at the moment I can't comment...
New:
What I want to say is, that you should put the names and grades in your main program together and then write it to the file. This is how I would solve your task.
names = ["John", "Sam", "Bob"]
grades = ["A", "B", "C"]
names_grades = zip(names, grades)
for row in names_grades:
print(row)
The new row can be written easily to your file.

Resources