I am trying to extract data from a csv file with python 3.6.
The data are both numbers and text (it's url addresses):
file_name = [-0.47, 39.63, http://example.com]
On multiple forums I found this kind of code:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines,)
But this works for numbers only, the url addresses are read as NaN.
If I add dtype:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines, dtype=None)
The url addresses are read correctly, but they got a "b" at the beginning of the address, such as:
b'http://example.com'
How can I remove that? How can I just have the simple string of text?
I also found this option:
file = open(file_path, "r")
csvReader = csv.reader(file)
for row in csvReader:
variable = row[i]
coordList.append(variable)
but it seems it has some issues with python3.
Related
I am trying to write a simple program that should give the following output when it reads csv file which contains several email ids.
email_id = ['emailid1#xyz.com','emailid2#xyz.com','emailid3#xyz.com'] #required format
but the problem is the output I got is like this following:
[['emailid1#xyz.com']]
[['emailid1#xyz.com'], ['emailid2#xyz.com']]
[['emailid1#xyz.com'], ['emailid2#xyz.com'], ['emailid3#xyz.com']] #getting this wrong format
here is my piece of code that I have written: Kindly suggest me the correction in the following piece of code which would give me the required format. Thanks in advance.
import csv
email_id = []
with open('contacts1.csv', 'r') as file:
reader = csv.reader(file, delimiter = ',')
for row in reader:
email_id.append(row)
print(email_id)
NB.: Note my csv contains only one column that has email ids and has no header. I also tried the email_id.extend(row) but It did not work also.
You need to move your print outside the loop:
with open('contacts1.csv', 'r') as file:
reader = csv.reader(file, delimiter = ',')
for row in reader:
email_id.append(row)
print(sum(email_id, []))
The loop can also be like this (if you only need one column from the csv):
for row in reader:
email_id.append(row[0])
print(email_id)
I'm able to import text as a string. I understand also read_csv.
with open('text.txt', 'r') as file:
text = file.read().replace('\n', '')
My question is if I data frame with many records, and I have the text location. How can bulk import text as strings to a new column?
Example data frame:
Filename,Text Path
File1,C:\Text\File1.txt
File2,C:\Text\File2.txt
File3,C:\Text\File3.txt
Example Result:
Filename,Text Path,Text
File1,C:\Text\File1.txt,This is some text.
File2,C:\Text\File2.txt,Other kinds of text.
File3,C:\Text\File3.txt,Even more text.
I'm not aware of any library that can do this directly. I think you need to step through each row of the dataframe and add the text to a new column. Assuming you are using pandas and your example dataframe is "df":
for i in range(len(df['Text Path'])):
with open(df.loc[i,'Text Path'], 'r') as file:
df.loc[i,'Text'] = file.read()
EDIT:
this could be a bit faster (apply a function to generate the new column):
def readtxt(f):
with open(f, 'r') as file:
return file.read()
df['Text'] = df['Text Path'].apply(readtxt)
I have a csv file which contains data in two columns, as follows:
40500 38921
43782 32768
55136 49651
63451 60669
50550 36700
61651 34321
and so on...
I want to convert each data into it's hex equivalent, then concatenate them, and write them into a column in another csv file.
For example: hex(40500) = 9E34, and hex(38921) = 9809.
So, in output csv file, element A1 would be 9E349809
So, i am expecting column A in output csv file to be:
9E349809
AB068000
D760C1F3
F7DBECFD
C5768F5C
F0D38611
I referred a sample code which concatenates two columns, but am struggling with the converting them to hex and then concatenating them. Following is the code:-
import csv
inputFile = 'input.csv'
outputFile = 'output.csv'
with open(inputFile) as f:
reader = csv.reader(f)
with open(outputFile, 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [''.join([row[0], row[1]])] + row[2:]
writer.writerow(new_row)
How can i convert data in each column to its hex equivalent, then concatenate them and write them in another file?
You could do this in 4 steps:
Read the lines from the input csv file
Use formatting options to get the hex values of each number
Perform string concatenation to get your result
Write to new csv file.
Sample Code:
with open (outputFile, 'w') as outfile:
with open (inputFile,'r') as infile:
for line in infile: # Iterate through each line
left, right = int(line.split()[0]), int(line.split()[1]) # split left and right blocks
newstr = '{:x}'.format(left)+'{:x}'.format(right) # create new string using hex values excluding '0x'
outfile.write(newstr) # write to output file
print ('Conversion completed')
print ('Closing outputfile')
Sample Output:
In[44] line = '40500 38921'
Out[50]: '9e349809'
ParvBanks solution is good (clear and functionnal), I would simplify it a little like that:
with open (inputFile,'r') as infile, open (outputFile, 'w+') as outfile:
for line in infile:
outfile.write("".join(["{:x}".format(int(v)) for v in line.split()]))
I am trying read a CSV file into python 3 using unicodecsv library. Code follows :
with open('filename.csv', 'rb') as f:
reader = unicodecsv.DictReader(f)
Student_Data = list(reader)
But the order of the columns in the CSV file is not retained when I output any element from the Student_Data. The output contains any random order of the columns. Is there anything wrong with the code? How do I fix this?
As stated in csv.DictReader documentation, the DictReader object behaves like a dict - so it is not ordered.
You can obtain the list of the fieldnames with:
reader.fieldnames
But if you only want to obtain a list of the field values, in original order, you can just use a normal reader:
with open('filename.csv', 'rb') as f:
reader = unicodecsv.reader(f)
for row in reader:
Student_Data = row
I have the following problem:
I want to convert a tab delimited text file to a csv file. The text file is the SentiWS dictionary which I want to use for a sentiment analysis ( https://github.com/MechLabEngineering/Tatort-Analyzer-ME/tree/master/SentiWS_v1.8c ).
The code I used to do this is the following:
txt_file = r"SentiWS_v1.8c_Positive.txt"
csv_file = r"NewProcessedDoc.csv"
in_txt = csv.reader(open(txt_file, "r"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'w'))
out_csv.writerows(in_txt)
This code writes everything in one row but I need the data to be in three rows as normally intended from the file itself. There is also a blank line under each data and I don´t know why.
I want the data to be in this form:
Row1 Row2 Row3
Word Data Words
Word Data Words
instead of
Row1
Word,Data,Words
Word,Data,Words
Can anyone help me?
import pandas
It will convert tab delimiter text file into dataframe
dataframe = pandas.read_csv("SentiWS_v1.8c_Positive.txt",delimiter="\t")
Write dataframe into CSV
dataframe.to_csv("NewProcessedDoc.csv", encoding='utf-8', index=False)
Try this:
import csv
txt_file = r"SentiWS_v1.8c_Positive.txt"
csv_file = r"NewProcessedDoc.csv"
with open(txt_file, "r") as in_text:
in_reader = csv.reader(in_text, delimiter = '\t')
with open(csv_file, "w") as out_csv:
out_writer = csv.writer(out_csv, newline='')
for row in in_reader:
out_writer.writerow(row)
There is also a blank line under each data and I don´t know why.
You're probably using a file created or edited in a Windows-based text editor. According to the Python 3 csv module docs:
If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.