I'm able to import text as a string. I understand also read_csv.
with open('text.txt', 'r') as file:
text = file.read().replace('\n', '')
My question is if I data frame with many records, and I have the text location. How can bulk import text as strings to a new column?
Example data frame:
Filename,Text Path
File1,C:\Text\File1.txt
File2,C:\Text\File2.txt
File3,C:\Text\File3.txt
Example Result:
Filename,Text Path,Text
File1,C:\Text\File1.txt,This is some text.
File2,C:\Text\File2.txt,Other kinds of text.
File3,C:\Text\File3.txt,Even more text.
I'm not aware of any library that can do this directly. I think you need to step through each row of the dataframe and add the text to a new column. Assuming you are using pandas and your example dataframe is "df":
for i in range(len(df['Text Path'])):
with open(df.loc[i,'Text Path'], 'r') as file:
df.loc[i,'Text'] = file.read()
EDIT:
this could be a bit faster (apply a function to generate the new column):
def readtxt(f):
with open(f, 'r') as file:
return file.read()
df['Text'] = df['Text Path'].apply(readtxt)
Related
I wrote the following code to split my data matrix into a csv file:
f = open('midi_data.csv', 'w', newline="")
writer = csv.writer(f, delimiter= ',',quotechar =',',quoting=csv.QUOTE_MINIMAL)
for item in data:
writer.writerow(item)
print(item)
f.close()
But the csv file ends up looking like this:
tuples not separated by columns but by commas in one column only
What am I doing wrong?
The data seems to be written correctly inside the tuples, because when running the code it outputs the following:
enter image description here
I'm trying to convert a CSV file into Python list I have strings organize in columns. I need an Automation to turn them into a list.
my code works with Pandas, but I only see them again as simple text.
import pandas as pd
data = pd.read_csv("Random.csv", low_memory=False)
dicts = data.to_dict().values()
print(data)
so the final results should be something like that : ('Dan', 'Zac', 'David')
You can simply do this by using csv module in python
import csv
with open('random.csv', 'r') as f:
reader = csv.reader(f)
your_list = map(list, reader)
print your_list
You can also refer here
If you really want a list, try this:
import pandas as pd
data = pd.read_csv('Random.csv', low_memory=False, header=None).iloc[:,0].tolist()
This produces
['Dan', 'Zac', 'David']
If you want a tuple instead, just cast the list:
data = tuple(pd.read_csv('Random.csv', low_memory=False, header=None).iloc[:,0].tolist())
And this produces
('Dan', 'Zac', 'David')
I assumed that you use commas as separators in your csv and your file has no header. If this is not the case, just change the params of read_csv accordingly.
I am trying to extract data from a csv file with python 3.6.
The data are both numbers and text (it's url addresses):
file_name = [-0.47, 39.63, http://example.com]
On multiple forums I found this kind of code:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines,)
But this works for numbers only, the url addresses are read as NaN.
If I add dtype:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines, dtype=None)
The url addresses are read correctly, but they got a "b" at the beginning of the address, such as:
b'http://example.com'
How can I remove that? How can I just have the simple string of text?
I also found this option:
file = open(file_path, "r")
csvReader = csv.reader(file)
for row in csvReader:
variable = row[i]
coordList.append(variable)
but it seems it has some issues with python3.
I am trying read a CSV file into python 3 using unicodecsv library. Code follows :
with open('filename.csv', 'rb') as f:
reader = unicodecsv.DictReader(f)
Student_Data = list(reader)
But the order of the columns in the CSV file is not retained when I output any element from the Student_Data. The output contains any random order of the columns. Is there anything wrong with the code? How do I fix this?
As stated in csv.DictReader documentation, the DictReader object behaves like a dict - so it is not ordered.
You can obtain the list of the fieldnames with:
reader.fieldnames
But if you only want to obtain a list of the field values, in original order, you can just use a normal reader:
with open('filename.csv', 'rb') as f:
reader = unicodecsv.reader(f)
for row in reader:
Student_Data = row
I have the following problem:
I want to convert a tab delimited text file to a csv file. The text file is the SentiWS dictionary which I want to use for a sentiment analysis ( https://github.com/MechLabEngineering/Tatort-Analyzer-ME/tree/master/SentiWS_v1.8c ).
The code I used to do this is the following:
txt_file = r"SentiWS_v1.8c_Positive.txt"
csv_file = r"NewProcessedDoc.csv"
in_txt = csv.reader(open(txt_file, "r"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'w'))
out_csv.writerows(in_txt)
This code writes everything in one row but I need the data to be in three rows as normally intended from the file itself. There is also a blank line under each data and I don´t know why.
I want the data to be in this form:
Row1 Row2 Row3
Word Data Words
Word Data Words
instead of
Row1
Word,Data,Words
Word,Data,Words
Can anyone help me?
import pandas
It will convert tab delimiter text file into dataframe
dataframe = pandas.read_csv("SentiWS_v1.8c_Positive.txt",delimiter="\t")
Write dataframe into CSV
dataframe.to_csv("NewProcessedDoc.csv", encoding='utf-8', index=False)
Try this:
import csv
txt_file = r"SentiWS_v1.8c_Positive.txt"
csv_file = r"NewProcessedDoc.csv"
with open(txt_file, "r") as in_text:
in_reader = csv.reader(in_text, delimiter = '\t')
with open(csv_file, "w") as out_csv:
out_writer = csv.writer(out_csv, newline='')
for row in in_reader:
out_writer.writerow(row)
There is also a blank line under each data and I don´t know why.
You're probably using a file created or edited in a Windows-based text editor. According to the Python 3 csv module docs:
If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.