How to convert a tab delimited text file to a csv file in Python - python-3.x

I have the following problem:
I want to convert a tab delimited text file to a csv file. The text file is the SentiWS dictionary which I want to use for a sentiment analysis ( https://github.com/MechLabEngineering/Tatort-Analyzer-ME/tree/master/SentiWS_v1.8c ).
The code I used to do this is the following:
txt_file = r"SentiWS_v1.8c_Positive.txt"
csv_file = r"NewProcessedDoc.csv"
in_txt = csv.reader(open(txt_file, "r"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'w'))
out_csv.writerows(in_txt)
This code writes everything in one row but I need the data to be in three rows as normally intended from the file itself. There is also a blank line under each data and I don´t know why.
I want the data to be in this form:
Row1 Row2 Row3
Word Data Words
Word Data Words
instead of
Row1
Word,Data,Words
Word,Data,Words
Can anyone help me?

import pandas
It will convert tab delimiter text file into dataframe
dataframe = pandas.read_csv("SentiWS_v1.8c_Positive.txt",delimiter="\t")
Write dataframe into CSV
dataframe.to_csv("NewProcessedDoc.csv", encoding='utf-8', index=False)

Try this:
import csv
txt_file = r"SentiWS_v1.8c_Positive.txt"
csv_file = r"NewProcessedDoc.csv"
with open(txt_file, "r") as in_text:
in_reader = csv.reader(in_text, delimiter = '\t')
with open(csv_file, "w") as out_csv:
out_writer = csv.writer(out_csv, newline='')
for row in in_reader:
out_writer.writerow(row)
There is also a blank line under each data and I don´t know why.
You're probably using a file created or edited in a Windows-based text editor. According to the Python 3 csv module docs:
If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

Related

How to Append List in Python by reading csv file

I am trying to write a simple program that should give the following output when it reads csv file which contains several email ids.
email_id = ['emailid1#xyz.com','emailid2#xyz.com','emailid3#xyz.com'] #required format
but the problem is the output I got is like this following:
[['emailid1#xyz.com']]
[['emailid1#xyz.com'], ['emailid2#xyz.com']]
[['emailid1#xyz.com'], ['emailid2#xyz.com'], ['emailid3#xyz.com']] #getting this wrong format
here is my piece of code that I have written: Kindly suggest me the correction in the following piece of code which would give me the required format. Thanks in advance.
import csv
email_id = []
with open('contacts1.csv', 'r') as file:
reader = csv.reader(file, delimiter = ',')
for row in reader:
email_id.append(row)
print(email_id)
NB.: Note my csv contains only one column that has email ids and has no header. I also tried the email_id.extend(row) but It did not work also.
You need to move your print outside the loop:
with open('contacts1.csv', 'r') as file:
reader = csv.reader(file, delimiter = ',')
for row in reader:
email_id.append(row)
print(sum(email_id, []))
The loop can also be like this (if you only need one column from the csv):
for row in reader:
email_id.append(row[0])
print(email_id)

Jupyter Notebooks Python - Import text from text path as string

I'm able to import text as a string. I understand also read_csv.
with open('text.txt', 'r') as file:
text = file.read().replace('\n', '')
My question is if I data frame with many records, and I have the text location. How can bulk import text as strings to a new column?
Example data frame:
Filename,Text Path
File1,C:\Text\File1.txt
File2,C:\Text\File2.txt
File3,C:\Text\File3.txt
Example Result:
Filename,Text Path,Text
File1,C:\Text\File1.txt,This is some text.
File2,C:\Text\File2.txt,Other kinds of text.
File3,C:\Text\File3.txt,Even more text.
I'm not aware of any library that can do this directly. I think you need to step through each row of the dataframe and add the text to a new column. Assuming you are using pandas and your example dataframe is "df":
for i in range(len(df['Text Path'])):
with open(df.loc[i,'Text Path'], 'r') as file:
df.loc[i,'Text'] = file.read()
EDIT:
this could be a bit faster (apply a function to generate the new column):
def readtxt(f):
with open(f, 'r') as file:
return file.read()
df['Text'] = df['Text Path'].apply(readtxt)

How to remove strings like los30_9_ from the second column of data file?

I have data like the following in a file.
seq
AB los30_9_AAACCTGAGATGTGGC
CGD los28_6_AAACCTGCAGCTTCGG
CGD los28_3_AAACCTGCATAGTAAG
CRG mgj28_3_AAACCTGCATATACGC
CGD lkgd28_11_AAACCTGGTCTTCTCG
CRG lkgd28_3_AAACCTGTCAGTTGAC
AB lkgd35_5_AAACCTGTCTGGTATG
CD los30_9_AAACGGGCAACCGCCA
CD lkgd_8_AAACGGGGTTACCAGT**
How can I remove los30_9_, los28_6, los28_3_, mgj28_3_, lkgd28_11_, lkgd28_3_, lkgd28_3_, lkgd35_5_, los30_9_, lkgd_8_ from the second column of a CSV file?
This Python 3 solution will do, as well as it will respect multiline fields on the csv file.
#!/usr/local/bin/python3
import csv
import re
csvr = csv.reader(open('input.csv'), delimiter = "\t")
next(csvr, None)
for row in csvr:
row[1] = re.sub(r'[a-z0-9]+_[a-z0-9]+_', '', row[1])
print("{}\t{}".format(row[0],row[1]))
Note : Please correct your csv so it contains tabs instead of spaces for this to work. You can open it in Excel and "Save as...".

Convert and concatenate data from two columns of a csv file

I have a csv file which contains data in two columns, as follows:
40500 38921
43782 32768
55136 49651
63451 60669
50550 36700
61651 34321
and so on...
I want to convert each data into it's hex equivalent, then concatenate them, and write them into a column in another csv file.
For example: hex(40500) = 9E34, and hex(38921) = 9809.
So, in output csv file, element A1 would be 9E349809
So, i am expecting column A in output csv file to be:
9E349809
AB068000
D760C1F3
F7DBECFD
C5768F5C
F0D38611
I referred a sample code which concatenates two columns, but am struggling with the converting them to hex and then concatenating them. Following is the code:-
import csv
inputFile = 'input.csv'
outputFile = 'output.csv'
with open(inputFile) as f:
reader = csv.reader(f)
with open(outputFile, 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [''.join([row[0], row[1]])] + row[2:]
writer.writerow(new_row)
How can i convert data in each column to its hex equivalent, then concatenate them and write them in another file?
You could do this in 4 steps:
Read the lines from the input csv file
Use formatting options to get the hex values of each number
Perform string concatenation to get your result
Write to new csv file.
Sample Code:
with open (outputFile, 'w') as outfile:
with open (inputFile,'r') as infile:
for line in infile: # Iterate through each line
left, right = int(line.split()[0]), int(line.split()[1]) # split left and right blocks
newstr = '{:x}'.format(left)+'{:x}'.format(right) # create new string using hex values excluding '0x'
outfile.write(newstr) # write to output file
print ('Conversion completed')
print ('Closing outputfile')
Sample Output:
In[44] line = '40500 38921'
Out[50]: '9e349809'
ParvBanks solution is good (clear and functionnal), I would simplify it a little like that:
with open (inputFile,'r') as infile, open (outputFile, 'w+') as outfile:
for line in infile:
outfile.write("".join(["{:x}".format(int(v)) for v in line.split()]))

How to remove a line break from CSV output

I am writing this code to separate information which will be uploaded to a database using the resulting CSV file from the code I wrote. I have it so that if I receive a spreadsheet with First, Middle, and Last name all in the same column they can be split into three separated columns. However my output file has some extra line breaks or returns or something which I just went through in the CSV and deleted manually to get the data uploaded for now. How can I remove these within my code? I have some ideas but none seem to work. I tried using line.replace but I do not fully understand how that is supposed to work so it failed.
My code:
import csv
with open('c:\\users\\cmobley\\desktop\\split for crm check.csv', "r") as readfile:
name_split = []
for line in readfile:
whitespace_split = line.split(" ")
remove_returns = (line.replace('/n', "") for line in whitespace_split)
name_split.append(remove_returns)
print (name_split)
with open ('c:\\users\cmobley\\desktop\\testblank.csv', 'w', newline = '\n') as csvfile:
writer = csv.writer(csvfile, delimiter = ',',
quotechar = '"', quoting = csv.QUOTE_MINIMAL)
writer.writerows(name_split)
Thanks for any help that can be provided! I am still trying to learn Python.
You have a forward-slash rather than a backward-slash needed for escape sequences.
Change to:
remove_returns = (line.replace('\n', "") for line in whitespace_split)

Resources