Writing lists of values to a csv file in different columns using Python - excel

I need help with writing values to a csv file.
I have 4 lists of values that I would like to write to a csv file, but not in a normal way. I mean, usually the csv module write the values in the same row, but this time I would like to write the values of the lists in different columns, I mean one column and different rows for every list. In this way, all the list 1 data would be in the column A of Excel, all the list 2 data would be in the column B of excel, and so on. Now I was trying a lot of commands and I half did it, but not at all.
My list's names are: It_5minute, Iiso_5min, IHDKR_5min and Iperez_5min.
My actual commands:
with open('Test.csv', 'w') as f:
w = csv.writer(f)
for row in zip(It_5minute, Iiso_5min, IHDKR_5min,Iperez_5min):
w.writerow(row)
With these commands I get the list values in the same column (instead of every list in a different column), each value separated by comma. I have attached an Excel image to clarify the problem. I want each list in a separated column, to be able of do operations with the data in an easy way. Can anybody help me? Thank you very much.
PD: Would be nice to write the name of each list at the top of every column, too.

Just for the fact change with open('Test.csv', 'w') as f: to with open('Test.csv', 'wb') as f: as csv's are binary.
State the delimiter to use clearly (in this case a comma) and whether or not to use quoting just in case (optional)
with open('Test.csv', 'wb') as f:
w = csv.writer(f,delimiter=',', quoting = csv.QUOTE_ALL) #you can replace the delimiter for whatever that suits you
for row in zip(It_5minute, Iiso_5min, IHDKR_5min,Iperez_5min):
w.writerow(row)
In case this doesn't work, you have to manually state the delimiter in the excel text import wizard. You can read how to here
Common Delimiters:
Tab = '\t'
semicolon = ';'
comma = ','
space = ' '
eg: comma selected as the delimiter

Related

Processing TSV Files in Lua

I have a very very large TSV file. The first line is headers. The following lines contain data followed by tabs or double-tabs if a field was blank otherwise the fields can contain alphanumerics or alphanumerics plus punctuation marks.
for example:
Field1<tab>Field2<tab>FieldN<newline>
The fields may contain spaces, punctuation or alphanumerics. The only thing(s) that remains true are:
each field is followed by a tab except the last one
the last field is followed by a newline
blank fields are filled with a tab. Like all other fields they are followed by a tab. This makes them double-tab.
I've tried many combinations of pattern matching in lua and never get it quite right. Typically the fields with punctuation (time and date fields) are the ones that get me.
I need the blank fields (the ones with double-tab) preserved so that the rest of the fields are always at the same index value.
Thanks in Advance!
Try the code below:
function test(s)
local n=0
s=s..'\t'
for w in s:gmatch("(.-)\t") do
n=n+1
print(n,"["..w.."]")
end
end
test("10\t20\t30\t\t50")
test("100\t200\t300\t\t500\t")
It adds a tab to the end of the string so that all fields are follow by a tab, even the last one.
Rows and columns are separated:
local filename = "big_tables.tsv" -- tab separated values
-- local filename = "big_tables.csv" -- comma separated values
local lines = io.lines(filename) -- open file as lines
local tables = {} -- new table with columns and rows as tables[n_column][n_row]=value
for line in lines do -- row iterator
local i = 1 -- first column
for value in (string.gmatch(line, "[^%s]+")) do -- tab separated values
-- for value in (string.gmatch(line, '%d[%d.]*')) do -- comma separated values
tables[i]=tables[i]or{} -- if not column then create new one
tables[i][#tables[i]+1]=tonumber(value) -- adding row value
i=i+1 -- column iterator
end
end

Converting Excel file to csv using to_csv, removes leading zeros even when cells are formatted to be string

When I try to convert my excel file to csv, using to_csv function, all my item number that has 1 leading 0, loses it except for the very first row.
I have a simple forloop that iterates through all cells and converts cell values to string so I have no idea why only first row gets converted to csv format correctly with the leading 0.
for row in ws.iter_rows():
for cell in row:
cell.value = str(cell.value)
pd.read_excel('example.xlsx').to_csv('result.csv', index=False, line_terminator = ',\n')
e.g.
https://i.stack.imgur.com/Njb3n.png (won't let me directly add image but it shows the following in excel)
0100,03/21/2019,4:00,6:00
0101,03/21/2019,4:00,6:00
0102,03/21/2019,4:00,8:00
turns into:
0100,03/21/2019,4:00,6:00,
101,03/21/2019,4:00,6:00,
102,03/21/2019,4:00,8:00,
What can I do to have 0 in front of all the first items in csv?
Any insight would be appreciated.
So if you have not header in excel file: the name by default of columns is 0,1,... and so on
if you want to keep the zero at column 0 for example, just do:
pd.read_excel('example.xlsx', header=None, dtype={0:str})\
.to_csv('result.csv', index=False, line_terminator = ',\n'
if you havent header and you dont precise header=None, the first row is the header. dtype={0:str} indicates the column 0 will be str.
be carefull when you save the excel file to csv, the header is saved (here with your options), the first row will be 0,1,.. (name of columns)
if you dont want header to csv file use:
pd.read_excel('e:/test.xlsx', header=None, dtype={0:str})\
.to_csv('e:/result.csv', index=False, header=False, line_terminator = ',\n')

How to select a column from a text file which has no header using python

I have a text file which is tabulated. When I open the file in python using pandas, it shows me that the file contains only one column but there are many columns in it. I've tried using pd.DataFrames, sep= '\s*', sep= '\t', but I can't select the column since there is only one column. I've even tried specifying the header but the header moves to the exterior right side and specifies the whole file as one column only. I've also tried .loc method and mentioned specific column number but it always returns rows. I want to select the first column (A, A), third column (HIS, PRO) and fourth column (0, 0).
I want to get the above mentioned specific columns and print it in a CSV file.
Here is the code I have used along with some file components.
1) After opening the file using pd:
[599 rows x 1 columns]
2) The file format:
pdb_id: 1IHV
0 radii_filename: MD_threshold: 4
1 A 20 HIS 0 MaximumDistance
2 A 21 PRO 0 MaximumDistance
3 A 22 THR 0 MaximumDistance
Any help will be highly appreciated.
3) code:
import pandas as pd
df= pd.read_table("file_path.txt", sep= '\t')
U= df.loc[:][2:4]
Any help will be highly appreciated.
If anybody gets any file like this, it can be opened and the column can be selected using the following codes:
f=open('file.txt',"r")
lines=f.readlines()
result=[]
for x in lines:
result.append(x.split(' ')[range])
for w in result:
s='\t'.join(w)
print(s)
Where range is the column you want to select.

Python Pandas check cells for a range of numbers copy or skip if not there

I would use pandas isin or iloc functions but the excel format is complex and there are sometimes data followed by cols of no info, and the main pool of entries are cols with up to 3 pieces of data in a cell with only a '|' to separate them. Some of the cells are missing a number and I want to skip those but copy the ones with them.
Above is my current code. I have a giant excel with thousands of entries and worse, the column/rows are not neat. There are several pieces of data in each column cell per row. What I've noticed is that a number called 'tail #' is missing in some of them. What I want to do is search for that number, if it has it then copy that cell, if it does not then go to the next column in the row. Then repeat that for all cells. There is a giant header, but when I transformed it into CSV, I removed that with formatting. This is also why I am looking for a number because there are several headers. for example, years that say like 2010 but then several empty columns till the next one maybe 10 cols later. Also please not that under this header of years are several columns of data per row that are separated by two columns with no info. Also, the info in a column looks like this, '13|something something|some more words'. If it has a number as you see, I want to copy it. The numbers seem to range from 0 to no greater than 30. Lastly, I'm trying to write this using pandas but I may need a more manual way to do things because using isin, and iloc was not working.
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import os.path as op
from openpyxl import workbook
import re
def extract_export_columns(df, list_of_columns, file_path):
column_df = df[list_of_columns]
column_df.to_csv(file_path, index=False, sep="|")
#Orrginal file
input_base_path = 'C:/Users/somedoc input'
main_df_data_file = pd.read_csv(op.join (input_base_path, 'som_excel_doc.csv '))
#Filter for tailnumbers
tail_numbers = main_df_data_file['abcde'] <= 30
main_df_data_file[tail_abcd]
#iterate over list
#number_filter = main_df_data_file.Updated.isin(["15"])
#main_df_data_file[number_filter]
#print(number_filter)
#for row in main_df_data_file.values:
#for value in row:
# print(value)
#print(row)
# to check the condition
# Product of code
output_base_path = r'C:\Users\some_doc output'
extract_export_columns(main_df_data_file,
['Updated 28 Feb 18 Tail #'],
op.join(output_base_path, 'UBC_example3.txt'))
The code I have loads into csv, and successfully creates a text file. I want to build the body function to scan an excel/csv file to copy and paste to a text file data that contains a number.
https://drive.google.com/file/d/1stXxgqBeo_sGksVYL9HHdn2IflFL_bb8/view?usp=sharing

Openpyxl to check for keywords, then modify next to cells to contain those keywords and total found

I'm using python 3.x and openpyxl to parse an excel .xlsx file.
For each row, I check a column (C) to see if any of those keywords match.
If so, I add them to a separate list variable and also determine how many keywords were matched.
I then want to add the actual keywords into the next cell, and the total of keywords into the cell after. This is where I am having trouble, actually writing the results.
contents of the keywords.txt and results.xlsx file
here
import openpyxl
# Here I read a keywords.txt file and input them into a keywords variable
# I throwaway the first line to prevent a mismatch due to the unicode BOM
with open("keywords.txt") as f:
f.readline()
keywords = [line.rstrip("\n") for line in f]
# Load the workbook
wb = openpyxl.load_workbook("results.xlsx")
ws = wb.get_sheet_by_name("Sheet")
# Iterate through every row, only looking in column C for the keyword match.
for row in ws.iter_rows("C{}:E{}".format(ws.min_row, ws.max_row)):
# if there's a match, add to the keywords_found list
keywords_found = [key for key in keywords if key in row[0].value]
# if any keywords found, enter the keywords in column D
# and how many keywords into column E
if len(keywords_found):
row[1].value = keywords_found
row[2].value = len(keywords_found)
Now, I understand where I'm going wrong, in that ws.iter_rows(..) returns a tuple, which can't be modified. I figure I could two for loops, one for each row, and another for the columns in each row, but this test is a small example of a real-world scenario, where the amount of rows are in the tens of thousands.
I'm not quite sure which is the best way to go about this. Thankyou in advance for any help that you can provide.
Use the ws['C'] and then the offset() method of the relevant cell.
Thanks Charlie for the offset() tip. I modified the code slightly and now it works a treat.
for row in ws.iter_rows("C{}:C{}"...)
for cell in row:
....
if len(keywords_found):
cell.offset(0,1).value = str(keywords_found)
cell.offset(0,2).value = str(len(keywords_found))

Resources