Fix csv files through Excel/Numbers [automatically through Python]? - excel

I'm working with some CSV files which have been created incorrectly. There are quotations and commas interconnected, and I keep getting parsing errors from pd.read_csv, even after replacing all column-separating commas with tabs.
Nevertheless, Numbers (Apple's Excel) can read the file perfectly, and, after re-saving it as csv, Pandas can generate data frames seamlessly. Thus, I wanted to know if there was a way, preferentially through Python, to automate this import-export in Numbers/Excel (maybe an API?) to fix my CSVs, or maybe find out what they do to correct them.
EDIT: The CSV rows look as following:
"id","lastVisitTimeLocal","lastVisitTimeUTC","title","url","typedCount","visitCount",""[]"_id","_id"
8986,"06/03/2018, 20:00:48","3/6/2018 2:30:48 PM","","https://chrome.google.com",0,1,3000001,2000001
Although some titles contain commas and some links contain quotation marks, so I keep getting parsing errors, despite Numbers/Excel parsing them seamlessly.
EDIT2: I'm looking for a pipeline that does the following:
file.csv --excel_engine--> file.xlsx --excel_engine--> file2.csv

Have you tried setting quoting and doublequote in pd.read_csv()? It's odd to me that Pandas can't read a csv that Excel can (i usually have problems with Excel instead; the only issue i've had with Pandas is NUL characters).
Alternatively you can also run this in VBA:
Sub openCsvAndSave()
Dim csv_paths, path
csv_paths = Array(path1, path2, ...) ' Set your csv paths here '
For Each path in csv_paths
Dim NewWb As Workbook: Set NewWb = Workbooks.Open(path)
NewWb.SaveAs Left(path, Len(path) - 4) & "_2.csv", xlCSV
Next path
End

Related

Using Excel files to create formatted Word files

I need to create a custom software that would convert Excel files to a formatted Word file.
From the Excel file below;
.
To this word file with the given formatting.
Now I have not done any work in this before. But, I have a few ideas in using Python with CSV file formats but I am not sure. What can I do to write a software that could fully automate this process? For example, take the Excel file as an input and generate a formatted Word file.
You could use the csv module that comes with Python. The function csv.reader() will allow you to iterate through the lines of of your csv file. Additionally, pandas, another Python module (which you have to download) is also a good option when it comes to working with data. However, I'm fairly certain that it won't be automatic, and that you'll have to deal with converting that to a Word file.
Another possibility is:
First, rearrange the data into tabular format, e.g., Name in column A, Gender in column B, etc. You could write a simple VBA routine to create a tabular file based on your raw file.
Use Word's mail merge feature to write the paragraphs by filling data fields from the Excel file.
Hope that helps.
You could use Excel formulas such as =Concatentate to gather the completed sentences onto a separate tab, named for example "Summary", in a named range such as "StudentInfo". Then close the spreadsheet.
In Word, use your skills to get the cursor to the correct spot, then try this:
Sub CopyNamedRangeFromExcelWorkbook()
' For this to work, must go to VB Editor and pick --> Tools > References > Miscorsoft Excel 15.0 Obj Lib
Dim xlApp As Excel.Application
Set xlApp = CreateObject("Excel.Application")
xlApp.Visible = False
Dim xlWorkbook As Excel.Workbook
Set xlWorkbook = _
xlApp.Workbooks.Open("C:\Users\Desdemona\Desktop\TheSpreadsheet.xlsx")
xlWorkbook.Worksheets("Summary").Range("StudentInfo").Copy
Selection.Paste ' pastes the excel data as a Word table
xlWorkbook.Close
xlApp.Quit
' now apply a format to the word table, or revert it to text, or whatever.
End Sub

Export to txt from VBA

I'm having some trouble exporting Excel files to txt files through VBA. The programm goes fine and generates a bunch of txt files with the information I want. The problem is that when exported, the txt file shows the date format as American, while I want it European dd/mm/yyyy. This doesn't happen when I save the txt manually. Here it is the code I'm trying to save the txt:
tmpFile = "C:\Users\z864451\Desktop\Prueba\AIMS\AIMS_" & Filename
ActiveWorkbook.SaveAs Filename:=tmpFile _
, FileFormat:=xlText, CreateBackup:=False
I have also tried to export to csv and then convert to txt but the same problem with the date happens again.
Any idea of how can I solve this?
Thanks
I'm guessing you want to use current date.
Below should do it:
tmpFile = "C:\Users\z864451\Desktop\Prueba\AIMS\AIMS_" & Format(Now, “dd/MM/yyyy”)
Source
Actually I realized the answer was just in changing the date format, when selecting the format there are two of them one is *14/03/2011 which is the one that was causing the problem, just changing it to 14/03/2001 solves the whole thing.
Thanks

Save as CSV with semicolon separator

I'm currently use this function for saving, but I have a problem with it:
Private Sub spara()
ActiveWorkbook.SaveAs Filename:="T:\filepath+ ActiveWorkbook.Name", _
FileFormat:=xlCSV, CreateBackup:=False
End Sub
It automatically saves with , but I need it to save with ; in its file. Is it possible to change how it saves somehow?
I've tried googling around for this issue, but all macros regarding .csv file saving are just how to save in .csv and how to split multiple sheets to .csv .
Which language does your Excel use? If your native language uses ";" as default, you can pass the parameter "local:=True"
ActiveWorkbook.SaveAs Filename:="C:\Temp\Fredi.csv", FileFormat:=xlCSV, CreateBackup:=False, local:=True
If not, your only choice is searching and replacing afterwards...
I had been looking this up to help resolve a similar issue I was having,
I had an excel sheet that I export to a csv file, this is then uloaded elsewhere but requires the use of semicolons rather than commas for the character seperation, this worked fine when i was manually exporting the file as i had already changed the following
Control Panel >> Region and Language >> additonal settings >> List separator
from comma to semicolon.
But when i tried to automate via VBA it defaulted back to comma,
to fix I added the local paprameter as suggested by Christian Sauer above which then picks up the fact that i have changed my regional settings.
ActiveWorkbook.SaveAs Filename:="Filename.txt", FileFormat:=xlCSV, CreateBackup:=False, Local:=True
Thanks to Christian for the pointer.
Change Excel Options (Advanced, Editing options) to set Decimal separator to , (obviously (!))
I solved it adding "SaveChanges:=False" when closing workbook
With ActiveWorkbook
.SaveAs Filename:="T:\filepath+ ActiveWorkbook.Name", FileFormat:=xlCSV, Local:=True
.Close SaveChanges:=False
End With
It's quite possible this problem is not solvable using Excel VBA only. The problem is, while Excel Save As... uses machine locale define list separator value, Excel VBA always uses en-US locale, thus, it always uses , as a list separator.
I would recommend saving a CSV and then use custom console app/script for postprocessing. There is plenty of CSV parsers available which can read a ,-csv and then save it as ;-csv.
For people having issues with this code
dim wa as Workbook
Workbooks.OpenText FileName:=path2file, _
DataType:=xlDelimited, Semicolon:=True, Local:=True
set wa = ActiveWorkbook
wa.SaveAs FileName:=path2file, FileFormat:=xlCSV, _ ConflictResolution:=xlLocalSessionChanges, Local:=True
wa.Close False
The second line is really important, if wa.Close False is not there it will ask for approval to save, or if wa.Close True it will replace the ";" for ",".
After going into local settings and making ";" the list delimiter, VBA was still separating with a ",". Modified the code to the above and it was done.
Hope this throw some light for some
Great answers here, but they didn't work for me, because I'm trying to create a tool that will work for any user on their own computer, and I don't want to have to set up this locale stuff on everybody's PC. I even tried rolling my own CSV exporter using FileSystemObject, but the destination path name is on SharePoint, so that failed too.
Then I stumbled upon a very simple workaround: create a new worksheet that concatenates all your information into one column, separated by semicolons, e.g.:
CONCAT('Old WS'!A1,";",'Old WS'!B1,";",'Old WS'!C1,";",'Old WS'!D1,";",'Old WS'!E1)
Then add some code like this to export it:
Call Worksheets("Export WS").Copy
ActiveWorkbook.SaveAs Filename:=CSVname, FileFormat:=xlCSV, CreateBackup:=False
ActiveWorkbook.Close
Because it's all in one column, Excel won't add its own delimiters!
Expanding on James Fingas's answer.
SOLUTION:
Concatenate all your information from multiple columns into one column and separate the data in the concatenated string by your separator of choice - i.e. the semicolon. Then copy the one column to a new file and save it.
LIMITATION:
However, beware that this solution is flawed when comma is present somewhere in the concatenated data as the concatenated string becomes wrapped by double quotes - see explanation below.
(1) When you save the file, VBA assumes that comma is the delimeter for the file. VBA also sees that comma is present in your string. Thus, in order to protect against future splitting of the string, VBA wraps the concatenated string in double quotes.
(2) You can try to bypass this problem by setting the system separator to semicolon and saving the file with the "Local:=True" parameter. However, now VBA knows that semicolon is the delimeter for the file. VBA also sees that semicolons are present in your string. Thus, in order to protect against future splitting of the string (VBA doesn't now that you actually want to split using the semicolons), VBA wraps the concatenated string in double quotes.

In vba How to change extension when saving file, don't know exact filename

My issue is this.
I read in a series of filenames using a wildcard, so that the end of the filename is unknown, and the extension is either .xls or .xlsx. So, the wildcard is something like :
beginningOfFilename_*.xls*
I then want to take each file, after I have manipulated it, and save it with the same name, but as a .csv (comma seperated value file). In vba for excel, can I just specify the format and it will take care of the extension, or do I have to somehow pull off the( unknown) extension, and append .csv
If the second case is neccessary how would you approach this problem, I don't know where to start, since part of the filename is unknown, and I am not sure how to manipulate strings in vba.
I'm a VBA beginner.
Any help will be appreciated, thanks.
The line you want is :
Mid(sFile, 1, InStrRev(sFile, ".")) & "csv"
Where sFile is the file name with any extension.
Split(sFile, ".")(0) & ".csv"
where sFile is the filename
To get a path and name of your file without extension use.
Dim StrFileName as string
StrFileName= split(ThisWorkbook.fullName,".xls")(0)
Now save your Csv using StrFileName content.
[]´s

Save custom data in Excel workbook

Is it possible to save large amount of data (about 1-2 mb) in Excel workbook?
Ideally, this data should be tied with a worksheet in the workbook.
CustomProperties are unlikely to support large data.
My data can be presented in following forms: binary, xml, string.
Thanks...
Yes, you could store string and XML in Excel cells. For binary you'd be better off not saving it inside Excel, but if you had to then OLE (object linking and embedding) could
be an option. You could do so by saving the binary as a file outside of Excel and then inserting it as a OLE object:
Dim mySht As Worksheet
Dim oleFileName as String
oleFile = "MyXmlDoc.xml"
Set mySht = ActiveWorkbook.ActiveSheet
mySht.Shapes.AddOLEObject Filename:=Environ$("Appdata") & _
"\MyFolder\" & oleFile, _
Link:=False, DisplayAsIcon:=True
It worked fine for us for certain types of common filetype. But we never did it for raw binary data. Usually we put a Word Document or PDF in the spreadsheet. You also run the risk of possibly corrupting the workbook or losing the binary data. In our case the OLE object would be clicked on by a user that had Wordperfect instead of Word or they ran Linux / Mac and the embedded document wouldn't open.
It does make the Excel File get rather large with every embedded object you add. It's an old technology with its quirks.
You could add a VBA module to the workbook for your data and encode your data in normal ASCII strings (for example, using Base64 encoding). Resulting code would look like this:
Dim x(1000) As String
Sub InitData()
x(0) = "abcdefghijk...."
x(1) = "123456789......"
'...'
End Sub
You can also store these strings in a sheet instead of a VBA module, line-by-line, if you prefer this.
To accomplish the encoding / decoding, look here:
How do I base64 encode a string efficiently using Excel VBA?
Base64 Encode String in VBScript
http://www.source-code.biz/snippets/vbasic/12.htm
If your information comes in files, you can read the file in binary mode and put it in a cell. Then in the end or in the beginning you would save your filename so you could easily reconstruct the file.
Here is some code for you to begin with:
Sub readBin()
Dim iFreeFile As Integer, bTemporary As Byte
iFreeFile = FreeFile
Open "\temp.bin" For Binary Access Read As iFreeFile
Do While Not EOF(intFileNum)
Get iFreeFile, , bTemporary //read byte from file
Cells(1, 1) = bTemporary //save in cell
Loop
Close iFreeFile
End Sub

Resources