batch remove linebreaks from within .xlsx cells - excel

I am currently trying to push content from a set of excel tables into a database. For that matter, i first try to convert said tables to .csv for further processing in python. However, the tables contain cells with multiple linebreaks, which apparently causes the resulting .csv file to be garbeled (newlines that do not contain any seperators). Manually removing all linebreaks prior to exporting to csv solves the problem. However, since there are 100+ files to be processed, i would like to automate this process with some kind of vba/vbs script (which i failed miserably to script for several hours).
How can i batch-remove all linebreaks from a bunch of excel files?

Related

Downloading CSV data into Excel from a Browser

So I have a script in PHP that creates tab separated CSV output.
I have a button in my HTML that works like so:
Export Data
Ideally I want the user to open this CSV file in Excel.
The issue I have here is with tab separated CSVs, the file extension, and how Excel handles all of this. For example:
download="export.csv"
Results in the Browser asking me to open this in Excel (wanted behaviour), but then once in Excel none of the columns are respected as they are tab separated (not comma separated, which Excel is obviously expecting).
download="export.xls"
Results in the Browser asking me to open this in Excel (again, wanted behaviour), but then Excel complains that the file extension and the contents do not match and gives the user a warning. If the user goes past this warning the data displays as expected, but I could do without the warning.
download="export.txt"
Results in the Browser downloading the file as a text file. Once imported into Excel, the columns are respected, but I could do with this being thought of as an Excel file like CSV files are.
download="export.tsv"
Results in the Browser downloading the file, but as this extension isnt recognized, it will need to be imported into Excel manually, which isn't what I am after. Infact, even though TSV is the most correct file extension for tab separated verse, the TXT extension seems to work more smoothly.
I am unable to set file associations on the end users machine, and I would like to avoid going down the "export your data as an actual XLXS file" route if at all possible. I would prefer to use tab separated CSVs over comma separated CSVs because the exported data contains lots of commas naturally.
EDIT:
So as per Ron Rosenfeld suggested I tried outputting a comma separated CSV file with quotes around the data - and the file loads into Excel, with columns preserved - however the quotes appear on every piece of data in every column that uses quotes.
Is it possible to not have the quotes appear?
Ideally I would prefer to have the content tab separated, but at this stage anything that allows me to open a CSV file from a browser into Excel would be great.
I want a way to download a tab separated CSV file from a browser to Excel with as little fuss as possible. How can this be achieved?
The difference between the CSV and TSV files are - as long as the creator followed some rules, that: CSV file will have comma separated values and a TSV file will have tab separated values.
For TXT files, there is no formatting specified.
CSV files are comma-delimited, so you have to use this:
sep=,
And TSV files are tab-delimited, so you have to use this:
sep=\t
If you have MS Excel installed on your computer, CSV files are closely associated with Excel.
Please, look at this post to find out what the use of sep=; for UTF-8 and UTF-16LE leads to.
It's very important to properly output UTF-8 and UTF-16LE CSV files in PHP.
So THIS POST will be informative and useful for you.
CSV means "comma separated values", so the default separator is a ,.
To change that separator to a tab, put
sep=\t
as the first line in your .csv-file (yes, you can still name it .csv). That tells excel what the delimiter character should be.
Note, that if you open the .csv with an actual text editor, it should read like
sep= (an actual tabulator character here, it's just not visible...)
This feature is not officially defined in the .csv RFC 4180, so if it works with any software other than Excel depends on that software's implementation.
I have done this before. A painful experience, which I rather not relive. but since you asked (and bountied).
Make sure your http-headers read: Content-Type: application/x-www-form-urlencoded
Make ; your separator
Don't enclose by " (This is a magic I have yet to understand).
Fingers crossed

Upload Microsoft Excel Workbook with Many Sheets into Azure ML Studio

I want to upload my Excel Workbook into Azure Machine Learning Studio. The reason is I have some data that I would like to join into my other .csv files to create a training data set.
When I upload my Excel, I don't get .xlsx, or .xls, but other extensions such as .csv, .txt etc..
This is how it looks,
I uploaded anyways and now, I am getting weird characters. How can I get excel workbook uploaded and get my sheets, so, I can join data and do, data preparation. Any suggestions?
You could save the workbook as a (set of) CSV file(s) and upload them separately.
A CSV file, a 'Comma Separated Values' file, is exactly that. A flat file with some values separated by a comma. If you load an Excel file it will mess up since there's way more information in an Excel file than just values separated by comma's. Have a look at File -> Save as -> Save as type where you can select 'CSV (comma delimited) (*.csv)'
Disclaimer: no, it's not always a comma...
In addition, the term "CSV" also denotes some closely related delimiter-separated formats that use different field delimiters. These include tab-separated values and space-separated values. A delimiter that is not present in the field data (such as tab) keeps the format parsing simple. These alternate delimiter-separated files are often even given a .csv extension despite the use of a non-comma field separator.
Edit
So apparently Excel files are supported: Supported data sources for Azure Machine Learning data preparation
Excel (.xls/.xlsx)
Read an Excel file one sheet at a time by specifying sheet name or number.
But also, only UTF-8 is supported: Import Data - Technical notes
Azure Machine Learning requires UTF-8 encoding. If the data you are importing uses a different encoding, or was exported from a data source that uses a different default encoding, various problems might appear in the text.

Removing Special Characters in CSV using PostgreSQL

I am trying to remove multiple special characters in a CSV file that I am copying into a created table in Postgresql. I have about 4CSV files like this, with 100,000 rows and 10 columns. I am getting errors every few 50-100 rows and I don't know what all the special characters are, as this is a large data set. Is there anyway I can just delete these or create something in excel/csv to delete these? I am afraid that I will be deleting important data
What would be best code?
Thanks!
Brook

CSV to Excel [xlsx] script

In excel I can open up a csv file using external data sources, and then chose to get data from text. This takes me through a set of steps to import the file. This works great, but I have a need to automate this process as many of these documents will need to be converted over time.
Is there a way to run a similar process as a script? I'm a complete newbie in this space.
You can run this command in a script:
csv2odf yourdata.csv yourtemplate.xlsx output.xlsx
You would need to get csv2odf and Python and create a template like this:
Insert column titles with the same number of columns as the csv.
Add one sample row of data. You can add formatting if you want.
Save the template as xlsx.

Totaling figures in .csv files using Excel

I have 12 .csv files produced by another program. The .csv files contain numeric data, separated by commas.
I need an easy way of totaling the values in certain columns in each of the files and comparing the totals across the various files e.g. compare the total from file 1 to the total from file 5.
The format of each file is the same i.e. 5 values in each record, separated by commas. Each of the 12 .csv files is about 50 Mb in size. Each file has a different number of records.
The environment I work in is 'secure' and I cant run any programs other than what I have installed on the PC I use. I have Excel installed and assume I can write VBA code/macros and I have access to the Command line. I can't (for example) load anything from a USB key and can not install any scripting language e.g. Python.
I have thought of doing this manually e.g. open each .csv file in Excel and total the columns using Excel functions i.e. SUM()
My challenge I need to do this many times of the next few weeks as new versions of the .csv files are produced i.e. I now have the first version, there will be many versions of the 12 files produced as I conduct testing on the other system. For each new version I need to sum the data and compare across files.
Last thing to say is, I cant change the system that produces the .csv files e.g. to create a set of totals
I'm looking for a programming solution that I can use, given my limited resources (ability to use any tools other than what is already on the PC)
You should be able to do this easily using an excel VBA macro but it might take quite some time if it needs to load and convert a 50MB csv file.
JScript (a microsoft form of JavaScript) is generally available on all machines and runs under the windows scripting host. Just create a file with a .js extension and try to run with a double click. Or you can use vbscript with a .vbs extension.
I think your easiest solution would be to write an excel macro (as you will have the IDE for excel vba as limited as it is).
Powershell or a batch script? A CSV is nothing more than a text file split with commas. Should be fairly easy to knock something up.
ADO can work on CSV files and you could then use SQL statements to sum the appropriate values - see this MSDN article for full details.
If you go to the Visual Basic Editor in Excel then try to add a reference via the Tools menu you should have several for Microsoft ActiveX Data Objects (2.8 being the most recent one.) Adding that reference lets you use ADO.

Resources