Upload Microsoft Excel Workbook with Many Sheets into Azure ML Studio - excel

I want to upload my Excel Workbook into Azure Machine Learning Studio. The reason is I have some data that I would like to join into my other .csv files to create a training data set.
When I upload my Excel, I don't get .xlsx, or .xls, but other extensions such as .csv, .txt etc..
This is how it looks,
I uploaded anyways and now, I am getting weird characters. How can I get excel workbook uploaded and get my sheets, so, I can join data and do, data preparation. Any suggestions?

You could save the workbook as a (set of) CSV file(s) and upload them separately.
A CSV file, a 'Comma Separated Values' file, is exactly that. A flat file with some values separated by a comma. If you load an Excel file it will mess up since there's way more information in an Excel file than just values separated by comma's. Have a look at File -> Save as -> Save as type where you can select 'CSV (comma delimited) (*.csv)'
Disclaimer: no, it's not always a comma...
In addition, the term "CSV" also denotes some closely related delimiter-separated formats that use different field delimiters. These include tab-separated values and space-separated values. A delimiter that is not present in the field data (such as tab) keeps the format parsing simple. These alternate delimiter-separated files are often even given a .csv extension despite the use of a non-comma field separator.
Edit
So apparently Excel files are supported: Supported data sources for Azure Machine Learning data preparation
Excel (.xls/.xlsx)
Read an Excel file one sheet at a time by specifying sheet name or number.
But also, only UTF-8 is supported: Import Data - Technical notes
Azure Machine Learning requires UTF-8 encoding. If the data you are importing uses a different encoding, or was exported from a data source that uses a different default encoding, various problems might appear in the text.

Related

Can the Excel PowerPivot data model process string lengths of >32,767 characters in a field?

I'm working with a csv file that contains 1 field (out of 10 total fields) with very large strings (50,000+ characters). The other 9 fields contain strings of normal length (<100).
I imported this file into the PowerPivot data model and then copied the data directly from the table view in PowerPivot into Notepad++.
All of the strings in Notepad++ are 32,767 characters, which suggests that PowerPivot has the same limitations as standard Excel in this respect.
Is there something I can do in PowerPivot to enable a field to hold more than 32,767 characters, or am I going to have to find another solution?
Fyi, the objective is to extract this long string (which is a base64-encoded jpeg) from the csv and save it as a separate text file (which would then be converted back to a jpeg with PowerShell...a script I've already developed).
The remainder of the data in the original csv would be saved as a table and combined with some other data from a few other sources to create one table to upload into our Salsify PIM.
I've asked the provider of this csv if it's possible to export the very long strings as individual text files with names that I could relate back to the original dataset (which would solve my problem instantly), but there is resistance. They are insisting on putting everything in one csv.
Note that I do have some experience in Python (and of course PowerShell) and am open to learning tools like PowerAutomate or any other tool that you'd recommend for something like this.
edit: Note that the jpeg files I'm working with range in size from 10KB all the way up to ~16MB, so the base64 string can get very long (in the range of 3.5M characters).
You will have to split an image into multiple rows of 30,000 characters and concatenation it back together in DAX. Images up to about 2.1MB should be supported this way.

Generating a CSV file

I am having problems converting an MSaccess table that contains a 12 digits barcode-number field to CSV file
The barcode field is defined as text!
I tried exporting to Excel and saving the Excel file to CSV or exporting it to CSV But but that did not work either (even when the field is defined as text).
The problem is that some barcodes start with zero which gets truncated and that displays a scientific notation instead of displaying the barcode string.
My Question is: How can I generate a CSV file that is stored as an Excel spread sheet?
any help is appreciated
Dory
Nick McDermaid thanks for your comment. When looking in a text editor everything looks perfect.....You mean the people requesting it on my website are actually using it as text file and do not care about the way it looks in a spread sheet? if so then I am just chasing a wild goose! is that what you mean?

Downloading CSV data into Excel from a Browser

So I have a script in PHP that creates tab separated CSV output.
I have a button in my HTML that works like so:
Export Data
Ideally I want the user to open this CSV file in Excel.
The issue I have here is with tab separated CSVs, the file extension, and how Excel handles all of this. For example:
download="export.csv"
Results in the Browser asking me to open this in Excel (wanted behaviour), but then once in Excel none of the columns are respected as they are tab separated (not comma separated, which Excel is obviously expecting).
download="export.xls"
Results in the Browser asking me to open this in Excel (again, wanted behaviour), but then Excel complains that the file extension and the contents do not match and gives the user a warning. If the user goes past this warning the data displays as expected, but I could do without the warning.
download="export.txt"
Results in the Browser downloading the file as a text file. Once imported into Excel, the columns are respected, but I could do with this being thought of as an Excel file like CSV files are.
download="export.tsv"
Results in the Browser downloading the file, but as this extension isnt recognized, it will need to be imported into Excel manually, which isn't what I am after. Infact, even though TSV is the most correct file extension for tab separated verse, the TXT extension seems to work more smoothly.
I am unable to set file associations on the end users machine, and I would like to avoid going down the "export your data as an actual XLXS file" route if at all possible. I would prefer to use tab separated CSVs over comma separated CSVs because the exported data contains lots of commas naturally.
EDIT:
So as per Ron Rosenfeld suggested I tried outputting a comma separated CSV file with quotes around the data - and the file loads into Excel, with columns preserved - however the quotes appear on every piece of data in every column that uses quotes.
Is it possible to not have the quotes appear?
Ideally I would prefer to have the content tab separated, but at this stage anything that allows me to open a CSV file from a browser into Excel would be great.
I want a way to download a tab separated CSV file from a browser to Excel with as little fuss as possible. How can this be achieved?
The difference between the CSV and TSV files are - as long as the creator followed some rules, that: CSV file will have comma separated values and a TSV file will have tab separated values.
For TXT files, there is no formatting specified.
CSV files are comma-delimited, so you have to use this:
sep=,
And TSV files are tab-delimited, so you have to use this:
sep=\t
If you have MS Excel installed on your computer, CSV files are closely associated with Excel.
Please, look at this post to find out what the use of sep=; for UTF-8 and UTF-16LE leads to.
It's very important to properly output UTF-8 and UTF-16LE CSV files in PHP.
So THIS POST will be informative and useful for you.
CSV means "comma separated values", so the default separator is a ,.
To change that separator to a tab, put
sep=\t
as the first line in your .csv-file (yes, you can still name it .csv). That tells excel what the delimiter character should be.
Note, that if you open the .csv with an actual text editor, it should read like
sep= (an actual tabulator character here, it's just not visible...)
This feature is not officially defined in the .csv RFC 4180, so if it works with any software other than Excel depends on that software's implementation.
I have done this before. A painful experience, which I rather not relive. but since you asked (and bountied).
Make sure your http-headers read: Content-Type: application/x-www-form-urlencoded
Make ; your separator
Don't enclose by " (This is a magic I have yet to understand).
Fingers crossed

CSV Exporting: Preserving leading zeros

I'm working on a .NET application which exports CSV files to open in Excel and I'm having a problem with preserving leading zeros when the file is opened in Excel. I've used the method mentioned at http://creativyst.com/Doc/Articles/CSV/CSV01.htm#CSVAndExcel
This works great until the user decides to save the CSV file within Excel. If the file is opened again in Excel then the leading zeros are lost.
Is there anything I can do when generating the CSV file to prevent this from happening.
This is not a CSV issue.
This is Excel loving to play with CSV files.
Change the extension to something else.
As #GSerg mentions, this is not a CSV issue.
If your users must edit/save in Excel they need to select the entire worksheet, right-click and choose "Format Cells" and from the Category list select "Text" after opening the csv file. This will preserve the leading zeros since the numbers will be treated as simple text.
Alternatively, you could use Open XML SDK 2.0, or some other Excel library, to create an xlsx file from your csv data and programmaticaly set the Cell type to Text in order to take the end users out of the equation...
I found a nice way around this, if you add a space anywhere along the phone number, the cell is then not treated as number and is treated as a text cell in both Excel and Apple's iWork Numbers.
It's the only solution I've found so far that plays nice with Numbers.
Yes I realise the number then has a space, but this is easy to process out of large chunks of data, you just have to select a column and remove all spaces.
Also, if this is web related, most web type things are ok with users entering a space in the number field. E.g you can tap-to-call on mobiles.
The challenge is to get the space in there in the first place.
In use:
01202123456 = 1202123456
but
01202 123456 = 01202 123456
Ok, new discovery.
Using Quick Preview on Mac to view a CSV file the telephone column will display perfectly, but opening the file fully with Numbers or Excel will ruin that column.
On some level Mac OS X is capable of handling that column correctly with no user meddling.
I am now working on the best/easiest way to make a website output a universally accepted CSV with telephone numbers preserved.
But maybe with that info someone else has an idea on how to make Numbers handle the file in the same way that Quick Preview does?

Totaling figures in .csv files using Excel

I have 12 .csv files produced by another program. The .csv files contain numeric data, separated by commas.
I need an easy way of totaling the values in certain columns in each of the files and comparing the totals across the various files e.g. compare the total from file 1 to the total from file 5.
The format of each file is the same i.e. 5 values in each record, separated by commas. Each of the 12 .csv files is about 50 Mb in size. Each file has a different number of records.
The environment I work in is 'secure' and I cant run any programs other than what I have installed on the PC I use. I have Excel installed and assume I can write VBA code/macros and I have access to the Command line. I can't (for example) load anything from a USB key and can not install any scripting language e.g. Python.
I have thought of doing this manually e.g. open each .csv file in Excel and total the columns using Excel functions i.e. SUM()
My challenge I need to do this many times of the next few weeks as new versions of the .csv files are produced i.e. I now have the first version, there will be many versions of the 12 files produced as I conduct testing on the other system. For each new version I need to sum the data and compare across files.
Last thing to say is, I cant change the system that produces the .csv files e.g. to create a set of totals
I'm looking for a programming solution that I can use, given my limited resources (ability to use any tools other than what is already on the PC)
You should be able to do this easily using an excel VBA macro but it might take quite some time if it needs to load and convert a 50MB csv file.
JScript (a microsoft form of JavaScript) is generally available on all machines and runs under the windows scripting host. Just create a file with a .js extension and try to run with a double click. Or you can use vbscript with a .vbs extension.
I think your easiest solution would be to write an excel macro (as you will have the IDE for excel vba as limited as it is).
Powershell or a batch script? A CSV is nothing more than a text file split with commas. Should be fairly easy to knock something up.
ADO can work on CSV files and you could then use SQL statements to sum the appropriate values - see this MSDN article for full details.
If you go to the Visual Basic Editor in Excel then try to add a reference via the Tools menu you should have several for Microsoft ActiveX Data Objects (2.8 being the most recent one.) Adding that reference lets you use ADO.

Resources