How to determine file encoding type with Excel VBA - excel

I have built an Excel/VBA tool to validate csv files to ensure the data they contain is valid. They csv can come originate from anywhere (from a full blown unix system or a desktop user saving data out from Excel). The Excel tool is sent out to businesses so they can validate their csv files in their own environment and without taking the risk of their data leaving thier systems. Thus, the solution needs to be in native VBA and not link into external libraries.
So using VBA, I need to be able to automatically detect UTF-8 (with or without BOM) or ANSI file encodings and warn the user if these are not the file encodings used for the csv.
I think this would perhaps involve reading in a few bytes from the start of the file and determining the encoding based on the existance of the byte order mark.
Could you help me get me started on the right track?

Assuming you have the freedom to ask user to choose the correct file type, making them responsible for what they choose as a file ;)
That means, you can create a form where users can choose the filename and the encoding type like how we do on file open wizard.
Else,
I suggest you to use the FileSystemObject. It returns a TextStream which can be utilized to determine the encoding. I doubt VBA supports other types of encoding and please correct me if it does :) and happy to hear. :)
how to detect encoding type
msdn object library model
Here is a link for further considerations:-
change encode type

Related

How can i Convert a text file to UCS-2 LE, from whatever the default is?

I am looking for a way to convert or save a text file in the UCS-2 LE format; specifically without BOM...i guess.
I have zero knowledge what any of that means actually; but i know i need that because of this wiki page on what i am trying to accomplish: https://developer.valvesoftware.com/wiki/Closed_Captions
in other words:
this is for a specific game engine, "Source Engine," which requires the format in order to compile in-game closed captions for sounds.
I have tried saving the file in Notepad++ using the "UCS-2 LE BOM" option under the encoding menu...there is no option for just "UCS-2 LE" however, and because of this, the captions cannot be compiled for the game engine. I need to save without BOM, "I guess" (because again I don't know what I'm talking about and I assume based on logical conclusions, that I need to not have BOM, whatever that actually means.)
I would like to know about a way to either save a txt file in that encoding format; or a way to convert one.
In my specific case; it appears that my problem boils down to "the program is weird."
what I mean by this is, notepad++ actually does save in the correct format; but I failed to realize that because of a quirk in the caption compiler where it only works if you drag the file onto it; not via command line as previously thought.
I will accept this as the answer when i am allowed to in 2 days.

CSV to Excel Doc?

I was curious what the community thinks is the easiest way to take a CSV file and 'save as' a Excel document with only a couple formulas pasted in?
I am trying to do this behind the scenes, and not physically navigating. e.g. opening, selecting save As, etc -- even though this is already VERY simple I **need to do this in code (Think automation)
Background: I have a c++ command line program generating the .csv, and a C# GUI starting this process. Either programs could hold the code, but I figure this is easiest in C# (InterOp?) The reasons I don't directly send code into the csv is because of the amount of comma characters that will mess up the csv and because other Excel documents need to reference the sheets so they need to be in .xls format.
=AVERAGE(C2:C999)
=COUNTIFS(C:C,">0",C:C,"<31")
=COUNTIFS(C:C,">31",C:C,"<55")
=COUNTIF(C:C,">55")
Have a look and see whether command-line scipting of openoffice will do the job. It can do quite a lot of conversions very easily. Otherwise there are a lot of Excel-producing libraries, for example PHPExcel, but you'd need to wrap some programming around them.

How to extract (import) data from a mainframe dataset to excel table

I want to build a little application that calculates the critical batch of a batch flow.
As input I need to use a Mainframe dataset. If possible, being dynamic, that is, I can choose the fields that apply at the time.
I've searched the internet about that but found nothing that suited what I wanted to do.
Is there a way to do that?
I have a dataset in a mainframe library and I want to ftp that file to Excel.
Convert the file to CSV on the mainframe (for example, via a REXX exec, a z/OS UNIX shell script, or a Lua4z program),
and then insert that CSV file into Excel via FTP.
You do not need to transfer the CSV file to your PC's file system and then, as a separate step, open it in Excel.
Instead, you define the FTP (or HTTP) URL for the CSV as a data source in Excel. One advantage of this technique is that you can refresh the data from that URL
without having to reapply formatting in Excel.
There are various tutorials on the web for doing this.
In brief:
Create a new blank workbook (I'm using Excel 2010).
Select the first cell in the empty worksheet (this step is unnecessary - the cell is already selected - if you've only just created the workbook).
On the Data tab, click From Text
In the File name text box of the Import Text File dialog, enter the FTP URL of the CSV file. For example:
ftp://zos1//u/me/data.csv
(This assumes that your mainframe is configured to allow FTP using this path.)
The two consecutive slash (/) characters following the host name (zos1) indicate that the path refers to a z/OS UNIX file (/u/me/data.csv).
The CSV file must be in a z/OS UNIX path. The FTP client does not accept MVS-style (dsname) paths such as 'me.csv(data)' (even when URL-encoded; that is, with the single quotes escaped as %27); by contrast, cURL accepts such paths just fine.
The CSV file on the mainframe must be ASCII encoded, not EBCDIC. (Here, I'm using the term ASCII imprecisely: the precise character encoding you want depends on your PC's settings. You probably want Windows-1252.) This is because the FTP client sets the default transfer type to binary.
Enter your user name and password (your z/OS TSO user ID and password).
Wait for the data to load.
Format the cells. For example, set the format of any columns containing date/time values.
On the Data tab, click Connections, select the connection (that Excel created when you specified a URL for the file name), and clear the check box Prompt for file name on refresh.
To refresh the data, replacing the current data with the results of a new FTP request: on the Data tab, click Refresh All. The data is replaced; the cell formatting remains intact.
Converting an EBCDIC-encoded CSV file to ASCII
(Strictly speaking, I mean ISO-8859, not ASCII.)
Suppose you have JCL that generates a CSV file encoded in EBCDIC. You want to make that CSV file available to Excel via FTP as an ASCII-encoded z/OS UNIX (zFS) file.
Replace your existing DD statement for the output CSV file with the following DD statement:
//OUTCSV DD PATH='/u/me/data-ebcdic.csv',
// PATHOPTS=(OWRONLY,OCREAT,OTRUNC),
// PATHDISP=(KEEP,DELETE),
// PATHMODE=(SIRUSR,SIWUSR,SIRGRP),
// FILEDATA=TEXT
Replace the ddname OUTCSV with your ddname, and the zFS file path /u/me/data-ebcdic.csv with the path that you want to use.
Thanks to the FILEDATA=TEXT parameter, the resulting CSV file will have a X'15' byte at the end of each line.
Append the following step to your JCL:
//ICONV EXEC PGM=IKJEFT01
//SYSTSIN DD *
BPXBATCH sh iconv -f IBM-037 -t iso8859-1 +
/u/me/data-ebcdic.csv +
> /u/me/data-ascii.csv
/*
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
In case you're wondering why I'm calling iconv as a shell command via BPXBATCH, the following:
//ICONV EXEC PGM=EDCICONV
// PARM=('FROMCODE(IBM-037),TOCODE(iso8859-1)')
didn't quite work: it left the X'15' bytes as is, whereas running iconv as a shell command correctly converted them to X'0A'. (z/OS 2.2.)
You've got some good information in the comments, consensus appears to be conversion to CSV (or TSV to avoid commas embedded in your data) is the easiest route. Here is a bit more information, copied from another answer...
I would strongly suggest you get the files into a text format before
transferring them to another box with a different code page. Trying to
deal with mixed text (which must have its code page translated) and
binary (which must not have its code page translated but which likely
must be converted from big endian to little endian) is harder than
doing the conversion up front.
The conversion can likely be done via the SORT utility on the
mainframe. Mainframe SORT utilities tend to have extensive data
manipulation functions. There are other mechanisms you could use
(other utilities, custom code written in the language of your choice,
purchased packages) but this is what we tend to do in these
circumstances.
Once you have your flat files converted such that all data is text,
you can transfer them via FTP or SFTP or FTPS.
...and thanks for coming back and adding more information. Hopefully the people here have provided enough information to help you solve your problem.
XML would be another possible text oriented solution. It would take more effort to create, but you could design your spreadsheet in Excel and save as an XML document, then write a program to generate the xml text using the data from your mainframe dataset. While this would be more difficult to implement than a simple CSV or TSV file, it has the advantage of implementing the spreadsheet formulas and attributes that a CSV file can not do. Another advantage, you can attach the XML document to an SMTP email note and deliver the document in "spreadsheet format" to your client.

Using different program office extension

I have a program that can access a database with a whole bunch of articles.
Due to copyright, I can't access the database straight from my program, but I have a different program that can access it, and it's legitimate to copy small bits from the articles.
Because my friends and I quote a lot from these articles, I thought it would be useful if we could find an add-in for Word that will copy the requested part from an article.
Is there any add-in for Word that would let me use the program that I mentioned above so that I can access the database from within Word?
I would like to program this add-in myself, if possible.
Without further information about which operating system, and version of Word you are using, I can offer only a general outline.
1) It seems to me that you want to make a Word macro using Word Basic, or Visual Basic.
2) When you want to call your program which is external to Word, you need to use the shell command as outlined here from Microsoft's webpage.
I hope that helps you get started writing your macro!
CHEERS
Well its a wrokaround but you can use an automation tool which can run a sequence of actions on a given GUI like Winrunner or TestQuest to semulate the usage of the program, i assume these tools can get an input from a given xml or text file and log outputs in log text file.
If you have the output in a text file you will be able to parse the file using any programmign language and get the information you need and write it to eord or whatever format using OLE objects.

I want to change the way text is represented internally in ANY Text Editor

I want to use a algorithm to reduce memory used to save the particular text file.I don't really know how text is stored but i have an idea in mind.
Would it be better to extend a open source text editor (if yes than which one) or write a text editor myself.
It would be nice if someone could also give me a link or tutorial to some basics on how text editors work and the way data is stored.
Edited to add
To clarify, what I wanted to do is instead of saving duplicates of a word make a hash table and store the address where it needs to be placed.
That way I wouldn't be storing the duplicates.
This would have become specific to a particular text editor.
Update
thanks everyone I got what all of you'll are trying to say. Anyways all i wanted to do is instead of saving duplicates of a word make a hash table and store the address where it needs to be placed.
This was i wouldn't be storing the duplicates.
Yes and this would have become specific to a particular text editor. never realized that.
I want to use a algorithm to reduce memory used to save the particular text file
If you did this you would no longer have a text editor, but instead you would have created some sort of binary file editor.
The whole point of the text file format is that it is universal, meaning any text file can be open in any other text editor.
Emacs handles compression transparently. Just create a text file with .gz extension. Emacs will automatically compress contents of the file during save operation, and decompress when you open the file next time.
Text is basically stored as-is. i.e., every character takes up a byte or two (wide chars), and there is no conversion done on it when it's saved. It might add an end-of-file character or something though. Don't try coming up with your own algorithm to compress these files. That's why zip-files and other archives were created. They're really good at compressing text. If you wanted to add these feature to your text-editor, you'd have to add some sort of post-save hook to zip it, and then put a hook on the open command to unzip it. Unless you wanted to do it by hand every time. Don't try writing the text editor yourself from scratch, unless (maybe) you're writing notepad. Text editors with syntax highlighting aren't very easy to make, even with the proper libraries. I'd say write a plugin for something like Visual Studio or what have you. Or find an open-source text editor.

Resources