What is the accepted file extension to use for pipe delimited files? - file-extension

I am parsing a file of data whose fields are separated by pipes. The records are separated by newlines. This is similar to a CSV file or even a TSV file (http://www.cs.tut.fi/~jkorpela/TSV.html), but I wonder what the accepted file extension for pipe delimited data is?
I do not see one specifically mentioned in http://en.wikipedia.org/wiki/Delimiter-separated_values and I have not found mention of one in the 5 or so StackOverflow questions I referenced.
The WP article suggests to me that, by way of "extension" from comma to CSV and tab to TSV, the extension should be PSV. Not everyone calls a pipe a pipe though.
Maybe there is a popular software package that uses pipe delimited data and has an extension for it, thereby setting the de facto standard?

I had this same question because I wanted to follow the standard if there was any. The obvious choice here is .psv considering the naming system of .csv and .tsv - however, I couldn't find this in use anywhere.
The most common extension that I could find associated with a pipe-delimited file is simply .txt. Exports from census.gov and most other government entities use .txt for pipe-delimited files.

Related

Pentaho - CSV Input not understanding special character [Windows to Linux]

I have a transformation on Pentaho Data Integration where the first thing I do is I use the "CSV Input" to map my flat file.
I've never had a problem with it on windows, but now I'm chaning my server that spoon is going to run to a linux server and now I'm having problems with special characters.
The first thing I noticed was that my tables where being updated because the system was understanding the names as diferent strings to the ones that are at my database.
Checking for the problem, I also noticed that if I go to my "CSV Input" -> Preview, it will show me the preview of my data with the problem above:
Special characters are not showing.
Where it should be:
Diretoria de Suporte à Decisão e Aplicação
I used a command to checked my file charset/codification and it showed:
$ file -bi foo.csv
text/plain; charset=iso-8859-1
If I open foo.csv on vi, it understands the special characters.
Any idea on what could be the problem or what should I try?
I don't have any data files with this encoding, so you'll have to do some experimenting, but there are some steps designed to deal with these issues.
First, the CSV Input step has a field that allows you to select the encoding of the source file. The Text File Input step has both a "Format" (meaning line terminator) and "Encoding" selector under the "Content" tab.
In Transforms, you have the Change file encoding step under the Utility tab. This step is designed to copy many files while changing their encoding; that's why it's in a transform.
In Jobs, there's the Convert file between Windows and Unix step under the File Management tab, but this appears to only deal with line terminators.
Either way it appears if the CSV/Text file input steps don't suit your needs, you'll have to copy the file to a new encoding before reading it in. It will probably be easiest to try handling it with the file input steps first.

Are there any difference between commas at the end of a line or not of a CSV File?

csvFile1.csv
abc,def
csvFile2.csv
abc,def,
Are there any difference between them?
Usually there is no issue with this except that it may create an empty field at the end, besides that there is no other difference I believe.
EDIT: It also depends where you will be using your CSV file. I do know some places where these CSV files must have a specific format but as I said this depends on the use of the file itself.

How to determine file encoding type with Excel VBA

I have built an Excel/VBA tool to validate csv files to ensure the data they contain is valid. They csv can come originate from anywhere (from a full blown unix system or a desktop user saving data out from Excel). The Excel tool is sent out to businesses so they can validate their csv files in their own environment and without taking the risk of their data leaving thier systems. Thus, the solution needs to be in native VBA and not link into external libraries.
So using VBA, I need to be able to automatically detect UTF-8 (with or without BOM) or ANSI file encodings and warn the user if these are not the file encodings used for the csv.
I think this would perhaps involve reading in a few bytes from the start of the file and determining the encoding based on the existance of the byte order mark.
Could you help me get me started on the right track?
Assuming you have the freedom to ask user to choose the correct file type, making them responsible for what they choose as a file ;)
That means, you can create a form where users can choose the filename and the encoding type like how we do on file open wizard.
Else,
I suggest you to use the FileSystemObject. It returns a TextStream which can be utilized to determine the encoding. I doubt VBA supports other types of encoding and please correct me if it does :) and happy to hear. :)
how to detect encoding type
msdn object library model
Here is a link for further considerations:-
change encode type

.dat file how to create one based on excel document

I have a .csv file in my matlab folder with 38 columns and about 48 thousand entries. I was hoping on using the findcluster gui but it only accepts .dat files.
How do I create a .dat file in matlab or specifically how do I convert the .csv file into a .dat file that can be used by the matlab fcm clustering tool?
example of csv:
how would I go about creating a data file for this kind of information?
The only documentation I could find about the file format was
The data set must have the extension .dat. For example, to load the data set,
clusterdemo.dat, type findcluster('clusterdemo.dat').
I checked clusterdemo.dat and found that the data is stored in ASCII format. Therefore, try
a = csvread('data.csv');
save 'data.dat' a -ASCII
Just rename xxx.csv to xxx.dat. This worked for me.
you should try changing extension.For changing extension you can go to folder settingand in view where we show hidden file…uncheck the hide extension for known files and now you can change the extension of any file by renaming it.
Because
There really isn't such a thing as 'dat' format, a 'dat' file is just a text file, it could theoretically have any extension you want.It could also be delimited however you want/need, it all really depends on what you are trying to achieve.
ie what are you going to use this file for?
If it's for use with another application then the requirements of that application will probably dictate how it's delimited/structured etc.
OR simply you can save the file from the excel as .csv and then later can change the extension.
It worked for me.

What is the correct MIME type for a CSV File created by Excel with semicolon as separator

I work in Switzerland, where the semicolon ; is the "official" list separator in Windows' regional settings.
Excel uses this separator when creating CSV files.
In RFC 4180 the CSV file is defined as comma separated and has the mime type of text/csv associated. My file does not conform to this definition.
I'm using application/vnd.ms-excel instead, but I'm not satisfied by declaring it an "Excel" file, since it's an application independent semicolon-separated file.
What would the correct MIME type be?
Thanks!
AFAIK there is no official mime-type for a semicolon-separated file. This is not surprising - the mime type also does not specify the character-encoding, for example.
You could simply use text/plain (since the file is after all simply a text file), but I presume that you want to use a particular mime type because you want the browser/OS to open it in an "appropriate" application. (This format is only really suitable for processing by an application of some sort).
I would think that in 90% of cases that appropriate application will be Excel. For the few users who don't have Excel but do have an Excel-a-like, you may find that the Excel-a-like application registers itself as able to consume Excel files, and so it may all just work as you wanted?

Resources