Reading csv.gz file in torchtext

Reading csv.gz file in torchtext - pytorch

Pandas’s read_csv works for csv.gz as well.
Is there a way to achieve similar with PyTorch?https://torchtext.readthedocs.io/en/latest/data.html#torchtext.data.Dataset doesn’t seem to have such an option.

TLDR: No, this is not supported by TabularDataset
torchtext.data.TabularDataset uses csv.reader.
Using csvreader against a gzipped file in Python suggests if you open the file with gzip.open, csv.reader can read it.
However, TabularDataset asks for a file path, not a file pointer, so digging into the source code, it uses
io.open(os.path.expanduser(path), encoding="utf8")
To open the filepath. Since .gz is not utf8, this won't read the file correctly.

Related

Open with Pandas in Python a .xls file that is corrupted

So here is the problem, I'm trying to import a DF from a file downloaded from COGNOS. In cognos I select .CSV format but always is downloaded the format is .xls
It will be very easy to open the .xls file and save as CVS but the problem with that is that the file has more rows than excel so I will lose a lot of data in the process. Also when I open the file in excel it is a warning that the file could be corrupted.
When I'm trying to open the data with df = pd.read_excel("Time Series 2018-1.xls") it shows the following problem.
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xff\xfeP\x00r\x00o\x00'
Please HELP

You can try
Change the file name, remove spaces and dash then try again
follow along with this pandas official link

I already resolve it. Just open the file in sublime and save with encoding UTF-8. Then just open it with df = pd.read_csv("Prueba1.xls", sep = "\t", encoding = 'utf8') because as #dougp said, is just a csv file save with the extension xls.
I guess there is a way to change the encoding in PYTHON but that's for another question.

How to open a text file from my desktop while using python 3.7.1 in Terminal

I saved a text file to my desktop named "test.txt" within the file I wrote only my name, David. Then, I opened terminal and opened python 3.7.1 and wrote the following code in attempt to see my name, David, populate:
open("/Users/David/Desktop/test.txt,"r")
However, I receive the following error message:
SyntaxError: EOL while scanning string literal
Does anyone know how I can avoid this error and have my name, David, read from the test.txt file on my desktop? Or am I going about this completely wrong?

As #Matt explained, you are missing quotes.
You can follow below approach to open file and read from it.
myfile = open("/Users/David/Desktop/test.txt","r") #returns file handle
myfile.read() # reading from the file
myfile.close() # closing the file handle, to release the resources.
For more information on how to do read/write operations on file

You are missing a quotation mark, after your file path. It should look like this:
open("/Users/David/Desktop/test.txt","r")
^ This quotation mark
This will open the file correctly, however you will still need to actually read from it.

You are missing the other quotations as the others have mentioned. Try using the with open statement, as it handles your resources for you, meaning you don't need to specify .close()
with open("/Users/David/Desktop/test.txt", "r") as file:
file.read()

you can use with which will close the file automatically as you come out of the block and put your Directory link
with open(r"Directory_link", "r") as file1:
FileContent = file1.read()
print(FileContent)

python 3.5 appending .txt file not formatting correctly when opened in notepad

I am trying to append to a text file and write on a new line each time I append the file for readability in notepad. I believe this should be simple and researched thoroughly but I am still having an issue. Here is the snippet of code that writes to a .txt file:
appending_Text = data2
with open(file_Name, 'a+') as file:
file.write(appending_Text)
file.write('\n')
When I run this code and then check the text file, I get my appended data on the same line. When I open the .txt file using notepad, I want it to look like:
data1
data2
When I open the .txt file using notepad in windows, it looks like:
data1data2
What am I missing?

I figured out the answer and it's not python related but rather a limitation of notepad in windows. Notepad uses a different new line termination than is used in linux systems. Linux uses '\n' and notepad uses '/r/n'

.mht file looking for local resources

I'm auto-generating .doc files (.mht really) according to this tutorial.
Generated files work great locally but not on other computers as the file is requesting the header to be loaded from my local path instead of using the base64 version that is embedded. The main content is loading fine. Here's the generated file.
I can't find the reason for this behaviour. Any suggestions appreciated.
EDIT
Turned out that the problem was caused by unnecessary carriage return and line feed characters at the beginning of my base64 strings.

.dat file how to create one based on excel document

I have a .csv file in my matlab folder with 38 columns and about 48 thousand entries. I was hoping on using the findcluster gui but it only accepts .dat files.
How do I create a .dat file in matlab or specifically how do I convert the .csv file into a .dat file that can be used by the matlab fcm clustering tool?
example of csv:
how would I go about creating a data file for this kind of information?

The only documentation I could find about the file format was
The data set must have the extension .dat. For example, to load the data set,
clusterdemo.dat, type findcluster('clusterdemo.dat').
I checked clusterdemo.dat and found that the data is stored in ASCII format. Therefore, try
a = csvread('data.csv');
save 'data.dat' a -ASCII

Just rename xxx.csv to xxx.dat. This worked for me.

you should try changing extension.For changing extension you can go to folder settingand in view where we show hidden file…uncheck the hide extension for known files and now you can change the extension of any file by renaming it.
Because
There really isn't such a thing as 'dat' format, a 'dat' file is just a text file, it could theoretically have any extension you want.It could also be delimited however you want/need, it all really depends on what you are trying to achieve.
ie what are you going to use this file for?
If it's for use with another application then the requirements of that application will probably dictate how it's delimited/structured etc.
OR simply you can save the file from the excel as .csv and then later can change the extension.
It worked for me.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Reading csv.gz file in torchtext - pytorch

Pandas’s read_csv works for csv.gz as well. Is there a way to achieve similar with PyTorch?https://torchtext.readthedocs.io/en/latest/data.html#torchtext.data.Dataset doesn’t seem to have such an option.

Related

Open with Pandas in Python a .xls file that is corrupted

How to open a text file from my desktop while using python 3.7.1 in Terminal

python 3.5 appending .txt file not formatting correctly when opened in notepad

.mht file looking for local resources

.dat file how to create one based on excel document

Categories

Resources