I'm working in a wordcloud and I was thinking in a way to open whatever file as an input chosen by the user. For example, an docx file like input and exctrat all text on it and do the wordcloud. Or use a csv file or txt file, something like that.
I've tried with some libraries like pandas, but por each type of document I spent some lines of code and 'if loops' to test the type of the file.
Maybe:
But I'm not very sure about that.
I think you should use different libraries for every extension, because you can't simply skip the headers and then read the content.
At least not using just some variant of "with a as open("path")"
Related
I have done quite a bit of searching before posting this question so let me outline what I am trying to do.
1.) I do not want to use applications I have to download from a website or created custom commands (please no start Xls2Csv.exe here's a link to a website where you can download the program) I do not want to download a program to do this.
2.) I want to keep it in the batch file if possible - I have tried the vbc/vbs/vb files that is not what I am looking for.
3.) I found this an this is close to what I need but if I can stay within a batch file that would be best: Can a Batch File Tell a program to save a file as? (If so how)
Background
I have a bunch of test records stored in excel sheets within folders. Each test record has autoformatted name so the only real difference between any of the filenames is a serial number, otherwise each file name is formatted the exact same way.
I have written a batch file to search and find the files I need but I am stuck on obtaining a tiny bit of information in a .xls file.
What I am trying to do - I have excel files (.xls) and there is a word in a cell on one of many sheets that I would like to copy into a textfile. However I am unable to use findstr for an excel find because the command searches the file as if you opened it in notepad and the data I need is not present.
I am not concerned of data loss as long as I can get this tiny bit of information to a text file.
Otherwise what I have found to be the best solution is to convert an XLS to a CSV. I have manually done it by opening the file and saving as type .csv that worked.
What hasn't worked is:
example1.xls >> example2.csv
ren example1.xls example3.csv - this will save it as a csv file but still opens with the same formating of the xls file in both excel and notepad.
I was hoping that the was a command to recreate the manual process of opening the file and saving as csv.
If there are any other suggested solutions - maybe a command where I can search for a string within an excel file? That would be the simplest option.
I'm trying to write a context-sensitive text parser in python. To do that, I need to be able to open ppt files and extract both the text and information about how it was formatted. I need to be able to tell if a sentence was in a header or if it's bolded, for instance.
It's supposed to run on large batches of files, so manually converting all the ppts to pptxs isn't practical.
I tried tika, but it doesn't give formatting information.
I tried python-pptx, but it doesn't seem like it can open ppts.
And I'm hoping to make the parser OS agnostic, so the command-line converters I've seen proposed on other variants of this question won't work for me, unless they'll somehow work on linux, mac, and windows.
I am trying to change a number in an excel file (and eventually multiple excel files by putting it in a loop). I want to edit the file and save it as a new file, which I have done successfully. The problem is the new file that I save strips all of the formatting that the old excel file had. Correct me if I'm wrong: but I can't use Openpyxl because it only works for .xlsx files (all of the files I'm working with are .xls).
I've looked into pandas but was unsuccessful in finding a solution. I'm most familiar with xlrd and xlwt, but am willing to try any other libraries if it solves the problem.
import xlrd
from xlutils.copy import copy
from xlrd import *
# To open Workbook
loc = (r"X:\Projects\test.xls")
wb = xlrd.open_workbook(loc)
dose = wb.sheet_by_index(13)
manu = dose.cell_value(4,3)
#writing 675
w = copy(open_workbook(loc))
if manu == "Hologic":
w.get_sheet(13).write(5,3,675)
w.save('book2.xls')
Again, the code works without any errors. But the new .xls file has no formatting. The formatting is crucial for this project, so I can't lose any of it.
You probably can't.
Microsoft created xlsx files for a reason: the classic xls format is a legacy binary file piling up hundreds, maybe thousands, of features, each reprented in differing ways (and the file format was not even openly documented back then, I don't know if it is now). So there is one app that can open a xls file and guarrantee to present what is there with all the features intended by the file creator: Excel. And the same Excel version that created the file, in that.
So, any open library that can write to xls will create the most basic files, with no formatting - and be lucky if it can parse out the content parts.
xlsx files on the other hand use conforming xml files internally, and even a program that does not care to know about the full specs can change information in the file and preserve formatting and other things simply by not touching anything it does not know about, and assembling a valid xml again.
That said, if you can't convert to xlsx, maybe the easier thing to do is use Python to drive Excel itself to make the changes for you, in an automated way.
The documentation for that is few and far apart, but that is possible by using pywin32 and the "COM" api - take a look here for a start: https://pbpython.com/windows-com.html
Another option is using LibreOffice - it can read and write xls files with formatting (though surely with losses), and is scriptable in Python. Unfortunatelly, the information on how to script LibreOffice using Python to do that is also hard to find, and the legacy option of using their "UNO" thing to enable interaction with Python makes its use complicated.
I'm making a file monitor for a folder where I download subtitles. So far, it works like this:
Look for new .rar files in the folder.
If found, extract the subtitles and delete the .rar file
If a single .srt file was extracted, save the file name to a variable.
Now, I'm clueless about how to achieve the next (and final) part of the script:
I want to find a pattern based on the way subtitles are named.
Let's say, the subtitles file can be something like this:
SomeShow.1x03.stuff.srt
some_show s01e03-stuff.srt
some show 1-03 stuff.srt
etc.
I want to get something like: SomeShow 1 3 and based on that, start the video with the name that matches that pattern, which I guess would be a matter of reversing the process that was used to get the Show, season and episode based on the name of the .srt file.
Is this possible at all? It'd be really simple stuff in most languages, but I really need this to be a .bat and I'm clueless about how to approach this... so far all I've managed to do is to remove the extension from the variable.
Thanks in advance.
Batch files are Turing complete - you can do anything in them, but it is usually not wise to go to extremes. You might be able to package a sed or grep or your own binary alongside your .bat file for a good compromise between batchiness and function. If you can assume a suitable operating system, you will have Powershell installed and go that route.
You should recognize that the task is not exactly defined and that the "solution" may need some tweaking and be never robust enough.
For this reason, the richer language you can pick, the further you will get.
I have a .csv file in my matlab folder with 38 columns and about 48 thousand entries. I was hoping on using the findcluster gui but it only accepts .dat files.
How do I create a .dat file in matlab or specifically how do I convert the .csv file into a .dat file that can be used by the matlab fcm clustering tool?
example of csv:
how would I go about creating a data file for this kind of information?
The only documentation I could find about the file format was
The data set must have the extension .dat. For example, to load the data set,
clusterdemo.dat, type findcluster('clusterdemo.dat').
I checked clusterdemo.dat and found that the data is stored in ASCII format. Therefore, try
a = csvread('data.csv');
save 'data.dat' a -ASCII
Just rename xxx.csv to xxx.dat. This worked for me.
you should try changing extension.For changing extension you can go to folder settingļ and in view where we show hidden fileā¦uncheck the hide extension for known files and now you can change the extension of any file by renaming it.
Because
There really isn't such a thing as 'dat' format, a 'dat' file is just a text file, it could theoretically have any extension you want.It could also be delimited however you want/need, it all really depends on what you are trying to achieve.
ie what are you going to use this file for?
If it's for use with another application then the requirements of that application will probably dictate how it's delimited/structured etc.
OR simply you can save the file from the excel as .csv and then later can change the extension.
It worked for me.