Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
The community reviewed whether to reopen this question 8 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I need to write a script to find out if a given document is of the format .doc or not.
Iam using Amazon Linux machine. I tried to make use of the linux file command.
For a given doc file the file command outputs the file information as following:
sample_file.doc: Composite Document File V2 Document, No summary info
I found out that file command provides the same file type information for 2003 excel files (.xls).
I want to know what all file types (like doc,xls) come under Composite Document File V2 Document and how I can check if given file is a doc file or not in Amazon Linux 2012 machine?
It is a document format of the Microsoft. I used the guide here to convert my files without issues.
Essentially, you can use the unoconv tool for the conversion to a more friendly format.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 months ago.
Improve this question
I have downloaded the following file on my Linux computer:
wget https://github.com/tomwhite/hadoop-book/blob/master/input/ncdc/all/1901.gz
I tried to unzip the file using gunzip 1901.gz but it did not work. I check the file format using 'file' command and it says:
1901.gz: HTML document, UTF-8 Unicode text, with very long lines
I am quite new to Linux. May I know how can I successfully extract the data for usage?
You have downloaded a regular HTML file and you called it something.gz, hoping that that would turn it into a zipped file, but this is not how it works: your file is not a zipped file, so there's no reason trying to unzip it.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
I have a user who was working on an Excel 2007 file from a thumbdrive.. all of a suddem the file will not open and generates the following error:
"Excel cannot open the file 'filename.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file. (OK)"
I hit Ctrl-Shift-i to get the code for that error (101590)
Any ideas how to repair?
I have tried the following to no avail:
Open and Repair tool
Opening with OpenOffice
http://office.microsoft.com/en-us/excel-help/repairing-a-corrupted-workbook-HA010097017.aspx
http://support.microsoft.com/kb/928979
First try to rename file. Go to tools, folder options, file types and check have you .xlsx extension. If not then rename the file .xlsx to .xls
Second try to look here: https://social.technet.microsoft.com/Forums/en-US/4994c2f4-ce6e-467d-a06c-d9ab7d67b706/i-have-a-case-that-the-client-work-on-an-exists-excel-file?forum=excel
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I am just preparing a doc file. i am using LibreOffice in ubuntu.
The name of the file is
WebApplicationRequirements.doc
When i am saving that in a drive, an extra file shows in the hidden file.
The name is
.~lock.WebApplicationRequirements.doc#
When i am pushing that in a remote repository it is including that hidden file. If i delete that fill will it harm the original file. And why it is happening?
As the name suggests, that hidden file is a lock file used internally by LibreOffice. To prevent multiple LibreOffice application instances from writing to the same file at the same time. It's not generally harmful to delete that file. It should get re-created again next time you open that file again in LibreOffice.
You haven't indicated what remote repo system you are using. But also note that most repro systems (e.g. git) have the concept of ignore files which allow you to configure which files to ignore during commit. If your repro system has that you probably want to add a rule to ignore the lock files so that they are not committed/pushed.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I'm interested in saving a pcap that has network layer name resolution. While it works great within Wireshark, how can I save it with the resolved names intact? Having this information would be extremely helpful for me and save me a lot of time if this is possible. I understand in the documentation that it can't be saved within the pcap file (http://www.wireshark.org/docs/wsug_html_chunked/ChAdvNameResolutionSection.html#idp390072124) but is there an alternative way to do so? Does anyone have any solutions to this?
Thanks in advance!
I haven't tried it myself, but in theory the name resolution information can/will be stored in the pcap-ng file format, which has been Wireshark's default file format since version 1.8. The old pcap file format you cite won't, but pcap-ng has a specific defined block type in its format for ip<->name resolution information.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I have a large dump of data from an outlook email account that comes entirely in .msg files. A quick call to ubuntu's file method revealed that they were Composite Document File V2 Documents (whatever that means). I would really like to be able to read these files as plaintext. Is that possible at all?
Update: Turns out it wasn't totally possible to do what I wanted for large scale data mining on these kinds of files which was a bummer. In case you face the same issue I made a library to address this issue. https://github.com/Slater-Victoroff/msgReader
Documentation isn't great, but it's a pretty small library so it should be self explanatory.
I faced the same problem this morning. I didn't find any information on the file format but it was possible to extract the required information from the file using strings and grep:
strings -e l *.msg | grep pattern
The -e l (that's a small L) converts from UTF-16.
This will only work if you can grep the data you need from the file (i.e. all required lines contain a standard string or pattern).