When I extract spectra from e2ds_A.fits,I found that I need ESO DRS BLAZE FILE and ESO DRS FLAT FILE,but I can't download that from ESO, so how can I get that?,such as HARPS.2003-12-16T22:28:53.145_blaze_A.fits
Related
To list all the files in a spreadsheet (xlsx) file, I can do:
$ unzip -l /Users/david/Desktop/myspreadsheet.xlsx
698 01-01-1980 00:00 xl/_rels/workbook.xml.rels
1415625191 01-01-1980 00:00 xl/worksheets/sheet1.xml
6798 01-01-1980 00:00 xl/theme/theme1.xml
2315 01-01-1980 00:00 xl/styles.xml
779218 01-01-1980 00:00 xl/sharedStrings.xml
322 01-01-1980 00:00 xl/worksheets/_rels/sheet1.xml.rels
9840 01-01-1980 00:00 xl/printerSettings/printerSettings1.bin
640 01-01-1980 00:00 docProps/core.xml
797 01-01-1980 00:00 docProps/app.xml
From here we can see that the spreadsheet has one sheet -- xl/worksheets/sheet1.xml. Is there a way to only see the zip contents of the xl/worksheets/ folder? For example, doing something like:
$ unzip -l /Users/david/Desktop/myspreadsheet.xlsx xl/worksheets
The most I could find from man was:
-l list archive files (short format). The names, uncompressed file sizes and modification dates and
times of the specified files are printed, along with totals for all files specified. If UnZip was
compiled with OS2_EAS defined, the -l option also lists columns for the sizes of stored OS/2
extended attributes (EAs) and OS/2 access control lists (ACLs). In addition, the zipfile comment
and individual file comments (if any) are displayed. If a file was archived from a single-case
file system (for example, the old MS-DOS FAT file system) and the -L option was given, the file-
name is converted to lowercase and is prefixed with a caret (^).
But it seems like there are tons of other options. Is there a way to do the above?
I want to use OpenNLP in order to tokenize Thai words. I downloaded OpenNLP and Thai tokenize model and run the following
./bin/opennlp POSTagger -lang th -model thai.tok.bin < sentence.txt > output.txt
I put thai.tok.bin that I downloaded on the directory that I call from and run the following. sentence.txt has this text inside กินอะไรยังนาย. However, the output I got has only these text:
Usage: opennlp POSTagger model < sentences
Execution time: 0.000 seconds
I'm pretty new to OpenNLP, please let me know if anyone knows how to get output from it.
The models from your link are outdated. First you need some manual steps to convert the model.
Download the file thai.tok.bin.gz and extract to an empty folder. Rename the extracted file thai.tok.bin to token.model
In the same folder, create a file named manifest.properties with the following contents:
Manifest-Version=1.0.
Language=th
OpenNLP-Version=1.5.0
Component-Name=TokenizerME
useAlphaNumericOptimization=false
Now you can zip the files, if you are using Linux you can use this command: zip thai.tok.bin token.model manifest.properties
Try your model:
sh bin/opennlp TokenizerME ~/Downloads/thai-token.bin/thai.tok.bin < thai_sentence.txt
Loading Tokenizer model ... done (0,097s)
กินอะไร ยังนาย
Average: 333,3 sent/s
Total: 1 sent
Runtime: 0.003s
Execution time: 0,108 seconds
Now that you have the updated tokenizer, you can do similar with the POS Tagger model.
Download the file thai.tag.bin.gz and extract to a empty folder. Rename the extracted file thai.tag.bin to pos.model
In the same folder, create a file named manifest.properties with the following contents:
Manifest-Version=1.0
Language=th
OpenNLP-Version=1.5.0
Component-Name=POSTaggerME
Now you can zip the files, if you are using Linux you can use this command: zip thai.pos.bin pos.model manifest.properties
Finally, we can try the two models combined:
sh bin/opennlp TokenizerME ~/Downloads/thai-token.bin/thai.tok.bin < thai_sentence.txt > thai_tokens.txt
sh bin/opennlp POSTagger ~/Downloads/pt-pos-maxent/thai.pos.bin < thai_tokens.txt
The result is:
กินอะไร_VACT ยังนาย_NCMN
Please, let me know if this is the expected result.
I want to create a PDF with selectable/searchable text..
I have source.png which has gone through some pre-processing before OCR, and then I have view.jpg which is a compressed version of source.png to reduce the output PDF file
How do I define the view.jpg in the syntax?
tesseract -l eng source.png out pdf
I'm not sure whether you can specify view.jpg in the command. The out.pdf already contains some sort of a compressed source.png.
I'm trying to validate if two zip packages are equivalent. I can not rely on md5sum. When I extract the two packages, and do a md5sum diff between all the files in the packages, there is no difference, and all files have equivalent md5sums. But the zip packages themselves have different md5sum values. My question is: How can I validate that two zip packages are equivalent?
When you list the archive's content with
unzip -v archive.zip
you get a list of files with these column headings
Length Method Size Cmpr Date Time CRC-32 Name
Depending on what you consider equivalent (e.g. Size, CRC, Name), you can extract the relevant columns for both archives, sort them and do a diff over the output.
without unzipping the file you can use zipinfo
e.g:
ipinfo 5.zip
Archive: 5.zip 158 bytes 1 file
drwxr-xr-x 3.0 unx 0 bx stor 18-Nov-13 07:23 501/
1 file, 0 bytes uncompressed, 0 bytes compressed: 0.0%
I was sent a zip file containing 40 files with the same name.
I wanted to extract each of these files to a seperate folder OR extract each file with a different name (file1, file2, etc).
Is there a way to do this automatically with standard linux tools? A check of man unzip revealed nothing that could help me. zipsplit also does not seem to allow an arbitrary splitting of zip files (I was trying to split the zip into 40 archives, each containing one file).
At the moment I am (r)enaming my files individually. This is not so much of a problem with a 40 file archive, but is obviously unscalable.
Anyone have a nice, simple way of doing this? More curious than anything else.
Thanks.
Assuming that no such tool currently exists, then it should be quite easy to write one in python. Python has a zipfile module that should be sufficient.
Something like this (maybe, untested):
#!/usr/bin/env python
import os
import sys
import zipfile
count = 0
z = zipfile.ZipFile(sys.argv[1],"r")
for info in z.infolist():
directory = str(count)
os.makedirs(directory)
z.extract(info,directory)
count += 1
z.close()
I know this is a couple years old, but the answers above did not solve my particular problem here so I thought I should go ahead and post a solution that worked for me.
Without scripting, you can just use command line input to interact with the unzip tools text interface. That is, when you type this at the command line:
unzip file.zip
and it contains files of the same name, it will prompt you with:
replace sameName.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename:
If you wanted to do this by hand, you would type "r", and then at the next prompt:
new name:
you would just type the new file name.
To automate this, simply create a text file with the responses to these prompts and use it as the input to unzip, as follows.
r
sameName_1.txt
r
sameName_2.txt
...
That is generated pretty easily using your favorite scripting language. Save it as unzip_input.txt and then use it as input to unzip like this:
unzip < unzip_input.txt
For me, this was less of a headache than trying to get the Perl or Python extraction modules working the way I needed. Hope this helps someone...
here is a linux script version
in this case the 834733991_T_ONTIME.csv is the name of the file that is the same inside every zip file, and the .csv after "$count" simply has to be swapped with the file type you want
#!/bin/bash
count=0
for a in *.zip
do
unzip -q "$a"
mv 834733991_T_ONTIME.csv "$count".csv
count=$(($count+1))
done`
This thread is old but there is still room for improvement. Personally I prefer the following one-liner in bash
unzipd ()
{
unzip -d "${1%.*}" "$1"
}
Nice, clean, and simple way to remove the extension and use the
Using unzip -B file.zip did the trick for me. It creates a backup file suffixed with ~<number> in case the file already exists.
For example:
$ rm *.xml
$ unzip -B bogus.zip
Archive: bogus.zip
inflating: foo.xml
inflating: foo.xml
inflating: foo.xml
inflating: foo.xml
inflating: foo.xml
$ ls -l
-rw-rw-r-- 1 user user 1161 Dec 20 20:03 bogus.zip
-rw-rw-r-- 1 user user 1501 Dec 16 14:34 foo.xml
-rw-rw-r-- 1 user user 1520 Dec 16 14:45 foo.xml~
-rw-rw-r-- 1 user user 1501 Dec 16 14:47 foo.xml~1
-rw-rw-r-- 1 user user 1520 Dec 16 14:53 foo.xml~2
-rw-rw-r-- 1 user user 1520 Dec 16 14:54 foo.xml~3
Note: the -B option does not show up in unzip --help, but is mentioned in the man pages: https://manpages.org/unzip#options