Corrupted Excel File & 7zip - excel

I have a problem with a corrupted excel file. So far I have used 7zip to open it as an archive and extract most of the data. But some important sheets cannot be extracted.
Using the l command of 7zip I get the following output :
7z.exe l -slt "C:\Users\corrupted1.xlsm" xl/worksheets/sheet3.xml
Output:
Listing archive: C:\Users\corrupted1.xlsm
--
Path = C:\Users\corrupted1.xlsm
Type = zip
Physical Size = 11931916
----------
Path = xl\worksheets\sheet3.xml
Folder = -
Size = 57217
Packed Size = 12375
Modified = 1980-01-01 00:00:00
Created =
Accessed =
Attributes = .....
Encrypted = -
Comment =
CRC = 553C3C52
Method = Deflate
Host OS = FAT
Version = 20
However when trying to extract it (or test it for that matter) I get :
7z.exe t -slt "C:\Users\corrupted1.xlsm" xl/worksheets/sheet3.xml
Output:
Processing archive: C:\Users\corrupted1.xlsm
Testing xl\worksheets\sheet3.xml Unsupported Method
Sub items Errors: 1
The method listed above says Deflate, which is the same for all the worksheets.
Is there anything I can do? What kind of corruption is this? Is it the CRC? Can I ignore it somehow or something?
Please help!
Edit:
The following is the error when trying to extract or edit the xml file through 7zip:
Edit 2:
Tried with WinZip as well, getting :
Extracting to "C:\Users\axpavl\AppData\Local\Temp\wzf0b9\"
Use Path: yes Overlay Files: yes
Extracting xl\worksheets\sheet2.xml
Unable to find the local header for xl\worksheets\sheet2.xml.
Severe Error: Cannot find a local header.

This might help:
https://superuser.com/questions/145479/excel-edit-the-xml-inside-an-xlsx-file
and this on too: http://www.techrepublic.com/blog/tr-dojo/recover-data-from-a-damaged-office-file-with-the-help-of-7-zip/

Related

How to download a sentinel images from google earth engine using python API in tfrecord

While trying to download sentinel image for a specific location, the tif file is generated by default in drive but its not readable by openCV or PIL.Image().Below is the code for the same. If I use the file format as tfrecord. There are no Images downloaded in the drive.
starting_time = '2018-12-15'
delta = 15
L = -96.98
B = 28.78
R = -97.02
T = 28.74
cordinates = [L,B,R,T]
my_scale = 30
fname = 'sinton_texas_30'
llx = cordinates[0]
lly = cordinates[1]
urx = cordinates[2]
ury = cordinates[3]
geometry = [[llx,lly], [llx,ury], [urx,ury], [urx,lly]]
tstart = datetime.datetime.strptime(starting_time, '%Y-%m-%d') tend =
tstart+datetime.timedelta(days=delta)
collSent = ee.ImageCollection('COPERNICUS/S2').filterDate(str(tstart).split('')[0], str(tend).split(' ')[0]).filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)).map(mask2clouds)
medianSent = ee.Image(collSent.reduce(ee.Reducer.median())) cropLand = ee.ImageCollection('USDA/NASS/CDL').filterDate('2017-01-01','2017-12-31').first()
task_config = {
'scale': my_scale,
'region': geometry,
'fileFormat':'TFRecord'
}
f1 = medianSent.select(['B1_median','B2_median','B3_median'])
taskSent = ee.batch.Export.image(f1,fname+"_Sent",task_config)
taskSent.start()
I expect the output to be readable in python so I can covert into numpy. In case of file format 'tfrecord', I expect the file to be downloaded in my drive.
I think you should think about the following things:
File format
If you want to open your file with PIL or OpenCV, and not with TensorFlow, you would rather use GeoTIFF. Try with this format and see if things are improved.
Saving to drive
Normally saving to your Drive is the default behavior. However, you can try to force writing to your drive:
ee.batch.Export.image.toDrive(image=f1, ...)
You can further try to setup a folder, where the images should be sent to:
ee.batch.Export.image.toDrive(image=f1, folder='foo', ...)
In addition, the Export data help page and this tutorial are good starting points for further research.

MafftCommandline and io.StringIO

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

How to use dcmtk/dcmprscp in Windows

How can I use dcmprscp to receive from SCU Printer a DICOM file and save it, I'm using dcmtk 3.6 & I've some trouble to use it with the default help, this's what I'm doing in CMD:
dcmprscp.exe --config dcmpstat.cfg --printer PRINT2FILE
each time I receive this messagebut (database\index.da) don't exsist in windows
W: $dcmtk: dcmprscp v3.6.0 2011-01-06 $
W: 2016-02-21 00:08:09
W: started
E: database\index.dat: No such file or directory
F: Unable to access database 'database'
I try to follow some tip, but the same result :
http://www.programmershare.com/2468333/
http://www.programmershare.com/3020601/
and this's my printer's PRINT2FILE config :
[PRINT2FILE]
hostname = localhost
type = LOCALPRINTER
description = PRINT2FILE
port = 20006
aetitle = PRINT2FILE
DisableNewVRs = true
FilmDestination = MAGAZINE\PROCESSOR\BIN_1\BIN_2
SupportsPresentationLUT = true
PresentationLUTinFilmSession = true
PresentationLUTMatchRequired = true
PresentationLUTPreferSCPRendering = false
SupportsImageSize = true
SmoothingType = 0\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15
BorderDensity = BLACK\WHITE\150
EmptyImageDensity = BLACK\WHITE\150
MaxDensity = 320\310\300\290\280\270
MinDensity = 20\25\30\35\40\45\50
Annotation = 2\ANNOTATION
Configuration_1 = PERCEPTION_LUT=OEM001
Configuration_2 = PERCEPTION_LUT=KANAMORI
Configuration_3 = ANNOTATION1=FILE1
Configuration_4 = ANNOTATION1=PATID
Configuration_5 = WINDOW_WIDTH=256\WINDOW_CENTER=128
Supports12Bit = true
SupportsDecimateCrop = false
SupportsTrim = true
DisplayFormat=1,1\2,1\1,2\2,2\3,2\2,3\3,3\4,3\5,3\3,4\4,4\5,4\6,4\3,5\4,5\5,5\6,5\4,6\5,6
FilmSizeID = 8INX10IN\11INX14IN\14INX14IN\14INX17IN
MediumType = PAPER\CLEAR FILM\BLUE FILM
MagnificationType = REPLICATE\BILINEAR\CUBIC
The documentation of the "dcmprscp" tool says:
The dcmprscp utility implements the DICOM Basic Grayscale Print
Management Service Class as SCP. It also supports the optional
Presentation LUT SOP Class. The utility is intended for use within the
DICOMscope viewer.
That means, it is usually not run from the command line (as most of the other DCMTK tools) but started automatically in the background by DICOMscope.
Anyway, I think the error message is clear:
E: database\index.dat: No such file or directory
F: Unable to access database 'database'
Did you check whether there is a subdirectory "database" and whether the "index.dat" file exists in this directory? If you should ask why there is a need for a "database" then please read the next paragraph of the documentation:
The dcmprscp utility accepts print jobs from a remote Print SCU.
It does not create real hardcopies but stores print jobs in the local
DICOMscope database as a set of Stored Print objects (one per page)
and Hardcopy Grayscale images (one per film box N-SET)

PyQt QFileEditor default suffix

I have looked out through bunch of code but this peace of code doesn't work as expected for me:
export_dialog = QtGui.QFileDialog()
export_dialog.setWindowTitle('Export')
export_dialog.setDirectory(EXPORT_DIR)
export_dialog.setAcceptMode(QtGui.QFileDialog.AcceptSave)
export_dialog.setNameFilter('INI files (*.ini)')
export_dialog.setDefaultSuffix('ini')
export_file, _ = export_dialog.getSaveFileName()
print(export_file)
I'm saving my file without extension, counting on that my above configurations will set it properly, but it doesn't work. There is no extension added.
Any suggestions?
Thanks
export_dialog = QtGui.QFileDialog()
export_dialog.setWindowTitle('Export')
export_dialog.setDirectory(EXPORT_DIR)
export_dialog.setAcceptMode(QtGui.QFileDialog.AcceptSave)
export_dialog.setNameFilter('INI files (*.ini)')
export_dialog.setDefaultSuffix('ini')
if export_dialog.exec_() == QtGui.QFileDialog.Accepted:
print(export_dialog.selectedFiles()[0])
This code will return a full file path with selected filter also.

JAudioTagger Deleting First Few Seconds of Track

I've written a simple Groovy script (below) to set the values of four of the ID3v1 and ID3v2 tag fields in mp3 files using the JAudioTagger library. The script successfully makes the changes but it also deletes the first 5 to 10 seconds of some of the files, other files are unaffected. It's not a big problem, but if anyone knows a simple fix, I would be grateful. All the files are from the same source, all have v1 and v2 tags, I can find no obvious difference in the source files to explain it.
import org.jaudiotagger.*
java.util.logging.Logger.getLogger("org.jaudiotagger").setLevel(java.util.logging.Level.OFF)
Integer trackNum = 0
Integer totalFiles = 0
Integer invalidFiles = 0
validMP3File = true
def dir = new File(/D:\Users\Jeremy\Music\Speech Radio\Unlistened\Z Temp Files to MP3 Tagged/)
dir.eachFile({curFile ->
totalFiles ++
try {
mp3File = org.jaudiotagger.audio.AudioFileIO.read(curFile)
} catch (org.jaudiotagger.audio.exceptions.CannotReadException e) {
validMP3File = false
invalidFiles ++
}
// Get the file name excluding the extension
baseFilename = org.jaudiotagger.audio.AudioFile.getBaseFilename(curFile)
// Check that it is an MP3 file
if (validMP3File) {
if (mp3File.getAudioHeader().getEncodingType() != 'mp3') {
validMP3File = false
invalidFiles ++
}
}
if (validMP3File) {
trackNum ++
if (mp3File.hasID3v1Tag()) {
curTagv1 = mp3File.getID3v1Tag()
} else {
curTagv1 = new org.jaudiotagger.tag.id3.ID3v1Tag()
}
if (mp3File.hasID3v2Tag()) {
curTagv2 = mp3File.getID3v2TagAsv24()
} else {
curTagv2 = new org.jaudiotagger.tag.id3.ID3v23Tag()
}
curTagv1.setField(org.jaudiotagger.tag.FieldKey.TITLE, baseFilename)
curTagv2.setField(org.jaudiotagger.tag.FieldKey.TITLE, baseFilename)
curTagv1.setField(org.jaudiotagger.tag.FieldKey.ARTIST, "BBC Radio")
curTagv2.setField(org.jaudiotagger.tag.FieldKey.ARTIST, "BBC Radio")
curTagv1.setField(org.jaudiotagger.tag.FieldKey.ALBUM, "BBC Radio - 20130205")
curTagv2.setField(org.jaudiotagger.tag.FieldKey.ALBUM, "BBC Radio - 20130205")
curTagv1.setField(org.jaudiotagger.tag.FieldKey.TRACK, trackNum.toString())
curTagv2.setField(org.jaudiotagger.tag.FieldKey.TRACK, trackNum.toString())
mp3File.setID3v1Tag(curTagv1)
mp3File.setID3v2Tag(curTagv2)
mp3File.save()
}
})
println """$trackNum tracks created from $totalFiles files with $invalidFiles invalid files"""
I'm still investigating and it appears that there is no problem with JAudioTagger. Before setting the tags, I use Total Recorder to reduce the quality of the download from 128kbps, 44,100Hz to 56kbps, 22,050Hz. This reduces the file size to less than half and the quality is fine for speech radio.
If I run my script on the original files, none of the audio track is deleted. The deletion of the first part of the audio track only occurs with the files that have been processed by Total Recorder.
Looking at the JAudioTagger logging for these files, there does appear to be a problem with the header:
Checking further because the ID3 Tag ends at 0x23f9 but the mp3 audio doesnt start until 0x7a77
Confirmed audio starts at 0x7a77 whether searching from start or from end of ID3 tag
This check is not performed for files that have not been processed by Total Recorder.
The log of the header read operation also shows (for a 27 minute track):
trackLength:06:52
It looks as though I shall have to find a new MP3 file editor!
Instead of
mp3File.save()
could you try:
mp3File.commit()
No idea if it will help, but that seems to be the documented method?

Resources