As a kind of follow up of this question, I would like to ask regarding binary files (such us excel files) and versioning.
Let's say I want to use github to store a programming project. No problem there since the majority of files are text (no matter the language).
But I have also documentation. What if I put it into a folder of the github project? (I have seen projects that do this)
I read git is no good for this, so how can I work versioning for say excel files?
You could save your excel as .fods, which is regular .ods file saved as flat XML. This format is probably not supported by MS Office, so you may need to install Libre Office for this (it is free).
Since .fods is regular XML, it can be versioned as regular text file with diffs and (with some luck) even support of merges between branches.
You could also save other Open Document formats as flat XMLs:
.fodt for word processing (text) documents
.fods for spreadsheets
.fodp for presentations
.fodg for graphics
So if migration to Libre Office is not a problem, this is probably the best solution.
If this is not an option, you may consider using Git LFS for storing binaries. But if files are small and you don't change them often, you can just ignore the whole problem - few small binary files will not hurt your repository. You should just estimate - if you will start versioning 1 MB binary file and save 100 versions of it, this will increase size of your repository about 100 MB (it could be smaller if file can be compressed). You need a really large codebase to reach 100 MB in repository with text source files only, so in this case your repository will be filled mainly by binary files.
BTW: GitHub released a tool for measuring size of git repository: git-sizer. It may give you some hints about potential problems with your repository.
//FIRST RUN THIS COMMAND
//npm install xlsx jsonfile
//CHANGE INPUT FILE NAME TO sample.xlsx and OUTPUT file is data.json
var XLSX = require('xlsx'),
request = require('request');
var fs = require('fs');
var jsonfile = require('jsonfile')
var file = 'data.json'
var buf = fs.readFileSync("sample.xlsx");
var wb = XLSX.read(buf, {type:'buffer'});
console.log(wb.Sheets);
jsonfile.writeFile(file, wb.Sheets, function (err) {
console.error(err)
})
Interesting question.Simple answer to it is, 'write some code to convert your excel file(.xls or .xlsx) to a json file and upload the content to git.
This idea is valid only for a simple excel sheet and not for complex ones involving a lot of math and charts.
Related
I'm using nodejs and developing file upload system on my personal project.
From one blog, (it's not english site so I won't link it here) I saw that I have to re-encoding fil. If not, high-level attacker can insert malicious shell code in normal file (he told jpeg as an example) even I limit the extension of uploaded files and change the original filename and hide the filepath.
My question is, so how to re-encoding files in nodejs??
there are several types of file and I don't know how to do it.
Thanks in advance.
While doing a file upload, there are several hardening to increase security.
Rename the file name with extension
Use whitelist to check the file's extension.
Use whitelist to check the file's mime types.
Use whitelist to check the file's magic bytes.
Set content-type and no-sniff headers to response if you buffer (accessible) your files from HTTP
Re-encoding means in here actually is the fourth step in above.
Determine file type with reading magic bytes;
const FileType = require('file-type');
const readChunk = require('read-chunk');
(async () => {
const buffer = readChunk.sync('Unicorn.png', 0, 4100);
console.log(await FileType.fromBuffer(buffer));
//=> {ext: 'png', mime: 'image/png'}
})();
There are several packages that you can decide to use:
https://github.com/sindresorhus/file-type
I'm trying to implement a minimal version of .zip file generation following this spec: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
I don't actually need compression, I just need a way to string together a bunch of files into a single widely adopted archive format with the capability to stream in file data while streaming out the zip.
So far I'm partially successful, 7-zip and windows built in zip extractor can extract them just fine, winrar and macos built in zip extractor are giving me corrupted archive errors.
I can't for the life of me find the actual problem(s?) though, as far as I can tell the .zips are built 100% to the specification but the spec is a big wall of text and with swooping changes from one zip file version to the next along with legacy attributes taking on new functions it is tad confusing.
Does anyone know of an extraction tool that can give me more specific errors than just "archive is corrupt"?
Or perhaps a zip generation utility where I can pick and choose between all the different ways of building a zip file so I can go and compare the results byte by byte?
Does anyone know of an extraction tool that can give me more specific errors than just "archive is corrupt"?
The unzipada tool # Zip-Ada project will do exactly that
Testing archive ko.zip
raised ZIP.ARCHIVE_CORRUPTED : Bad (or no) end-of-central-directory
[C:\Ada\za\unzipada.exe]
Zip.Find_First_Offset at zip.adb:589
Unzip.Extract at unzip.adb:667
Unzipada at unzipada.adb:259
By browsing the code (like: zip.adb, line 589) you can narrow down the corrupt archive issues. For building the tool, download the sources and follow the readme.txt file. There are also pre-built binaries for Windows.
I'm researching a project on software defined networking discussed on knowledgedefinednetworking.org and they provide several datasets. Two of the three datasets are unzipping just fine (100K.csk.gz & train.csv.gz), but benchmark.csv.gz unzips into a new spreadsheet but still uses 3.3GB of memory. I'm using WinZip to unzip the files and they're all going into the same folder, but only benchmark is coming back empty. Is this a common issue or is there something potentially wrong with the download of the file that causes it to unzip empty?
"Is this a common issue" <-- Simple answer : Yes, it is a coomn issue.
"or is there something potentially wrong with the download of the file that causes it to unzip empty?" <-- the download went ok.. I tried to save it as excel. went ok. The files (file1 file2) is not blank.
Note : try to use 7zip as your file uncompressor.
Hope that solves... (:
I'm processing a data set and running into a problem - although I xlswrite all the relevant output variables to a big Excel file that is timestamped, I don't save the code that actually generated that result. So if I try to recreate a certain set of results, I can't do it without relying on memory (which is obviously not a good plan). I'd like to know if there's a command(s) that will help me save the m-files used to generate the output Excel file, as well as the Excel file itself, in a folder I can name and timestamp so I don't have to do this manually.
In my perfect world I would run the master code file that calls 4 or 5 other function m-files, then all those m-files would be saved along with the Excel output to a folder names results_YYYYMMDDTIME. Does this functionality exist? I can't seem to find it.
There's no such functionality built in.
You could build a dependency tree of your main function by using depfun with mfilename.
depfun(mfilename()) will return a list of all functions/m-files that are called by the currently executing m-file.
This will include all files that come as MATLAB builtins, you might want to remove those (and only record the MATLAB version in your excel sheet).
As pseudocode:
% get all files:
dependencies = depfun(mfilename());
for all dependencies:
if not a matlab-builtin:
copyfile(dependency, your_folder)
As a "long term" solution you might want to check if using a version control system like subversion, mercurial (or one of many others) would be applicable in your case.
In larger projects this is preferred way to record the version of source code used to produce a certain result.
I am building the server part of a webapp, using node.js. This involves getting data from thetvdb.com (API documentation of thetvdb).
The data comes as a zip file. HTTP download is no problem, however, parsing the file is. I actually never save the file, but just keep it in memory, as suggested in How to download and unzip a zip file in memory in NodeJs?
I have a buffer with valid data (same data as when I download the file with browser/curl...). However, adm-zip (I also tired other zip libraries, some suggest invalid zip length) can't open it. It does not show an error, but the zipEntries in the end have length of 0.
When I write out the buffer to the filesystem and open it with gui or cli tools it works.
I can't give a direkt link to the file, as it would involve my API key, however I re-uploaded it here.
I think I might have an answer for you:
Don't rely on npm install. I just ran the example that you linked to with the zip file you provided, and I get an output of "0".
I saw a comment on that other StackOverflow page, saying that the version of adm-zip on npm is not up to date. I grabbed a fresh copy of adm-zip from github, overwrote the one in my node_modules folder and reran the example code and now get the following:
...
<Actor>
<id>237811</id>
<Image>actors/237811.jpg</Image>
<Name>Peter Pratt</Name>
<Role>The Master</Role>
<SortOrder>3</SortOrder>
</Actor>
<Actor>
<id>23780s/237811.jpg</Image>
Give that a shot!