Backup Google Drive to .zip file with file conversion

Backup Google Drive to .zip file with file conversion - linux

I keep going in circles on this topic, and can't find an automated method that works for mass data on a Google Drive. Here is the goal I'm looking to achieve:
My company uses an unlimited Google Drive to store shared documents, and we are looking to backup the contents automatically. But we can't have the data stored in a backup with google documents like ".gdoc" and ".gsheet"... we need to have the documents backed up in Microsoft/Open-Office format (".docx" and ".xlsx").
We currently use Google's Takeout page to zip all the contents of the Drive and save it on our Linux server (That has redundant storage). And it does zip and export the files to the correct formats.
Here: [https://takeout.google.com/settings/takeout][1]
Now that works... but requires a bit of manual work on our part. And babysitting the zip, download and upload processes is becoming wasteful. I have searched and have read that the google API for Takeout is unavailable to use through gscript. So, that seems to be out of the question.
Using Google scripts, I have been able to convert single files.... but can't, for instance, convert a folder of ".gsheet" files to ".xlsx" format. Maybe copying and converting all the google files into a new folder on the drive could be possible. Having access to the drive and the converted "backup", we could then backup the collection of converted files via the server...
So here is the just of it all:
Can you mass-convert all of a google drive and/or a specific folder on the drive from ".gdoc" to ".docx", and ".gsheet" to ".xlsx". Can this be done with gscript?
If not able to via the method in question one, is anyone familiar with an Linux of Mac app that could do such a directory conversion? (Don't believe it because of googles proprietary file types)
I'm stuck in a bit of a hole, and any insight to this problem could help. I really wish Google would allow users to convert and export drive folders via a script selection.

#1) Can you mass-convert all of a google drive and/or a specific folder on the drive from ".gdoc" to ".docx", and ".gsheet" to ".xlsx". Can this be done with gscript?
You can try this:
How To Automaticlly Convert files in Google App Script
Converting file in Google App Script into blob
var documentId = DocumentApp.getActiveDocument().getId();
function getBlob(documentId) {
var file = Drive.Files.get(documentId);
var url = file.exportLinks['application/vnd.openxmlformats-officedocument.wordprocessingml.document'];
var oauthToken = ScriptApp.getOAuthToken();
var response = UrlFetchApp.fetch(url, {
headers: {
'Authorization': 'Bearer ' + oauthToken
}
});
return response.getBlob();
}
Saving file as docx in Drive
function saveFile(blob) {
var file = {
title: 'Converted_into_MS_Word.docx',
mimeType: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
};
file = Drive.Files.insert(file, blob);
Logger.log('ID: %s, File size (bytes): %s', file.id, file.fileSize);
return file;
}
Time-driven triggers
A time-driven trigger (also called a clock trigger) is similar to a cron job in Unix. Time-driven triggers let scripts execute at a particular time or on a recurring interval, as frequently as every minute or as infrequently as once per month. (Note that an add-on can use a time-driven trigger once per hour at most.) The time may be slightly randomized — for example, if you create a recurring 9 a.m. trigger, Apps Script chooses a time between 9 a.m. and 10 a.m., then keeps that timing consistent from day to day so that 24 hours elapse before the trigger fires again.
function createTimeDrivenTriggers() {
// Trigger every 6 hours.
ScriptApp.newTrigger('myFunction')
.timeBased()
.everyHours(6)
.create();
// Trigger every Monday at 09:00.
ScriptApp.newTrigger('myFunction')
.timeBased()
.onWeekDay(ScriptApp.WeekDay.MONDAY)
.atHour(9)
.create();
}
Process:
List all files id inside a folder
Convert Files
Insert Code to a Time-driven Triggers
2) If not able to via the method in question one, is anyone familiar with an Linux of Mac app that could do such a directory conversion? (Don't believe it because of googles proprietary file types)
If you are want to save it locally try setting a cronjob and use Download Files
The Drive API allows you to download files that are stored in Google Drive. Also, you can download exported versions of Google Documents (Documents, Spreadsheets, Presentations, etc.) in formats that your app can handle. Drive also supports providing users direct access to a file via the URL in the webViewLink property.
Depending on the type of download you'd like to perform — a file, a Google Document, or a content link — you'll use one of the following URLs:
Download a file — files.get with alt=media file resource
Download and export a Google Doc — files.export
Link a user to a file — webContentLink from the file resource
Sample Code :
$fileId = '0BwwA4oUTeiV1UVNwOHItT0xfa2M';
$content = $driveService->files->get($fileId, array(
'alt' => 'media' ));
Hope this helps and answered all you questions

Related

Google Drive File Stream files creation date

I know it has been asked a few times already, but - not in this context I think and other questions have been asked few years ago already, so I'm hoping maybe something changed.
So my issue is - I am uploading files to the Google Drive using Google Drive File Stream. However, while the uploading goes smoothly, I have a problem with files creation date - it is always changed to the timestamp of the time the file got uploaded, not the actual, local file creation date. It is a serious problem, as I am going to use this to back-up huge amounts of data and preserve all the meta-data I can and the creation date is crucial. Is there a way to either upload it with the creation date intact, or to change it after the upload? From what I've seen this seems not to be possible, but I have to try and make it work. Any help and insight will be appreciated. I'm using the Drive File Stream with Python.
EDIT: I didn't make it clear enough - the issue here is that I don't want to use Google Drive API at all, but rather deal with this using only Google Drive File Stream interface if it's possible.

create
If you check the documentation for files.create You will find that acceptable metadata for file creation does include a createdTime
You should then just add this to the metadata you use when uploading the file. As you did not post your code I have grabbed the standard example from the documentation and added the created time as follows.
file_metadata = {'name': 'photo.jpg', 'createdTime': 'THETIME'}
media = MediaFileUpload('files/photo.jpg', mimetype='image/jpeg')
file = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
print 'File ID: %s' % file.get('id')
Update
In the event that you want to update the ones you have already created you could use the following method.
If you check the documentation for file create you will find that the response is just a File resource
If you check file resource you will see that CreatedTime is write able.
You should run a file.update and reset the createdTime to the proper time.

Linux Split for tar.gz works well when joined but when tranferred to remote machine with help of S3 bucket

I have few files which i did tar.gz.
As this file can get too big thus I used the Linux split.
As this needs to be transferred to a different machine i have used s3 bucket to transfer these files. I used application/octet-stream content-type to upload these files.
The files when downloaded shows exactly same size as original size thus no bytes are lost.
now when I do cat downloaded_files_* > tarball.tar.gz the size is exactly as the original file
but only the part with _aa gets extracted.
i checked the type of files
file downloaded_files_aa
this is tar zip file(gzip compressed data, from Unix, last modified: Sun May 17 15:00:41 2020)
but all other files are data files
I am wondering how can i get the files.
Note: Http upload via API gateway done to upload the files to s3
================================
Just putting my debugging finding with a hope probably it will help someone facing same problem.
As we wanted to use API gateway out upload calls were done http calls. This is something which is not using regular aws sdk.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html
Code Samples: https://docs.aws.amazon.com/AmazonS3/latest/API/samples/AWSS3SigV4JavaSamples.zip
After some debugging, we found this leg was working fine.
As the machine which we wanted to download the files had direct access to s3 we used the aws sdk for downloading the files.
This is the URL
https://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html
this code does not work well, though it showed the exact file size download as upload the file lost some information. The code also complained about still pending bytes. Some changes were done to get rid of error but it never worked.
the code which I found here is working like magic
InputStream reader = new BufferedInputStream(
object.getObjectContent());
File file = new File("localFilename");
OutputStream writer = new BufferedOutputStream(new FileOutputStream(file));
int read = -1;
while ( ( read = reader.read() ) != -1 ) {
writer.write(read);
}
writer.flush();
writer.close();
reader.close();
This code also make the download much faster then our previous approach.

Downloading with node modifies excel files and causes data loss

I am trying to create a script in node.js which will download an excel file. My code is built upon first making an http.get request to the URL and then write to a file using response.pipe and createWriteStream. My code is as follows:
const fs = require("fs");
const http = require("http");
let url = "http://www.functionalglycomics.org:80/glycomics/HFileServlet?operation=downloadRawFile&fileType=DAT&sideMenu=no&objId=1002183";
http.get(url, response => {
let file = fs.createWriteStream('file.xls');
let stream = response.pipe(file);
})
If you download the following file using Firefox the file downloads appropriately and if you open the file it works fine and excel does not give any errors.
http://www.functionalglycomics.org:80/glycomics/HFileServlet?operation=downloadRawFile&fileType=DAT&sideMenu=no&objId=1002183
Note- the download link above will not work with Chrome due to this issue with the filename containing , in filename. Therefore I cannot use puppeteer for this.
However if I use my script above and download the file and try to open it in excel it gives me an error stating "data may have been lost" 5 times but then eventually still opens the file.
My question is therefore, what is causing this data loss when downloading using nodejs?
Update
Some data about versions:
Node:v12.13.1
Excel: Office 2019
OS: Windows 10 latest
Update 2
Based on the comments below from jarmod, I tried using wget on Windows PowerShell. It downloads the file too but also produces the excel error.

I posted this as an issue on the node.js github. #Hakerh400 provided a good description of what is happening there but briefly, on Windows NTFS file system there is something called ADS (Alternate-Data Streams) which keeps track of which files are downloaded from the internet to prompt security concerns. You can read more about it in #Hakerh400 comment here.
The workaround proposed is to add this Zone.Identifier ADS to the file after the download is complete using the following example:
http.get(url, response => {
let file = fs.createWriteStream('file.xls');
let stream = response.pipe(file);
fs.writeFileSync(
'file.xls:Zone.Identifier',
`[ZoneTransfer]\r\nZoneId=3\r\nHostUrl=${url}`,
);
})
Note- This workaround allows you to open the Excel file in "Protected View" without any concerns. However if you click on "Enable Editing" in the security prompt in Excel, the "File Error: data may have been lost" error still pops up (Excel 2019). However, there is no real data loss in terms of the sheets/data in cells.
I hope this answer helps anyone who faces anything similar.

Autorenaming duplicate filename downloads in chrome/puppeteer/ubuntu

I'm downloading pdf files using headFULL chromium & puppeteer. I call a javascript function in the browser context and the download starts. The file name comes as is from the server. Issue: Many files I download in a directory are of same names coming from the server and Chrome instead of autosuffixing an index (1) to the file, overwrites the existing one.
Since the file is downloaded by calling a JS function and I have inspected the function as well, I don't have access to a the pdf url. It is triggered using the function call and thus I have no control over the file names.
I have a list of the file names but that in no way helps in changing the filename on the fly, if it it's duplicate name already exists on the machine.
Config: Ubuntu 18.04, Puppeteer 1.18.1
I know either it's a config issue with Nautilus file manager or with Chrome. Is it possible to configure any of these two?
I cannot foresee an option within nodejs where I can rename the file before it's downloaded. A workaround is to download each file in a temp folder, then move it to the required folder while doing a check if it already exists and rename if so. But it adds a lot of time complexity. It would be great to have chrome or nautilus do the task.
Function which triggers the download:
await page.evaluate( (doc_index,arg1,arg2) => openDocument(String(doc_index), String(arg1), String(arg2) ,'ABC','','','XYZ') , doc_index,arg1,arg2 )
Expected behaviour: When the above function is called and pdf starts downloading in the set folder, if a pdf of the same name exists, the new pdf should be renamed to something like pdf_name.pdf(1) or the like.

Any way to import a saved search file over 10mb to File Cabinet using SuiteScript?

I use SuiteScript to execute Saved Searches and save CSV files to the File Cabinet. However, the saved files are limited to 10mb or the script fails. Is there any way to work around the 10mb limit? I'm able to upload a file through the UI over 10mb in size and the ability to do so using SuiteScript would be very useful.
Thanks for any insight.

Like #bknights answered, you can use the N/task module to have NetSuite create a CSV for you.
var searchTask = task.create({
taskType: task.TaskType.SEARCH
});
searchTask.savedSearchId = 51;
searchTask.filePath = 'ExportFolder/export.csv';;
var searchTaskId = searchTask.submit();
If for whatever reason you need more control over the output, you can create files larger than 10MB using N/file#File.appendLine() to set the contents of the file line by line.

Use a SS2.0 N/task method to schedule the script to be published to a file id or path.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string