I'm writing a unit test for a piece of code that loads a file from an AWS S3 bucket and process it. The processing is done by Papa.parse through createReadStream. I fear I may have to mock the interaction between the S3 file and Papa.parse.
My code:
const { reader, s3 } = require('../util');
file = await reader(
SOURCE_BUCKET_NAME,
SOURCE_BUCKET_PREFIX,
FILE_PREFIX,
FILE_SUFFIX,
);
const s3file = s3.getObject({ Bucket: SOURCE_BUCKET_NAME, Key: file.Key });
return new Promise((resolve, reject) => {
Papa.parse(s3file.createReadStream().pipe(zlib.createGunzip()), {
encoding: 'utf8',
header: true,
step: (line) => {
const d = line.data[0];
// handling irrelevant to the mock issue
},
complete: async () => {
// more handling
},
});
});
reader() is a utility function that wraps some checks and s3 request and returns the file we want to load. s3 is the actual AWS s3 object that's been instantiated by the imported utility.
In my tests, I don't want to use the real s3 at all, so I want to mock both the reader() function and the s3 object, of which I'm only calling s3.getObject.
So this is how I'm mocking this:
const util = require('../util');
describe('blah', () => {
beforeEach(() => {
jest.mock('../util', () => jest.fn());
const mockReadStream = jest.fn().mockImplementation(() => {
const readable = new Readable();
readable.push('fieldA, fieldB\n');
readable.push('value A1, value B1\n');
readable.push('value A2, value B2\n');
readable.push(null);
return readable;
});
s3GetObject = jest.fn(() => ({
createReadStream: fn(() => ({
pipe: mockReadStream,
})),
}));
util.reader = jest.fn((bucketName, bucketPrefix, filePrefix, fileSuffix) => ({
Key: `/${filePrefix}__20201021.${fileSuffix}`,
}));
util.s3 = jest.fn(() => ({
getObject: s3GetObject,
}));
});
});
As far as I can find online, this should work, but it doesn't. The unit code loads the actual file from the real S3 bucket, and not my mock.
Thing is, I'm using this same way of mocking (const {x} = require(y) and in the test y.x = jest.fn(), and there it works fine. Although I've also used it somewhere where it didn't work if I mocked one import, but it did work if I mocked a secondary import that the first import depended upon. I have no idea why, but my workaround worked, so I didn't worry about it. But this time it doesn't work at all, and really don't want to secondary dependency, because then I'd have to mock the entire S3 interface. (The S3 interface I'm importing here is a simple wrapper.)
I found the solution myself: manual mocks.
Create a __mocks__ folder next to the file I want to mock, put in it a file with the same name, and these contents:
const { Readable } = require('stream');
const mockReadStream = jest.fn().mockImplementation(() => {
const readable = new Readable();
readable.push('fieldA, fieldB\n');
readable.push('value A1, value B1\n');
readable.push('value A2, value B2\n');
readable.push(null);
return readable;
});
const s3GetObject = () => ({
createReadStream: () => ({
pipe: mockReadStream,
}),
});
const s3 = {
getObject: s3GetObject,
};
const reader = async (bucketName, dirPrefix = '/', filePrefix, fileSuffix) => ({
Key: `/${filePrefix}__20201021_2020102112345.${fileSuffix}`,
});
module.exports = {
reader,
s3,
};
Then in the unit test file, start with:
jest.mock('../../datamigrations/util');
Remove all the other mocking code and the original require. Now jest will load the mocked util instead of the real util.
Primary downside: I can't check how often various methods have been called, but for my specific case, that's not a problem. (Because I'm also mocking access to the database I'm pushing this data to, and I can still pass that mock a jest.fn()).
Related
I am trying to mock the S3 class inside the aws-sdk module while macking sure that the methods inside the class S3 can be spyed.
I am able to mock the S3 class inside aws-sdk however I cannot spy the methods inside the class.
Any ideas on how to approach this problem?
These are my code snippets:
services/s3.js
const AWS = require('aws-sdk');
const uploadAsset = async (param) => {
try {
const response = await s3.upload(param).promise();
return response;
} catch (e) {
console.log(e);
}
}
module.exports = { uploadAsset }
services.s3.test.js
const AWS = require('aws-sdk');
const { uploadAsset } = require('../services/s3')
jest.mock('aws-sdk', () => {
return {
S3: class {
constructor() { }
upload(param) { // 👈 I want to make sure that this method is called
return {
promise: () => {
return Promise.resolve(
{
Location: `http://${param.Bucket}.s3.amazonaws.com/${param.Key}`,
Key: param.Key
}
)
}
}
}
}
}
});
describe('uploadAsset() functionality', () => {
it('should upload an asset', async () => {
const uploadPath = 'users/profilePicture';
const base64Str = '/9j/4AAQSkZJRgABAQAAAQABAAD/';
const buffer = Buffer.from(base64Str, 'base64');
const s3 = new AWS.S3();
const response = await uploadAsset({
Bucket: 'BucketName,
Key: `KeyName`,
Body: buffer,
});
const spy = jest.spyOn(s3, 'deleteObject')
expect(spy).toBeCalled(); // 🚨 This spy nevers gets called
});
});
Any insights would be helpful.
Thanks.
I mocked the aws-sdk successfully. However my spy in the S3 never gets called.
I am almost positive that this is a scope problem. I think my spyOn method only affects my local S3 class instance. However I still have no idea how to test this specific case scenario.
I'm using TypeScript + Node.js + the pdfkit library to create PDFs and verify that they're consistent.
However, when just creating the most basic PDF, consistency already fails. Here's my test.
import {readFileSync, createWriteStream} from "fs";
const PDFDocument = require('pdfkit');
const assert = require('assert').strict;
const fileName = '/tmp/tmp.pdf'
async function makeSimplePDF() {
return new Promise(resolve => {
const stream = createWriteStream(fileName);
const doc = new PDFDocument();
doc.pipe(stream);
doc.end();
stream.on('finish', resolve);
})
}
describe('test that pdfs are consistent', () => {
it('simple pdf test.', async () => {
await makeSimplePDF();
const data: Buffer = readFileSync(fileName);
await makeSimplePDF(); // make PDF again
const data2: Buffer = readFileSync(fileName);
assert.deepStrictEqual(data, data2); // fails!
});
});
Most of the values in the two Buffers are identical but a few of them are not. What's happening here?
I believe that the bytes may be slightly different due to the creation time being factored into the Buffer somehow. When I used mockdate(https://www.npmjs.com/package/mockdate) to fix 'now', I ended up getting consistent Buffers.
I am receiving an E-Mail containing one single .zip file. Right now im trying to upload it to GCS and handle (Bulk-Decompress) the file into its own folder. Running a Bulk Dataflow for such a small action seems overkill to me.
I was thinking about using the "unzip-stream" package but until now, i did not come up with an efficient solution to my problem.
Is it even possible to process a file like this in a Cloud-Function? Or is there no way without a dedicated server handling decompression and then uploading the content into GCS?
Heres my code:
const os = require('os');
const fs = require('fs');
const { Storage } = require('#google-cloud/storage');
const storage = new Storage();
// Node.js doesn't have a built-in multipart/form-data parsing library.
// Instead, we can use the 'busboy' library from NPM to parse these requests.
const Busboy = require('busboy');
exports.uploadZIPFile = (req, res) => {
if (req.method !== 'POST') {
// Return a "method not allowed" error
return res.status(405).end();
}
const busboy = new Busboy({ headers: req.headers });
const tmpdir = os.tmpdir();
for (a in req.headers) {
console.log(`Header: ${a}`);
}
// This object will accumulate all the fields, keyed by their name
const fields = {};
// This code will process each non-file field in the form.
busboy.on('field', (fieldname, val) => {
// TODO(developer): Process submitted field values here
console.log(`Processed field ${fieldname}: ${val}.`);
fields[fieldname] = val;
});
// This object will accumulate all the uploaded files, keyed by their name.
const uploads = {};
const fileWrites = [];
// This code will process each file uploaded.
busboy.on('file', (fieldname, file, filename) => {
// Note: os.tmpdir() points to an in-memory file system on GCF
// Thus, any files in it must fit in the instance's memory.
console.log(`Processed file ${filename} - ${fieldname} - ${file}`);
const filepath = path.join(tmpdir, filename);
uploads[fieldname] = filepath;
const writeStream = fs.createWriteStream(filepath);
file.pipe(writeStream);
// File was processed by Busboy; wait for it to be written to disk.
const promise = new Promise((resolve, reject) => {
file.on('end', () => {
writeStream.end();
});
writeStream.on('finish', resolve);
writeStream.on('error', reject);
});
fileWrites.push(promise);
});
// Triggered once all uploaded files are processed by Busboy.
// We still need to wait for the disk writes (saves) to complete.
busboy.on('finish', async () => {
await Promise.all(fileWrites);
// TODO(developer): Process saved files here
for (const file in uploads) {
async function upload2bucket() {
// Uploads a local file to the bucket
const bucketName = 'myBucket';
const todayDate = new Date().toISOString().slice(0, 10);
await storage.bucket(bucketName).upload(uploads[file], {
// Support for HTTP requests made with `Accept-Encoding: gzip`
gzip: true,
// By setting the option `destination`, you can change the name of the
// object you are uploading to a bucket.
destination:
'zip-inbox/' + todayDate + '_' + uploads[file].substring(5),
metadata: {
// Enable long-lived HTTP caching headers
// Use only if the contents of the file will never change
// (If the contents will change, use cacheControl: 'no-cache')
cacheControl: 'no-cache',
},
});
console.log(`${filename} uploaded to ${bucketName}.`);
}
if (uploads[file].endsWith('.zip')) {
await upload2bucket();
}
console.log(`${file}: ${uploads[file]}`);
//fs.unlinkSync(file);
}
res.status(200).send('Success');
});
busboy.end(req.rawBody);
};
Cloud Functions are great for small tasks, which take little time. When you have a similar requirement - you want to spin an instance for a unit of work only when needed, but cannot predict the time to execute it, I suggest using Cloud Run, have a look at this use case
You may replace your "do-everything" cloud function with several functions and create a pipeline with PubSub messages or with Google Workflows (released in Jan, 2021).
I also recommend to use firebase-functions package to simplify cloud functions definitions.
Pipeline could be like this:
Create and schedule a function which gets attachments from the mailbox and uploads them to Google Storage. After each attachment has been uploaded it sends a PubSub message with the filepath payload to the next step.
Create another function which executes on message from PubSub and unpacks uploaded earlier file.
Note: a function can execute for not more than 9 minutes. Split the function if it takes longer than the limit.
Here is a pseudo code for the proposed pipeline:
import * as functions from 'firebase-functions';
import {PubSub} from '#google-cloud/pubsub';
const REGION = 'your-google-cloud-region'; // e.g. europe-west1
const PUBSUB_TOPIC_UNPACK_ZIP = 'your-topic-name-for-step-2';
/**
* Step 1.
*/
exports.getFromEmailAndUploadToStorage = functions
.region(REGION)
.runWith({
memory: '256MB',
timeoutSeconds: 540, // Max execution duration is 9 minutes
maxInstances: 1
})
.pubsub
.schedule('every 15 minutes')
.timeZone('Asia/Vladivostok')
.onRun(async _ => {
const filepath = await yourLogicToFetchEmailAndUploadToStorage();
if (filepath) {
const messagePayload = {filepath};
console.log('Publishing message to the topic:', PUBSUB_TOPIC_UNPACK_ZIP);
console.log('Payload:', JSON.stringify(messagePayload));
await new PubSub().topic(PUBSUB_TOPIC_UNPACK_ZIP).publishMessage({
json: messagePayload
});
}
});
/**
* Step 2.
*/
exports.unzipFilesInStorage = functions
.region(REGION)
.runWith({
memory: '256MB',
timeoutSeconds: 540, // Max execution duration is 9 minutes
maxInstances: 1
})
.pubsub
.topic(PUBSUB_TOPIC_UNPACK_ZIP )
.onPublish(async (message, context) => {
const {filepath} = message?.json;
if (!filepath) {
throw new Error('filepath is not set.');
}
// your unzip logic here.
// after unzipping you can pipe the results further via PubSub messages.
});
I'm having trouble trying to mock a module with a constructor
// code.js
const ServiceClass = require('service-library');
const serviceInstance = new ServiceClass('some param');
exports.myFunction = () => {
serviceInstance.doSomething();
};
And the test code:
// code.test.js
const ServiceClass = require('service-library');
jest.mock('service-library');
const {myFunction} = require('../path/to/my/code');
test('check that the service does something', () => {
// ????
});
It's not like the Documentation example Mocking Modules because you need to instantiate the module after importing it. And isn't either like Mocking a Function.
How could I mock this doSomething() function while testing?
For reference, I'm trying to mock #google-cloud/* packages here. And I have a few projects that could take advantage on this.
You need to mock the whole module first so that returns a jest mock. Then import into your test and set the mock to a function that returns an object holding the spy for doSomething. For the test there is difference between of a class called with new and a function called with new.
import ServiceLibrary from 'service-library'
jest.mock( 'service-library', () => jest.fn())
const doSomething = jest.fn()
ServiceLibrary.mockImplementation(() => ({doSomething}))
Following #andreas-köberle solution I was able to mock #google-cloud/bigquery like so:
// mock bigquery library
const BigQuery = require('#google-cloud/bigquery');
jest.mock('#google-cloud/bigquery', () => jest.fn());
const load = jest.fn(() => ({'#type': 'bigquery#load_job'}));
const table = jest.fn(() => ({load}));
const dataset = jest.fn(() => ({table}));
BigQuery.mockImplementation(() => ({dataset}));
// mock cloud storage library
const {Storage} = require('#google-cloud/storage');
jest.mock('#google-cloud/storage');
const file = jest.fn(name => ({'#type': 'storage#file', name}));
const bucket = jest.fn(() => ({file}));
Storage.mockImplementation(() => ({bucket}));
I'm leaving this here as a reference in case someone else googles something similar. But to make it clear, thats just a particularization for #andreas-köberle answer
I have an AWS lambda that has to...
read from S3 an yaml file,
transform the content into an object,
than do some wonderful staff with it.
I can handle the last point, but I do not know how to read and parse that yaml file.
Here is the code I need to complete:
const AWS = require('aws-sdk');
const YAML = require('js-yaml');
const S3 = new AWS.S3();
module.exports.handler = (event, context, callback) => {
const [record] = event.Records;
const bucket = record.s3.bucket.name;
const { key } = record.s3.object;
const params = { Bucket: bucket, Key: key };
console.log(params);
S3.getObject(params).promise()
.then((x) => {
// ?????????????????????????????????????????
// HOW TO DO SOME MAGIC AND GET THE OBJECT ?
})
.then((fileContentObject) => {
// DO SOME WONDERFUL STAFF (example: console.log :) )
console.log(JSON.stringify(fileContentObject))
})
.then(() => callback)
.catch(callback);
};
Feel free to suggest another approach to read and parse the yaml file. I prefer a Promise approach if possible.
I've finally solved the problem. "It was easy", of course!
Here is the code in lambda:
S3.getObject(params).promise()
.then((configYaml) => {
// Get the content of the file as a string
// It works on any text file, should your config is in another format
const data = configYaml.Body.toString('utf-8');
console.log(data);
// Parse the string, which is a yaml in this case
// Please note that `YAML.parse()` is deprecated
return YAML.load(data);
})
.then((config) => {
// Do some wonderful staff with the config object
console.log(`• Config: ${JSON.stringify(config)}`);
return null;
})
.then(callback)
.catch(callback);
All I was asking for is: YAML.load(configYaml.Body.toString('utf-8')).