Deleting keys from redis using node js - node.js

I am trying to delete keys in redis using the below code but for some reason, its not deleting the keys in redis but consoling works perfect. Can someone please help what am I missing here
import { RedisClient } from 'redis';
let rediClient: RedisClient = redis.createClient(redisPort, redisHostName, {
auth_pass: authPass,
no_ready_check: true,
prefix: KEY_PREFIX,
retry_strategy: redisRetryStrategy,
tls: { servername: hostName },
});
let cursor = '0';
const scan = (pattern: string, callback: () => void) => {
redisClient.scan(
cursor,
'MATCH',
pattern,
'COUNT',
'1000',
async (err, reply) => {
console.log(err);
if (err) {
throw err;
}
cursor = reply[0];
const keys = reply[1];
console.log(keys);
console.log(keys.length);
console.log(keys[1]);
if (keys) {
await redisClient.del(keys[1], (deleteErr, deleteSuccess) => {
console.log(`err ==> ${deleteErr}`);
console.log(deleteSuccess);
});
console.log(` key 0 is : ${keys[0]}`);
redisClient.del(keys[0]);
// keys.forEach((key) => {
// redisClient.del(key, (deleteErr, deleteSuccess) => {
// console.log(`err ==> ${deleteErr}`);
// console.log(deleteSuccess);
// });
// });
}
if (cursor !== '0') {
console.log(cursor);
return scan(pattern, callback);
}
return callback();
}
);
};
export const deleteResetPin = (pattern: string) => {
scan(pattern, () => {
console.log('Scan Complete');
});
};
Requirement: I want to delete all keys matching the pattern using node js

With the commented part (starting at keys.forEach) running the scan function will delete all the keys that matches the pattern, but there's a couple of thinks to fix/improve here:
the callback (and therefore also the log) will be called before the keys are deleted.
if scan reply with an error the error will be uncaught and the process will exit.
you're mixing callbacks and promises.
you can delete a bunch of keys at once.
Here is a "promised" version of the function:
const { promisify } = require('util'),
client = require('redis').createClient(),
scanAsync = promisify(client.scan).bind(client),
delAsync = promisify(client.del).bind(client);
async function scanAndDelete(pattern: string): Promise<void> {
let cursor = '0';
do {
const reply = await scanAsync(cursor, 'MATCH', pattern, 'COUNT', '1000');
cursor = reply[0];
await delAsync(reply[1]);
} while (cursor !== '0')
}

For Node Redis >= 4
const redisClient = require('redis').createClient(),
async function scanAndDelete(pattern) {
let cursor = '0';
// delete any paths with query string matches
const reply = await redisClient.scan(cursor, { MATCH: pattern, COUNT: 1000 });
for (key of reply.keys) {
cursor = reply.cursor;
await redisClient.del(key);
}
}

Related

How to do a ioredis scanStream on node-redis in TypeScript?

The Following code works on ioredis scanstream. I am trying to do it on node-redis in typescript.
It scans a stream and returns gameIds to an Express client, when it reaches 'end' it returns the results to the client.
const stream = redis.scanStream({
match: "kaboom:moves:*",
});
const gameIds: any = [];
This is my attempt:
const stream = client.scanIterator({
TYPE: "string", // `SCAN` only
MATCH: "kaboom:moves:*",
COUNT: 100,
});
const LIMIT = 3;
const asyncIterable = {
[Symbol.asyncIterator]() {
let i = 0;
return {
next() {
const done = i === LIMIT;
const value = done ? undefined : i++;
return Promise.resolve({ value, done });
},
return() {
// This will be reached if the consumer called 'break' or 'return' early in the loop.
return { done: true };
},
};
},
};
(async () => {
for await (const keys of stream) {
gameIds.push(keys.split(":")[2]);
}
})();
const gameIds: any = [];
How do I write this part so that it will work with node-redis in typescript?
stream.on('data', (keys: any) => {
//Extract the gameId from the key and append to gameIds array.
keys.forEach((key: any) => gameIds.push(key.split(':')[2]));
});
stream.on('end', () => {
res.status(200).json({
data: {
gameIds,
length: gameIds.length
},
status: 'success',
})
})
ScanIterator was correct.
const stream = client.scanIterator({
TYPE: 'string', // `SCAN` only
MATCH: 'kaboom:moves:*',
COUNT: 100
});
const LIMIT = 3;
const asyncIterable = {
[Symbol.asyncIterator]() {
let i = 0;
return {
next() {
const done = i === LIMIT;
const value = done ? undefined : i++;
return Promise.resolve({ value, done });
},
return() {
// This will be reached if the consumer called 'break' or 'return' early in the loop.
return { done: true };
}
};
}
};
(async () => {
for await (const keys of stream) {
gameIds.push(keys.split(':')[2]);
}
})();

Feathersjs validation method using ldapjs module problem

I want to write a function that adds the params of an ldap server after checking if the binding with server is right otherwise the data won't get added in the db ; i have to use feathers js. I tried to write a couple of lines to validate the params before adding them to the db :
const ldap=require("ldapjs"); but the problem is that when i do console.log(errors) inside and outside of ldap.bind function : i noticed that the result is only visible inside the function but not to the rest of the code ! I need it to be visible to the other parts as well so that I can use it , I want to know how to fix that.
this is the code and what the console.log has as a result.
module.exports = (ldapConnexion, mode = "create") => {
const errors = {
url_path: [],
browsingAccount: [],
password: [],
userSearchBase:[],
};
const client=ldap.createClient({url:ldapConnexion.url_path})
client.bind(ldapConnexion.browsingAccount,ldapConnexion.password,function(err){
if (err){errors.browsingAccount.push("unvalid credentials:"+err);console.log(err)}})
console.log(errors)
const hasErrors = Object.keys(errors).reduce(
(res, errorType) =>
res || (errors[errorType] && errors[errorType].length > 0),
false
);
return { errors, hasErrors };
}
to do that we can implement the code this way : if this can interrest someone
in a different file do :
const ldap = require('ldapjs');
module.exports = (ldapConnexion, mode = "create",cb) => {
const errors = {
url_path: [],
browsingAccount: [],
password: [],
userSearchBase: [],
};
var opts ={
rejectUnauthorized: false, // To force the client not to check self-signed certificates
}
//create the ldap client with the list of informations mentionned in the form
var client = ldap.createClient({
url:ldapConnexion.url_path, tlsOptions:opts
});
client.bind(ldapConnexion.browsingAccount, ldapConnexion.password, function (err) {
if (err) {
errors.browsingAccount.push("unvalid credentials:" + err.message);
//console.log(errors);
cb(errors,true);
client.unbind(function(err){if(err){console.log(err)}});
} client.unbind(function(err){if(err){console.log(err)}});
})
// console.log(errors)
if (!ldapConnexion.url_path) {
errors.url_path.push("Url obligatoire");
}
if (!ldapConnexion.browsingAccount) {
errors.browsingAccount.push("le compte lecteur est obligatoire");
}
if (!ldapConnexion.password) {
errors.password.push("le mot de passe est obligatoire");
}
const hasErrors = Object.keys(errors).reduce(
(res, errorType) =>
res || (errors[errorType] && errors[errorType].length > 0),
false
);
cb(errors, hasErrors)
}
in the hooks :
const validateConnexionHook = function (mode = "create") {
return context => {
return new Promise((resolve, reject) => {
const ldapConnexion = context.data || {};
validateConnexion(ldapConnexion, mode, (errors, hasErrors) => {
console.log('**************')
// console.log(errors);
console.log(hasErrors);
if (hasErrors) {
context.error = errors;
reject(errors)
}
else setTimeout(function () { resolve(); }, 3000);
})
})
}
}

Nodejs Streams - Help find my memory leak

So I have a process that selects from a table. I partition my select programmatically into 20 sub-selects. I then go through each on of those select and stream its data to an indexing client (solr). Every select memory jumps up and holds until I get an OOM.
I logged when each query went off and can be seen in in the following charts:
These correlate with each jump in the this memory graph:
14 of 20 queries ran before I oomed.
I see this behavior with code that is similar but with a delta that runs every 15 mins. Every delta holds some sort of memory until it eventually causes the server to crash with OOM (which recovers)
I have tried to track down issues with the delta past but gave up and just created a way to gracefully restart. What am I missing here?
Here is my entire code chain that makes this work... I know its a lot to look through but I figured as much detail as possible would help.
Library Stack:
"node": "~11.10.1"
"knex": "^0.20.9",
"oracledb": "^4.0.0"
"camelize2": "^1.0.0"
Knex - DB connection factory
'use strict'
const objection = require('objection')
const knex = require('knex')
module.exports = function ObjectionFactory(log) {
class MyObjection extends objection.Model {
constructor() {
super()
}
static get tableName() {
return ''
}
}
MyObjection.pickJsonSchemaProperties = true
log.info('Connecting to Oracle Pluggable...', {
host: 'myHost',
username: 'myUser',
database: 'myDatabase"
})
const knexInstance = knex({
client: 'oracledb',
connection: 'connectionInfo',
pool: {
min: 0,
max: 10
},
acquireConnectionTimeout: 10000
})
process.once('SIGINT', () => {
log.info('Disconnecting from Oracle Pluggable.')
knexInstance.destroy()
.then(() => process.exit(0))
.catch(() => process.exit(1))
})
// Shut down cleanly for nodemon
process.once('SIGUSR2', () => {
log.info('Disconnecting from Oracle Pluggable')
knexInstance.destroy()
.then(() => process.kill(process.pid, 'SIGUSR2'))
.catch(() => process.kill(process.pid, 'SIGUSR2'))
})
const knexBoundClass = MyObjection.bindKnex(knexInstance)
knexBoundClass.tag = 'Oracle Connection'
return knexBoundClass
}
My Select Stream Code:
module.exports = function oracleStream(log, MyObjection) {
const knex = MyObjection.knex()
const fetchArraySize = 10000
const outFormat = oracledb.OBJECT
return {
selectStream
}
async function selectStream(sql, bindings = [], fetchSize = fetchArraySize) {
let connection = await knex.client.acquireConnection()
log.info(`Fetch size is set to ${fetchSize}`)
let select = connection.queryStream(sql, bindings, {
fetchArraySize: fetchSize,
outFormat: outFormat
})
select.on('error', (err) => {
log.error('Oracle Error Event', err)
knex.client.releaseConnection(connection)
})
select.on('end', () => {
log.info('Destroying the Stream')
select.destroy()
})
select.on('close', () => {
log.info('Oracle Close Event')
knex.client.releaseConnection(connection)
select = null
connection = null
})
return select
}
}
My index/stream pipeline code
async function indexJob() {
const reindexStartTime = new moment().local()
let rowCount = 0
log.info('Reindex Started at', reindexStartTime.format())
let queryNumber = 1
const partitionedQueries = ['Select * from table where 1=1', 'Select * from table where 2=2', 'Select * from table where 3=3'] //There would be 20 queries in this array
let partitionedQueriesLength = partitionedQueries.length
while (partitionedQueries.length > 0) {
let query = partitionedQueries.pop()
log.info('RUNNING Query', {
queryNumber: `${queryNumber++} of ${partitionedQueriesLength}`,
query: query
})
let databaseStream = await oracleStream.selectStream(query, [], 10000) //10k represents the oracle fetch size
databaseStream.on('data', () => {
rowCount++
})
let logEveryFiveSec = setInterval(() => {
log.info('Status: ', getReindexInfo(reindexStartTime, rowCount))
}, 5000)
try {
let pipeline = util.promisify(stream.pipeline)
await pipeline(
databaseStream,
camelizeAndStringify(),
streamReindex(core)
)
} catch (err) {
databaseStream.destroy(err)
throw new JobFailedError(err)
} finally {
databaseStream.destroy()
clearInterval(logEveryFiveSec)
}
}
}
function camelizeAndStringify() {
let first = true
const serialize = new Transform({
objectMode: true,
highWaterMark: 1000,
transform(chunk, encoding, callback) {
if (first) {
this.push('[' + JSON.stringify(camelize(chunk)))
first = false
} else {
this.push(',' + JSON.stringify(camelize(chunk)))
}
callback()
chunk = null
},
flush(callback) {
this.push(']')
callback()
}
})
return serialize
}
function streamReindex(core) {
const updateUrl = baseUrl + core + '/update'
const options = {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
'auth': `${user.username}:${user.password}`,
}
let postStream = https.request(updateUrl, options, (res) => {
let response = {
status: {
code: res.statusCode,
message: res.statusMessage
},
headers: res.headers,
}
if (res.statusCode !== 200) {
postStream.destroy(new Error(JSON.stringify(response)))
}
})
postStream.on('error', (err)=>{
throw new Error(err)
})
postStream.on('socket', (socket) => {
socket.setKeepAlive(true, 110000)
})
return postStream
}
EDIT 1:
I tried removing knex out of the equation by doing a singular connection to my db with the oracle library. Unfortunately I still see the same behavior.
This is how I changed my select to not use knex
async function selectStream(sql, bindings = [], fetchSize = fetchArraySize) {
const connectionInfo = {
user: info.user,
password: info.password,
connectString: info.host +'/'+info.database
}
const connection = await oracledb.getConnection(connectionInfo)
log.info('Connection was successful!')
log.info(`Fetch size is set to ${fetchSize}`)
let select = connection.queryStream(sql, bindings, {
fetchArraySize: fetchSize,
outFormat: outFormat
})
select.on('error', async (err) => {
log.error('Oracle Error Event', err)
await connection.close()
})
select.on('end', () => {
log.info('Destroying the Stream')
select.destroy()
})
select.on('close', async () => {
log.info('Oracle Close Event')
await connection.close()
})
return select
}

koa2+koa-router+mysql keep returning 'Not Found'

Background
I am using koa2 with some middlewares to build a basic api framework. But when I use "ctx.body" to send response in my router, the client side always receive "Not Found"
My code
./app.js
const Koa = require('koa');
const app = new Koa();
const config = require('./config');
//Middlewares
const loggerAsync = require('./middleware/logger-async')
const bodyParser = require('koa-bodyparser')
const jsonp = require('koa-jsonp')
app.use(loggerAsync())
app.use(bodyParser())
app.use(jsonp());
//Router
const gateway = require('./router/gateway')
app.use(gateway.routes(), gateway.allowedMethods());
app.use(async(ctx, next) => {
await next();
ctx.response.body = {
success: false,
code: config.code_system,
message: 'wrong path'
}
});
app.listen(3000);
./router/gateway.js
/**
* Created by Administrator on 2017/4/11.
*/
const Router = require('koa-router');
const gateway = new Router();
const df = require('../db/data-fetcher');
const config = require('../config');
const moment = require('moment');
const log4js = require('log4js');
// log4js.configure({
// appenders: { cheese: { type: 'file', filename: 'cheese.log' } },
// categories: { default: { appenders: ['cheese'], level: 'error' } }
// });
const logger = log4js.getLogger('cheese');
logger.setLevel('ERROR');
gateway.get('/gateway', async(ctx, next) => {
let time = ctx.query.time;
if (!time) {
ctx.body = {
success: false,
code: config.code_system,
message: 'Please input running times'
}
} else {
try {
let r = await df(`insert into gateway (g_time, g_result, g_date) values (${time}, '',now())`);
return ctx.body = {
success: true,
code: config.code_success
}
} catch (error) {
logger.error(error.message);
}
}
});
module.exports = gateway;
Then a db wrapper(mysql)
./db/async-db.js
const mysql = require('mysql');
const config = require('../config');
const pool = mysql.createPool({
host: config.database.HOST,
user: config.database.USERNAME,
password: config.database.PASSWORD,
database: config.database.DATABASE
})
let query = (sql, values) => {
return new Promise((resolve, reject) => {
pool.getConnection(function (err, connection) {
if (err) {
reject(err)
} else {
connection.query(sql, values, (err, rows) => {
if (err) {
reject(err)
} else {
resolve(rows)
}
connection.release()
})
}
})
})
}
module.exports = query
./db/data-fetcher.js
const query = require('./async-db')
async function performQuery(sql) {
let dataList = await query(sql)
return dataList
}
module.exports = performQuery;
My running result
When I launch server on port 3000 then accesss via http://localhost:3000/gateway?time=5, it always returns "Not found". But as I can see I have already used
return ctx.body = {
success: true,
code: config.code_success
}
to send response. I debugged and found that the database processing was done well, the new data was inserted well.
when I remove that db inserting line, it works well and returns success info.
let r = await df(`insert into gateway (g_time, g_result, g_date) values (${time}, '',now())`);
Is there anything wrong?
Thanks a lot!
Update 2017/04/27
Now I have found the problem. It's due to my custom middleware
const loggerAsync = require('./middleware/logger-async')
Code are like following -
function log( ctx ) {
console.log( ctx.method, ctx.header.host + ctx.url )
}
module.exports = function () {
return function ( ctx, next ) {
return new Promise( ( resolve, reject ) => {
// 执行中间件的操作
log( ctx )
resolve()
return next()
}).catch(( err ) => {
return next()
})
}
}
I changed it to async/await way then everything is working well.
Could anyone please tell me what's wrong with this middleware?
I guess, your problem is the ./db/data-fetcher.js function. When you are calling
let r = await df(`insert ....`)
your df - function should return a promise.
So try to rewrite your ./db/data-fetcher.js like this (not tested):
const query = require('./async-db')
function performQuery(sql) {
return new Promise((resolve, reject) => {
query(sql).then(
result => {
resolve(result)
}
)
}
}
module.exports = performQuery;
Hope that helps.
correct middleware:
function log( ctx ) {
console.log( ctx.method, ctx.header.host + ctx.url )
}
module.exports = function () {
return function ( ctx, next ) {
log( ctx );
return next()
}
}
reason: when resolve involved; promise chain was completed; response has been sent to client. although middleware remained will involved, but response has gone!
try to understand It seems that if you want to use a common function as middleware, you have to return the next function
nodejs(koa):Can't set headers after they are sent

Node.js & Amazon S3: How to iterate through all files in a bucket?

Is there any Amazon S3 client library for Node.js that allows listing of all files in S3 bucket?
The most known aws2js and knox don't seem to have this functionality.
Using the official aws-sdk:
var allKeys = [];
function listAllKeys(marker, cb)
{
s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
allKeys.push(data.Contents);
if(data.IsTruncated)
listAllKeys(data.NextMarker, cb);
else
cb();
});
}
see s3.listObjects
Edit 2017:
Same basic idea, but listObjectsV2( ... ) is now recommended and uses a ContinuationToken (see s3.listObjectsV2):
var allKeys = [];
function listAllKeys(token, cb)
{
var opts = { Bucket: s3bucket };
if(token) opts.ContinuationToken = token;
s3.listObjectsV2(opts, function(err, data){
allKeys = allKeys.concat(data.Contents);
if(data.IsTruncated)
listAllKeys(data.NextContinuationToken, cb);
else
cb();
});
}
Using AWS-SDK v3 and Typescript
import {
paginateListObjectsV2,
S3Client,
S3ClientConfig,
} from '#aws-sdk/client-s3';
/* // For Deno
import {
paginateListObjectsV2,
S3Client,
S3ClientConfig,
} from "https://deno.land/x/aws_sdk#v3.32.0-1/client-s3/mod.ts"; */
const s3Config: S3ClientConfig = {
credentials: {
accessKeyId: 'accessKeyId',
secretAccessKey: 'secretAccessKey',
},
region: 'us-east-1',
};
const getAllS3Files = async (client: S3Client, s3Opts) => {
const totalFiles = [];
for await (const data of paginateListObjectsV2({ client }, s3Opts)) {
totalFiles.push(...(data.Contents ?? []));
}
return totalFiles;
};
const main = async () => {
const client = new S3Client(s3Config);
const s3Opts = { Bucket: 'bucket-xyz' };
console.log(await getAllS3Files(client, s3Opts));
};
main();
For AWS-SDK v2 Using Async Generator
Import S3
const { S3 } = require('aws-sdk');
const s3 = new S3();
create a generator function to retrieve all the files list
async function* listAllKeys(opts) {
opts = { ...opts };
do {
const data = await s3.listObjectsV2(opts).promise();
opts.ContinuationToken = data.NextContinuationToken;
yield data;
} while (opts.ContinuationToken);
}
Prepare aws parameter, based on api docs
const opts = {
Bucket: 'bucket-xyz' /* required */,
// ContinuationToken: 'STRING_VALUE',
// Delimiter: 'STRING_VALUE',
// EncodingType: url,
// FetchOwner: true || false,
// MaxKeys: 'NUMBER_VALUE',
// Prefix: 'STRING_VALUE',
// RequestPayer: requester,
// StartAfter: 'STRING_VALUE'
};
Use generator
async function main() {
// using for of await loop
for await (const data of listAllKeys(opts)) {
console.log(data.Contents);
}
}
main();
thats it
Or Lazy Load
async function main() {
const keys = listAllKeys(opts);
console.log(await keys.next());
// {value: {…}, done: false}
console.log(await keys.next());
// {value: {…}, done: false}
console.log(await keys.next());
// {value: undefined, done: true}
}
main();
Or Use generator to make Observable function
const lister = (opts) => (o$) => {
let needMore = true;
const process = async () => {
for await (const data of listAllKeys(opts)) {
o$.next(data);
if (!needMore) break;
}
o$.complete();
};
process();
return () => (needMore = false);
};
use this observable function with RXJS
// Using Rxjs
const { Observable } = require('rxjs');
const { flatMap } = require('rxjs/operators');
function listAll() {
return Observable.create(lister(opts))
.pipe(flatMap((v) => v.Contents))
.subscribe(console.log);
}
listAll();
or use this observable function with Nodejs EventEmitter
const EventEmitter = require('events');
const _eve = new EventEmitter();
async function onData(data) {
// will be called for each set of data
console.log(data);
}
async function onError(error) {
// will be called if any error
console.log(error);
}
async function onComplete() {
// will be called when data completely received
}
_eve.on('next', onData);
_eve.on('error', onError);
_eve.on('complete', onComplete);
const stop = lister(opts)({
next: (v) => _eve.emit('next', v),
error: (e) => _eve.emit('error', e),
complete: (v) => _eve.emit('complete', v),
});
Here's Node code I wrote to assemble the S3 objects from truncated lists.
var params = {
Bucket: <yourbucket>,
Prefix: <yourprefix>,
};
var s3DataContents = []; // Single array of all combined S3 data.Contents
function s3Print() {
if (program.al) {
// --al: Print all objects
console.log(JSON.stringify(s3DataContents, null, " "));
} else {
// --b: Print key only, otherwise also print index
var i;
for (i = 0; i < s3DataContents.length; i++) {
var head = !program.b ? (i+1) + ': ' : '';
console.log(head + s3DataContents[i].Key);
}
}
}
function s3ListObjects(params, cb) {
s3.listObjects(params, function(err, data) {
if (err) {
console.log("listS3Objects Error:", err);
} else {
var contents = data.Contents;
s3DataContents = s3DataContents.concat(contents);
if (data.IsTruncated) {
// Set Marker to last returned key
params.Marker = contents[contents.length-1].Key;
s3ListObjects(params, cb);
} else {
cb();
}
}
});
}
s3ListObjects(params, s3Print);
Pay attention to listObject's documentation of NextMarker, which is NOT always present in the returned data object, so I don't use it at all in the above code ...
NextMarker — (String) When response is truncated (the IsTruncated
element value in the response is true), you can use the key name in
this field as marker in the subsequent request to get next set of
objects. Amazon S3 lists objects in alphabetical order Note: This
element is returned only if you have delimiter request parameter
specified. If response does not include the NextMarker and it is
truncated, you can use the value of the last Key in the response as
the marker in the subsequent request to get the next set of object
keys.
The entire program has now been pushed to https://github.com/kenklin/s3list.
In fact aws2js supports listing of objects in a bucket on a low level via s3.get() method call. To do it one has to pass prefix parameter which is documented on Amazon S3 REST API page:
var s3 = require('aws2js').load('s3', awsAccessKeyId, awsSecretAccessKey);
s3.setBucket(bucketName);
var folder = encodeURI('some/path/to/S3/folder');
var url = '?prefix=' + folder;
s3.get(url, 'xml', function (error, data) {
console.log(error);
console.log(data);
});
The data variable in the above snippet contains a list of all objects in the bucketName bucket.
Published knox-copy when I couldn't find a good existing solution. Wraps all the pagination details of the Rest API into a familiar node stream:
var knoxCopy = require('knox-copy');
var client = knoxCopy.createClient({
key: '<api-key-here>',
secret: '<secret-here>',
bucket: 'mrbucket'
});
client.streamKeys({
// omit the prefix to list the whole bucket
prefix: 'buckets/of/fun'
}).on('data', function(key) {
console.log(key);
});
If you're listing fewer than 1000 files a single page will work:
client.listPageOfKeys({
prefix: 'smaller/bucket/o/fun'
}, function(err, page) {
console.log(page.Contents); // <- Here's your list of files
});
Meekohi provided a very good answer, but the (new) documentation states that NextMarker can be undefined. When this is the case, you should use the last key as the marker.
So his codesample can be changed into:
var allKeys = [];
function listAllKeys(marker, cb) {
s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
allKeys.push(data.Contents);
if(data.IsTruncated)
listAllKeys(data.NextMarker || data.Contents[data.Contents.length-1].Key, cb);
else
cb();
});
}
Couldn't comment on the original answer since I don't have the required reputation. Apologies for the bad mark-up btw.
I am using this version with async/await.
This function will return the content in an array.
I'm also using the NextContinuationToken instead of the Marker.
async function getFilesRecursivelySub(param) {
// Call the function to get list of items from S3.
let result = await s3.listObjectsV2(param).promise();
if(!result.IsTruncated) {
// Recursive terminating condition.
return result.Contents;
} else {
// Recurse it if results are truncated.
param.ContinuationToken = result.NextContinuationToken;
return result.Contents.concat(await getFilesRecursivelySub(param));
}
}
async function getFilesRecursively() {
let param = {
Bucket: 'YOUR_BUCKET_NAME'
// Can add more parameters here.
};
return await getFilesRecursivelySub(param);
}
This is an old question and I guess the AWS JS SDK has changed a lot since it was asked. Here's yet another way to do it these days:
s3.listObjects({Bucket:'mybucket', Prefix:'some-pfx'}).
on('success', function handlePage(r) {
//... handle page of contents r.data.Contents
if(r.hasNextPage()) {
// There's another page; handle it
r.nextPage().on('success', handlePage).send();
} else {
// Finished!
}
}).
on('error', function(r) {
// Error!
}).
send();
If you want to get list of keys only within specific folder inside a S3 Bucket then this will be useful.
Basically, listObjects function will start searching from the Marker we set and it will search until maxKeys: 1000 as limit. so it will search one by one folder and get you first 1000 keys it find from different folder in a bucket.
Consider i have many folders inside my bucket with prefix as prod/some date/, Ex: prod/2017/05/12/ ,prod/2017/05/13/,etc.
I want to fetch list of objects (file names) only within prod/2017/05/12/ folder then i will specify prod/2017/05/12/ as my start and prod/2017/05/13/ [your next folder name] as my end and in code i'm breaking the loop when i encounter the end.
Each Keyin data.Contents will look like this.
{ Key: 'prod/2017/05/13/4bf2c675-a417-4c1f-a0b4-22fc45f99207.jpg',
LastModified: 2017-05-13T00:59:02.000Z,
ETag: '"630b2sdfsdfs49ef392bcc16c833004f94ae850"',
Size: 134236366,
StorageClass: 'STANDARD',
Owner: { }
}
Code:
var list = [];
function listAllKeys(s3bucket, start, end) {
s3.listObjects({
Bucket: s3bucket,
Marker: start,
MaxKeys: 1000,
}, function(err, data) {
if (data.Contents) {
for (var i = 0; i < data.Contents.length; i++) {
var key = data.Contents[i].Key; //See above code for the structure of data.Contents
if (key.substring(0, 19) != end) {
list.push(key);
} else {
break; // break the loop if end arrived
}
}
console.log(list);
console.log('Total - ', list.length);
}
});
}
listAllKeys('BucketName', 'prod/2017/05/12/', 'prod/2017/05/13/');
Output:
[ 'prod/2017/05/12/05/4bf2c675-a417-4c1f-a0b4-22fc45f99207.jpg',
'prod/2017/05/12/05/a36528b9-e071-4b83-a7e6-9b32d6bce6d8.jpg',
'prod/2017/05/12/05/bc4d6d4b-4455-48b3-a548-7a714c489060.jpg',
'prod/2017/05/12/05/f4b8d599-80d0-46fa-a996-e73b8fd0cd6d.jpg',
... 689 more items ]
Total - 692
I ended up building a wrapper function around ListObjectsV2, works the same way and takes the same parameters but works recursively until IsTruncated=false and returns all the keys found as an array in the second parameter of the callback function
const AWS = require('aws-sdk')
const s3 = new AWS.S3()
function listAllKeys(params, cb)
{
var keys = []
if(params.data){
keys = keys.concat(params.data)
}
delete params['data']
s3.listObjectsV2(params, function(err, data){
if(err){
cb(err)
} else if (data.IsTruncated) {
params['ContinuationToken'] = data.NextContinuationToken
params['data'] = data.Contents
listAllKeys(params, cb)
} else {
keys = keys.concat(data.Contents)
cb(null,keys)
}
})
}
Here's what I came up with based on the other answers.
You can await listAllKeys() without having to use callbacks.
const listAllKeys = () =>
new Promise((resolve, reject) => {
let allKeys = [];
const list = marker => {
s3.listObjects({ Marker: marker }, (err, data) => {
if (err) {
reject(err);
} else if (data.IsTruncated) {
allKeys.push(data.Contents);
list(data.NextMarker || data.Contents[data.Contents.length - 1].Key);
} else {
allKeys.push(data.Contents);
resolve(allKeys);
}
});
};
list();
});
This assumes you've initialized the s3 variable like so
const s3 = new aws.S3({
apiVersion: API_VERSION,
params: { Bucket: BUCKET_NAME }
});
I made it as simple as possible. You can iterate uploading objects using for loop, it is quite simple, neat and easy to understand.
package required: fs, express-fileupload
server.js :-
router.post('/upload', function(req, res){
if(req.files){
var file = req.files.filename;
test(file);
res.render('test');
}
} );
test function () :-
function test(file){
// upload all
if(file.length){
for(var i =0; i < file.length; i++){
fileUP(file[i]);
}
}else{
fileUP(file);
}
// call fileUP() to upload 1 at once
function fileUP(fyl){
var filename = fyl.name;
var tempPath = './temp'+filename;
fyl.mv(tempPath, function(err){
fs.readFile(tempPath, function(err, data){
var params = {
Bucket: 'BUCKET_NAME',
Body: data,
Key: Date.now()+filename
};
s3.upload(params, function (err, data) {
if (data) {
fs.unlink(tempPath, (err) => {
if (err) {
console.error(err)
return
}
else{
console.log("file removed from temp loaction");
}
});
console.log("Uploaded in:", data.Location);
}
});
});
});
}
}
This should work,
var listAllKeys = async function (token) {
if(token) params.ContinuationToken = token;
return new Promise((resolve, reject) => {
s3.listObjectsV2(params, function (err, data) {
if (err){
reject(err)
}
resolve(data)
});
});
}
var collect_all_files = async function () {
var allkeys = []
conti = true
token = null
while (conti) {
data = await listAllKeys(token)
allkeys = allkeys.concat(data.Contents);
token = data.NextContinuationToken
conti = data.IsTruncated
}
return allkeys
};
Using the new API s3.listObjectsV2 the recursive solution will be:
S3Dataset.prototype.listFiles = function(params,callback) {
var self=this;
var options = {
};
for (var attrname in params) { options[attrname] = params[attrname]; }
var results=[];
var s3=self.s3Store.GetInstance();
function listAllKeys(token, callback) {
var opt={ Bucket: self._options.s3.Bucket, Prefix: self._options.s3.Key, MaxKeys: 1000 };
if(token) opt.ContinuationToken = token;
s3.listObjectsV2(opt, (error, data) => {
if (error) {
if(self.logger) this.logger.error("listFiles error:", error);
return callback(error);
} else {
for (var index in data.Contents) {
var bucket = data.Contents[index];
if(self.logger) self.logger.debug("listFiles Key: %s LastModified: %s Size: %s", bucket.Key, bucket.LastModified, bucket.Size);
if(bucket.Size>0) {
var Bucket=self._options.s3.Bucket;
var Key=bucket.Key;
var components=bucket.Key.split('/');
var name=components[components.length-1];
results.push({
name: name,
path: bucket.Key,
mtime: bucket.LastModified,
size: bucket.Size,
sizehr: formatSizeUnits(bucket.Size)
});
}
}
if( data.IsTruncated ) { // truncated page
return listAllKeys(data.NextContinuationToken, callback);
} else {
return callback(null,results);
}
}
});
}
return listAllKeys.apply(this,['',callback]);
};
where
function formatSizeUnits(bytes){
if (bytes>=1099511627776) {bytes=(bytes/1099511627776).toFixed(4)+' PB';}
else if (bytes>=1073741824) {bytes=(bytes/1073741824).toFixed(4)+' GB';}
else if (bytes>=1048576) {bytes=(bytes/1048576).toFixed(4)+' MB';}
else if (bytes>=1024) {bytes=(bytes/1024).toFixed(4)+' KB';}
else if (bytes>1) {bytes=bytes+' bytes';}
else if (bytes==1) {bytes=bytes+' byte';}
else {bytes='0 byte';}
return bytes;
}//formatSizeUnits
Although #Meekohi's answer does technically work, I've had enough heartache with the S3 portion of the AWS SDK for NodeJS. After all the previous struggling with modules such as aws-sdk, s3, knox, I decided to install s3cmd via the OS package manager and shell-out to it using child_process
Something like:
var s3cmd = new cmd_exec('s3cmd', ['ls', filepath, 's3://'+inputBucket],
function (me, data) {me.stdout += data.toString();},
function (me) {me.exit = 1;}
);
response.send(s3cmd.stdout);
(Using the cmd_exec implementation from this question)
This approach just works really well - including for other problematic things like file upload.
The cleanest way to do it for me was through execution of s3cmd from my node script like this (The example here is to delete files recursively):
var exec = require('child_process').exec;
var child;
var bucket = "myBucket";
var prefix = "myPrefix"; // this parameter is optional
var command = "s3cmd del -r s3://" + bucket + "/" + prefix;
child = exec(command, {maxBuffer: 5000 * 1024}, function (error, stdout, stderr) { // the maxBuffer is here to avoid the maxBuffer node process error
console.log('stdout: ' + stdout);
if (error !== null) {
console.log('exec error: ' + error);
}
});

Resources