Error in Watson Classifier API reference nodejs example: source.on is not a function - node.js

I'm trying to use Watson Classifier from node. I've started by implementing the example in the API reference, found at https://www.ibm.com/watson/developercloud/natural-language-classifier/api/v1/node.html?node#create-classifier
My code (sensitive information replaced with stars):
58 create: function(args, cb) {
59 var params = {
60 metadata: {
61 language: 'en',
62 name: '*********************'
63 },
64 training_data: fs.createReadStream(config.data.prepared.training)
65 };
66
67 params.training_data.on("readable", function () {
68 nlc.createClassifier(params, function(err, response) {
69 if (err)
70 return cb(err);
71 console.log(JSON.stringify(response, null, 2));
72 cb();
73 });
74 });
75 },
The file I am trying to make a stream from exists. The stream works (I've managed to read from it on "readable"). I've placed the on("readable") part because it made sense for me to do all of this once the stream becomes available, and also because I wanted to be able to check that I can read from it. It does not change the outcome, however.
nlc is the natural_langauge_classifier instance.
I'm getting this:
octav#****************:~/watsonnlu$ node nlc.js create
/home/octav/watsonnlu/node_modules/delayed-stream/lib/delayed_stream.js:33
source.on('error', function() {});
^
TypeError: source.on is not a function
at Function.DelayedStream.create (/home/octav/watsonnlu/node_modules/delayed-stream/lib/delayed_stream.js:33:10)
at FormData.CombinedStream.append (/home/octav/watsonnlu/node_modules/combined-stream/lib/combined_stream.js:44:37)
at FormData.append (/home/octav/watsonnlu/node_modules/form-data/lib/form_data.js:74:3)
at appendFormValue (/home/octav/watsonnlu/node_modules/request/request.js:321:21)
at Request.init (/home/octav/watsonnlu/node_modules/request/request.js:334:11)
at new Request (/home/octav/watsonnlu/node_modules/request/request.js:128:8)
at request (/home/octav/watsonnlu/node_modules/request/index.js:53:10)
at Object.createRequest (/home/octav/watsonnlu/node_modules/watson-developer-cloud/lib/requestwrapper.js:208:12)
at NaturalLanguageClassifierV1.createClassifier (/home/octav/watsonnlu/node_modules/watson-developer-cloud/natural-language-classifier/v1-generated.js:143:33)
at ReadStream.<anonymous> (/home/octav/watsonnlu/nlc.js:68:8)
I tried debugging it myself for a while, but I'm not sure what this source is actually supposed to be. It's just an object composed of the metadata I put in and the "emit" function if I print it before the offending line in delayed-stream.js.
{ language: 'en',
name: '*******************',
emit: [Function] }
This is my package.json file:
1 {
2 "name": "watsonnlu",
3 "version": "0.0.1",
4 "dependencies": {
5 "csv-parse": "2.0.0",
6 "watson-developer-cloud": "3.2.1"
7 }
8 }
Any ideas how to make the example work?
Cheers!
Octav

I got the answer in the meantime thanks to the good people at IBM. It seems you have to send the metadata as a stringified JSON:
59 var params = {
60 metadata: JSON.stringify({
61 language: 'en',
62 name: '*********************'
63 }),
64 training_data: fs.createReadStream(config.data.prepared.training)
65 };

Related

Kafkajs - get statistics (lag)

In our nest.js application we use kafkajs client for kafka.
We need to get chance monitor statistic.
One of metrics is lag.
Trying to figure out if kafkajs provides any and nothing interesting. (The most interesting thing in payload are: timestamp, offset, batchContext.firstOffset, batchContext.firstTimestamp, batchContext.maxTimestamp)
Questions
Is there any ideas how to log lag value and other statistic provided by kafkajs?
Should I think about implementing my own statistic monitor to collect required information in node application which uses kafka.js client?
New Details 1
Following documentation I can get batch.highWatermark, where
batch.highWatermark is the last committed offset within the topic partition. It can be useful for calculating lag.
Trying
await consumer.run({
eachBatchAutoResolve: true,
eachBatch: async (data) => {
console.log('Received data.batch.messages: ', data.batch.messages)
console.log('Received data.batch.highWatermark: ', data.batch.highWatermark)
},
})
I can get information like a next one:
Received data.batch.messages: [
{
magicByte: 2,
attributes: 0,
timestamp: '1628877419958',
offset: '144',
key: null,
value: <Buffer 68 65 6c 6c 6f 21>,
headers: {},
isControlRecord: false,
batchContext: {
firstOffset: '144',
firstTimestamp: '1628877419958',
partitionLeaderEpoch: 0,
inTransaction: false,
isControlBatch: false,
lastOffsetDelta: 2,
producerId: '-1',
producerEpoch: 0,
firstSequence: 0,
maxTimestamp: '1628877419958',
timestampType: 0,
magicByte: 2
}
},
{
magicByte: 2,
attributes: 0,
timestamp: '1628877419958',
offset: '145',
key: null,
value: <Buffer 6f 74 68 65 72 20 6d 65 73 73 61 67 65>,
headers: {},
isControlRecord: false,
batchContext: {
firstOffset: '144',
firstTimestamp: '1628877419958',
partitionLeaderEpoch: 0,
inTransaction: false,
isControlBatch: false,
lastOffsetDelta: 2,
producerId: '-1',
producerEpoch: 0,
firstSequence: 0,
maxTimestamp: '1628877419958',
timestampType: 0,
magicByte: 2
}
},
{
magicByte: 2,
attributes: 0,
timestamp: '1628877419958',
offset: '146',
key: null,
value: <Buffer 6d 6f 72 65 20 6d 65 73 73 61 67 65 73>,
headers: {},
isControlRecord: false,
batchContext: {
firstOffset: '144',
firstTimestamp: '1628877419958',
partitionLeaderEpoch: 0,
inTransaction: false,
isControlBatch: false,
lastOffsetDelta: 2,
producerId: '-1',
producerEpoch: 0,
firstSequence: 0,
maxTimestamp: '1628877419958',
timestampType: 0,
magicByte: 2
}
}
]
Received data.batch.highWatermark: 147
Is any ideas how to use batch.highWatermark in tag calculation then?
Looks like the only way to get offset lag metric is by using instrumentation events:
consumer.on(consumer.events.END_BATCH_PROCESS, (payload) =>
console.log(payload.offsetLagLow),
);
offsetLagLow measures the offset delta between first message in the batch and the last offset in the partition (highWatermark). You can also use offsetLag but it is based on the last offset of the batch.
As #Sergii mentioned there are some props available directly when you are using eachBatch (here are all available methods on the batch prop). But you won't get that props if you are using eachMessage. So instrumentation events are the most universal approach.

AWS S3 getSignedUrl() returns a 403 Forbidden Error

I'm trying to get a pre-signed URL from s3.getSignedUrl, so I could directly upload a file/image from the client react side. I get a URL but whenever I open that link or if I try to make put a request on that URL from the client, I always get a 403 forbidden error. I'm not doing this in serverless
// code to get a presigned url
const s3 = new AWS.S3({
accessKeyId: keys.ACCESS_KEY,
secretAccessKey: keys.SECRET_KEY,
});
const router = express.Router();
// this method is returning a url, configured with the Key, we shared
router.get(
'/api/image-upload/get-url',
requireAuth,
async (req: Request, res: Response) => {
// we want our key to be like this -- myUserId/12122113.jpeg -- where filename will be used as a random unique string
const key = `${req.currentUser!.id}/${uuid()}.jpeg`;
s3.getSignedUrl(
'putObject',
{
Bucket: 'my-first-s3-bucket-1234567',
ContentType: 'image/jpeg',
Key: key,
},
(err, url) => res.send({ key, url, err })
);
}
);
I get the object back having Key and Url property and if I open that URL or if I try to make a PUT request from the client-side, I get 403 Forbidden Error
// Error
<Error>
<link type="text/css" rel="stylesheet" id="dark-mode-custom-link"/>
<link type="text/css" rel="stylesheet" id="dark-mode-general-link"/>
<style lang="en" type="text/css" id="dark-mode-custom-style"/>
<style lang="en" type="text/css" id="dark-mode-native-style"/>
<Code>SignatureDoesNotMatch</Code>
<Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
<AWSAccessKeyId>AKIA2YLSQ26Z6T3PRLSR</AWSAccessKeyId>
<StringToSign>GET 1616261901 /my-first-s3-bucket-12345/60559b4dc123830023031184/f852ca00-89a0-11eb-a3dc-07c38e9b7626.jpeg</StringToSign>
<SignatureProvided>TK0TFR+I79t8PPbtRW37GYaOo5I=</SignatureProvided>
<StringToSignBytes>47 45 54 0a 0a 0a 31 36 31 36 32 36 31 39 30 31 0a 2f 6d 79 2d 66 69 72 73 74 2d 73 33 2d 62 75 63 6b 65 74 2d 31 32 33 34 35 2f 36 30 35 35 39 62 34 64 63 31 32 33 38 33 30 30 32 33 30 33 31 31 38 34 2f 66 38 35 32 63 61 30 30 2d 38 39 61 30 2d 31 31 65 62 2d 61 33 64 63 2d 30 37 63 33 38 65 39 62 37 36 32 36 2e 6a 70 65 67</StringToSignBytes>
<RequestId>56RQCDS1X5GMF4JH</RequestId>
<HostId>LTA1+vXnwzGcPo70GmuKg0J7QDzW4+t+Ai9mgVqcerRKDbXkHBOnqU/7ZTvMLpyDf1CLZMYwSMY=</HostId>
</Error>
Please have a look at my s3 bucket policies and cors
// S3 Bucket Policies
{
"Version": "2012-10-17",
"Id": "Policy1616259705897",
"Statement": [
{
"Sid": "Stmt1616259703206",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": "arn:aws:s3:::my-first-s3-bucket-1234567/*"
}
]
}
// S3 CORS [
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [],
"MaxAgeSeconds": 3000
},
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"PUT",
"POST",
"DELETE"
],
"AllowedOrigins": [
"https://ticketing.dev"
],
"ExposeHeaders": [
"x-amz-server-side-encryption",
"x-amz-request-id",
"x-amz-id-2",
"ETag"
],
"MaxAgeSeconds": 3000
} ]
I'm unable to resolve this issue. Please bear with me, I'm not a good questioner. Thanks
After a lot of debugging, I had to give my IAM user, AmazonS3FullAccess to make it work.
Maybe it would need some specific permission to make a put request to s3 presigned url

Azure functions messing up gzipped POST data

Currently i'm implementing a webhook which states that the request sent to the configured endpoint will be gzipped, and i'm experiencing a weird bug with that.
I created a middleware to handle de gunzip of the request data:
const buffer: Buffer[] = [];
request
.on("data", (chunk) => {
buffer.push(Buffer.from(chunk));
})
.on("end", () => {
const concatBuff: Buffer = Buffer.concat(buffer);
zlib.gunzip(concatBuff, (err, buff) => {
if (err) {
console.log("gunzip err", err);
return next(err);
}
request.body = buff.toString();
next();
});
});
I added this middleware before all the other body parser middlewares to avoid any incompatibility with that.
So i'm testing it with this curl command:
cat webhook.txt | gzip | curl -v -i --data-binary #- -H "Content-Encoding: gzip" http://localhost:3334
In this server, which uses azure-function-express, i'm getting this error:
[1/9/2020 22:36:21] gunzip err Error: incorrect header check
[1/9/2020 22:36:21] at Zlib.zlibOnError [as onerror] (zlib.js:170:17) {
[1/9/2020 22:36:21] errno: -3,
[1/9/2020 22:36:21] code: 'Z_DATA_ERROR'
[1/9/2020 22:36:21] }
[1/9/2020 22:36:21]
it seems that the error is caused because the header is not the "magical number" of a gzip file:
<Buffer 1f ef bf bd 08 00 ef bf bd ef bf bd 4e 5f 00 03 ef bf bd 5d 6d 73 db b8 11 ef bf bd ef b
f bd 5f ef bf bd e1 97 bb 6b 7d 16 ef bf bd 77 ef bf bd 73 ef ... 4589 more bytes>
But here is the weird thing, i created a new express application to test this using the exactly same curl, and it works perfectly in there, so it seems that there is some problem with the createAzureFunctionHandler, or i'm missing out something.
Have you guys experienced any of those problems using Azure functions??
Any idea of what is Azure messing up with the gzip data??
I just got an answer from the Azure team, they recommend me to set a proxy inside proxies.json as a workaround, so if anyone is having the same issue you can just set a new proxy to override the Content-Type.
In my case i was always expecting a gzipped json, so maybe if you don't know beforehand which type is this wouldn't work for you.
{
"$schema": "http://json.schemastore.org/proxies",
"proxies": {
"RequireContentType": {
"matchCondition": {
"route": "/api/HttpTrigger"
},
"backendUri": "https://proxy-example.azurewebsites.net/api/HttpTrigger",
"requestOverrides": {
"backend.request.headers.content-type": "application/octet-stream",
"backend.request.headers.request-content-type": "'{request.headers.content-type}'"
}
}
}
}

Express Response.send() throwing TypeError

I have this simple express code:
const api = Router()
api.post('/some-point', async (req, res, next) => {
const someStuffToSend = await Promise.resolve("hello");
res.json({ someStuffToSend });
})
It works well on my dev environment, but on the prod I get the error bellow:
TypeError: argument entity must be string, Buffer, or fs.Stats
at etag (/[...]/node_modules/etag/index.js:83:11)
at generateETag ([...]/node_modules/express/lib/utils.js:280:12)
at ServerResponse.send ([...]/node_modules/express/lib/response.js:200:17)
at ServerResponse.json ([...]/node_modules/express/lib/response.js:267:15)
at api.post (/the/code/above)
I checked at node_modules/etag/index.js:83:11 and saw
if (!isStats && typeof entity !== 'string' && !Buffer.isBuffer(entity)) {
throw new TypeError('argument entity must be string, Buffer, or fs.Stats')
}
Before this code I added a printf to check the type of entity:
console.log("Entity contains", entity, "is of type", typeof entity, "with constructor", entity.constructor, "and is it a buffer?", Buffer.isBuffer(entity))
Which got me the output bellow:
Entity contains <Buffer 7b 22 70 72 65 64 69 63 74 69 6f 6e 5f 69 64 22 3a 22 63 4b 57 41 64 41 46 43 77 6e 55 43 22 2c 22 69 6e 69 74 69 61 6c 5f 70 72 65 64 69 63 74 69 6f ... > is of type object with constructor function Buffer(arg, encodingOrOffset, length) {
if (!Buffer.TYPED_ARRAY_SUPPORT && !(this instanceof Buffer)) {
return new Buffer(arg, encodingOrOffset, length)
}
// Common case.
if (typeof arg === 'number') {
if (typeof encodingOrOffset === 'string') {
throw new Error(
'If encoding is specified then the first argument must be a string'
)
}
return allocUnsafe(this, arg)
}
return from(this, arg, encodingOrOffset, length)
} and is it a buffer? false
So it looks like the entity is a buffer, but not recognized as such. If I comment the test out, it crashed at a different location
TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be one of type string or Buffer
at ServerResponse.end (_http_outgoing.js:747:13)
at ServerResponse.send ([...]/node_modules/express/lib/response.js:221:10)
at ServerResponse.json ([...]/node_modules/express/lib/response.js:267:15)
at api.post (/the/code/above)
If you check at /node_modules/express/lib/response.js:221:10 you see
this.end(chunk, encoding);
where encoding is from a few lines above (l. 189, I checked with a printf)
chunk = Buffer.from(chunk, encoding)
I could hack the lib out to make that work, but I suspect some broken node_modules folder. However the error persists even after a rm package-lock.json; rm -rf node_modules; npm i.
Any clue on how to solve this would be very appreciated.
Below are my version numbers:
node 9.8.0 (also tried with 8.4.0), installed locally with nvm
npm 5.6.0
express 4.16.3
ts-node 5.0.1
typescript 2.7.2
Edit 1
replace the async call by something simpler
specify express version
Edit 2
I removed the node_modules folder then npm i the packages one by one:
npm i aws-sdk body-parser bunyan check-types deepcopy duck-type express fast-csv glob handlebars http-auth md5 moment moment-timezone multer node-stream object-path randomstring
Still get the same error.
Edit 3
Add further information about the environment.
Edit 4
OK, figured out. It was a ts-node config related problem.
I was launching my server with
ts-node --harmony_async_iteration -r tsconfig-paths/register ./src/index.ts
With the following lines in my tsconfig.json:
{
"compilerOptions": {
"module": "commonjs",
"target": "es2017",
"lib": [ "es2017", "esnext.asynciterable" ],
"noImplicitAny": true,
"moduleResolution": "node",
"sourceMap": true,
"outDir": "dist",
"baseUrl": ".",
"pretty": true,
"paths": {
"*": [
"*", "src/*", "src/types/*",
"node_modules/*"
]
}
},
"include": [ "src/**/*" ]
}
Because of the -r tsconfig-paths/register in the command line, the paths specified in the tsconfig.json were loaded, including the "node_modules/*" in paths['*'].
I'm not sure why, but it looks like this was causing the libs in node_modules to be loaded twice, breaking the type checks based on constructors (such as instanceof).
Question
I'm not sure to completely understand the reason of that. Any light?
I had very similar issue with typeorm and later with express too. The hint was in this conversation. My solution was to get rid * from paths.
I was getting this error,
TypeError: argument entity must be string, Buffer, or fs.Stats
at etag (/[...]/node_modules/etag/index.js:83:11)
at generateETag ([...]/node_modules/express/lib/utils.js:280:12)
.....
I was trying to run a express project, and using webpack as bundler,
My error got resolved when I set target in webpack as 'node'
module.exports = (env) => {
return {
//....
target: "node",
//....,
plugins: [
// new NodePolyfillPlugin() // Don't add this plugin
]
}
}
And make sure not to add NodePolyfillPlugin as pluging

How to replicate a curl command with the nodejs request module?

How can I replicate this curl request:
$ curl "https://s3-external-1.amazonaws.com/herokusources/..." \
-X PUT -H 'Content-Type:' --data-binary #temp/archive.tar.gz
With the node request module?
I need to do this to PUT a file up on AWS S3 and to match the signature provided by Heroku in the put_url from Heroku's sources endpoint API output.
I have tried this (where source is the Heroku sources endpoint API output):
// PUT tarball
function(source, cb){
putUrl = source.source_blob.put_url;
urlObj = url.parse(putUrl);
var options = {
headers: {},
method : 'PUT',
url : urlObj
}
fs.createReadStream('temp/archive.tar.gz')
.pipe(request(
options,
function(err, incoming, response){
if (err){
cb(err);
} else {
cb(null, source);
}
}
));
}
But I get the following SignatureDoesNotMatch error.
<?xml version="1.0"?>
<Error>
<Code>SignatureDoesNotMatch</Code>
<Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
<AWSAccessKeyId>AKIAJURUZ6XB34ESX54A</AWSAccessKeyId>
<StringToSign>PUT\n\nfalse\n1424204099\n/heroku-sources-production/heroku.com/d1ed2f1f-4c81-43c8-9997-01706805fab8</StringToSign>
<SignatureProvided>DKh8Y+c7nM/6vJr2pabvis3Gtsc=</SignatureProvided>
<StringToSignBytes>50 55 54 0a 0a 66 61 6c 73 65 0a 31 34 32 34 32 30 34 30 39 39 0a 2f 68 65 72 6f 6b 75 2d 73 6f 75 72 63 65 73 2d 70 72 6f 64 75 63 74 69 6f 6e 2f 68 65 72 6f 6b 75 2e 63 6f 6d 2f 64 31 65 64 32 66 31 66 2d 34 63 38 31 2d 34 33 63 38 2d 39 39 39 37 2d 30 31 37 30 36 38 30 35 66 61 62 38</StringToSignBytes>
<RequestId>A7F1C5F7A68613A9</RequestId>
<HostId>JGW6l8G9kFNfPgSuecFb6y9mh7IgJh28c5HKJbiP6qLLwvrHmESF1H5Y1PbFPAdv</HostId>
</Error>
Here is an example of what the Heroku sources endpoint API output looks like:
{ source_blob:
{ get_url: 'https://s3-external-1.amazonaws.com/heroku-sources-production/heroku.com/2c6641c3-af40-4d44-8cdb-c44ee5f670c2?AWSAccessKeyId=AKIAJURUZ6XB34ESX54A&Signature=hYYNQ1WjwHqyyO0QMtjVXYBvsJg%3D&Expires=1424156543',
put_url: 'https://s3-external-1.amazonaws.com/heroku-sources-production/heroku.com/2c6641c3-af40-4d44-8cdb-c44ee5f670c2?AWSAccessKeyId=AKIAJURUZ6XB34ESX54A&Signature=ecj4bxLnQL%2FZr%2FSKx6URJMr6hPk%3D&Expires=1424156543'
}
}
Update
The key issue here is that the PUT request I send with the request module should be the same as the one sent with curl because I know that the curl request matches the expectations of the AWS S3 Uploading Objects Using Pre-Signed URLs API. Heroku generates the PUT url so I have no control over its creation. I do know that the curl command works as I have tested it -- which is good since it is the example provided by Heroku.
I am using curl 7.35.0 and request 2.53.0.
The Amazon API doesn't like chunked uploads. The file needs to be sent unchunked. So here is the code that works:
// PUT tarball
function(source, cb){
console.log('Uploading tarball...');
putUrl = source.source_blob.put_url;
urlObj = url.parse(putUrl);
fs.readFile(config.build.temp + 'archive.tar.gz', function(err, data){
if (err){ cb(err); }
else {
var options = {
body : data,
method : 'PUT',
url : urlObj
};
request(options, function(err, incoming, response){
if (err){ cb(err); } else { cb(null, source); }
});
}
});
},

Resources