Looking for a better way to parse RSS data - node.js

I wrote a basic API to parse RSS data from any source (in particular the Washington Post). The route I wrote is simple enough, it just checks what category your looking for, then queries the newest data in that feed. However the program runs slower than I need it to. For reasons explained below, it needs to respond in less that 8s.
I haven't been a active software developer for years (junior dev for 2 yrs like 15 yrs ago) so I am rusty. I'm new to node js and I have never written an API before.
I'm working on this because I'm trying to integrate it with an online service that requires all api calls to return in under 8s. However, my functions return in about 22s. I switched parsing methods and got it down to about 14s but from here I'm stuck. Does anyone know a quicker way to parse large rss feeds.
I'm using rss parser package ans even switch to fast-xml-parser with not luck.
I don't even need all 15 or 20 posts, just the latest 3. I tried using trim to limit the results but that doesn't seem to make any significant difference. Below is the code. I also can link to the repo in github if more context is needed. Any help would be appreciated!
//Basic RSS parser returning array of results.
async function fetchRssFeed(feedUrl) {
let feed = await. parser.parseURL(feedUrl);
return feed.items.map(item => {
return {
title: item.title,
link: item.link,
date: item.pubDate
}
});
}

Related

Get extended/full text tweets in Twitter API v2

The new Twitter v2 API was just released a couple of weeks ago, so this may just be an issue of the documentation not being done quite yet.
What I am trying to do is search recent tweets for "puppies" and return all that have some kind of media attached. However, when I run this search in Postman, not all of the returned tweets have attachments.media_keys. I noticed that the ones that do not have attachments.media_keys are tweets whose text ends in ellipses .... I understand that in the v1.1 API, this issue is solved by specifying tweet_mode=extended in the query params or tweet.fields=extended_tweet. However, these do not seem to work in the v2 API and I have not seen any documentation about getting the full text of tweets (and the associated attachments). Does anyone know how to do this in v2?
My Postman query url: "https://api.twitter.com/2/tweets/search/recent?query=has:media puppies&tweet.fields=attachments&expansions=attachments.media_keys&media.fields=duration_ms,height,media_key,preview_image_url,public_metrics,type,url,width"
In my app, I am using Node.js Axios to perform the query:
var axios = require('axios');
var config = {
method: 'get',
url: 'https://api.twitter.com/2/tweets/search/recent?query=has:media puppies&tweet.fields=attachments&expansions=attachments.media_keys&media.fields=duration_ms,height,media_key,preview_image_url,public_metrics,type,url,width',
headers: {
'Authorization': 'Bearer {{my berarer token}}',
}
};
axios(config)
.then(function (response) {
console.log(JSON.stringify(response.data));
})
.catch(function (error) {
console.log(error);
});
As of July, 2021, for sure this "problem" or strange behavior concerns retweets.
To get full text of a retweet while getting recent tweets for a user I did the following trick:
First I get recent tweets for a user following docs:
curl "https://api.twitter.com/2/users/2244994945/tweets?expansions=attachments.poll_ids,attachments.media_keys,author_id,entities.mentions.username,geo.place_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id&tweet.fields=attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,possibly_sensitive,public_metrics,referenced_tweets,reply_settings,source,text,withheld&user.fields=created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld&place.fields=contained_within,country,country_code,full_name,geo,id,name,place_type&poll.fields=duration_minutes,end_datetime,id,options,voting_status&media.fields=duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics,non_public_metrics,organic_metrics,promoted_metrics&max_results=5" -H "Authorization: Bearer $BEARER_TOKEN"
This is an all fields query (not all fields are necessary) but it is necessary to get ['includes']['tweets'] within the structure of the returned JSON data. This is the place where you have to look for the full text of a retweet - it is at: ['includes']['tweets'][0..n]['text] while all the recent tweets (and retweets) are found at ['data'][0..n]['text'].
Then you have to match the shortened retweets from the ['data'] with those from the ['includes']['tweets']. I do it using ['data'][n]['referenced_tweets'][0]['id'] which should match ['includes']['tweets'][m]['id]. where n and m are some indexes.
To be 100% safe you can check if ['data'][n]['referenced_tweets'][0]['id'] has a matching key/value pair: type: retweet (suggesting that this is really a retweet reference), but for me the 0 index works in all checked cases so not to complicate things more I left it this way for now :)
If that sounds complicated just dump the whole parsed JSON with all tweets and check the structure of the data.
Great question, thank you. We’re discussing this on the Twitter Developer forums as well.
In v2 of the API we have eliminated the notion of an “extended Tweet” since we assume that all new apps understand the concept of 280 characters, so the complete text is in the Tweet text field.
The difference you’re finding is in retweets or quoted Tweets where the embedded text is truncated. This is (perhaps surprisingly) the same as v1.1 and the former premium and enterprise APIs as well. We are investigating whether to modify this, and the implications in doing so.
I don’t for any means want to take traffic away from Stack, but you might find more ongoing updates and information on our developer forums. Thanks!

What Version of Javascript Engine, Rhino, is used for Bixby's Servers?

I am writing a Javascript code for one of my Actions and It is complicated as it manipulates data structure of javascript object (written below) to search queries. How do I debug the code
to make sure it works as intended. It consumes alot time for me so I was wondering if I could setup an IDE for myself. Sure I can use bixby itself to view the data output but sometimes it convenient to use console to check my code as I go along. I am not asking for recommendation, but I need clarify what the dev docs implies. It does mention that it use ES5.1 and some features too. but I don't know what that "some features" are by just looking at Mozilla's Rhino Compatibility chart. Because I did want to use .reduce(callback, initialValue) function to ouput data objects. However The Mozilla's Rhino Chart shows error for it.
PS: I Hope I am not breaking rules this time.
// #DataGraph
[{
$id: "Cold_Souls_1",
animeTitle: "Fullmetal Alchemist Brotherhood",
animePoster:{
referenceImage: 'https://qph.fs.quoracdn.net/main-qimg-da58c837c7197acf364cb2ada34fc5fb.webp',
imageTags: ["Grey","Yellow","Blue","Metal Body","Machine", "Robot","Yellow Hair Boy"],
},
animeCharacters:{
"Edward Elric": [
{
quote: "A lesson without pain is meaningless. For you cannot gain something without sacrificing something else in return. But once you have recovered it and made it your own... You will gain an irreplaceable Fullmetal heart.",
keywords: ["lesson", "pain", "meaningless", "gain","sacrificing", "recover"],
category: "Life Lesson"
}
]
}
}]
As you read from https://bixbydevelopers.com/dev/docs/dev-guide/developers/actions.js-actions Bixby IDE supports ECMAScript 5.1 and some ES6 such as:
The arrow (=>) function operator
The const keyword
The let keyword
Array destructuring
It is fair to say the functions not listed above are not supported.
I would recommend that you raise a Feature Request in our community for unsupported features. This forum is open to other Bixby developers who can upvote it, leading to more visibility within the community and with the Product Management team.

Delaying execution of multiple HTTP requests in Google Cloud Function

I've implemented a web scraper with Nodejs, cheerio and request-promise that scrapes an endpoint (basic html page) and return certain information. The content of the page I'm crawling differs based on a parameter at the end of the url (http://some-url.com?value=12345 where 12345 is my dynamic value).
I need this crawler to work every x minutes and crawl multiple pages, and to do that I've set a cronjob using Google Cloud Scheduler. (I'm fetching the dynamic values I need from Firebase).
There could be more than 50 different values for which I'd need to crawl the specific page, but I would like to ease the load with which I'm sending the requests so the server doesn't choke. To accomplish this, I've tried to add a delay
1) using setTimeout
2) using setInterval
3) using a custom sleep implementation:
const sleep = require('util').promisify(setTimeout);
All 3 of these methods work locally; all of the requests are made with y seconds delay as intended.
But when tried with Firebase Cloud Functions and Google Cloud Scheduler
1) not all of the requests are sent
2) the delay is NOT consistent (some requests fire with the proper delay, then there are no requests made for a while and other requests are sent with a major delay)
I've tried many things but I wasn't able to solve this problem.
I was wondering if anyone could suggest a different theoretical approach or a certain library etc. I can take for this scenario, since the one I have now doesn't seem to work as I intended. I'm adding one of the approaches that locally work below.
Cheers!
courseDataRefArray.forEach(async (dataRefObject: CourseDataRef, index: number) => {
console.log(`Foreach index = ${index} -- Hello StackOverflow`);
setTimeout(async () => {
console.log(`Index in setTimeout = ${index} -- Hello StackOverflow`);
await CourseUtil.initiateJobForCourse(dataRefObject.ref, dataRefObject.data);
}, 2000 * index);
});
(Note: I can provide more code samples if necessary; but it's mostly following a loop & async/await & setTimeout pattern, and since it works locally I'm assuming that's not the main problem.)

Strapi & react-admin : I'd like to set 'Content-Range' header dynamically when any fetchAll query fires

I'm still a novice web developer, so please bear with me if I miss something fundamental !
I'm creating a backoffice for a Strapi backend, using react-admin.
React-admin library uses a 'data provider' to link itself with an API. Luckily someone already wrote a data provider for Strapi. I had no problem with step 1 and 2 of this README, and I can authenticate to Strapi within my React app.
I now want to fetch and display my Strapi data, starting with Users. In order to do that, quoting Step 3 of this readme : 'In controllers I need to set the Content-Range header with the total number of results to build the pagination'.
So far I tried to do this in my User controller, with no success.
What I try to achieve:
First, I'd like it to simply work with the ctx.set('Content-Range', ...) hard-coded in the controller like aforementioned Step 3.
Second, I've thought it would be very dirty to c/p this logic in every controller (not to mention in any future controllers), instead of having some callback function dynamically appending the Content-Range header to any fetchAll request. Ultimately that's what I aim for, because with ~40 Strapi objects to administrate already and plenty more to come, it has to scale.
Technical infos
node -v: 11.13.0
npm -v: 6.7.0
strapi version: 3.0.0-alpha.25.2
uname -r output: Linux 4.14.106-97.85.amzn2.x86_64
DB: mySQL v2.16
So far I've tried accessing the count() method of User model like aforementioned step3, but my controller doesn't look like the example as I'm working with users-permissions plugin.
This is the action I've tried to edit (located in project/plugins/users-permissions/controllers/User.js)
find: async (ctx) => {
let data = await strapi.plugins['users-permissions'].services.user.fetchAll(ctx.query);
data.reduce((acc, user) => {
acc.push(_.omit(user.toJSON ? user.toJSON() : user, ['password', 'resetPasswordToken']));
return acc;
}, []);
// Send 200 `ok`
ctx.send(data);
},
From what I've gathered on Strapi documentation (here and also here), context is a sort of wrapper object. I only worked with Express-generated APIs before, so I understood this snippet as 'use fetchAll method of the User model object, with ctx.query as an argument', but I had no luck logging this ctx.query. And as I can't log stuff, I'm kinda blocked.
In my exploration, I naively tried to log the full ctx object and work from there:
// Send 200 `ok`
ctx.send(data);
strapi.log.info(ctx.query, ' were query');
strapi.log.info(ctx.request, 'were request');
strapi.log.info(ctx.response, 'were response');
strapi.log.info(ctx.res, 'were res');
strapi.log.info(ctx.req, 'were req');
strapi.log.info(ctx, 'is full context')
},
Unfortunately, I fear I miss something obvious, as it gives me no input at all. Making a fetchAll request from my React app with these console.logs print this in my terminal:
[2019-09-19T12:43:03.409Z] info were query
[2019-09-19T12:43:03.410Z] info were request
[2019-09-19T12:43:03.418Z] info were response
[2019-09-19T12:43:03.419Z] info were res
[2019-09-19T12:43:03.419Z] info were req
[2019-09-19T12:43:03.419Z] info is full context
[2019-09-19T12:43:03.435Z] debug GET /users?_sort=id:DESC&_start=0&_limit=10& (74 ms)
While in my frontend I get the good ol' The Content-Range header is missing in the HTTP Response message I'm trying to solve.
After writing this wall of text I realize the logging issue is separated from my original problem, but if I was able to at least log ctx properly, maybe I'd be able to find the solution myself.
Trying to summarize:
Actual problem is, how do I set my Content-Range properly in my strapi controller ? (partially answered cf. edit 3)
Collateral problem n°1: Can't even log ctx object (cf. edit 2)
Collateral problem n°2: Once I figure out the actual problem, is it feasible to address it dynamically (basically some callback function for index/fetchAll routes, in which the model is a variable, on which I'd call the appropriate count() method, and finally append the result to my response header)? I'm not asking for the code here, just if you think it's feasible and/or know a more elegant way.
Thank you for reading through and excuse me if it was confuse; I wasn't sure which infos would be relevant, so I thought the more the better.
/edit1: forgot to mention, in my controller I also tried to log strapi.plugins['users-permissions'].services.user object to see if it actually has a count() method but got no luck with that either. Also tried the original snippet (Step 3 of aforementioned README), but failed as expected as afaik I don't see the User model being imported anywhere (the only import in User.js being lodash)
/edit2: About the logs, my bad, I just misunderstood the documentation. I now do:
ctx.send(data);
strapi.log.info('ctx should be : ', {ctx});
strapi.log.info('ctx.req = ', {...ctx.req});
strapi.log.info('ctx.res = ', {...ctx.res});
strapi.log.info('ctx.request = ', {...ctx.request});
ctrapi.log.info('ctx.response = ', {...ctx.response});
Ctx logs this way; also it seems that it needs the spread operator to display nested objects ({ctx.req} crash the server, {...ctx.req} is okay). Cool, because it narrows the question to what's interesting.
/edit3: As expected, having logs helps big time. I've managed to display my users (although in the dirty way). Couldn't find any count() method, but watching the data object that is passed to ctx.send(), it's equivalent to your typical 'res.data' i.e a pure JSON with my user list. So a simple .length did the trick:
let data = await strapi.plugins['users-permissions'].services.user.fetchAll(ctx.query);
data.reduce((acc, user) => {
acc.push(_.omit(user.toJSON ? user.toJSON() : user, ['password', 'resetPasswordToken']));
return acc;
}, []);
ctx.set('Content-Range', data.length) // <-- it did the trick
// Send 200 `ok`
ctx.send(data);
Now starting to work on the hard part: the dynamic callback function that will do that for any index/fetchAll call. Will update once I figure it out
I'm using React Admin and Strapi together and installed ra-strapi-provider.
A little boring to paste Content-Range header into all of my controllers, so I searched for a better solution. Then I've found middleware concept and created one that fits my needs. It's probably not the best solution, but do its job well:
const _ = require("lodash");
module.exports = strapi => {
return {
// can also be async
initialize() {
strapi.app.use(async (ctx, next) => {
await next();
if (_.isArray(ctx.response.body))
ctx.set("Content-Range", ctx.response.body.length);
});
}
};
};
I hope it helps
For people still landing on this page:
Strapi has been updated from #alpha to #beta. Care, as some of the code in my OP is no longer valid; also some of their documentation is not up to date.
I failed to find a "clever" way to solve this problem; in the end I copy/pasted the ctx.set('Content-Range', data.length) bit in all relevant controllers and it just worked.
If somebody comes with a clever solution for that problem I'll happily accept his answer. With the current Strapi version I don't think it's doable with policies or lifecycle callbacks.
The "quick & easy fix" is still to customize each relevant Strapi controller.
With strapi#beta you don't have direct access to controller's code: you'll first need to "rewrite" one with the help of this doc. Then add the ctx.set('Content-Range', data.length) bit. Test it properly with RA, so for the other controllers, you'll just have to create the folder, name the file, copy/paste your code + "Search & Replace" on model name.
The "longer & cleaner fix" would be to dive into the react-admin source code and refactorize so the lack of "Content-Range" header doesn't break pagination.
You'll now have to maintain your own react-admin fork, so make sure you're already committed into this library and have A LOT of tables to manage through it (so much that customizing every Strapi controller will be too tedious).
Before forking RA, please remember all the stuff you can do with the Strapi backoffice alone (including embedding your custom React app into it) and ensure it will be worth the trouble.

How do I acquire and return the date using an AWS lamba function

I am a unity developer who has been hired to make a game on the amazon alexa. It has been a mostly smooth process so far, but I now need to call the day's date into my IDE (pullstring), and was told by pullstring support that the solution is to make an endpoint api call to a lambda function that I will create and host.
I have tried finding some basic tutorials along the lines of making a very basic lambda function, but most of the videos and tutorials I've found assume I am already a web developer and know my way around node js and amazon's service environment, which I do not.
The function that I need is simple enough (I think?), as I just need to call the current date as a number and then look up that date on a json I have set up (which is working, I just need to get the actual real date instead of a dummy variable) and have written this:
let dateObj = new Date();
let day = dateObj.getDate();
but within how a lambda function works, I'm unsure how to set up the service and call this date (day) to be returned.
Does anyone have any resources that outline constructing and using a lambda function in the most basic, most essential way possible? Will the code I wrote serve the purpose I need?
I'm a little out of my element and just need a few gaps filled in before I can learn the rest on my own.
Thank you!
This Node.js/Lambda tutorial might help get you going: https://dev.to/adnanrahic/getting-started-with-aws-lambda-and-nodejs-1kcf

Resources