Download files before build in gatsby wordpress - node.js

I have a client that im working with who needs his pdfs to be readable in browser and the user doesn't need to download them first and it turned out to not be an option to do it through Wordpress so I thought I can download them in gatsby before build everytime if they don't already exist and I was wondering if this is possible.
I found this repo: https://github.com/jamstack-cms/jamstack-ecommerce
that shows a way to do it with this code:
function getImageKey(url) {
const split = url.split('/')
const key = split[split.length - 1]
const keyItems = key.split('?')
const imageKey = keyItems[0]
return imageKey
}
function getPathName(url, pathName = 'downloads') {
let reqPath = path.join(__dirname, '..')
let key = getImageKey(url)
key = key.replace(/%/g, "")
const rawPath = `${reqPath}/public/${pathName}/${key}`
return rawPath
}
async function downloadImage (url) {
return new Promise(async (resolve, reject) => {
const path = getPathName(url)
const writer = fs.createWriteStream(path)
const response = await axios({
url,
method: 'GET',
responseType: 'stream'
})
response.data.pipe(writer)
writer.on('finish', resolve)
writer.on('error', reject)
})
}
but It doesn't seem to work if i put it in my createPages and i cant use it outside it either because i don't have access to graphql to query the data first.
any idea how to do this?

WordPress source example is defined as async:
exports.createPages = async ({ graphql, actions }) => {
... so you can already use await to download your file(-s) just after querying data (and before createQuery() call). It should (NOT TESTED) be as easy as:
// Check for any errors
if (result.errors) {
console.error(result.errors)
}
// Access query results via object destructuring
const { allWordpressPage, allWordpressPost } = result.data
const pageTemplate = path.resolve(`./src/templates/page.js`)
allWordpressPage.edges.forEach(edge => {
// for one file per edge
// url taken/constructed from some edge property
await downloadImage (url);
createPage({
Of course for multiple files you should use Promise.all to wait for [resolving] all [returned promise] downloads before creating page:
allWordpressPage.edges.forEach(edge => {
// for multiple files per edge(page)
// url taken/constructed from some edge properties in a loop
// adapth 'paths' of iterable (edge.xxx.yyy...)
// and/or downloadImage(image) argument, f.e. 'image.someUrl'
await Promise.all(
edge.node.someImageArrayNode.map( image => { return downloadImage(image); }
);
createPage({
If you need to pass/update image nodes (for components usage) you should be able to mutate nodes, f.e.:
await Promise.all(
edge.node.someImageArrayNode.map( image => {
image["fullUrl"] = `/publicPath/${image.url}`;
return downloadImage(image.url); // return Promise at the end
}
);
createPage({
path: slugify(item.name),
component: ItemView,
context: {
content: item,
title: item.name,
firstImageUrl: edge.node.someImageArrayNode[0].fullUrl,
images: edge.node.someImageArrayNode

Related

remix passing args to async server side functions

I am actually a really beginner with this stuff so I beg your pardon for my (silly) questions.
I want to use async functions inside tsx pages, specifically those functions are fetching calls from shopify to get data and ioredis calls to write and read some data.
I know that remix uses action loader functions, so to manage shopify calls I figured out this
export const loader: LoaderFunction = async ({ params }) => {
return json(await GetProductById(params.id as string));
};
async function GetProductById(id: string) {
const ops = ...;
const endpoint = ...;
const response = await fetch(endpoint, ops);
const json = await response.json();
return json;
};
export function FetchGetProductById(id: number) {
const fetcher = useFetcher();
useEffect(() => {
fetcher.load(`/query/getproductid/${id}`);
}, []);
return fetcher.data;
}
with this solution I can get the data whenever I want just calling FetchGetProductById, but my problem is that I need to send more complex data to the loader (like objects)
How may I do that?
In Remix, the loader only handles GET requests, so data must be in the URL, either via params or searchParams (query string).
If you want to pass data in the body of the request, then you'll need to use POST and create an action.
NOTE: Remix uses FormData and not JSON to send data. You will need to convert your JSON into a string.
export const action = async ({ request }: ActionArgs) => {
const formData = await request.formData();
const object = JSON.parse(formData.get("json") as string);
return json(object);
};
export default function Route() {
const fetcher = useFetcher();
useEffect(() => {
if (fetcher.state !== 'idle' || fetcher.data) return;
fetcher.submit(
{
json: JSON.stringify({ a: 1, message: "hello", b: true }),
},
{ method: "post" }
);
}, [fetcher]);
return <pre>{JSON.stringify(fetcher.data, null, 2)}</pre>
}

Asynchronous function in Node.js API not working as intended

As an exercise, I'm creating a simple API that allows users to provide a search term to retrieve links to appropriate news articles across a collection of resources. The relevent function and the route handler that uses the function is as follows:
function GetArticles(searchTerm) {
const articles = [];
//Loop through each resource
resources.forEach(async resource => {
const result = await axios.get(resource.address);
const html = result.data;
//Use Cheerio: load the html document and create Cheerio selector/API
const $ = cheerio.load(html);
//Filter html to retrieve appropriate links
$(`a:contains(${searchTerm})`, html).each((i, el) => {
const title = $(el).text();
let url = $(el).attr('href');
articles.push(
{
title: title,
url: url,
source: resource.name
}
);
})
})
return articles; //Empty array is returned
}
And the route handler that uses the function:
app.get('/news/:searchTerm', async (req, res) => {
const searchTerm = req.params.searchTerm;
const articles = await GetArticles(searchTerm);
res.json(articles);
})
The problem I'm getting is that the returned "articles" array is empty. However, if I'm not "looping over each resource" as commented in the beginning of GetArticles, but instead perform the main logic on just a single "resource", "articles" is returned with the requested data and is not empty. In other words, if the function is the following:
async function GetArticles(searchTerm) {
const articles = [];
const result = await axios.get(resources[0].address);
const html = result.data;
const $ = cheerio.load(html);
$(`a:contains(${searchTerm})`, html).each((i, el) => {
const title = $(el).text();
let url = $(el).attr('href');
articles.push(
{
title: title,
url: url,
source: resources[0].name
}
);
})
return articles; //Populated array
}
Then "articles" is not empty, as intended.
I'm sure this has to do with how I'm dealing with the asynchronous nature of the code. I've tried refreshing my knowledge of asynchronous programming in JS but I still can't quite fix the function. Clearly, the "articles" array is being returned before it's populated, but how?
Could someone please help explain why my GetArticles function works with a single "resource" but not when looping over an array of "resources"?
Try this
function GetArticles(searchTerm) {
return Promise.all(resources.map(resource => axios.get(resource.address))
.then(responses => responses.flatMap(result => {
const html = result.data;
//Use Cheerio: load the html document and create Cheerio selector/API
const $ = cheerio.load(html);
let articles = []
//Filter html to retrieve appropriate links
$(`a:contains(${searchTerm})`, html).each((i, el) => {
const title = $(el).text();
let url = $(el).attr('href');
articles.push(
{
title: title,
url: url,
source: resource.name
}
);
})
return articles;
}))
}
The problem in your implementation was here
resources.forEach(async resource...
You have defined your function async but when result.foreach get executed and launch your async functions it doesn't wait.
So your array will always be empty.

fs.writeFile crashes node app after writing first json file

I'm trying to crawl several web pages to check broken links and writing the results of the links to a json files, however, after the first file is completed the app crashes with no error popping up...
I'm using Puppeteer to crawl, Bluebird to run each link concurrently and fs to write the files.
WHAT IVE TRIED:
switching file type to '.txt' or '.php', this works but I need to create another loop outside the current workflow to convert the files from '.txt' to '.json'. Renaming the file right after writing to it also causes the app to crash.
using try catch statements for fs.writeFile but it never throws an error
the entire app outside of express, this worked at some point but i trying to use it within the framework
const express = require('express');
const router = express.Router();
const puppeteer = require('puppeteer');
const bluebird = require("bluebird");
const fs = require('fs');
router.get('/', function(req, res, next) {
(async () => {
// Our (multiple) URLs.
const urls = ['https://www.testing.com/allergy-test/', 'https://www.testing.com/genetic-testing/'];
const withBrowser = async (fn) => {
const browser = await puppeteer.launch();
try {
return await fn(browser);
} finally {
await browser.close();
}
}
const withPage = (browser) => async (fn) => {
const page = await browser.newPage();
// Turns request interceptor on.
await page.setRequestInterception(true);
// Ignore all the asset requests, just get the document.
page.on('request', request => {
if (request.resourceType() === 'document' ) {
request.continue();
} else {
request.abort();
}
});
try {
return await fn(page);
} finally {
await page.close();
}
}
const results = await withBrowser(async (browser) => {
return bluebird.map(urls, async (url) => {
return withPage(browser)(async (page) => {
await page.goto(url, {
waitUntil: 'domcontentloaded',
timeout: 0 // Removes timeout.
});
// Search for urls we want to "crawl".
const hrefs = await page.$$eval('a[href^="https://www.testing.com/"]', as => as.map(a => a.href));
// Predefine our arrays.
let links = [];
let redirect = [];
// Loops through each /goto/ url on page
for (const href of Object.entries(hrefs)) {
response = await page.goto(href[1], {
waitUntil: 'domcontentloaded',
timeout: 0 // Remove timeout.
});
const chain = response.request().redirectChain();
const link = {
'source_url': href[1],
'status': response.status(),
'final_url': response.url(),
'redirect_count': chain.length,
};
// Loops through the redirect chain for each href.
for ( const ch of chain) {
redirect = {
status: ch.response().status(),
url: ch.url(),
};
}
// Push all info of target link into links
links.push(link);
}
// JSONify the data.
const linksJson = JSON.stringify(links);
fileName = url.replace('https://www.testing.com/', '');
fileName = fileName.replace(/[^a-zA-Z0-9\-]/g, '');
// Write data to file in /tmp directory.
fs.writeFile(`./tmp/${fileName}.json`, linksJson, (err) => {
if (err) {
return console.log(err);
}
});
});
}, {concurrency: 4}); // How many pages to run at a time.
});
})();
});
module.exports = router;
UPDATE:
So there is nothing wrong with my code... I realized nodemon was stopping the process after each file was saved. Since nodemon would detect a "file change" it kept restarting my server after the first item

What is the ideal way to loop API requests with fetch?

I'm relatively new to working with NodeJS, and I'm doing a practice project using the Youtube API to get some data on a user's videos. The Youtube API returns a list of videos with a page token, to successfully collect all of a user's videos, you would have to make several API requests, each with a different page token. When you reach the end of these requests, there will be no new page token present in the response, so you can move on. Doing it in a for, or while loop seemed like the way to handle this, but these are synchronous operations that do not appear to work in promises, so I had to look for an alternative
I looked at a few previous answers to similar questions, including the ones here and here. I got the general idea of the code in the answers, but I couldn't quite figure out how to get it working fully myself. The request I am making is already chained in a .then() of a previous API call - I would like to complete the recursive fetch calls with new page tokens, and then move onto another .then(). Right now, when I run my code, it moves onto the next .then() without the requests that use the tokens being complete. Is there any way to stop this from happening? I know async/await may be a solution, but I've decided to post here just to see if there are any possible solutions without having to go down that route in the hope I learn a bit about fetch/promises in general. Any other suggestions/advice about the way the code is structured is welcome too, as I'm pretty conscious that this is probably not the best way to handle making all of these API calls.
Code :
let body = req.body
let resData = {}
let channelId = body.channelId
let videoData = []
let pageToken = ''
const fetchWithToken = (nextPageToken) => {
let uploadedVideosUrlWithToken = `https://youtube.googleapis.com/youtube/v3/playlistItems?part=ContentDetails&playlistId=${uploadedVideosPlaylistId}&pageToken=${nextPageToken}&maxResults=50&key=${apiKey}`
fetch(uploadedVideosUrlWithToken)
.then(res => res.json())
.then(uploadedVideosTokenPart => {
let {items} = uploadedVideosTokenPart
videoData.push(...items.map(v => v.contentDetails.videoId))
pageToken = (uploadedVideosTokenPart.nextPageToken) ? uploadedVideosTokenPart.nextPageToken : ''
if (pageToken) {
fetchWithToken(pageToken)
} else {
// tried to return a promise so I can chain .then() to it?
// return new Promise((resolve) => {
// return(resolve(true))
// })
}
})
}
const channelDataUrl = `https://youtube.googleapis.com/youtube/v3/channels?part=snippet%2CcontentDetails%2Cstatistics&id=${channelId}&key=${apiKey}`
// promise for channel data
// get channel data then store it in variable (resData) that will eventually be sent as a response,
// contentDetails.relatedPlaylists.uploads is the playlist ID which will be used to get individual video data.
fetch(channelDataUrl)
.then(res => res.json())
.then(channelData => {
let {snippet, contentDetails, statistics } = channelData.items[0]
resData.snippet = snippet
resData.statistics = statistics
resData.uploadedVideos = contentDetails.relatedPlaylists.uploads
return resData.uploadedVideos
})
.then(uploadedVideosPlaylistId => {
// initial call to get first set of videos + first page token
let uploadedVideosUrl = `https://youtube.googleapis.com/youtube/v3/playlistItems?part=ContentDetails&playlistId=${uploadedVideosPlaylistId}&maxResults=50&key=${apiKey}`
fetch(uploadedVideosUrl)
.then(res => res.json())
.then(uploadedVideosPart => {
let {nextPageToken, items} = uploadedVideosPart
videoData.push(...items.map(v => v.contentDetails.videoId))
// idea is to do api calls until pageToken is non existent, and add the video id's to the existing array.
fetchWithToken(nextPageToken)
})
})
.then(() => {
// can't seem to get here synchronously - code in this block will happen before all the fetchWithToken's are complete - need to figure this out
})
Thanks to anyone who takes the time out to read this.
Edit:
After some trial and error, this seemed to work - it is a complete mess. The way I understand it is that this function now recursively creates promises that resolve to true only when there is no page token from the api response allowing me to return this function from a .then() and move on to a new .then() synchronously. I am still interested in better solutions, or just suggestions to make this code more readable as I don't think it's very good at all.
const fetchWithToken = (playlistId, nextPageToken) => {
let uploadedVideosUrlWithToken = `https://youtube.googleapis.com/youtube/v3/playlistItems?part=ContentDetails&playlistId=${playlistId}&pageToken=${nextPageToken}&maxResults=50&key=${apiKey}`
return new Promise((resolve) => {
resolve( new Promise((res) => {
fetch(uploadedVideosUrlWithToken)
.then(res => res.json())
.then(uploadedVideosTokenPart => {
let {items} = uploadedVideosTokenPart
videoData.push(...items.map(v => v.contentDetails.videoId))
pageToken = (uploadedVideosTokenPart.nextPageToken) ? uploadedVideosTokenPart.nextPageToken : ''
// tried to return a promise so I can chain .then() to it?
if (pageToken) {
res(fetchWithToken(playlistId, pageToken))
} else {
res(new Promise(r => r(true)))
}
})
}))
})
}
You would be much better off using async/await which are basically a wrapper for promises. Promise chaining, which is what you are doing with the nested thens, can get messy and confusing...
I converted your code to use async/await so hopefully this will help you see how to solve your problem. Good luck!
Your initial code:
let { body } = req
let resData = {}
let { channelId } = body
let videoData = []
let pageToken = ''
const fetchWithToken = async (nextPageToken) => {
const someData = (
await fetch(
`https://youtube.googleapis.com/youtube/v3/playlistItems?part=ContentDetails&playlistId=${uploadedVideosPlaylistId}&pageToken=${nextPageToken}&maxResults=50&key=${apiKey}`,
)
).json()
let { items } = someData
videoData.push(...items.map((v) => v.contentDetails.videoId))
pageToken = someData.nextPageToken ? someData.nextPageToken : ''
if (pageToken) {
await fetchWithToken(pageToken)
} else {
// You would need to work out
}
}
const MainMethod = async () => {
const channelData = (
await fetch(
`https://youtube.googleapis.com/youtube/v3/channels?part=snippet%2CcontentDetails%2Cstatistics&id=${channelId}&key=${apiKey}`,
)
).json()
let { snippet, contentDetails, statistics } = channelData.items[0]
resData.snippet = snippet
resData.statistics = statistics
resData.uploadedVideos = contentDetails.relatedPlaylists.uploads
const uploadedVideosPlaylistId = resData.uploadedVideos
const uploadedVideosPart = (
await fetch(
`https://youtube.googleapis.com/youtube/v3/playlistItems?part=ContentDetails&playlistId=${uploadedVideosPlaylistId}&maxResults=50&key=${apiKey}`,
)
).json()
let { nextPageToken, items } = uploadedVideosPart
videoData.push(...items.map((v) => v.contentDetails.videoId))
await fetchWithToken(nextPageToken)
}
MainMethod()
Your Edit:
const fetchWithToken = (playlistId, nextPageToken) => {
return new Promise((resolve) => {
resolve(
new Promise(async (res) => {
const uploadedVideosTokenPart = (
await fetch(
`https://youtube.googleapis.com/youtube/v3/playlistItems?part=ContentDetails&playlistId=${playlistId}&pageToken=${nextPageToken}&maxResults=50&key=${apiKey}`,
)
).json()
let { items } = uploadedVideosTokenPart
videoData.push(...items.map((v) => v.contentDetails.videoId))
pageToken = uploadedVideosTokenPart.nextPageToken
? uploadedVideosTokenPart.nextPageToken
: ''
if (pageToken) {
res(fetchWithToken(playlistId, pageToken))
} else {
res(new Promise((r) => r(true)))
}
}),
)
})
}

Pass query from Link to server, first time load query value undefined, after reload get correct query

I try to create some API to external adobe stock.
Like in the title, first time i get query from Link router of undefined, but after reload page it work correctly. My
main page
<Link
href={{
pathname: "/kategoria-zdjec",
query: images.zdjecia_kategoria
}}
as={`/kategoria-zdjec?temat=${images.zdjecia_kategoria}`}
className={classes.button}>
</Link>
and my server
app
.prepare()
.then(() => {
server.get("/kategoria-zdjec", async (req, res) => {
const temat = await req.query.temat;
console.log(temat)
const url = `https://stock.adobe.io/Rest/Media/1/Search/Files?locale=pl_PL&search_parameters[words]=${temat}&search_parameters[limit]=24&search_parameters[offset]=1`;
try {
const fetchData = await fetch(url, {
headers: { ... }
});
const objectAdobeStock = await fetchData.json();
res.json(objectAdobeStock);
const totalObj = await objectAdobeStock.nb_results;
const adobeImages = await objectAdobeStock.files;
} catch (error) {
console.log(error);
}
});
and that looks like getInitialProps on page next page
Zdjecia.getInitialProps = async ({req}) => {
const res = await fetch("/kategoria-zdjec");
const json = await res.json();
return { total: json.nb_results, images: json.files };
}
I think it is problem due asynchronous.
I think this might be due to the fact that you are using fetch which is actually part of the Web API and this action fails when executed on server.
You could either use isomorphic-fetch which keeps fetch API consistent between client and server, or use node-fetch when fetch is called on the server:
Zdjecia.getInitialProps = async ({ req, isServer }) => {
const fetch = isServer ? require('node-fetch') : window.fetch;
const res = await fetch("/kategoria-zdjec");
const json = await res.json();
return { total: json.nb_results, images: json.files };
}
This problem is solved, the issue was in another part of my app, directly in state management, just created new variables, and pass to link state value.

Resources