scrape hacker news via x-ray/node - node.js

How can i scrape hacker news (https://news.ycombinator.com/) via x-ray/nodejs?
I would like to get something like this out of it:
[
{title1, comment1},
{title2, comment2},
...
{"‘Minimal’ cell raises stakes in race to harness synthetic life", 48}
...
{title 30, comment 30}
]
There is a news table but i dont know how to scrape it...
Each of the stories on the website consists of three columns. These do not have a parent that is unique to them. So the structure looks like this
<tbody>
<tr class="spacer"> //Markup 1
<tr class="athing"> //Headline 1 ('.deadmark+ a' contains title)
<tr class> //Meta Information 1 (.age+ a contains comments)
<tr class="spacer"> //Markup 2
<tr class="athing"> //Headline 2 ('.deadmark+ a' contains title)
<tr class> //Meta Information 2 (.age+ a contains comments)
...
<tr class="spacer"> //Markup 30
<tr class="athing"> //Headline 30 ('.deadmark+ a' contains title)
<tr class> //Meta Information 30 (.age+ a contains comments)
So far i have tried:
x("https://news.ycombinator.com/", "tr", [{
title: [".deadmark+ a"],
comments: ".age+ a"
}])
and
x("https://news.ycombinator.com/", {
title: [".deadmark+ a"],
comments: [".age+ a"]
})
The 2nd approach returns 30 names and 29 comment-couts... I do not see any possibility to map them together as there is no information which of the 30 title's is missing a comment...
Any help appriciated

The markup is not easy to scrape with X-ray package since there is no way to reference the current context in a CSS selector. This would be useful to get the next tr sibling after the tr.thing row to get the comments.
We can still use the "next sibling" notation (the +) to get to the next row, but, instead of targeting the optional comments link, we'll grab the complete row text and then extract the comments value with regular expressions. If no comments present, setting the value to 0.
Complete working code:
var Xray = require('x-ray');
var x = Xray();
x("https://news.ycombinator.com/", {
title: ["tr.athing .deadmark+ a"],
comments: ["tr.athing + tr"]
})(function (err, obj) {
// extracting comments and mapping into an array of objects
var result = obj.comments.map(function (elm, index) {
var match = elm.match(/(\d+) comments?/);
return {
title: obj.title[index],
comments: match ? match[1]: "0"
};
});
console.log(result);
});
Currently prints:
[ { title: 'Follow the money: what Apple vs. the FBI is really about',
comments: '85' },
{ title: 'Unable to open links in Safari, Mail or Messages on iOS 9.3',
comments: '12' },
{ title: 'Gogs – Go Git Service', comments: '13' },
{ title: 'Ubuntu Tablet now available for pre-order',
comments: '56' },
...
{ title: 'American Tech Giants Face Fight in Europe Over Encrypted Data',
comments: '7' },
{ title: 'Moving Beyond the OOP Obsession', comments: '34' } ]

Related

Iterate through an array to send in Sendgrid tranactional email template?

I'm trying to iterate through an array to send a list of products that a customer as ordered. I have the handlebars template displaying, but my {{#each}} function isn't working. I'm using Nodejs and expressing and receiving data from my React frontend via a POST request.
Data is returned like this:
{
order_total: 1200,
customer_email: roger#mail.com,
customer_first_name: Roger,
customer_last_name: Smith,
customer_address: 1024 Cherry Street,
customer_city: Langley Falls,
customer_state: Virginia,
customer_zip: 11111,
customer_phone: 123456789,
status: in progress,
order_items: [
{
product_id: 12,
title: Slay bed,
image_url: url,
price: 400,
quantity: 1,
color_id: 12,
image_id: 7
},
{
product_id: 13,
title: dresser,
image_url: url
price: 800,
quantity: 1,
color_id: 12,
image_id: 7
}
]
}
I store the order_items in a variable const order_items = req.body.order_items
My msg object looks like this:
const msg = {
to: [order.customer_email],
bcc: 'test#test.com',
from: 'test#test.com',
subject: `Test`,
html:
`<head>
<title></title>
</head>
<body>
<div data-role="module-unsubscribe" class="module" role="module" data-type="unsubscribe" style="color:#444444; font-size:12px; line-height:20px; padding:16px 16px 16px 16px; text-align:Left;" data-muid="4e838cf3-9892-4a6d-94d6-170e474d21e5">
<p>This is a conformation that your order has been processed.
You will receive a delivery time within the next 24 hours. Thanks for being a valued customer. Please see order below. If any part of this order is incorrect please reach out to us at test#mail.com</p>
<h3 style="margin-top:4rem;">Order #${addedOrder}</h3>
<p style="font-size:12px; line-height:20px;"> Name on order: ${order.customer_first_name} ${order.customer_last_name}</p>
<p style="text-transform:capitalize">Address: ${order.customer_address} ${order.customer_city} ${order.customer_state}, ${order.customer_zip}</p>
<p>Phone Number: ${order.customer_phone}</p>
<p>Order Status: ${order.status}</p>
{{#each order_items}
<div>
<img style="width:100%;" src="{{this.image_url}}"/>
</div>
<p style="font-size:1.3rem" >{{this.title}}</p>
<p style="font-size:1.5rem">{{this.price}}</p>
<p>Total: $800</p>
{{/each}}
</body>`
};
sgMail.send(msg).then(() => {
console.log('Message sent', msg)
}).catch((error) => {
console.log(error.response.body)
console.log(error.response.body.errors[0].message)
})
It seems you forgot a bracket in this line :
{{#each order_items}
Should definitely be :
{{#each order_items}}
Is this the same in your code ?
Also (it's unlikely but still), there might be a rule that forbid you to have many html tags inside a loop ? Check if you have to wrap them in a single div (again I don't remember anything like this ...)

NodeJS Express & Mongoose: How to get data from array not in one but in multi table columns

I have a list in mongodb that I use in another page to put numbers within array in each of that list collection. For example:
I have input list this where all, valid and invalid inputs are normal but first, second and third list are from loop so it's dynamic and I put in models to to be in array:
allVotes: {
type: Number,
required: true
},
validVotes: {
type: Number,
required: true
},
invalidVotes: {
type: Number,
required: true
},
partyVotes: {
type: [Number], // this goes as array
required: true
}
so in my mongodb compass it's look like this:
and now, when I want to make table of this content I want to data from this array shows in separate column as list from loop look like, not in one column but that what it shows me:
I presume that maybe instead of array it should be Schema.Types.ObjectId and to be refer to that list, I don't know.
This is code where I get all that with array:
router.get('/', (req, res)=>{
StateResult.count().then(stateResultListCount=>{
res.render('admin/stateResultsLists/index', {
stateResultsLists: stateResultsLists,
});
});
});
});
and in view this is tbody of table:
<tbody>
{{#each stateResultsLists}}
<tr>
<td>{{allVotes}}</td>
<td>{{validVotes}}</td>
<td>{{invalidVotes}}</td>
<td>{{partyVotes}}</td>
{{/each}}
</tbody>
So, the simple question is how to get 5 to be in first list, 2 in second and 2 in third list?
I think the solution you are looking for involves iterating over your partyVotes array. You need to have a similar conditional to iterate of your <th> elements to be the length of the partyVotes array, but I'll assume you will handle that elsewhere:
<tbody>
{{#each stateResultsLists}}
<tr>
<td>{{allVotes}}</td>
<td>{{validVotes}}</td>
<td>{{invalidVotes}}</td>
{{#each partyVotes}}
<td>{{this}}</td>
{{/each}}
</tr>
{{/each}}
</tbody>

How to filter items inside “ngFor” loop, based on object property string

I need to filter items inside an ngFor loop, by changing the category in a drop-down list. Therefore, when a particular category is selected from the list, it should only list the items containing that same category.
HTML Template:
<select>
<option *ngFor="let model of models">{{model.category}}</option>
</select>
<ul class="models">
<li *ngFor="let model of models" (click)="gotoDetail(model)">
<img [src]="model.image"/>
{{model.name}},{{model.category}}
</li>
</ul>
Items Array:
export var MODELS: Model[] = [
{ id: 1,
name: 'Model 1',
image: 'img1',
category: 'Cat1',
},
{ id: 2,
name: 'Model 2',
image: 'img2',
category: 'Cat3',
},
{ id: 3,
name: 'Model 3',
image: 'img3',
category: 'Cat1',
},
{ id: 4,
name: 'Model 4',
image: 'img4',
category: 'Cat4',
},
...
];
Also, the drop-down list contains repeated category names. It is necessary for it to list only unique categories (strings).
I know that creating a custom pipe would be the right way to do this, but I don't know how to write one.
Plunker: http://plnkr.co/edit/tpl:2GZg5pLaPWKrsD2JRted?p=preview
Here is a sample pipe:
import { Pipe, PipeTransform } from '#angular/core';
#Pipe({
name: 'matchesCategory'
})
export class MathcesCategoryPipe implements PipeTransform {
transform(items: Array<any>, category: string): Array<any> {
return items.filter(item => item.category === category);
}
}
To use it:
<li *ngFor="let model; of models | matchesCategory:model.category" (click)="gotoDetail(model)">
===== for the plunkr example ====
You need your select changes to reflect in some variable
First define in your class a member:
selectedCategory: string;
then update your template:
<select (change)="selectedCategory = $event.target.value">
<option *ngFor="let model of models ">{{model.category}}</option>
</select>
last, use the pipe:
<li *ngFor="let model; of models | matchesCategory:selectedCategory" (click)="gotoDetail(model)">
==== comments after seeing the plunker ====
I noticed you used promise. Angular2 is more rxjs oriented. So the first thing I'd change is in your service, replace:
getModels(): Promise<Model[]> {
return Promise.resolve(MODELS);
}
to:
getModels(): Observable<Array<Model>> {
return Promise.resolve(MODELS);
}
and
getModels(id: number): Observable<Model> {
return getModels().map(models => models.find(model.id === id);
}
then in your ModelsComponent
models$: Observable<Array<Model>> = svc.getModels();
uniqueCategories$: Observable<Array<Model>> = this.models$
.map(models => models.map(model => model.category)
.map(categories => Array.from(new Set(categories)));
Your options will become:
<option *ngFor="let category; of uniqueCategories$ | async">{{model.category}}</option>
and your list:
<li *ngFor="let model; of models$ | async | matchesCategory:selectedCategory" (click)="gotoDetail(model)">
This is a very drafty solution since you have many duplicates and you keep querying the service. Take this as a starting point and query the service only once, then derive specific values from the result you got.
If you'd like to keep you code, just implement a UniqueValuesPipe, its transform will get a single parameter and filter it to return unique categories using the Array.from(new Set(...)). You will need though to map it to strings (categories) first.

scraping items with x-ray that don't have a single root

I'm running into trouble with scraping items that don't have a single root. Something that is necessary I believe with x-ray
Consider scraping hacker news where each headline is made up of two TRs:
<tbody>
<tr class="athing>content item 1</tr>
<tr>content item 1</tr>
<tr class="spacer></tr>
<tr class="athing>content item 2</tr>
<tr>content item 2</tr>
<tr class="spacer></tr>
</tbody>
As can be seen, there's no common root-node per item.
Does x-ray support scraping in such a case?
you could use + to select sibling
x(html, 'tbody ',
['tr.athing, tr.athing+tr:not(.athing):not(.spacer)']
)
(function (err, res) {
console.log(res)
})
result:
[ 'content item 1a',
'content item 1b',
'content item 2a',
'content item 2b' ]

adding ngTable module, data content is empty

I integrated ng-table to my jhipster project, with
bower install ng-table
All is ok. I added to index.html the module, I added to the app.js as module dependency.
I test the school case at the official website, helping with some basic example found on internet.
But nothing to do, the content of the table is ALWAYS empty, even with hard coded data content. My simplest example :
<div class="table-responsive">
<table ng-table="tableParams" class="table" show-filter="true">
<tr ng-repeat="myvar in $data">
<td title="'Myvar'" >
{{myvar}}</td>
<td title="'tableParams'" >
{{tableParams}}</td>
</tr>
</table>
</div>
In the controller :
console.log("test log1");
var data = [{name: "Moroni", age: 50} /*,*/];
$scope.tableParams = new NgTableParams(
{
},
{
total: 0, // length of data
counts: [], // hide page counts control
getData: function($defer, params)
{
console.log("test log3");
data = [{name: "Moroni", age: 50} /*,*/];
$defer.resolve(data.content);
},
data: [{name: "Moroni", age: 50} /*,*/],
dataset: [{name: "Moroni", age: 50} /*,*/]
}
);
Logs display well : "test log3" is displayed ! But the 'tableParams' or 'mywar' are always empty.
Need your help :(.

Resources