I am trying to parse a CSV file using NodeJS.
so far I have tried these packages:
Fast CSV
YA-CSV
I would like to parse a CSV file into objects based on header. I have been able to accomplish this with fast-csv but I have "'" values in my CSV file that I would like to ignore. I cant seem to do this with fast-csv even though I try to use the
{escape:'"'}
I used ya-csv to try to get around this but no values are being written when I try:
var reader = csv.createCsvFileReader('YT5.csv', {columnsFromHeader:true, 'separator': ','});
var writer = new csv.CsvWriter(process.stdout);
reader.addListener('YT5', function(data){
writer.writeRecord(data);
});
I get no output, any help would be great.
Thanks.
Edit:
I want the output in this format....
{ 'Video ID': '---kAT_ejrw',
'Content Type': 'UGC',
Policy: 'monetize',
'Video Title': 'Battlefield 3 Multiplayer - TDM na KanaĆach Nouszahr (#3)',
'Video Duration (sec)': '1232',
Username: 'Indmusic',
Uploader: 'MrKacu13',
'Channel Display Name': 'MrKacu13',
'Channel ID': '9U6il2dwbKwE4SK3-qe35g',
'Claim Type': 'Audio',
'Claim Origin': 'Audio Match',
'Total Views': '11'
}
The header line is this.
Video ID,Content Type,Policy,Video Title,Video Duration (sec),Username,Uploader,Channel Display Name,Channel ID,Claim Type,Claim Origin,Total Views,Watch Page Views,Embedded Player Views,Channel Page Video Views,Live Views,Recorded Views,Ad-Enabled Views,Total Earnings,Gross YouTube-sold Revenue,Gross Partner-sold Revenue,Gross AdSense-sold Revenue,Estimated RPM,Net YouTube-sold Revenue,Net AdSense-sold Revenue,Multiple Claims?,Category,Asset ID,Channel,Custom ID,ISRC,GRid,UPC,Artist,Asset Title,Album,Label
All of these values will be filled in with some areas like title having single quotes.
This looks a mess so let me know if you need another format.
This was resolved by using ya-csv
In order to use this I had to do a little more research but I did not know to add another listener for the data that I wanted to read and that it was just called 'data'
var reader = csv.createCsvFileReader('YT5.csv', {columnsFromHeader:true, 'separator': ','});
var writer = new csv.CsvWriter(process.stdout);
reader.addListener('data', function(data){
do something with data
}
reader.addListener('end', function(){
console.log('thats it');
}
This read the file without any issues from the single quotes.
Related
I'm trying to pull historical quarterly revenue data from Yahoo Finance. Its default "Annual" webpage is updated by selecting "Quarterly" option button, but its URL is still same. It appears that "Quarterly" option only updates the screen behind. So I was not able to pull quarterly data from the URL, but annual data only. Tracking of its XHR request, I figured out its source header, by the following codes below. Now I face another challenge on how to extract quarterlyTotalRevenue from its complex object, because of my lack of understanding about object handling. What I finally want to get is an array, something like this.
How can I do that? Thank you for any help!
function test() {
var ticker = 'JPM';
var url = 'https://query1.finance.yahoo.com/ws/fundamentals-timeseries/v1/finance/timeseries/'
+ ticker + '?lang=en-US®ion=US&symbol=' + ticker + '&padTimeSeries=true&type=' + '%2cquarterlyTotalRevenue' +
'%2c&merge=false&period1=493590046&period2=1672016399&corsDomain=finance.yahoo.com';
var obj = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
var obj = JSON.parse(obj); //Convert strings to object
console.log(obj);
}
When your script is modified, how about the following modification?
Modified script:
var obj = ### // Please set your value.
var valeus = [["asOfDate", "raw"], ...obj.timeseries.result[0].quarterlyTotalRevenue.map(({ asOfDate, reportedValue: { raw } }) => [asOfDate, raw])];
var res = valeus[0].map((_, c) => valeus.map(r => r[c]));
// return res; // If you want to use this as a custom function, please use this.
var sheet = SpreadsheetApp.getActiveSheet();
sheet.getRange(1, 1, res.length, 2).setValues(res);
In this modification, the values are put to the active sheet. If you want to use this script as a custom function, please return the values like return res;.
Reference:
map()
I have some problems with the length in a byte of data get from a file. In my case, I use the readFileSync method to get data from a text file. But when I do something like the below code. It gives me 2 difference results.
let data = fs.readFileSync('size.txt');
console.log(data.length);
console.log(JSON.stringify(JSON.parse(data)).length);
Result in: 579859 (console log 1) and 409065 (console log 2)
So, I don't understand why the size is decreased after I parsed it to JSON and then I use the stringify method.
Thank you for any helping!
JSON.stringify will not restore the spaces like in the below example :
const obj = `{
"keyA": "obiwan kenobi",
"testB": "foo"
}`;
console.log(obj);
const obj2 = JSON.stringify(JSON.parse(obj));
console.log(obj.length, obj2.length);
console.log(obj2);
I am trying to make a lyric project using discord.js, cheerio and the website called genius.com.
I have successfully found a way to scrape the lyrics from the website, I am onto the part where I need to split it because discord has a max word limit of 2000.
I can check how many characters/words are in the overall lyrics by doing lyrics.length, I just need to find a way to split the string and send both, in the future I might implement richEmbeds to make it more stylish but for now I'm focusing on the basics.
var request = require('request');
var cheerio = require('cheerio');
/*
This is a project for my discord bot, the reason for the 2000 word limit is because
discords character limit is currently set to 2000, this means that i will have to add
a function to split the lyrics and send each part
*/
//Define the URL that we are going to be scraping the data from
var UR_L = "https://genius.com/Josh-a-and-jake-hill-not-afraid-of-dying-lyrics";
//send a request to the website and return the contents of the website
request(UR_L, function(err, resp, body) {
//load the website using cheerio
$ = cheerio.load(body);
//define lyrics as the selector to text form
var lyrics = $('p').text();
if (lyrics.length > "2000" && lyrics.length < "4000") {
} else if (lyrics.length > "4000" && lyrics.length < "6000") {
} else {
//send the lyrics as one message
}
})
You can find a live version running here on repl.it.
You don't need to use any fancy function, that function is already built in discord.js: you can attach some options to a message, and MessageOptions.split is what you're searching for. When you want to send the text, do it like this:
channel.send(lyrics, { split: true });
If lyrics.length is greater that the limit, discord.js will cut your messages and send them one after the other, making it seem like it's only one.
channel is the TextChannel you want to send the messages to.
Discord has a 2000 characters limit not a 2000 words limit.
One solution to your problem could be this:
// This will result in an array with strings of max 2000 length
const lyricsArr = lyrics.match(/.{1,2000}/g);
lyricsArr.forEach(chunk => sendMessage(chunk))
Given the async nature of sending messages, you might want to look into modules like p-iteration to ensure the chunks arrive in the correct order.
That being said, there exists APIs for getting lyrics of songs, which I would recommend instead of scraping. See apiseeds lyrics API as an example.
UPDATE
const lyrics = 'These are my lyrics';
const lyricsArr = lyrics.match(/.{1,8}/g);
console.log(lyricsArr); // [ 'These ar', 'e my lyr', 'ics' ]
lyricsArr.forEach((chunk, i) => {
// Break if this is the last chunk.
if (i == lyricsArr.length -1) {
return;
}
// If last character is not a space, we split a word in two.
// Add additional non-wordbreaking symbols between the slashes (in the regex) if needed.
if (!chunk[chunk.length - 1].match(/[ ,.!]/)) {
const lastWord = chunk.match(/\s([^ .]+)$/)
lyricsArr[i + 1] = lastWord[1] + lyricsArr[i + 1];
lyricsArr[i] = lyricsArr[i].split(/\s[^ .]*$/)[0];
}
})
console.log(lyricsArr) // [ 'These', 'are my', 'lyrics' ]
Updated as per the comments.
This is some crude code that i did not spend much time on, but it does the job.
Some info when using this approach:
You need to add any symbols that should not be considered wordbreaking to the regex in the second if
This has not been tested thoroughly, so use at your own risk.
It will definitely break if you have a word in the lyrics longer than the chunk size. Since this is around 2000, I imagine it will not be problem.
This will no longer ensure that the chunk length is below the limit, so change the limit to around 1900 to be safe
You can use .split( ) Javascript function.
word_list = lyrics.split(" ")
And word_list.length to access the number of words in your message and word_list[0] to select the first word for instance.
I am trying to split a CSV into new CSV files based on the value of an attribute field. After testing a few different modules it looks like fast-csv is a good choice. However, I need some help on how split the files by the attribute.
I initially thought about doing a transform:
.transform(function(data){
if (data.AttributeValue == '10') {
return {
data
};
} else if (data.AttributeValue == '5') {
return {
data
};
} else {
data
})
})
Or I could use the validate:
.validate(function(data){
return data.AttributeValue == '10';
})
What I need help with is which one to use and how to send the row of data to different writeStreams.
.pipe(fs.createWriteStream("10.csv", {encoding: "utf8"}));
.pipe(fs.createWriteStream("5.csv", {encoding: "utf8"}));
.pipe(fs.createWriteStream("Other.csv", {encoding: "utf8"}));
I have done this in Python but trying to migrate this to Node is proving trickier than I thought.
Thanks
You can use something like this, to send the data from your validation test to another function.
.validate(function(data){
if(data.attribute = '10'){
ten(data);
} else {
notTen(data);
}
});
Then you can simply use a CSV writer to write out to CSV, for each CSV you'll need to open a csv writer:
var csvStream = csv.createWriteStream({headers: true}),
writableStream = fs.createWriteStream("not10.csv");
writableStream.on("finish", function(){
console.log("DONE!");
});
csvStream.pipe(writableStream);
function writeToCsv(data){
csvStream.write(data);
}
Repeat for each CSV you need to write to.
Might not be the best way to do it, but I am fairly new to this, and it seems to work.
I want to be able to do the reverse of this question/answer: What are the exact steps to read a csv file into an multidimensional array in Node.js?
First get ya-csv:
npm install ya-csv
Now you can do the following:
var out = require('ya-csv');
var writer = writer.createCsvFileWriter('data.csv', {
'quote': ''
});
people.forEach(function(item) {
writer.writeRecord(item);
});
If your MD array looked like this:
[Smith][Sally][42]
[Jones][Don][55]
Then your .csv file will look like this:
Smith,Sally,42
Jones,Don,55