Separating an HTML Element String into Multiple Strings

Separating an HTML Element String into Multiple Strings - node.js

I am webscraping using puppeteer and I am trying to extract the innerText of this h4 element.
<h4 class="loss">
(NA)
<br>
<span class="team-name">TEAMNAME</span>
<br>
<span class="win spoiler-wrap">0</span>
</h4>
I am able to get this element using:
const teamName = await matches.$eval('h4', (h4) => h4.innerHTML);
This will set teamName to:
(NA)<br><span class="team-name">TEAMNAME</span><br><span class="win spoiler-wrap">0</span>
I am trying to get only the inner text of each element.
I can get the (NA) using const s = teamName.substr(0, teamName.indexOf('<'));
But I cannot seem to figure out how to get "TEAMNAME" or "0" out of this string. I have thoughts of using regex, but I am not sure how I would accomplish this.
PS the inner text will not always be the same so I can't look for specific words.

With regex, you can do it like this:
teamName.match(/<span class="team-name">(.*)<\/span>/)[1]
match returns an array, where the first element is the match of the whole regex, the second element is the match of the first regex group, the third element is the match of the second regex group (there is none in this case), etc.
The /.../ marks a regex which matches the first biggest match it can find. . in a regex is any character. * specifies that any number of occurrences of the character is matched, including 0 occurences. (...) is a regex group, which is used by match. \ is an escape character, because / is a special character to start and end a regex.
I very much recommend reading the Mozilla docs on match and on regexes for details. You will often find them useful.
However, in the case of puppeteer there probably also is a way of directly matching the selector h4 span, which would be more straightforward than using regexes. I don't know enough about puppeteer to tell you the exact way of doing that. :/

With a bit more thinking, I was able to solve my issue.
Here is a solution:
const teamName = await matches.$eval('h4', (h4) => h4.innerHTML);
const openSpanGT = teamName.indexOf('>', 20);
const closeSpanLT = teamName.indexOf('<', openSpanGT);
const teamTitle = teamName.substr(openSpanGT + 1, closeSpanLT - openSpanGT - 1);
console.log(teamTitle);
This will output "TEAMNAME" no matter how long the string is.

Related

Find and replace text and wrap in "href"

I am trying to find specific word in a div (id="Test") that starts with "a04" (no case). I can find and replace the words found. But I am unable to correctly use the word found in a "href" link.
I am trying the following working code that correctly identifies my search criteria. My current code is working as expected but I would like help as i do not know how to used the found work as the url id?
var test = document.getElementById("test").innerHTML
function replacetxt(){
var str_rep = document.getElementById("test").innerHTML.replace(/a04(\w)+/g,'TEST');
var temp = str_rep;
//alert(temp);
document.getElementById("test").innerHTML = temp;
}
I would like to wrap the found word in an href but i do not know how to use the found word as the url id (url.com?id=found word).
Can someone help point out how to reference the found work please?
Thanks

If you want to use your pattern with the capturing group, you could move the quantifier + inside the group or else you would only get the value of the last iteration.
\ba04(\w+)
\b word boundary to prevent the match being part of a longer word
a04 Match literally
(\w+) Capture group 1, match 1+ times a word character
Regex demo
Then you could use the first capturing group in the replacement by referring to it with $1
If the string is a04word, you would capture word in group 1.
Your code might look like:
function replacetxt(){
var elm = document.getElementById("test");
if (elm) {
elm.innerHTML = elm.innerHTML.replace(/\ba04(\w+)/g,'TEST');
}
}
replacetxt();
<div id="test">This is text a04word more text here</div>
Note that you don't have to create extra variables like var temp = str_rep;

Code about replacing certain words in discord.js

I was trying to make the bot replace multiple words in one sentence with another word.
ex: User will say "Today is a great day"
and the bot shall answer "Today is a bad night"
the words "great" and "day" were replaced by the words "bad" and "night" in this example.
I've been searching in order to find a similar code, but unfortunately all I could find is "word-blacklisting" scripts.
//I tried to do some coding with it but I am not an expert with node.js the code is written really badly. It's not even worth showing really.
The user will say some sentence and the bot will recognize some predetermined words on the sentence and will replace those words with other words I'll decide in the script

We can use String.replace() combined with Regular Expressions to match and replace single words of your choosing.
Consider this example:
function antonyms(string) {
return string
.replace(/(?<![A-Z])terrible(?![A-Z])/gi, 'great')
.replace(/(?<![A-Z])today(?![A-Z])/gi, 'tonight')
.replace(/(?<![A-Z])day(?![A-Z])/gi, 'night');
}
const original = 'Today is a tErRiBlE day.';
console.log(original);
const altered = antonyms(original);
console.log(altered);
const testStr = 'Daylight is approaching.'; // Note that this contains 'day' *within* a word.
const testRes = antonyms(testStr); // The lookarounds in the regex prevent replacement.
console.log(testRes); // If this isn't the desired behavior, you can remove them.

Find index of a specific character in a string then parse the string

I have strings which looks like this [NAME LASTNAME/NAME.LAST#emailaddress/123456678]. What I want to do is parse strings which have the same format as shown above so I only get NAME LASTNAME. My psuedo idea is find the index of the first instance of /, then strip from index 1 to that index of / we found. I want this as a VBScript.

Your way should work. You can also Split() your string on / and just grab the first element of the resulting array:
Const SOME_STRING = "John Doe/John.Doe#example.com/12345678"
WScript.Echo Split(SOME_STRING, "/")(0)
Output:
John Doe
Edit, with respect to comments.
If your string contains the [, you can still Split(). Just use Mid() to grab the first element starting at character position 2:
Const SOME_STRING = "[John Doe/John.Doe#example.com/12345678]"
WScript.Echo Mid(Split(SOME_STRING, "/")(0), 2)

Your idea is good here, you should also need to grab index for "[".This will make script robust and flexible here.Below code will always return strings placed between first occurrence of "[" and "/".
var = "[John Doe/John.Doe#example.com/12345678]"
WScript.Echo Mid(var, (InStr(var,"[")+1),InStr(var,"/")-InStr(var,"[")-1)

Reading from a string using sscanf in Matlab

I'm trying to read a string in a specific format
RealSociedad
this is one example of string and what I want to extract is the name of the team.
I've tried something like this,
houseteam = sscanf(str, '%s');
but it does not work, why?

You can use regexprep like you did in your post above to do this for you. Even though your post says to use sscanf and from the comments in your post, you'd like to see this done using regexprep. You would have to do this using two nested regexprep calls, and you can retrieve the team name (i.e. RealSociedad) like so, given that str is in the format that you have provided:
str = 'RealSociedad';
houseteam = regexprep(regexprep(str, '^<a(.*)">', ''), '</a>$', '')
This looks very intimidating, but let's break this up. First, look at this statement:
regexprep(str, '^<a(.*)">', '')
How regexprep works is you specify the string you want to analyze, the pattern you are searching for, then what you want to replace this pattern with. The pattern we are looking for is:
^<a(.*)">
This says you are looking for patterns where the beginning of the string starts with a a<. After this, the (.*)"> is performing a greedy evaluation. This is saying that we want to find the longest sequence of characters until we reach the characters of ">. As such, what the regular expression will match is the following string:
<ahref="/teams/spain/real-sociedad-de-futbol/2028/">
We then replace this with a blank string. As such, the output of the first regexprep call will be this:
RealSociedad</a>
We want to get rid of the </a> string, and so we would make another regexprep call where we look for the </a> at the end of the string, then replace this with the blank string yet again. The pattern you are looking for is thus:
</a>$
The dollar sign ($) symbolizes that this pattern should appear at the end of the string. If we find such a pattern, we will replace it with the blank string. Therefore, what we get in the end is:
RealSociedad

Found a solution. So, %s stops when it finds a space.
str = regexprep(str, '<', ' <');
str = regexprep(str, '>', '> ');
houseteam = sscanf(str, '%*s %s %*s');
This will create a space between my desired string.

Lua: Search a specific string

Hi all tried all the string pattrens and library arguments but still stuck.
i want to get the name of the director from the following string i have tried the string.matcH but it matches the from the first character it finD from the string
the string is...
fixstrdirector = {id:39254,cast:[{id:15250,name:Hope Davis,character:Aunt Debra,order:5,cast_id:10,profile_path:/aIHF11Ss8P0A8JUfiWf8OHPVhOs.jpg},{id:53650,name:Anthony Mackie,character:Finn,order:3,cast_id:11,profile_path:/5VGGJ0Co8SC94iiedWb2o3C36T.jpg},{id:19034,name:Evangeline Lilly,character:Bailey Tallet,order:2,cast_id:12,profile_path:/oAOpJKgKEdW49jXrjvUcPcEQJb3.jpg},{id:6968,name:Hugh Jackman,character:Charlie Kenton,order:0,cast_id:13,profile_path:/wnl7esRbP3paALKn4bCr0k8qaFu.jpg},{id:79072,name:Kevin Durand,character:Ricky,order:4,cast_id:14,profile_path:/c95tTUjx5T0D0ROqTcINojpH6nB.jpg},{id:234479,name:Dakota Goyo,character:Max Kenton,order:1,cast_id:15,profile_path:/7PU6n4fhDuFwuwcYVyRNVEZE7ct.jpg},{id:8986,name:James Rebhorn,character:Marvin,order:6,cast_id:16,profile_path:/ezETMv0YM0Rg6YhKpu4vHuIY37D.jpg},{id:930729,name:Marco Ruggeri,character:Cliff,order:7,cast_id:17,profile_path:/1Ox63ukTd2yfOf1LVJOMXwmeQjO.jpg},{id:19860,name:Karl Yune,character:Tak Mashido,order:8,cast_id:18,profile_path:/qK315vPObCNdywdRN66971FtFez.jpg},{id:111206,name:Olga Fonda,character:Farra Lemkova,order:9,cast_id:19,profile_path:/j1qabOHf3Pf82f1lFpUmdF5XvSp.jpg},{id:53176,name:John Gatins,character:Kingpin,order:10,cast_id:41,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:1126350,name:Sophie Levy,character:Big Sister,order:11,cast_id:42,profile_path:null},{id:1126351,name:Tess Levy,character:Little Sister,order:12,cast_id:43,profile_path:null},{id:1126352,name:Charlie Levy,character:Littlest Sister,order:13,cast_id:44,profile_path:null},{id:187983,name:Gregory Sims,character:Bill Panner,order:14,cast_id:45,profile_path:null}],crew:[{id:58726,name:Leslie Bohem,department:Writing,job:Screenplay,profile_path:null},{id:53176,name:John Gatins,department:Writing,job:Screenplay,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:17825,name:Shawn Levy,department:Directing,job:Director,profile_path:/7f2f8EXdlWsPYN0HPGcIlG21xU.jpg},{id:12415,name:Richard Matheson,department:Writing,job:Story,profile_path:null},{id:57113,name:Dan Gilroy,department:Writing,job:Story,profile_path:null},{id:25210,name:Jeremy Leven,department:Writing,job:Story,profile_path:null},{id:17825,name:Shawn Levy,department:Production,job:Producer,profile_path:/7f2f8EXdlWsPYN0HPGcIlG21xU.jpg},{id:34970,name:Susan Montford,department:Production,job:Producer,profile_path:/1XJt51Y9ciPhkHrAYE0j6Jsmgji.jpg},{id:3183,name:Don Murphy,department:Production,job:Producer,profile_path:null},{id:34967,name:Rick Benattar,department:Production,job:Producer,profile_path:null},{id:1126348,name:Eric Hedayat,department:Production,job:Producer,profile_path:null},{id:186721,name:Ron Ames,department:Production,job:Producer,profile_path:null},{id:10956,name:Josh McLaglen,department:Production,job:Executive Producer,profile_path:null},{id:57634,name:Mary McLaglen,department:Production,job:Executive Producer,profile_path:null},{id:23779,name:Jack Rapke,department:Production,job:Executive Producer,profile_path:null},{id:488,name:Steven Spielberg,department:Production,job:Executive Producer,profile_path:/cuIYdFbEe89PHpoiOS9tmo84ED2.jpg},{id:30,name:Steve Starkey,department:Production,job:Executive Producer,profile_path:null},{id:24,name:Robert Zemeckis,department:Production,job:Executive Producer,profile_path:/isCuZ9PWIOyXzdf3ihodXzjIumL.jpg},{id:531,name:Danny Elfman,department:Sound,job:Original Music Composer,profile_path:/pWacZpYPos8io22nEiim7d3wp2j.jpg},{id:18265,name:Mauro Fiore,department:Crew,job:Cinematography,profile_path:null},{id:54271,name:Dean Zimmerman,department:Editing,job:Editor,profile_path:null},{id:25365,name:Richard Hicks,department:Production,job:Casting,profile_path:null},{id:5490,name:David Rubin,department:Production,job:Casting,profile_path:null},{id:52088,name:Tom Meyer,department:Art,job:Production Design,profile_path:null}]}
i have tried string.match(fixstrdirector,"name:(.+),department:Directing")
but it gives me the from the first occurace it find the name to the end of thr string
output:
Hope Davis,character:Aunt Debra,order:5,cast_id:10,profile_path:/aIHF11Ss8P0A8JUfiWf8OHPVhOs.jpg},{id:53650,name:Anthony Mackie,character:Finn,order:3,cast_id:11,profile_path:/5VGGJ0Co8SC94iiedWb2o3C36T.jpg},{id:19034,name:Evangeline Lilly,character:Bailey Tallet,order:2,cast_id:12,profile_path:/oAOpJKgKEdW49jXrjvUcPcEQJb3.jpg},{id:6968,name:Hugh Jackman,character:Charlie Kenton,order:0,cast_id:13,profile_path:/wnl7esRbP3paALKn4bCr0k8qaFu.jpg},{id:79072,name:Kevin Durand,character:Ricky,order:4,cast_id:14,profile_path:/c95tTUjx5T0D0ROqTcINojpH6nB.jpg},{id:234479,name:Dakota Goyo,character:Max Kenton,order:1,cast_id:15,profile_path:/7PU6n4fhDuFwuwcYVyRNVEZE7ct.jpg},{id:8986,name:James Rebhorn,character:Marvin,order:6,cast_id:16,profile_path:/ezETMv0YM0Rg6YhKpu4vHuIY37D.jpg},{id:930729,name:Marco Ruggeri,character:Cliff,order:7,cast_id:17,profile_path:/1Ox63ukTd2yfOf1LVJOMXwmeQjO.jpg},{id:19860,name:Karl Yune,character:Tak Mashido,order:8,cast_id:18,profile_path:/qK315vPObCNdywdRN66971FtFez.jpg},{id:111206,name:Olga Fonda,character:Farra Lemkova,order:9,cast_id:19,profile_path:/j1qabOHf3Pf82f1lFpUmdF5XvSp.jpg},{id:53176,name:John Gatins,character:Kingpin,order:10,cast_id:41,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:1126350,name:Sophie Levy,character:Big Sister,order:11,cast_id:42,profile_path:null},{id:1126351,name:Tess Levy,character:Little Sister,order:12,cast_id:43,profile_path:null},{id:1126352,name:Charlie Levy,character:Littlest Sister,order:13,cast_id:44,profile_path:null},{id:187983,name:Gregory Sims,character:Bill Panner,order:14,cast_id:45,profile_path:null}],crew:[{id:58726,name:Leslie Bohem,department:Writing,job:Screenplay,profile_path:null},{id:53176,name:John Gatins,department:Writing,job:Screenplay,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:17825,name:Shawn Levy

You're searching from the first occurrence of "name:" until the "department:Directing" with everything in between.
Instead, you need to restrict what can be between the two strings. Here for example I'm saying that the characters that make up the name can only be alphanumeric or a space:
string.match(fixstrdirector,"name:([%w ]+),department:Directing")
Alternatively, given that there's a comma separating the parameters, a better approach would be to search for "name:" followed by any characters other than a comma, followed by "department:Directing":
string.match(fixstrdirector,"name:([^,]+),department:Directing")
Of course that wouldn't work if the name had a comma it in!

Lua patterns provides - modifier for tasks as you have above. As stated on PiL - Section 20.2:
The + modifier matches one or more characters of the original class.
It will always get the longest sequence that matches the pattern.
Like *, the modifier - also matches zero or more occurrences of
characters of the original class. However, instead of matching the
longest sequence, it matches the shortest one.
Next, when you are using . to match, it'll find any and all characters satisfying the pattern. Therefore, you'll get the result from first occurence of name until the ,department:Directing is found. Since you know that it is a JSON data, you can try to match for [^,]; that is, non-comma characters.
So, for your case try:
local tAllNames = {}
for sName in fixstrdirector:gmatch( "name:([^,]-),department:Directing" ) do
tAllNames[ #tAllNames + 1 ] = sName
end
and all your required names will be stored in the table tAllNames. An example of the above can be seen at codepad.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Separating an HTML Element String into Multiple Strings - node.js

Related

Find and replace text and wrap in "href"

Code about replacing certain words in discord.js

Find index of a specific character in a string then parse the string

Reading from a string using sscanf in Matlab

Lua: Search a specific string

Categories

Resources