I have a simple chrome extension, and I'm trying to do some page analysis through the content.js. this is the code:
console.log("content.js running.."); //debug
var fromDOM = new XMLSerializer().serializeToString(document);
console.log(fromDOM)
var i = 0;
var item;
for (item in fromDOM) {
var x = fromDOM[item];
if (x == "/"){
i++;
console.log(i);
chrome.runtime.sendMessage({lala: i});
}
}
This code searches for any occurence of "/" in the page and sends a message to a background script (that currently does nothing).
This for loop alone causes any tab I load to load slower than usual.. affecting user performance.
What am I doing wrong here? I can't do heavy lifting on content.js scripts? or is there a better way I'm missing.
Assuming you want to process the current HTML of the page:
Use document.documentElement.innerHTML
Use string methods like indexOf to get the position of each / without enumerating the long HTML string character by character.
Accumulate all the positions in an array and send it in one message since sending of a message is an expensive operation that involves internal JSON.stringify+JSON.parse.
Don't use console.log when devtools is open as it does a lot of extra processing to format the messages. And generally prefer debugging interactively - there's a panel in devtools to inspect and set breakpoints in content scripts so you can debug the code, view the variables, and so on.
const html = document.documentElement.innerHTML;
const slashes = [];
let pos = -1;
do {
pos = html.indexOf('/', pos + 1);
if (pos >= 0) {
slashes.push(pos);
}
} while (pos >= 0);
chrome.runtime.sendMessage({lala: slashes});
Now your background listener will receive an array of character positions - not really useful per se, but that's just an example. You can put more info inside the array to make it more meaningful.
Related
I've been trying to figure this out for the past day or two with minimal results. Essentially what I want to do is send my selected comps in After Effects to Adobe Media Encoder via script, and using information about them (substrings of their comp name, width, etc - all of which I already have known and figured out), and specify the appropriate AME preset based on the conditions met. The current two methods that I've found won't work for what I'm trying to do:
https://www.youtube.com/watch?v=K8_KWS3Gs80
https://blogs.adobe.com/creativecloud/new-changed-after-effects-cc-2014/?segment=dva
Both of these options more or less rely on the output module/render queue, (with the first option allowing sending it to AME without specifying preset) which, at least to my knowledge, won't allow h.264 file-types anymore (unless you can somehow trick render queue with a created set of settings prior to pushing queue to AME?).
Another option that I've found involves using BridgeTalk to bypass the output module/render queue and go directly to AME...BUT, that primarily involves specifying a file (rather than the currently selected comps), and requires ONLY having a single comp (to be rendered) at the root level of the project: https://community.adobe.com/t5/after-effects/app-project-renderqueue-queueiname-true/td-p/10551189?page=1
Now as far as code goes, here's the relevant, non-working portion of code:
function render_comps(){
var mySelectedItems = [];
for (var i = 1; i <= app.project.numItems; i++){
if (app.project.item(i).selected)
mySelectedItems[mySelectedItems.length] = app.project.item(i);
}
for (var i = 0; i < mySelectedItems.length; i++){
var mySelection = mySelectedItems[i];
//~ front = app.getFrontend();
//~ front.addItemToBatch(mySelection);
//~ enc = eHost.createEncoderForFormat("H.264");
//~ flag = enc.loadPreset("HD 1080i 25");
//app.getFrontend().addItemToBatch(mySelection);
var bt = new BridgeTalk();
bt.appName = "ame";
bt.target = "ame";
//var message = "alert('Hello')";
//bt.body = message;
bt.body="app.getFrontend().addCompToBatch(mySelection)";
bt.send();
}
}
Which encapsulates a number of different attempts and things that I've tried.
I've spent about 4-5 hours trying to scour the internet and various resources but so far have come up short. Thanks in advance for the help!
Im using pdfkit to generate pdf invoice.
When all my content fit in one page I have no issue.
However when it doesn't fit and need an extra page, I have a strange behaviour:
Instead of adding the elements in the second page, it only add one line and the rest of the page is blank.
Then on 3rd page I have another element, and the rest it blank, then 4th page, 5th etc.
Here is the code corresponding to this part:
for (let i = 0; i < data.items.length; i++) {
const item = data.items[i];
this.itemPositionY = this.itemPositionY + 20;
if (item.bio) this.containBioProduct = true;
let itemName = item.bio ? `${item.item}*` : item.item;
this.generateTableRow(
doc,
this.itemPositionY,
itemName,
"",
this.formatCurrency(item.itemPriceDf.toFixed(2)),
item.quantity,
this.formatCurrency(item.itemPriceTotalDf.toFixed(2))
);
this.generateHr(doc, this.itemPositionY + 15);
}
Basically I just iterate over an array of products. For each line my Y position has +20.
Thanks for your help.
In case someone has this issue, here is a solution:
Everywhere in the code I know that an extra page could be generated, I add this:
if (this.position > 680) {
doc.addPage();
this.position = 50;
}
It allows you to control the generation of new pages (instead of pdfkit doing it automatically with potential problems)
You just need to track the position from the initialization of "this.position".
In that way, evertime it's superior than an Y position (680 in my case, it's a bit less than a page with pdfkit), you just do "doc.addPage()", which will create another page, and you reinitialize your position to the beginning of the new page.
First of all, I want to let you guys know that I know the basic work logic of how ElasticSearch Scroll API works. To use Scroll API, first, we need to call search method with some scroll value like 1m, then it will return a _scroll_id that will be used for the next consecutive calls on Scroll until all of the doc returns within loop. But the problem is I just want to use the same process on multi-thread basis, not on serially. For example:
If I have 300000 documents, then I want to process/get the docs this way
The 1st thread will process initial 100000 documents
The 2nd thread will process next 100000 documents
The 3rd thread will process remaining 100000 documents
So my question is as I didn't find any way to set the from value on scroll API how can I make the scrolling process faster with threading. Not to process the documents in a serialized manner.
My sample python code
if index_name is not None and doc_type is not None and body is not None:
es = init_es()
page = es.search(index_name,doc_type, scroll = '30s',size = 10, body = body)
sid = page['_scroll_id']
scroll_size = page['hits']['total']
# Start scrolling
while (scroll_size > 0):
print("Scrolling...")
page = es.scroll(scroll_id=sid, scroll='30s')
# Update the scroll ID
sid = page['_scroll_id']
print("scroll id: " + sid)
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
print("scroll size: " + str(scroll_size))
print("scrolled data :" )
print(page['aggregations'])
Have you tried a sliced scroll? According to the linked docs:
For scroll queries that return a lot of documents it is possible to
split the scroll in multiple slices which can be consumed
independently.
and
Each scroll is independent and can be processed in parallel like any
scroll request.
I have not used this myself (the largest result set I need to process is ~50k documents) but this seems to be what you're looking for.
You should used sliced scroll for that, see https://github.com/elastic/elasticsearch-dsl-py/issues/817#issuecomment-372271460 on how to do it in python.
I met the same problem as yours, but the doc size is 1.4 million. I've had to use concurrency method and use 10 threads for data writting.
I wrote the code with Java thread pool, and you can find the similar way in Python.
public class ControllerRunnable implements Runnable {
private String i_res;
private String i_scroll_id;
private int i_index;
private JSONArray i_hits;
private JSONObject i_result;
ControllerRunnable(int index_copy, String _scroll_id_copy) {
i_index = index_copy;
i_scroll_id = _scroll_id_copy;
}
#Override
public void run(){
try {
s_logger.debug("index:{}", i_index );
String nexturl = m_scrollUrl.replace("--", i_scroll_id);
s_logger.debug("nexturl:{}", nexturl);
i_res = get(nexturl);
s_logger.debug("i_res:{}", i_res);
i_result = JSONObject.parseObject(i_res);
if (i_result == null) {
s_logger.info("controller thread parsed result object NULL, res:{}", i_res);
s_counter++;
return;
}
i_scroll_id = (String) i_result.get("_scroll_id");
i_hits = i_result.getJSONObject("hits").getJSONArray("hits");
s_logger.debug("hits content:{}\n", i_hits.toString());
s_logger.info("hits_size:{}", i_hits.size());
if (i_hits.size() > 0) {
int per_thread_data_num = i_hits.size() / s_threadnumber;
for (int i = 0; i < s_threadnumber; i++) {
Runnable worker = new DataRunnable(i * per_thread_data_num,
(i + 1) * per_thread_data_num);
m_executor.execute(worker);
}
// Wait until all threads are finish
m_executor.awaitTermination(1, TimeUnit.SECONDS);
} else {
s_counter++;
return;
}
} catch (Exception e) {
s_logger.error(e.getMessage(),e);
}
}
}
scroll must be synchronous, this is the logic.
You can use multi thread, this is exactly why elasticsearch is good for: parallelism.
An elasticsearch index, is composed of shards, this is the physical storage of your data. Shards can be on the same node or not (better).
Another side, the search API offers a very nice option: _preference(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html)
So back to your app:
Get the list of index shards (and nodes)
Create a thread by shard
Do the scroll search on each thread
Et voilĂ !
Also, you could use the elasticsearch4hadoop plugin, which do exactly that for Spark / PIG / map-reduce / Hive.
How can I grab all the text in a website, and I don't just mean ctrl+a/c. I'd like to be able to extract all the text from a website (and all the pages associated) and use it to build a concordance of words from that site. Any ideas?
I was intrigued by this so I've written the first part of a solution to this.
The code is written in PHP because of the convenient strip_tags function. It's also rough and procedural but I feel in demonstrates my ideas.
<?php
$url = "http://www.stackoverflow.com";
//To use this you'll need to get a key for the Readabilty Parser API http://readability.com/developers/api/parser
$token = "";
//I make a HTTP GET request to the readabilty API and then decode the returned JSON
$parserResponse = json_decode(file_get_contents("http://www.readability.com/api/content/v1/parser?url=$url&token=$token"));
//I'm only interested in the content string in the json object
$content = $parserResponse->content;
//I strip the HTML tags for the article content
$wordsOnPage = strip_tags($content);
$wordCounter = array();
$wordSplit = explode(" ", $wordsOnPage);
//I then loop through each word in the article keeping count of how many times I've seen the word
foreach($wordSplit as $word)
{
incrementWordCounter($word);
}
//Then I sort the array so the most frequent words are at the end
asort($wordCounter);
//And dump the array
var_dump($wordCounter);
function incrementWordCounter($word)
{
global $wordCounter;
if(isset($wordCounter[$word]))
{
$wordCounter[$word] = $wordCounter[$word] + 1;
}
else
{
$wordCounter[$word] = 1;
}
}
?>
I needed to do this to configure PHP for the SSL the readability API uses.
The next step in the solution would be too search for links in the page and call this recursively in an intelligent way to hance the associated pages requirement.
Also the code above just gives the raw data of a word-count you would want to process it some more to make it meaningful.
I am making a page which pulls from the user's browser their preferred language, via the Request.UserLanguages....which returns a two letter code (ex. "en") or detailed code (ex. "en-GB") .
I basically get the string of user languages (they are in order of preference) and store them in a string array. Then I use a loop to check if the language code in the first position of the string array is any of the codes for a certain language (another string array hard coded in).
Is there a better way to do this? I'm noticing increased load time and am worried additional languages will further slow the page load...
if (!IsPostBack)
{ //Holds possible user languages preferences to check client machine against
String[] compJapaneseLang = { "ja-jp","ja","jp","jpn","euc","shift-jis" };
}
//Get client machines langugage preferences
String[] userLang = Request.UserLanguages;
//Loop through variation of preferences from possible user langugaes
for (int i = 0; i < compJapaneseLang.Length; i++)
{
//IF JAPANESE
if (userLang.GetValue(0).ToString().ToLowerInvariant().Equals(compJapaneseLang.GetValue(i).ToString().ToLowerInvariant()))
cc.JapeneseObject();
}
Thanks!
Storing them in a list turned out best, not really much else one can do....