Extract specific portion of text from pdf using Javascript?

Extract specific portion of text from pdf using Javascript? - text

I need to do a modification. I am using this code that I found to extract all text in the pdf:
<!-- edit this; the PDF file must be on the same domain as this page -->
<iframe id="input" src="your-file.pdf"></iframe>
<!-- embed the pdftotext service as an iframe -->
<iframe id="processor" src="http://hubgit.github.com/2011/11/pdftotext/"></iframe>
<!-- a container for the output -->
<div id="output"></div>
<script>
var input = document.getElementById("input");
var processor = document.getElementById("processor");
var output = document.getElementById("output");
// listen for messages from the processor
window.addEventListener("message", function(event){
if (event.source != processor.contentWindow) return;
switch (event.data){
// "ready" = the processor is ready, so fetch the PDF file
case "ready":
var xhr = new XMLHttpRequest;
xhr.open('GET', input.getAttribute("src"), true);
xhr.responseType = "arraybuffer";
xhr.onload = function(event) {
processor.contentWindow.postMessage(this.response, "*");
};
xhr.send();
break;
// anything else = the processor has returned the text of the PDF
default:
output.textContent = event.data.replace(/\s+/g, " ");
break;
}
}, true);
</script>
The output is packed text without any paragraphs. All my pdfs have the word 'Datacover' somewhere in the beginning and follows a big paragraph.
All I want to do is to delete all the text from its begining until the first instance of the word 'Datacover' and also at the front of the word 'Datacover' to show all text until the third instance of '. ' <--(dot with space) and delete all the next text to the end.
Can you help? thanks!

You could match Datacover between word boundaries \b and repeat in a non greedy way 3 times matching any char including a newling [\s\S]*? until the next occurrence of a dot and space \.
\bDatacover\b(?:[\s\S]*?\. ){3}
Regex demo
To get the data, you could use
event.data.match(regex)
For example:
const regex = /\bDatacover\b(?:[\s\S]*?\. ){3}/g;
let event = {
data: `testhjgjhg hjg jhg jkgh kjhghjkg76t 76 tguygtf yr 6 rt6 gtyut 67 tuy yoty yutyu tyu yutyuit iyut iuytiyu tuiyt Datacover uytuy tuyt uyt uiytuiyt uytutest.
yu tuyt uyt uyt iutiuyt uiy
yuitui tuyt
test.
uiyt uiytuiyt
uyt ut ui
this is a test.
sjhdgfjsa.
hgwryuehrgfhrghw fsdfdfsfs sddsfdfs.`
};
console.log(event.data.match(regex));

Related

How to get char code of fontawesome icon?

I'd like to use fontawesome icons in SVG scope. I cannot achieve it in common way, but I can add <text> element containing corresponding UTF-8 char and with font set to fontawesome, like that:
<text style="font-family: FontAwesome;">\uf0ac</text>
To make it clear I wrote a switch for getting useful icons:
getFontAwesomeIcon(name) {
switch (name) {
case 'fa-globe':
return '\uf0ac'
case 'fa-lock':
return '\uf023'
case 'fa-users':
return '\uf0c0'
case 'fa-ellipsis-h':
return '\uf141'
default:
throw '# Wrong fontawesome icon name.'
}
}
But of course that's ugly, because I must write it myself im my code. How can I get these values just from fontawesome library?

You can avoid producing such a list and extract the information from the font-awesome stylesheet on the fly. Include the stylesheet and set the classes like usual, i. e.
<tspan class="fa fa-globe"></tspan>
and you can do the following:
var icons = document.querySelectorAll(".fa");
var stylesheet = Array.from(document.styleSheets).find(function (s) {
return s.href.endsWith("font-awesome.css");
});
var rules = Array.from(stylesheet.cssRules);
icons.forEach(function (icon) {
// extract the class name for the icon
var name = Array.from(icon.classList).find(function (c) {
return c.startsWith('fa-');
});
// get the ::before styles for that class
var style = rules.find(function (r) {
return r.selectorText && r.selectorText.endsWith(name + "::before");
}).style;
// insert the content into the element
// style.content returns '"\uf0ac"'
icon.textContent = style.content.substr(1,1);
});

My two answers for two approaches to the problem (both developed thanks to ccprog):
1. Setting char by class definition:
In that approach we can define element that way:
<text class="fa fa-globe"></text>
And next run that code:
var icons = document.querySelectorAll("text.fa");
// I want to modify only icons in SVG text elements
var stylesheets = Array.from(document.styleSheets);
// In my project FontAwesome styles are compiled with other file,
// so I search for rules in all CSS files
// Getting rules from stylesheets is slightly more complicated:
var rules = stylesheets.map(function(ss) {
return ss && ss.cssRules ? Array.from(ss.cssRules) : [];
})
rules = [].concat.apply([], rules);
// Rest the same:
icons.forEach(function (icon) {
var name = Array.from(icon.classList).find(function (c) {
return c.startsWith('fa-');
});
var style = rules.find(function (r) {
return r.selectorText && r.selectorText.endsWith(name + "::before");
}).style;
icon.textContent = style.content.substr(1,1);
});
But I had some problems with that approach, so I developed the second one.
2. Getting char with function:
const getFontAwesomeIconChar = (name) => {
var stylesheets = Array.from(document.styleSheets);
var rules = stylesheets.map(function(ss) {
return ss && ss.cssRules ? Array.from(ss.cssRules) : [];
})
rules = [].concat.apply([], rules);
var style = rules.find(function (r) {
return r.selectorText && r.selectorText.endsWith(name + "::before");
}).style;
return style.content.substr(1,1);
}
Having that funcion defined we can do something like this (example with React syntax):
<text>{getFontAwesomeIconChar('fa-globe')}</text>

How do I change a WAV to txt file of time vs amplitude?

I want to use the file input feature of LTSpice to simulate a circuit using a real world bit of audio. I need the data in a time vs amplitude version but not sure which software package can do this for me. Audacity can convert the MP3 to WAV but from what I see can't do it to a headerless text file.
So a .WAV file to a two column text file of time/amplitude.
Any ideas for a free way of doing it?

Here's a quick'n'nasty implementation using javascript.
I'll leave it as an exercise to convert the result string to a Blob which can be easily downloaded, as will I leave channel selection, stereo results and the ability to select a small portion of the audio track for processing.
<!doctype html>
<html>
<head>
<script>
"use strict";
function byId(id){return document.getElementById(id)}
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
window.addEventListener('load', onDocLoaded, false);
var audioCtx;
function onDocLoaded(evt)
{
audioCtx = new AudioContext();
byId('fileInput').addEventListener('change', onFileInputChangedGeneric, false);
}
function onFileInputChangedGeneric(evt)
{
// load file if chosen
if (this.files.length != 0)
{
var fileObj = this.files[0];
loadAndTabulateAudioFile( fileObj, byId('output') );
}
// clear output otherwise
else
byId('output').textContent = '';
}
// processes channel 0 only
//
// creates a string that represents a 2 column table, where each row contains the time and amplitude of a sample
// columns are tab seperated
function loadAndTabulateAudioFile( fileObj, tgtElement )
{
var a = new FileReader();
a.onload = loadedCallback;
a.readAsArrayBuffer( fileObj );
function loadedCallback(evt)
{
audioCtx.decodeAudioData(evt.target.result, onDataDecoded);
}
function onDataDecoded(buffer)
{
//console.log(buffer);
var leftChannel = buffer.getChannelData(0);
//var rightChannel = buffer.getChannelData(1);
console.log("# samples: " + buffer.length);
console.log(buffer);
var result = '';
var i, n = buffer.length, invSampleRate = 1.0 / buffer.sampleRate;
for (i=0; i<n; i++)
{
var curResult = (invSampleRate*i).toFixed(8) + "\t" + leftChannel[i] + "\n";
result += curResult;
}
tgtElement.textContent = result;
}
}
</script>
<style>
</style>
</head>
<body>
<label>Select audio file: <input type='file' id='fileInput'/></label>
<hr>
<pre id='output'></pre>
</body>
</html>

Open new window from quick launch in Sharepoint Online

<script type="text/javascript">
//add an entry to the _spBodyOnLoadFunctionNames array
//so that our function will run on the pageLoad event
_spBodyOnLoadFunctionNames.push("rewriteLinks");
function rewriteLinks() {
//create an array to store all the anchor elements in the page
var anchors = document.getElementsByTagName("a");
//loop through the array
for (var x=0; x<anchors.length; x++) {
//does this anchor element contain #openinnewwindow?
if (anchors[x].outerHTML.indexOf('#openinnewwindow')>0) {
//store the HTML for this anchor element
oldText = anchors[x].outerHTML;
//rewrite the URL to remove our test text and add a target instead
newText = oldText.replace(/#openinnewwindow/,'" target="_blank');
//write the HTML back to the browser
anchors[x].outerHTML = newText;
}
}
}
</script>
I have this code I put in the seattle.master file before Then in quick launch when I edit links I put #openinnewwindow after the website address. On "try link" this opens the website right. My problem is when I save it. And click the link it does not open in a new window. Any ideas why this might be happening?

I realized for this code to work that I needed Publishing enabled.

How to convert the RadEditor content to image

I have a radeditor,in that i have spans and a background image.I have some html input textboxes,on text changed of the textboxes,i'm binding the text to particular spans like:
<script>
function txtTitlechanged(x) {
var y = document.getElementById(x).value
var editor = $find("<%=RadEditor1.ClientID %>");
var oDocument = editor.get_document()
var img = oDocument.getElementById('span1');
if (y == '') {
img.innerHTML = 'UserName';
}
else {
img.innerHTML = y;
}
}
</script>
<input type='text' onchage="txtTitlechanged" />
Here i want that radeditor content as an image,i'm getting the radeditor content as html,but i want as it as an image.

You could search for a third party HTMLtoImage or XHTMLtoImage convertor and provide the generated content to it
OR
export the content to PDF using the built-in PDF exporting feature of RadEditor: http://demos.telerik.com/aspnet-ajax/editor/examples/pdfexport/defaultcs.aspx

TinyMCE Setting focus in text part

Consider the following HTML:
<div id="block-container">
<div id="some-background"></div>
<div id="text-div">Focus should be here when this HTML goes into the editor</div>
</div>
I want the caret be in the text-div -- more precisely in the first text element -- when it opens in the TinyMCE editor.
There could be a way to add some class like ".default-focused" to such element and set focus based on the class. Is there any other (generalized) way to achieve this?
The reason why I can't go with the ".default-focused" way:
1. It could be huge task to add class considering the amount of data I have and
2. More importantly, user can change the HTML and can remove the class.

Well, if you know in which element the caret is to be placed you may use this short function
// sets the cursor to the specified element, ed ist the editor instance
// start defines if the cursor is to be set at the start or at the end
setCursor: function (ed, element, start) {
var doc = ed.getDoc();
if (typeof doc.createRange != "undefined") {
var range = doc.createRange();
range.selectNodeContents(element);
range.collapse(start);
var win = doc.defaultView || doc.parentWindow;
var sel = win.getSelection();
sel.removeAllRanges();
sel.addRange(range);
} else if (typeof doc.body.createTextRange != "undefined") {
var textRange = doc.body.createTextRange();
textRange.moveToElementText(element);
textRange.collapse(start);
textRange.select();
}
},

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract specific portion of text from pdf using Javascript? - text

Related

How to get char code of fontawesome icon?

How do I change a WAV to txt file of time vs amplitude?

Open new window from quick launch in Sharepoint Online

How to convert the RadEditor content to image

TinyMCE Setting focus in text part

Categories

Resources