Getting information of a pdf - string

I have run into a little problem. Basicly i want to exstract from String-data off a pdf file.
More specifik this pdf file
http://www.midttrafik.dk/koereplaner/bybusser/aarhus/bybusser-aarhus/18-mejlbyelev-park-all%C3%A9-skaade-moesgaard/koereplan
So, my problem lays in not knowing, how to get the names, and the times(the pdf is times and locations of bus-stops, street names on the left kolon, and bus ariving times is the rest). the info i want to save is the number befor the street name (1-4), the street name, and all of the times.
translate of some of the stuff on the pdf.
Faste minuttal - just means that bus times is the same for the intival under 'Faste
6.56 - 8.11 - this means that, in this intival followes the under.
so
the buss will stop at 'Elev Skole, Høvej' 56, 11, 26, 41 meaning 6.56, 7.11, 7.26, 7.41, 7.56, 8.11.
I dont think i can desribe my problem any better, so i hope one of you guys will be able to help. i dont need a ready code, just send me in the rigth direaction - tell me what i can do, that migth help, or good patterns to use.
Thanks

You can use the nice PDFBox Library from here to extract the text you want from this pdf file. It works really nice, i used it in one of my last projects to index pfd files for a full text search.
Here is the URL to the project:
http://pdfbox.apache.org/index.html
There you'll find also the documentation and some examples how to extract text from pdf's.
Sample Code:
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.util.*;
public class LittleExample {
public static void main(String[] args){
PDDocument pd;
BufferedWriter wr;
try {
// this is your pdf from which you would like to extract the text
File input = new File("/home/ottp/pdffiles/1.pdf");
// this is the target file to store the extracted text
File output = new File("/home/ottp/pdffiles/extracts/1.txt");
pd = PDDocument.load(input);
System.out.println(pd.getNumberOfPages());
System.out.println(pd.isEncrypted());
pd.save("CopyOfInvoice.pdf")
PDFTextStripper stripper = new PDFTextStripper();
wr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(output)));
stripper.writeText(pd, wr);
if (pd != null) {
pd.close();
}
// close and flush the output stream
wr.close();
} catch (Exception e){
e.printStackTrace();
}
}
}

Related

Application skipping frames while accessing sound files

I have this for statement triggered by a button that iterates through a MutableList of strings.
For each string it completes a file path and checks if that file path is valid. If it is, it's attempted to be sent to the mediaPlayer via a function to be played as a sound file. It should play the sound for all files it can find with a pause at certain stated points (2,5,7).
Unfortunately, when I test it out, the button animation comes in delayed, followed by a Logcat info of 363 skipped frames, doing too much work on the application's main thread.
I tried to consecutively commenting out certain lines of the function but was not able to identify the computationally intensive part of it. Could anybody tell me where the issue lies or how I could improve the function?
Here's the function itself:
btnStartReadingAloud.setOnClickListener {binding.root.context
Toast.makeText(binding.root.context, "Reading the exercise out loud", Toast.LENGTH_SHORT).show()
println("Reading the exercise out loud")
for (i in 0 until newExercise.lastIndex){
val currentElement : Pair<String,Array<Any>> = newExercise[i]
val currentDesignatedSoundFile : String = "R.id.trampolin_ansage_malte_"+replaceSpecialChars(currentElement.first)
val path : Uri = Uri.parse(currentDesignatedSoundFile)
val file = File(currentDesignatedSoundFile)
if (doesFileExist(file)){
System.out.println("Playing file named$file")
playSound(path)
Toast.makeText(binding.root.context, "Playing sound for: %path", Toast.LENGTH_SHORT).show()
}
val pauseTimes = listOf<Int>(2,5,7)
if (i in pauseTimes){
Thread.sleep(2000)
}
}
Here is the dedicated function to play the sound
fun playSound(soundFile : Uri) {
if (mMediaPlayer == null) {
mMediaPlayer = MediaPlayer.create(requireContext(), soundFile)
mMediaPlayer!!.isLooping = false
mMediaPlayer!!.start()
} else mMediaPlayer!!.start()
}
Any hint is appreciated, thanks already for reading & brainstorming :)

File Handling in cpp

I am working on my Project and new to c++. I have a question related to csv file. So, I am working with multiple cpp file in a same Project (for example main.cpp, first.cpp and second.cpp). In main.cpp, I am creating two csv file which have different Name whenever I run the Code and Iopen both csv file, writing 1st row in both csv file and then Close it. Now my question is: if I wanted to open and write on these both csv file in first.cpp and second.cpp then is it possible? If yes then how can I do that?
//main.cpp
void createcsv1()
{
//creating csv file1 and writing first row
}
void createcsv2()
{
//creating csv file2 and writing first row
}
int main()
{
void createcsv1();
void createcsv2();
System ("pause");
return 0;
//first.cpp
//second.cpp
To Open a file and perform file operations(read, write, seek, truncate), all you need is the Path to the file and appropriate permissions.
I doesn't matter if you are reading from class 1 or class 2.
Try to open these two CSV files in the same way you are opening in main.cpp. you will get a file handle. Start writing in to the file using file handles.
Don't forget to close the file handle once after file operation is completed.
Example Code:
ofstream handle;
handle.open ("example.txt");
handle << "basic example.\n";
handle.close();
You can define 2 functions(createcsv1, createcsv2) in 2 cpp files.
You have to add declarations in stdafx.h.
Last, you can call functions in main() function.
int main()
{
void createcsv1();
void createcsv2();
System ("pause");
return 0;
}

Sphinx Voice Activity Detection

So I'm trying to write a simple program that will detect voice activity with a .wav file using the CMU Sphinx library.
So far, I have the following
SpeechClassifier s = new SpeechClassifier();
s.setPredecessor(dataSource);
Data d = s.getData();
while(d != null) {
if(s.isSpeech()) {
System.out.println("Speech is detected");
}
else {
System.out.println("Speech has not been detected");
}
System.out.println();
d = s.getData();
}
I get the output "Speech is not detected" but there is Speech in the audio file. It seems as if the getData function is not working the way I want it to. I want it to get the frames and then determine whether the frames (s.isSpeech()) contain speech or not.
I'm trying to have multiple outputs ("Speech is detected" vs "Speech is not detected") for each frame. How can I make my code better? Thanks!
You need to insert DataBlocker before SpeechClassifier:
DataBlocker b = new DataBlocker(10); // means 10ms
SpeechClassifier s = new SpeechClassifier(10, 0.003, 10, 0);
b.setPredecessor(dataSource);
s.setPredecessor(b);
Then it will process 10 millisecond frames.

JAudioTagger Deleting First Few Seconds of Track

I've written a simple Groovy script (below) to set the values of four of the ID3v1 and ID3v2 tag fields in mp3 files using the JAudioTagger library. The script successfully makes the changes but it also deletes the first 5 to 10 seconds of some of the files, other files are unaffected. It's not a big problem, but if anyone knows a simple fix, I would be grateful. All the files are from the same source, all have v1 and v2 tags, I can find no obvious difference in the source files to explain it.
import org.jaudiotagger.*
java.util.logging.Logger.getLogger("org.jaudiotagger").setLevel(java.util.logging.Level.OFF)
Integer trackNum = 0
Integer totalFiles = 0
Integer invalidFiles = 0
validMP3File = true
def dir = new File(/D:\Users\Jeremy\Music\Speech Radio\Unlistened\Z Temp Files to MP3 Tagged/)
dir.eachFile({curFile ->
totalFiles ++
try {
mp3File = org.jaudiotagger.audio.AudioFileIO.read(curFile)
} catch (org.jaudiotagger.audio.exceptions.CannotReadException e) {
validMP3File = false
invalidFiles ++
}
// Get the file name excluding the extension
baseFilename = org.jaudiotagger.audio.AudioFile.getBaseFilename(curFile)
// Check that it is an MP3 file
if (validMP3File) {
if (mp3File.getAudioHeader().getEncodingType() != 'mp3') {
validMP3File = false
invalidFiles ++
}
}
if (validMP3File) {
trackNum ++
if (mp3File.hasID3v1Tag()) {
curTagv1 = mp3File.getID3v1Tag()
} else {
curTagv1 = new org.jaudiotagger.tag.id3.ID3v1Tag()
}
if (mp3File.hasID3v2Tag()) {
curTagv2 = mp3File.getID3v2TagAsv24()
} else {
curTagv2 = new org.jaudiotagger.tag.id3.ID3v23Tag()
}
curTagv1.setField(org.jaudiotagger.tag.FieldKey.TITLE, baseFilename)
curTagv2.setField(org.jaudiotagger.tag.FieldKey.TITLE, baseFilename)
curTagv1.setField(org.jaudiotagger.tag.FieldKey.ARTIST, "BBC Radio")
curTagv2.setField(org.jaudiotagger.tag.FieldKey.ARTIST, "BBC Radio")
curTagv1.setField(org.jaudiotagger.tag.FieldKey.ALBUM, "BBC Radio - 20130205")
curTagv2.setField(org.jaudiotagger.tag.FieldKey.ALBUM, "BBC Radio - 20130205")
curTagv1.setField(org.jaudiotagger.tag.FieldKey.TRACK, trackNum.toString())
curTagv2.setField(org.jaudiotagger.tag.FieldKey.TRACK, trackNum.toString())
mp3File.setID3v1Tag(curTagv1)
mp3File.setID3v2Tag(curTagv2)
mp3File.save()
}
})
println """$trackNum tracks created from $totalFiles files with $invalidFiles invalid files"""
I'm still investigating and it appears that there is no problem with JAudioTagger. Before setting the tags, I use Total Recorder to reduce the quality of the download from 128kbps, 44,100Hz to 56kbps, 22,050Hz. This reduces the file size to less than half and the quality is fine for speech radio.
If I run my script on the original files, none of the audio track is deleted. The deletion of the first part of the audio track only occurs with the files that have been processed by Total Recorder.
Looking at the JAudioTagger logging for these files, there does appear to be a problem with the header:
Checking further because the ID3 Tag ends at 0x23f9 but the mp3 audio doesnt start until 0x7a77
Confirmed audio starts at 0x7a77 whether searching from start or from end of ID3 tag
This check is not performed for files that have not been processed by Total Recorder.
The log of the header read operation also shows (for a 27 minute track):
trackLength:06:52
It looks as though I shall have to find a new MP3 file editor!
Instead of
mp3File.save()
could you try:
mp3File.commit()
No idea if it will help, but that seems to be the documented method?

java me: start playing mp3 file from the 2nd minute and stop on the fourth minute. mp3 file is of 6 minutes

i need to start playing a local resource mp3 file from the 2nd minute and stop on the fourth minute. mp3 file is of 6 minutes.
im new to this and couldn't find an example for the below code, could some1 pls point me to something like below?
long setMediaTime(long now)
i have other files also which i want to do the same with different numbers, it would be best if i could do this in milliseconds.. i am using this code to play the file..
{
try
{
InputStream is = getClass().getResourceAsStream("/nn.mp3");
Player p = Manager.createPlayer(is,"audio/mpeg");
p.realize();
{
}
p.prefetch();
p.start();
}
catch(Exception e)
{}
}
thanking u in advance! :)
You can use this p.setLoopCount( ); method.
Just have a look here

Resources