PDFBox in Kotlin - Output of an Rectangle Field of an PDF File - string

Its my first time i'm posting here, so sorry if i did something wrong.
My english is also not the best at the moment.
I'm new at coding and did start with some usefull Tutorials at Youtube.
My first Project should get some data out of an pdf-file.
I've used therefor PDFBox and it works well for me.
The only problem i cant solve at the moment is:
The extracted data are 4 tables in the pdf, ive used extractbyarea and set the right ints (x, y, w, h).
The data is written in an output, to this point it works well for me.
I got a string called "daten61". But if i'm using println(daten61) i got the string
shown with linebreaks. I want to use it like:
09.08 3931 15:59 00:46
and not like:
09.08
3931
15:59
00:46
Ive tried trim, replace, whatever ... it wont work.
Any other solutions for it? Someone well inside of PDFBox to can help to solve the problem?
Further i want to use the String like [09.08,3931,15:59,00:46] to start the next step of my project.
Thanks!
Code ive used:
val document = File("C://test//test1.pdf")
val doc = PDDocument.load(document)
val stripper = PDFTextStripperByArea()
stripper.sortByPosition = true
val output = ""
val rect = Rectangle(x, y, w, h)
stripper.addRegion(output, rect)
val firstPage = doc.getPage(0)
stripper.extractRegions(firstPage)
return (stripper.getTextForRegion(output))
/After println(output) ive got the Data in 4 rows

Related

How to split compound word in pandas?

I have and document that consist of many compounds (or sometimes combined) word as:
document.csv
index text
0 my first java code was helloworld
1 my cardoor is totally broken
2 I will buy a screwdriver to fix my bike
As seen above some words are combined or compound and I am using compound word splitter from here to fix this issue, however, I have trouble to apply it in each row of my document (like pandas series) and convert the document into a clean form of:
cleanDocument.csv
index text
0 my first java code was hello world
1 my car door is totally broken
2 I will buy a screw driver to fix my bike
(I am aware of word such as screwdriver should be together, but my goal is cleaning the document). If you have a better idea for splitting only combined words, please let me know.
splitter code may works as:
import pandas as pd
import splitter ## This use enchant dict (pip install enchant requires)
data = pd.read_csv('document.csv.csv')
then, it should use:
splitter.split(data) ## ???
I already looked into something like this but this not work in my case. thanks
You use apply wit axis =1 : Can you try the following
data.apply(lambda x: splitter.split(j) for j in (x.split()), axis = 1)
I do not have splitter installed on my system. By looking at the link you have provided, I have this following code. Can you try:
def handle_list(m):
ret_lst = []
L = m['text'].split()
for wrd in L:
g = splitter.split(wrd)
if g :
ret_lst.extend(g)
else:
ret_lst.append(wrd)
return ret_lst
dft.apply(handle_list, axis = 1)

connecting a variable output to send Keys string

I am trying to generate a random number that is six digits and then submit that 6 digit out put to a website.
here is my code:
send_code = self.chrome_browser.find_element_by_xpath("/html/body/div[1]/section/div/div/div[3]/form/span/button").click()
random_num = random.randrange(000000, 999999)
(f'{random_num:06}')
send_random_num = self.chrome_browser.find_elements_by_xpath('/html/body/div[1]/section/div/div/div[2]/form/div/input')
random_num.send_keys(Keys.random_num, Keys.ENTER)
the generating the random number is working but for some reason my send_random_num variable is not executing the right commands. I feel like I am formatting this wrong. Keys.(var) just isn't close to right so any suggestions? thanks (using selenium)
To me, it looks like you are using selenium. Maybe next time try adding that as a tag or clearly stating what module you are using. I believe this will work, and I took the liberty clarifying some of your variable names:
button = self.chrome_browser.find_element_by_xpath("/html/body/div[1]/section/div/div/div[3]/form/span/button")
button.click()
random_num = random.randrange(000000, 999999)
random_code = (f'{random_num:06}')
input = self.chrome_browser.find_elements_by_xpath('/html/body/div[1]/section/div/div/div[2]/form/div/input')
input.send_keys(random_code)

loop to read multiple files

I am using Obspy _read_segy function to read a segy file using following line of code:
line_1=_read_segy('st1.segy')
However I have a large number of files in a folder as follow:
st1.segy
st2.segy
st3.segy
.
.
st700.segy
I want to use a for loop to read the data but I am new so can any one help me in this regard.
Currently i am using repeated lines to read data as follow:
line_2=_read_segy('st1.segy')
line_2=_read_segy('st2.segy')
The next step is to display the segy data using matplotlib and again i am using following line of code on individual lines which makes it way to much repeated work. Can someone help me with creating a loop to display the data and save the figures .
data=np.stack(t.data for t in line_1.traces)
vm=np.percentile(data,99)
plt.figure(figsize=(60,30))
plt.imshow(data.T, cmap='seismic',vmin=-vm, vmax=vm, aspect='auto')
plt.title('Line_1')
plt.savefig('Line_1.png')
plt.show()
Your kind suggestions will help me a lot as I am a beginner in python programming.
Thank you
If you want to reduce code duplication, you use something called functions. And If you want to repeatedly do something, you can use loops. So you can call a function in a loop, if you want to do this for all files.
Now, for reading the files in folder, you can use glob package of python. Something like below:
import glob, os
def save_fig(in_file_name, out_file_name):
line_1 = _read_segy(in_file_name)
data = np.stack(t.data for t in line_1.traces)
vm = np.percentile(data, 99)
plt.figure(figsize=(60, 30))
plt.imshow(data.T, cmap='seismic', vmin=-vm, vmax=vm, aspect='auto')
plt.title(out_file_name)
plt.savefig(out_file_name)
segy_files = list(glob.glob(segy_files_path+"/*.segy"))
for index, file in enumerate(segy_files):
save_fig(file, "Line_{}.png".format(index + 1))
I have not added other imports here, which you know to add!. segy_files_path is the folder where your files reside.
You just need to dynamically open the files in a loop. Fortunately they all follow the same naming pattern.
N = 700
for n in range(N):
line_n =_read_segy(f"st{n}.segy") # Dynamic name.
data = np.stack(t.data for t in line_n.traces)
vm = np.percentile(data, 99)
plt.figure(figsize=(60, 30))
plt.imshow(data.T, cmap="seismic", vmin=-vm, vmax=vm, aspect="auto")
plt.title(f"Line_{n}")
plt.show()
plt.savefig(f"Line_{n}.png")
plt.close() # Needed if you don't want to keep 700 figures open.
I'll focus on addressing the file looping, as you said you're new and I'm assuming simple loops are something you'd like to learn about (the first example is sufficient for this).
If you'd like an answer to your second question, it might be worth providing some example data, the output result (graph) of your current attempt, and a description of your desired output. If you provide that reproducible example and clear description of the problem you're having it'd be easier to answer.
Create a list (or other iterable) to hold the file names to read, and another container (maybe a dict) to hold the result of your read_segy.
files = ['st1.segy', 'st2.segy']
lines = {} # creates an empty dictionary; dictionaries consist of key: value pairs
for f in files: # f will first be 'st1.segy', then 'st2.segy'
lines[f] = read_segy(f)
As stated in the comment by #Guimoute, if you want to dynamically generate the file names, you can create the files list by pasting integers to the base file name.
lines = {} # creates an empty dictionary; dictionaries have key: value pairs
missing_files = []
for i in range(1, 701):
f = f"st{str(i)}.segy" # would give "st1.segy" for i = 1
try: # in case one of the files is missing or can’t be read
lines[f] = read_segy(f)
except:
missing_files.append(f) # store names of missing or unreadable files

Python loop list in function call

This is my first question on StackOverflow. I have always found what I was looking for just googling, but this time I'm stuck and can't figure it out.
I'm a beginner programmer with python and still learning a lot.
I want to change a dateEdit box in a Userinterface with a small code to set det current date time.
the code looks like this.
self.dateEdit_2.setDateTime(QtCore.QDateTime.currentDateTime())
Now i want to change every dateEdit box the same, starting from 2 and going to 29, without typing every single line out.
i have tried to make a for loop with a filled list.
and i get it to print out what i want, but how does i get "set_date_numb" to be a attribute that does what i want.
hope you understand, Thanks.
dateTimeList = ['2','3','4','5','6','7','8','9',
'10','11','12','13','14','15','16','17','18','19','20',
'21','22','23','24','25','26','27','28','29']
indexval = 0
for i in range(len(dateTimeList)):
date_numb = (dateTimeList[indexval])
set_date_numb ='self.dateEdit_{}.setDateTime(QtCore.QDateTime.currentDateTime())'.format(date_numb)
print(set_date_numb)
indexval += 1
You could use getattr(), see the documentation here. Since the functions you are after are members of your instance you can grab them with their names as strings (which I think is the main problem you are facing):
dateTimeList = [str(x) for x in range(2,30)]
for dt in dateTimeList:
name = "dateEdit_{}".format(dt)
currentDateEdit = getattr(self, name)
currentDateEdit.setDateTime(QtCore.QDateTime.currentDateTime())

Exporting uitable data to excel using matlab for macintosh

I am stuck trying to export matlab uitable data to excel. I tried many things, and It has been impossible to solve this problem. After many days, I tried the code below using windows and it does work perfect, however, using the same for Macintosh is not longer working. The output is as follows:
"Error using dlmwrite (line 118) The input cell array cannot be converted to a matrix"
Searching for more information, I found an answer here ,(Using "xlswrite" MATLABs for cell arrays containing strings of different size) which doesn't work perfect. Finally I found this method which applies only for matlab using windows (http://www.mathworks.es/matlabcentral/answers/20819-export-uitable-s-data-to-a-spreadsheet-excel).
I hope you can help me with this problem.
Thanks in advance
Hector
function Save_File
hf = figure;
hExportButton = uicontrol('Parent',hf,'Units',...
'normalized','Position',[0 0.81 0.22 0.18],'Style','Pushbutton',....
'String',' Export Data!','FontSize',20,'Callback',#ExportButton_Callback);
dat = rand(5,5);
t=uitable('Data',dat,'ColumnName',{'First','Second','Third','Fourth','Fifth'},...
'Position',[7 10 500 300]);
Data=get(t,'Data');
ColumnName=get(t,'ColumnName');
set(t,'ColumnWidth',{93.5})
function ExportButton_Callback(~,~)
NewData= num2cell(Data,ones(size(Data,1),1),ones(size(Data,2),1));
CombData=[ColumnName';NewData];
FileName = uiputfile('*.xls','Save as');
xlswrite(FileName,CombData);
end
end
You should be able to convert the cell array into a number array with a cell2mat command and then use csvwrite or dlmwrite.
If the combo of numbers and strings is the issue, as stated in my comment above, you can use some simple looping to do this all for you. I posted some sample code below.
% Creating some temporary data for proof of concept
mat = randi([1,5],10,2);
header = {'Col1','Col2'};
cellVals = [header;num2cell(mat)];
% the real code that does the writing
fh = fopen('temp.csv','w'); % open a file with write privileges, will overwrite old versions
for ii = 1:size(cellVals,1)
first = 1;
for jj = 1:size(cellVals,2)
if first
fwrite(fh,num2str(cellVals{ii,jj},'%f'));
first = 0;
else
fwrite(fh,[',',num2str(cellVals{ii,jj},'%f')]);
end
end
fwrite(fh,sprintf('\r\n')); % print line break
end
fclose(fh); % close file out when done writing

Resources