pdftotext doesn't give the same result in python3 script and shell - python-3.x

i try to use pdftotext and when i use it in my shell with my command:
"pdftotext pdf1" it gives me:
Révision : 0 1
Page : 1 / 1
Test Synapture
PJK
Spécification
Document n°
PJLTest 00 PJT SPC 0201 01
25 Rue Marguerite
69100 Villeurbanne
etc..
but when i use it in a simple script
with open(file, "rb") as f:
pdf = pdftotext.PDF(f)
text = "\n\n".join(pdf)
print(text)
it gives me :
PJLTest 00 PJT SPC 0201
Révision : 0 1
Test Synapture Page : 1 / 1
PJK
Spécification
Document n°
PJLTest 00 PJT SPC 0201 01
25 Rue Marguerite
69100 Villeurbanne
Tel : +33 (0)4 78.94.51.20 / Fax : +33 (0)4 78.94.51.21
01 BPE 16/09/2015 Première édition P. NOM1 P. NOM2 P. NOM3
REV. ETAT DATE OBJET DE LA REVISION REDACTION VERIFICATION APPROBATION
I searched on the documentation but i found nothing.
Thanks for you help!

Related

How can translate the file EDI 837 to _dataframe in python

ISA00 00 ZZ558881208 ZZNAVICURE 2206200220*^005011335627330P*:~GSHC558881208NAVICURE202206200220173562725X005010X222A1~ST837000107073005010X222A1~BHT001900309051202206200220CH~NM1412FLORIDA ARTHRIT46558881208~PERICEDI SUPPORTTE5088363663~NM1402NAVICURE46NAVICURE~HL1**201~NM1852FLORIDA ARTHRITIS RHEUMATISM, INC.*****XX1497731491~N31400 W OAK STSTE B~N4KISSIMMEEFL347414000~REFEI550881208~NM1872~N3PO BOX 421606~N4KISSIMMEEFL347421541~HL21220~SBRP18MC~NM1IL1IRIZARRYCHRISTINAMI726913467~N31021 LANDSTAR PARK DR~N4ORLANDOFL328248624~DMGD819850916F~NM1PR2*SIMPLY HEALTHCAREPI27094~N3PO BOX 21535~N4EAGANMN55121~CLM1704317811:B:1YAYY~DTP431D820220606~HIABK:M064ABF:M1990ABF:R5383ABF:M7918ABF:M797~NM1821AZIZABDULXX1811973696~PRVPEPXC207RR0500X~LX1~SV1HC:99213178UN11:2:3:4~DTP472RD820220606-20220606~REF6R19702~SE32000107073~GE1173562725~IEA1*133562733~

How to add tags like Author,Title and Thumbnail to an .m4a audio file using node.js?

Using Node.js to download files containing music, in .m4a format.
Issue: I cannot find a way to add tags and the Cover Art (thumbnail) to .m4a files.
I had done this before using another program: achieved by MediaHuman youtube -> mp3 downloader (even though it downloads as m4a, my desired format) https://ufile.io/yzyzt
(P.S.I'm open to use another language, as long as the language can be linked it to node, but it is definitely very much preferred if it could be done purely in node.js)
Any clues on this subject are very appreciated.
One way to do it is to use node-taglib2, a Node.js C++ addon based on taglib and available in the npm repository.
This module makes trivial editing mpeg metadata:
const fs = require('fs');
const taglib = require('taglib2');
let props = {
artist: 'Howlin\' Wolf',
title: 'Evil is goin\' on',
pictures: [
{
"mimetype": 'image/jpeg',
"picture": fs.readFileSync('./thumbnail.jpg')
}
]
}
taglib.writeTagsSync('./file.m4a', props);
Now we can check that metadata have been updated:
let tags = taglib.readTagsSync('./file.m4a')
console.log(tags.artist, '-', tags.title) // Howlin' Wolf - Evil is goin' on
console.log(tags.pictures) // [ { mimetype: 'image/jpeg', picture: <Buffer ff d8 ff e0 00 10 4a 46 49 46 00 01 01 00 00 01 00 01 00 00 ff db 00 43 00 03 02 02 03 02 02 03 03 03 03 04 03 03 04 05 08 05 05 04 04 05 0a 07 07 06 ... > } ]
But of course there are others options to do the same thing and I'm sure you could also use ffmpeg, as mentioned by Brad in his comment. I would be interested in your feedback if you try it.
I have finally solved my issue by the use of ffmpeg using node!
https://www.ffmpeg.org/
https://www.npmjs.com/package/ffmpeg
The issue was that \node_modules\ffmpeg\lib\video.js decided to skip duplicate commands, therefore my requests consisting of multiple of same commands were never read properly by ffmpeg. However, with a quick mod to the video.js file, I was able to make it work! I have successfully written both, the tags, and a thumbnail onto my .m4a

Mac Yosemite: Remove all space characters on lines that are starting with 0 (zero), in a standard text file

I have a text files, which is text copied from a subtitle file, that looks like this:
1
00 : 00 : 02 , 240 --> 00 : 00 : 04 , 240
(tadashi) <watashi no namae wa kanzaki jika.
2
00 : 00 : 04 , 240 --> 00 : 00 : 06 , 240
makikomare te shimatta watashi wa
tsuini?
...
it goes on for some ~300 more chunks like this.
How would I make it look like this, without doing it manually :) :
1
00:00:02,240 --> 00:00:04,240
(tadashi) <watashi no namae wa kanzaki jika.
2
00:00:04,240 --> 00:00:06,240
makikomare te shimatta watashi wa
tsuini?
...
Basically, I would like to remove all spaces on lines that are starting with the number zero, except those spaces that are before and after the "arrow"
I am on OSX Yosemite but, if the only solution would be on some other os, I'd be glad to hear it regardless
Since no one has answered you yet, here is a solution in python. You need to replace source and target filenames with what is appropriate for you.
#!/usr/bin/python
import re # this is the regex library
f = open('source.txt', 'rt') # this is the name of your source file
fnew = open('target.txt', 'wt') # this is the name of your target file
for line in f:
new = re.sub(r'(\d\d) ([:|,]) (\d\d)', "\\1\\2\\3", line)
fnew.write(new)
f.close()
fnew.close()

Fortran 77 read unformatted sequence data from old sun machine

I am porting an old mathematical model (between 1995 to 2000) to a current linux machine. For this, I adapted all makefiles as shown:
FORTRAN = gfortran # f90 -f77 -ftrap=%none
OPTS = -O -u -lgfortran -g -fconvert="big-endian" # -O -u
NOOPT =
LOADER = gfortran #f90
LOADOPTS = #-lf77compat
and:
SYSFFLAGS = -O0 -u -g -fconvert="big-endian" # -f77=input
SYSCFLAGS = -DX_WCHAR
SYSLDFLAGS =
SYSCPPFLAGS = -DSYS_UNIX -DCODE_ASCII -DCODE_IEEE # -DSYS_Sun
SYSAUTODBL = -fdefault-real-8 #-r8
SYSDEBUG = -g
SYSCHECK = -C
LINKOPT =
CPPOPT =
SHELL = /bin/sh
CC = cc
FC = gfortran # f90
LD = gfortran # f90
AR = ar vru
RM = rm -f
CP = cp
MV = mv -f
LN = ln -s
So I replaced all the outdated compilers/options to be able to compile the code. After that, It generates the binaries with no error. Note that all option behind the # symbol where the original options in the Makefiles.
So, when running the program, it is not possible to read the sample data. IMO those files were created as unformatted sequence mode on a Sun machine. The following hex dump belongs to the file that I need to read.
0000000: 0000 0400 2020 2020 2020 2020 2020 2020 ....
0000010: 3930 3130 7465 7374 2d63 3031 2020 2020 9010test-c01
0000020: 2020 2020 4741 5520 2020 2020 2020 2020 GAU
0000030: 2020 2020 2020 2020 2020 2020 2020 2020
0000040: 2020 2020 2020 2020 2020 2020 2020 2020
0000050: 2020 2020 2020 2020 2020 2020 2020 2020
...
...
0000390: 2020 2020 2020 2020 2020 2020 2020 2020
00003a0: 2020 2020 2020 2020 2020 2020 2020 2020
00003b0: 2020 2020 3139 3936 3037 3232 2032 3030 19960722 200
00003c0: 3434 3920 4147 434d 352e 3420 2020 2020 449 AGCM5.4
00003d0: 2020 2020 3230 3030 3036 3134 2031 3230 20000614 120
00003e0: 3831 3720 6869 726f 2020 2020 2020 2020 817 hiro
00003f0: 2020 2020 2020 2020 2020 2020 2020 2034 4
0000400: 3039 3630 0000 0400 0002 8000 bef7 21f3 0960..........!.
0000410: bf3c 55ab bf7a 8f71 bf99 e26a bfb2 db4e .<U..z.q...j...N
0000420: bfc7 425f bfd6 64b1 bfdf d44f bfe3 6a43 ..B_..d....O..jC
After analyzing the code, it is able to read until the line 0000410. It is not possible to continue after the mark 0002 8000. The source code shown below is actually reading this file.
...
* [INPUT]
INTEGER IFILE
CHARACTER HITEM *(*) !! name for identify
CHARACTER HDFMT *(*) !! data format
*
* [ENTRY INPUT]
REAL * 8 TIME1 !! time
REAL * 8 TIME2 !! time
REAL*8 DMIN
REAL*8 DMAX
REAL*8 DIVS
REAL*8 DIVL
INTEGER ISTYPE
INTEGER JFILE !! output file No.
INTEGER IMAXD
INTEGER JMAXD
*
* [WORK]
REAL * 8 DDATA ( NGDWRK )
REAL * 4 SDATA ( NGDWRK )
*
* [INTERNAL WORK]
INTEGER I, J, K, IJK, IJKNUM, IERR
...
...
READ ( IFILE, IOSTAT=IEOD ) HEAD
...
...
...
DO 2150 IJK = 1, IJKNUM
READ ( IFILE, END=2150 ) SDATA(IJK)
WRITE (6,*) ' IGTIO::GTZZRD: iteration=', IJK, SDATA(IJK)
2150 CONTINUE
In order to easily debug the loop I replaced for the one above. The original one is implicit.
READ ( IFILE, IOSTAT=IEOD)
& (SDATA(IJK), IJK=1, IJKNUM )
And the output for the loop is:
IGTIO::GTZZRD: iteration= 1 -0.48268089
IGTIO::GTZZRD: iteration= 2 1.35631564E-19
IGTIO::GTZZRD: iteration= 3 -0.48142704
IGTIO::GTZZRD: iteration= 4 1.35631564E-19
IGTIO::GTZZRD: iteration= 5 244.25270
IGTIO::GTZZRD: iteration= 6 1.35631564E-19
IGTIO::GTZZRD: iteration= 7 983.87988
IGTIO::GTZZRD: iteration= 8 1.35631564E-19
IGTIO::GTZZRD: iteration= 9 1.59284362E-04
IGTIO::GTZZRD: iteration= 10 1.35631564E-19
IGTIO::GTZZRD: iteration= 11 0.0000000
---error here---
I am definately lost on this, so any help is appreciated.
Here is whats going on - first this This is definitely a Big Enfian file.
The first 4 bytes
00000400
are the big end 4 byte integer 1024, which is the length of your first record.
which agrees with the length of HEAD (per comment)
Now note that 00000400 is repeated at byte position 1024+4 exactly (hex dump line 400) as you should expect for a fortran unformatted file...so far so good.
Now the next 4 bytes
0002 8000
begin the second record. (edit correcting mistake) This is 163840 (2*16^4+8*16^3) You should find that repeated at position 1024+8+163840+4 in the hex dump. (should be line 028400 i think..)
Here is the problem: in your code you are reading that 160 kilobyte record into a single 4 byte variable, then moving on to the next record. My guess you are seeing that alternating 10^-19 because every other record is of type character.
In unformatted fortran you must read a whole record in one shot - try reading the entire array (without the loop..)
READ ( IFILE )SDATA
assuming sdata is dimensioned to hold 160 kb of course. (eg. real*4 (40960) )
The answer to your problem is in the edit that I missed. Thanks also to george's arithmetic---which I hadn't bothered to do.
We can safely say that the record headers are correct, and that you can solve your problem with the endian conversion.
So, the problem is: reads with an implied-do loop are not equivalent to reads inside a do loop.
That is: read(unit) (i(j), j=1,5) is not the same as
do j=1,5
read(unit) i(j)
end do
In the first, reading of the five values is from one record, in the second each is read from a distinct record.
You should, then, revert your change. If you want to do the same diagnostics, however, you can do something like
READ ( IFILE, IOSTAT=IEOD) (SDATA(IJK), IJK=1, IJKNUM )
WRITE (6, '("IGTIO::GTZZRD: iteration='", I0.0, F12.8)') (IJK, SDATA(IJK), IJK=1,IJKNUM)
Out of this scope, for some reason a deep method in charge of reading the file is called in the 1100 loop. That was causing to read the file more times than necessary. Find below the fixed code:
* 1100 CONTINUE
CALL GDREDX !! read data
O ( GDATA , IEOD ,
O HITEMD, HTITL , HUNIT , HDSET ,
O TIME , TDUR , KLEVS ,
I IFILE , HITEM , HDFMT ,
I IMAXD , JMAXD ,
I IDIMD , JDIMD , KDIMD )
IF ( IEOD .EQ. 0 ) THEN
WRITE (6,*) ' IRWGD.F::GDRDTS: TSEL0=', TSEL0
WRITE (6,*) ' IRWGD.F::GDRDTS: TSEL1=', TSEL1
WRITE (6,*) ' IRWGD.F::GDRDTS: TIME=', TIME
* IF ( ((TSEL0.GE.0).AND.(TIME.LT.TSEL0))
* & .OR.((TSEL1.GE.0).AND.(TIME.GT.TSEL1)) ) THEN
* GOTO 1100
* ENDIF
ENDIF
*
RETURN
END
That mislead me in order to figure out what was going on.

when I show a sfl window help, the main screen disappear, in power7 v7r1m0 rpg ile program

I'm showing a sfl screen in my rpg program, in one field WPROV, I'm using F4=Help, if the user type F4, the program shows a sfl window help, but the program erase the old and main screen and I only can see the help sfl screen. How can I put in the screen the main screen and the sfl window help screen?
another thing I'm checking the fields in the screen in my rpg program, if there is an error I turn on *IN71, *IN72, but I can't see the message error on my screen, why?
Here is the main screen:
A*%%EC
A DSPSIZ(24 80 *DS3)
A R W1
A*%%TS SD 20130821 124511 ALCRUZ REL-V7R1M0 5770-WDS
A TEXT('ventana para ver detalles')
A CF03(03 'salir')
A CF05(05 'ACTUALIZAR')
A CF04(04 'AYUDA')
A CF06(06 'PROCEDER')
A CF12(12 'CANCELAR')
A KEEP
A BLINK
A ALARM
A OVERLAY
A WINDOW(2 2 18 75 *NORSTCSR)
A WDWBORDER((*DSPATR HI RI) (*CHAR '.-
A ..:::.:'))
A RMVWDW
A USRRSTDSP
A 1 22'ACME, S.A. de C.V.'
A DSPATR(HI)
A DSPATR(RI)
A 16 2'F3=Salir'
A DSPATR(HI)
A 1 63DATE
A EDTCDE(Y)
A 2 63TIME
A 1 2USER
A PGMA 10A O 2 2
A 16 39'F5=Actualizar'
A COLOR(WHT)
A 16 57'F12=Cancelar'
A COLOR(WHT)
A 2 19'Generación de Ventas Proveedores F-
A .F.S.'
A 7 26'No. de proveedor (F4).:'
A 10 26'Fecha Inicial(AAAAMMDD)'
A 13 26'Fecha Final..(AAAAMMDD)'
A WFI 8Y 0B 10 51EDTWRD(' / / ')
A COLOR(YLW)
A 72 ERRMSG('** Error en Fecha Inicial *-
A *' 72)
A WFF 8Y 0B 13 51EDTWRD(' / / ')
A COLOR(YLW)
A 73 ERRMSG('** Error en Fecha Final **'-
A 73)
A WNOMBP 30A O 8 26
A 16 13'F4=Ayuda'
A COLOR(WHT)
A WPROV 4A B 7 51COLOR(YLW)
A 71 ERRMSG('ERROR ESTE PROVEEDOR NO EXI-
A STE' 71)
A 16 24'F6=Proceder'
A COLOR(WHT)
And the sfl window screen is:
A*%%EC
A DSPSIZ(24 80 *DS3)
A R SWCCHK03 SFL
A*%%TS SD 20130819 102201 ALCRUZ REL-V7R1M0 5770-WDS
A S0AVAL 1Y 0H SFLCHCCTL
A S0OPTN 20A O 6 1
A R SWCCHK04 SFLCTL(SWCCHK03)
A*%%TS SD 20130819 104010 ALCRUZ REL-V7R1M0 5770-WDS
A SFLSIZ(0006)
A SFLPAG(0005)
A WINDOW(*DFT 13 32)
A OVERLAY
A 27 SFLDSP
A N28 SFLDSPCTL
A 28 SFLCLR
A 29 SFLEND
A CF12(12)
A SFLSNGCHC(*RSTCSR *AUTOSLT)
A*
A SFLRRN 4S 0H SFLRCDNBR(CURSOR)
A 1 10'PANTALLA DE AYUDA'
A COLOR(YLW)
A 4 1'Selecciona rengón,oprimiendo la'
A COLOR(WHT)
A CHOICE 20A O 3 1COLOR(BLU)
A 5 1'Barra espaciadora, F12= Salir'
A COLOR(WHT)
In the second display file, add a record format with the ASSUME keyword. You don't need to do anything with it in your RPG program, just define it.
A R DUMMY
A ASSUME
A 1 2' '
As for ERRMSG not working, it's because of RMVWDW. See the DDS Reference for ERRMSG - Restrictions and notes.
When the RMVWDW keyword is active, error messages are not displayed.

Resources