How can I search the content of a pdf file in linux shell script? [closed]

How can I search the content of a pdf file in linux shell script? [closed] - linux

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Suppose I have given some journal paper in pdf format. I want to find out the title and Author List of the papers. How can I do that in shell scripts ?

I do not know if this works for your journal, it works on some pdf files:
strings "myjournal.pdf" | egrep "/Author|/Title" | tr '/' '\n' | egrep "Author|Title"

I worked on a project where we had to do search's in the content of a pdf file. The process that we decided to use is the following one:
First we would convert the pdf file to an image with the following command:
convert -density 500 "pdf_path.pdf" -depth 8 "image_output.png"
And after the file has been created, we use the command below to create a txt file with the pdf's content.
tesseract "image_output.png" "out_put_txt_file_name" -l por
You are probably going to have to change the -l por argument, because we use to do this for text's in portuguese.

Related

Add rows in an excel file with a shell script [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 days ago.
Improve this question
I'm trying to create a shell script that creates an excel output file and add data in it.
I don't know how to add rows in this excel file. The goal being that the data will be added in separate cells.
I tried many instruction that i found in the web, but none worked for me.
the file after my script would be like below:
Is there an instruction to do this please?
Thanks for your help

This code worked for me:
touch Output.xls
chmod 777 Output.xls
printf "Date\tTotal number\tSuccessful number\tFailed numbe\n" >> Output.xls
printf "2016_05_11\t20\t5\t15\n" >> Output.xls
printf "2016_06_30\t30\t16\t14\n" >> Output.xls
The output file is as below:
enter image description here

How to convert a temp folder( .tmp ) to a binary file ( .bin) in Linux [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
enter image description here
enter image description herei.stack.imgur.com/emphasized text8kFIB.png
I want to convert the folder into a binary file(.bin),I tried to write a shell file, but it not generate the correct binary file,I do not know how to solve it,maybe my solution is wrong.Someone tells me that I can use the command zip,I can not agree....C

From your remark about zip, it seems you are trying to make an archive. One simple way to create an archive of a directory is with tar:
tar -zcf foo.tgz /path/to/foo
Note that this will create a file named foo.tgz rather than *.bin, but .... you don't really want to use the bin suffix for a tarball.

Replace URL in markdown files in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have big markdown file. Is there any way to change /foo in my url by /bla. That is, I want to replace
[text](/foo/some-long-url/a.html)
for
[text](/bla/some-long-url/a.html)
(all ocurrences).
I know I could compile markdown file to html and use html parsers (like BeautifulSoup) to do that. But I want to do that, on the source file.
Prefered python or shell solutions.

I mean you can always replace "/foo" to "/bla" directly in the source using sed?
sed 's/\/foo/\/bla/' source.md >> destination.md
If it catches anything unwanted, you can just tweak the regular expression a bit to be more specific.

I need a linux script to report all error lines from a log file, and export the results of error lines into a .csv file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have log file for checking transactions, and I have error lines, so I need those error lines to be exported to a .csv file? is there any code using linux bash shell script can do this?

suppose your error lines consists ERROR.
then
grep "ERROR" errorfile.txt | tr -s '[:blank:]' ',' >> errorfile.csv
csv conversion is based on blank spaces to each cell.you can replace blank filter with anything

Portable bourne shell script without using functions of modern shells as bash, ksh, zsh etc [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
First of all I want thank all of you who will help me solve this. I have an exam tomorrow and I have to prepare this script for the exam. I am really new to linux and those bourne shell script.
My project should be a portable bourne shell script which scans a directory for the following files: header.txt, footer.txt and content.txt. The content of the files should be read but ignoring the lines starting with # and this content should be used for generating an HTML page with the following header, footer and content. This files can contain any text and/or HTML code but the cannot contain head and body tags. When scanning the directory the script have to compare the date of the last change of the files (header.txt, footer.txt and content.txt) with the date of the last change of the HTML page (if you have one already) and if the date of the last edit on the files is newer than the one on the HTML page the script should generate a new HTML page with the latest content.
Guys thank you very much as this is very important for me. Please help me getting this done.
Thank you very much!

To remove lines beginning with # try this:
grep -v "^#" file
To remove lines that may contain spaces (or blank characters) before a #:
grep -v "^[[:blank:]]*#" file

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I search the content of a pdf file in linux shell script? [closed] - linux

I do not know if this works for your journal, it works on some pdf files: strings "myjournal.pdf" | egrep "/Author|/Title" | tr '/' '\n' | egrep "Author|Title"

Related

Add rows in an excel file with a shell script [closed]

How to convert a temp folder( .tmp ) to a binary file ( .bin) in Linux [closed]

Replace URL in markdown files in python [closed]

I need a linux script to report all error lines from a log file, and export the results of error lines into a .csv file? [closed]

Portable bourne shell script without using functions of modern shells as bash, ksh, zsh etc [closed]

Categories

Resources