How can I extract equations from docx files? - ms-office

I want to write a program to retrieves equations (formulas) from .docx files. I am using open xml sdk, but I can't see how an equation is inserted in .docx files. An image (which represent the equation) is the only thing that I can find there. How can I extract equations from .docx files? Is that possible?

The previous answer was not responsive to the question!
If you examine the xml document tree and open the file document.xml, you will see something like the following:
<m:oMath><m:r><w:rPr><w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/></w:rPr><m:t>x= y</m:t>
(In the above example, the equation is "x=y". The tag indicates an equation.

You might be able to select the text in each equation editor object an copy it out to another document or something by following the VBA section in this link:
http://www.extendoffice.com/documents/word/751-word-select-equation.html
That's what I'd try anyway!

Related

Update linked excel path in PowerPoint via Python

I want to automate creating of a powerpoint ppt via linking template charts to some Excel files. Updating the excel file values changes the powerpoint slides automatically. I have created my powerpoint template and linked charts to sample excel files data.
I want to send the folder with the powerpoint and excel files to someone else. But this will break the link to excel files due to change in the path. (As path is not relative). I can edit the paths manually by going under the "edit links to files" option under File Menu but this is tedious as charts are numerous with multiple files.
I want to update the same via Python code using the Python-Pptx package.
Please help!
There's no API support for this in the current version of python-pptx.
You would need to modify the underlying XML directly, perhaps using python-pptx internals as a starting point and using lxml calls on the appropriate element objects. If you search on "python-pptx workaround function" you will find some examples.
Another thing to consider is modifying the XML by cruder but still possibly effective means by accessing the XML files in the .pptx package directly (the .pptx file is a Zip archive of largely XML files) and using regular expressions or perhaps a command line tool like sed or awk to do simple text substitution.
Either way you're going to need to want it pretty badly, depending on your Python skill level. You'll also of course need to discover just which strings in which parts of the XML are the ones that need changing. opc-diag can be helpful for that, but it's a bit of detective work even with the best tools.

How can I edit a DXF in node.js?

I'd like to make a custom lasered label from a user's input on a website. I have a template dxf file and I'd like to replace placeholder text with the user input. My problem is the dxf file format is very unreadable in its text format. Is there any way to make sense of the numeric data? If not are there any other formats (svg, etc) that would be easier to work with?
EDIT: The reason I've found it unreadable in terms of text is that the program (Solidworks) converted the text to curves.) At this point I'm trying to figure out how to prevent that.
AutoDesk was nice enough to document DXF syntax in great detail. Spend a couple hours understanding the documentation from the link below, and I think you will find it quite easy to parse and edit using code.
To just replace some placeholder text, it should be just as simple as reading the DXF file into a string (a dxf file is no different than a txt file), performing a text replace operation and saving it back to file. Just make sure that your placeholder text is very unique and is not contained in any of the key words in the document below (otherwise your DXF file will get corrupted). Something like "PlaceHolderText" will do the trick.
http://images.autodesk.com/adsk/files/autocad_2012_pdf_dxf-reference_enu.pdf
Edit: More Info
I do a lot of work with AutoDesk Inventor which is in direct competition with SolidWorks, so they are effectively the same tool. We were faced with a similar problem of needing to place text onto sheet metal flat pattern DXFs that came out of Inventor in order to identify the part, but Inventor simply could not do it (see, exactly the same!). One of our developers had the idea to place a very precise geometry punch onto the flat pattern. After the DXF was generated he wrote some code that parsed the DXF file and replaced the geometry with a text entity. More specifically we used a triangle with sides having each length defined to something like the 7th decimal place. You can then use one of the vertices of the triangle to position the text, including rotation. This process would be automatic, so once you write the code with the help of the document above (which won't take the long), it will just work. If your engraver can handle text the way you want it, I'd say this is a very good solution. We generate hundreds of parts every day using this code. Hope this helps.

Add text to pdf using Excel -VBA

I don't have much knowledge about VBA.
But I have a problem which I think can be solved with VBA.
I have a PDF file of 400 pages. I have an excel with page numbers and some text. Now I want this text to be copy pasted (Add Text under drawing markup in PDF tools) in the PDF.
I can do it manually but it will take 3 to 4 days. so can anybody help me and make my work easier. I wanted to do this in Excel-VBA.
I have 2013 Excel and Acrobat xi Pro.
It depends.
If the pdf has forms in it, you are of course able to fill them in a programmatic way.
If your document does not contain forms you are not going to be able to solve this problem in a trivial manner.
Why, I hear you ask?
PDF documents, despite their reputation are more like containers of instructions than they are a WYSIWYG format
instructions are bundled in groups called "objects"
objects can be compressed (DEFLATE) into streams
objects are indexed so they can be re-used (this is called the xref)
the index uses byte-offsets to get a grip on which object is where in the document
Now what would happen if you wanted to add a single character somewhere in the document
you would need to decode the streams to figure out where you're actually placing content
Once you've found the right stream, and you've inserted your character, you have also screwed up the xref table.
Nothing will work anymore

using A Layout File(.lyt) and fixed width file to create a .CSV file output

I have been given a layout file, .lyt ,which gives me information on how to interpret a fixed width file. I need to convert this file into a comma delimited file. Normally this process is one or two files so i just do it using excels data tools by hand creating the delimitation based on reading the layout and applying it. Now however I have a multiple files with different layout files I need to apply to each and this could become tedious. I remember once seeing someone apply this type of file directly using either notepad ++ or excel but after an exhaustive google search I cannot find a tutorial on how to apply a layout file to a fixed width file to create a csv file without direct human intervention. Does anyone know how to do this?
below is a sample of what is in the layout file
Seq Position Name Length
1 1-3 Title Code Full 3
2 4-17 Given Name 14
3 18-18 Middle Initial 1
4 19-48 Surname 30
To all future users I could not find a solution but I did find a workaround using excels own data tools. Going to the data tab, go to get external data, From Text. From there follow the wizard. Im sorry I could not find the proper solution I know exists.

Serialized Printing Method

I am looking for a method by which I can print one document, and have a field that is incremented on each copy printed. I currently run linux, so bash in concert with several programs might be the way to go, but I'm just not sure where to start.
I have a document that is used for our business that currently is hand stamped for serialization... We would like to simply print them but cant find a method by which to increment a specific field. I would like to use either a PDF or an ODF/ODT for the document.
Thanks for any help you can give!
How is the document produced at the first place?
If you master that process, you could certainly add serialization at that level. For instance if using LibreOffice you could do that in LibreOffice. If using a text formatter (like LaTeX, Lout, ....) just emit the formatting instructions (e.g. the .tex or .lout source file) with some unique counting (perhaps simpler to do in some scripting language like Python or Ocaml).
Then run the relevant tool to get a .pdf file.

Resources