LibreOffice: determine source code part responsible for printing - linux

I am trying to implement some additional functionality to the LibreOffice printing process (some special info should be added automatically to the margins of every printed page). I am using RHEL 6.4 with LibreOffice 4.0.4 and Gnome 2.28.
My purpose is to research the data flow between LibreOffice and system components and determine which source codes are responsible for printing. After that I will have to modify these parts of code.
Now I need an advice on the methods of source code research. I found a plenty of tools and from my point of view:
strace seem to be very low-level;
gprof requires binaries recompiled with "-pg" CFLAGS; have no idea how to do it with LibreOffice;
systemtap can probe syscalls only, isn't it?
callgrind + Gprof2Dot are quite good together but perform strange results (see below);
For instance here is the call graph from callgrind output with Gprof2Dot visualisation. I started callgrind with such a command:
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes /usr/lib64/libreoffice/program/soffice --writer
and received four output files:
-rw-------. 1 root root 0 Jan 9 21:04 callgrind.out.29808
-rw-------. 1 root root 427196 Jan 9 21:04 callgrind.out.29809
-rw-------. 1 root root 482134 Jan 9 21:04 callgrind.out.29811
-rw-------. 1 root root 521713 Jan 9 21:04 callgrind.out.29812
The last one (pid 29812) corresponds to the running LibreOffice Writer GUI application (i determined it with strace and ps aux). I pressed CTRL+P and OK button. Then I closed the application hoping to see the function responsible for printing process initialisation in logs.
The callgrind output was processed with a Gprof2Dot tool according to this answer. Unfortunately, I cannot see on the picture neither the actions I am interested in, nor the call graph as is.
I will appreciate for any info about the proper way of resolving such a problem. Thank you.

The proper way of solving this problem is remembering that LibreOffice is open source. The whole source code is documented and you can browse documentation at docs.libreoffice.org. Don't do that the hard way :)
Besides, remember that the printer setup dialog is not LibreOffice-specific, rather, it is provided by the OS.

What you want is a tool to identify the source code of interest. Test Coverage (TC) tools can provide this information.
What TC tools do is determine what code fragments have run, when the program is exercised; think of it as collecting as set of code regions. Normally TC tools are used in conjunction with (interactive/unit/integration/system) tests, to determine how effective the tests are. If only a small amount of code has been executed (as detected by the TC tool), the tests are interpreted as ineffective or incomplete; if a large percentage has been covered, one has good tests asd reasonable justification for shipping the product (assuming all the tests passed).
But you can use TC tools to find the code that implements features. First, you execute some test (or perhaps manually drive the software) to exercise the feature of interest, and collect TC data. This tells you the set of all the code exercised, if the feature is used; it is an overestimation of the code of interest to you. Then you exercise the program, asking it to do some similar activity, but which does not exercise the feature. This identifies the set of code that definitely does not implement the feature. Compute the set difference of the code-exercised-with-feature and ...-without to determine code which is more focused on supporting the feature.
You can naturally get tighter bounds by running more exercises-feature and more doesn't-exercise-feature and computing differences over unions of those sets.
There are TC tools for C++, e.g., "gcov". Most of them, I think, won't let/help you compute such set differences over the results; many TC tools seem not to have any support for manipulating covered-sets. (My company makes a family of TC tools that do have this capability, including compute coverage-set-differences, including C++).
If you actually want to extract the relevant code, TC tools don't do that.
They merely tell you what code by designating text regions in source files. Most test coverage tools only report covered lines as such text regions; this is partly because the machinery many test coverage tools use is limited to line numbers recorded by the compiler.
However, one can have test coverage tools that are precise in reporting text regions in terms of starting file/line/column to ending file/line/column (ahem, my company's tools happen to do this). With this information, it is fairly straightforward to build a simple program to read source files, and extract literally the code that was executed. (This does not mean that the extracted code is a well-formed program! for instance, the data declarations won't be included in the executed fragments although they are necessary).
OP doesn't say what he intends to do with such code, so the set of fragments may be all that is needed. If he wants to extract the code and the necessary declarations, he'll need more sophisticated tools that can determine the declarations needed. Program transformation tools with full parsers and name resolvers for source code can provide the necessary capability for this. This is considerably more complicated to use than just test coverage tools with ad hoc extraction text extraction.

Related

Code Coverage Fuzzing Microsoft Office using WinAfl

I am looking for the ways to fuzz Microsoft office, let's say Winword.exe. I want to know which modules or functions does parsing the file formats like RTF,.DOCX,.DOC etc....
Yes i know by doing reverse engineering. but office don't have symbols(public symbols) which gives too much pain and too hard for tracing or investigating .
i have done some RE activity via windbg by putting breakpoint and analysing each function and done some stack analysis.so looked into RTF specification and relying on some structure will be loaded into memory while debugging in Windbg. but lost everywhere..... And time consuming.
Even i ran Dynamorio, hoping for getting some results. but again failed....
Winafl Compatibility:
As per winafl, i need to find a function which is taking some inputs and doing some interesting stuffs like parsing in my case.
but in my case it is way too difficult to get due to lack of symbols...
and i m asking, is it possible to doing code coverage and instrumentation fuzzing via winafl...
And what are the best possible and easy way to do RE activity on symbol less software like in my case?
so asking if anybody has any experience.....

Gnu fortran compiler write option

I use FORTRAN gnu compiler to compile a piece of code written using fortran(.f90). Unlike in other compilers the output of write statement are not displayed in the screen rather written in the output file.
For example I have placed "write(*,*) 'Check it here'" in the middle of the source code so that this message is displayed in the screen when someone runs the compiled version of the code.
I dont understand why this message is not displayed in the terminal window while running the code, but it is written in the output file.
I would appreciate your help to resolve this !!
>
I am compiling these source codes:
https://github.com/firemodels/fds/tree/master/Source
makefile that I am using to compile the code is located here:
https://github.com/firemodels/fds/tree/master/Build/mpi_intel_linux_64
I run the program using a executable that makefile creates
The version of the compiler that I am using is
GNU Fortran (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
>
Thank you.
Way bigger picture: Is there a reason you're building FDS from source rather than downloading binaries directly from NIST i.e. from https://pages.nist.gov/fds-smv/downloads.html ?
Granted, if you're qualifying the code for safety-related use, you may need to compile from source rather than use someone else's binaries. You may need to add specific info to a header page such as code version, date of run, etc. to satisfy QA requirements.
If you're just learning about FDS (practicing fire analysis, learning about CFD, evaluating the code), I'd strongly suggest using NIST's binaries. If you need/want to compile it from source, we'll need more info to diagnose the problem.
That said, operating on the assumption that you have a use case that requires that you build the code, your specific problem seems to be that writing to the default output unit * isn't putting the output where you expect.
Modern Fortran provides the iso_fortran_env module which formalizes a lot of the obscure trivia of Fortran, in this case, default input and output units.
In the module you're editing, look for something like:
use iso_fortran_env
or
use iso_fortran_env, only: output_unit
or
use, intrinsic:: iso_fortran_env, only: STDOUT => output_unit
If you see an import of output_unit or (as in the last case) an alias to it, write to that unit instead of to *.
If you don't an import from iso_fortran_env, add the last line above to the routine or module you're printing from and write to STDOUT instead of *.
That may or may not fix things, depending on if the FDS authors do something strange to redirect IO. They might; I'm not sure how writing to screen works in an MPI environment where the code may run in parallel on a number of networked machines (I'd write to a networked logging system in that case, but that's just me). But in a simple case of a single instance of the code running, writing to output_unit is more precise than writing to * and more portable and legible than writing to 6.
Good luck with FDS; I tried using it briefly to model layer formation from a plume of hydrogen gas in air. FDS brought my poor 8 CPU machine to its knees so I went back to estimating it by hand instead of trying to make CFD work...

What exactly is the Link-edit step

Question
What exactly does the link-edit step in my COBOL complier do?
After the code is compiled, there is a link edit step performed. I am not really sure what this step does.
Background Information
Right out of school (3 years ago) I got a job as a mainframe application developer. Having learned nothing about mainframes in school, my knowledge has many gaps. Around my shop, we kind of have a "black box" attitude of we don't need to know how a lot of this stuff works, it just does. I am trying to understand why we need this link-edit step if the program as already compiled successfully.
The linkedit/binderer step makes an executable program out of the output from the compiler (or the Assembler).
If you look at the output data set on SYSLIN from your COBOL compile step (if it is a temporary dataset, you can override it to an FB, LRECL 80 sequential dataset to be able to look at it) you'll see "card images", which contain (amongst some other stuff) the machine-code generated by the compiler.
These card-images are not executable. The code is not even contiguous, and many things like necessary runtime modules are missing.
The Program Binder/Binder (PGM=HEWL) takes the object code (card-images) from the compiler/assembler and does everything necessary (according to the options it was installed with, and further options you provide, and other libraries which many contain object-code or loadmodules or Program Objects) to create an executable program.
There used to be a thing called the Linkage Editor which accomplished this task. Hence linkedit, linkedited. Unfortunately, in English, bind does not conjugate in the same way as edit. There's no good word, so I use Binderer, and Bindered, partly to rail against the establishment which decided to call it the Program Binder (not to be so confused with Binding for DB2, either).
So, today, by linkedit people mean "use of the Program Binder". It is a process to make the output from your compile/assemble into an executable program, which can be a loadmodule, or a Program Object (Enterprise COBOL V5+ can only be bindered into Program Objects, not loadmodules), or a DLL (not to be confused with .dll).
It is worth looking at the output of SYSLIN, the SYSPRINT output from the binder step, and consulting manuals/presentations of the Program Binder which will give you an idea of what goes in, what happens (look up any IEW messages, especially for non-zero-RC executions of the step) by sticking the message in a browser search box. From the documentary material you'll start to get an idea of the breadth of the subject also. The Binder is able to do many useful things.
Here's a link to a useful diagram, some more detailed explanation, and the name of the main reference document for the binder for application programmes: z/OS MVS Program Management: User's Guide and Reference
The program management binder
As an end-note, the reason they are "card images" is because... back in olden times, the object deck from compiler/assembler would be punched onto physical cards. Which would then be used as input cards to the linkage editor. I'm not sorry that I missed out on having to do that ever...
In addition to Bill's (great) answer, I think it is worth to also mention the related topics below ...
Static versus dynamic linking
If a (main) program 'calls' a subprogram, then you can either have such call to happen 'dynamic' or 'static':
dynamic: at run-time (when the main program is executed), the then current content of the subprogram is loaded and performed.
static: at link-time (when the mail program is(re-)linked), the then current content of the subprogram is included (= resolved) in the main program.
Link-edit control cards
The actual creation of the load module (output of the link-edit step) can be controlled by special directives for the link-editor, such as:
Entry points to be created.
Name of the load module to be created.
Includes (of statically linked subprograms) to be performed.
Alias-members to be created.
Storing the link-edit output in PDS or PDSE
The actual output (load module) can be stored in members located in either PDS or PDSE libraries. In doing so, you need to think ahead a bit about which format (PDS or PDSE) best fits your requirements, especially when it comes to concatenating multiple libraries (e.g a preprod environment for testing purposes).

Time virtualisation on linux

I'm attempting to test an application which has a heavy dependency on the time of day. I would like to have the ability to execute the program as if it was running in normal time (not accelerated) but on arbitrary date/time periods.
My first thought was to abstract the time retrieval function calls with my own library calls which would allow me to alter the behaviour for testing but I wondered whether it would be possible without adding conditional logic to my code base or building a test variant of the binary.
What I'm really looking for is some kind of localised time domain, is this possible with a container (like Docker) or using LD_PRELOAD to intercept the calls?
I also saw a patch that enabled time to be disconnected from the system time using unshare(COL_TIME) but it doesn't look like this got in.
It seems like a problem that must have be solved numerous times before, anyone willing to share their solution(s)?
Thanks
AJ
Whilst alternative solutions and tricks are great, I think you're severely overcomplicating a simple problem. It's completely common and acceptable to include certain command-line switches in a program for testing/evaluation purposes. I would simply include a command line switch like this that accepts an ISO timestamp:
./myprogram --debug-override-time=2014-01-01Z12:34:56
Then at startup, if set, subtract it from the current system time, and indeed make a local apptime() function which corrects the output of regular system for this, and call that everywhere in your code instead.
The big advantage of this is that anyone can reproduce your testing results, without a big readup on custom linux tricks, so also an external testing team or a future co-developer who's good at coding but not at runtime tricks. When (unit) testing, that's a major advantage to be able to just call your code with a simple switch and be able to test the results for equality to a sample set.
You don't even have to document it, lots of production tools in enterprise-grade products have hidden command line switches for this kind of behaviour that the 'general public' need not know about.
There are several ways to query the time on Linux. Read time(7); I know at least time(2), gettimeofday(2), clock_gettime(2).
So you could use LD_PRELOAD tricks to redefine each of these to e.g. substract from the seconds part (not the micro-second or nano-second part) a fixed amount of seconds, given e.g. by some environment variable. See this example as a starting point.

Organizing Scientific Data and Code - Experiments, Models, Simulation, Implementation

I am working on a robotics research project, and would like to know: Does anyone have suggestions for best practices when organizing scientific data and code? Does anyone know of existing scientific libraries with source that I could examine?
Here are the elements of our 'suite':
Experiments - Two types:
Gathering data from existing, 'natural' system.
Data from running behaviors on robotic system.
Models
Description of dnamical system - dynamics, kinematics, etc
Parameters for said system, some of which are derived from type 1 experiments
Simulation - trying to simulate natural behaviors, simulating behaviors on robots
Implementation - code for controlling the robots. Granted this is a large undertaking and has a large infrastructure of its own.
Some design aspects of our 'suite':
Would be good if simulation environment allowed for 'rapid prototyping' (scripts / interactive prompt for simple hacks, quick data inspection, etc - definitely something hard to incorporate) - Currently satisfied through scripting language (Python, MATLAB)
Multiple programming languages
Distributed, collaborative setup - Will be using Git
Unit tests have not yet been incorporated, but will hopefully be later on
Cross Platform (unfortunately) - I am used to Linux, but my team members use Windows, and some of our tools are wed to that platform
I saw this post, and the books look interesting and I have ordered "Writing Scientific Software", but I feel like it will focus primarily on the implementation of the simulation code and less on the overall organization.
The situation you describe is very similar to what we have in our surface dynamics lab.
Some of the work involves keeping measurements data which are analysed at real time, or saved for late analysis.Some other work, on the other hand, involves running simulations and analysing their results.
The data management scheme, which the lab leader picked up at Cambridge while studying there, is centred around a main server which holds the personal files of all lab members. Each member access the files from his work station by mounting the appropriate server folder using NFS. This has its merits and faults. It is easier to back up everything, but is problematic when processing large amounts of data over the net. For this reason i am an exception in the lab, since the simulation i work with generates a large amount of data. This data is saved on my work station, and only the code used to generate it (source code of the simulation and configuration files) are saved on the server.
I also keep my code in an online SVN service, since i can not log into to lab server from home. This is a mandatory practice, which stems from the need to be able to reproduce older results on demands and trace changes to the code if some obscure bug appears. Hence the need to maintain older versions and configuration files.
We also employ low tech methods, such as lab notebooks to record results, modifications, etc.
This content can sometimes be more abstract (no point describing every changed line in the code - you have diff for this. Just the purpose of the change, perhaps some notes about implementations and its date).
Work is done mostly with Matlab. Again i am an exception, as i prefer Python. I also use C for the data generating simulation. Testing are mostly of convergences, since my project now is concerned with comparing to computational models. I just generate results with different configurations, saved in their own respected folder (which i track in my lab logbook). This has the benefits of being able to control and interface the data exactly as i want to, instead of conforming to someone else's ideas and formats.

Resources