Object Files/Executables: What's the difference between a segment and a section? - exe

I am confused at to whether there is a difference between "segment" and "section" when referring to object files/executables.
According to https://en.wikipedia.org/wiki/Object_file:
Most object file formats are structured as separate sections of data, each section containing a certain type of data.
However, the article later goes on to talking about segments (e.g. code segment, data segment, etc).
Additionally, the PE file format (.exe/.dll/.coff in Windows) refers to these different parts as sections (https://msdn.microsoft.com/en-us/library/windows/desktop/ms680547(v=vs.85).aspx).
So my question: Is there a difference between the two or are they practically synonyms?

The terminology may depend on the specific object file format, but typically a section is a more fine-grained "chunk" of code or data than a segment, in the sense that a segment might consist of multiple sections.
For example, the PE/COFF standard document does not have a concept of segments -- only sections, whereas the ELF object format has both. In the case of ELF, segments in the object file are analogous to what is known as segments in context of a CPU or instruction set architecture, such as x86 -- that is, a segment is some contiguous partition of memory with a specific set of memory access rights (or similar) associated with it. The typical examples are executable "code segments" vs non-executable "data segments".
Sections, on the other hand, have more to do with how code or data are logically organized in an object file. For example, a table of exported symbols might be stored in a section separate from the data that is accessed by the application during its exection, although both are considered data.
If an object file format has a concept of both segments and sections, each section is typically fully contained within a single segment (at least that is the case with ELF).

Related

Data extraction from mainframe to excel

how to extract the data from mainframe into excel? Currently , I am fetching data from MS access but the requirements are for Mainframe.
Thanks in advance
First, please understand that saying "extract data from mainframe" is similar to saying "extract data from Intel." The following is not comprehensive but is intended to provide an idea of how to ask your question in a manner which can be meaningfully answered.
Please understand there is a big difference between...
what is technically possible
what is allowed in your shop
what is likely to provide a robust and maintainable solution given your requirements
These are three very different things. Some of us answering questions here on Stack Overflow have life experiences that make us reticent about answering questions regarding what is technically possible absent any mention of what is allowed in your shop or what the actual business requirement is that is being solved.
Mainframes have been around for over half a century, and many shops have standard solutions to technical problems. Sometimes the solution is "don't do that, and here's what we do instead." Working against the recommendations of your technical staff, or your shop standards, is career limiting.
What operating system?
z/OS is in common use on mainframes, but there do exist shops that still run one of its ancestors like MVS/XA. The mainframe operating system traces its roots back to OS/360 first available in 1965.
z/TPF
z/Linux usually runs on top of the z/VM hypervisor.
z/VSE
In what sort of file does the data reside?
QSAM or Queued Sequential Access Method, also commonly called flat files.
VSAM or Virtual Sequential Access Method. There are several different kinds of VSAM files including KSDS (Keyed Sequential Data Set) ESDS (Entry Sequenced Data Set), RRDS (Relative Record Data Set) and Linear (conceptually similar to a memory mapped file).
a DBMS like DB2 or IMS. A DBMS typically has extract facilities to allow writing a flat file from its own internal format. DB2, for example, stores data in Linear VSAM datasets.
Unix System Services files reside in a different file system than QSAM or VSAM. This will be more familiar, as it has a directory structure where the classic z/OS file system has none.
What does the data look like?
You must know the record layout of the data you wish to retrieve.
It is common for mainframe data to include both text and binary data in a single record, for example a name and a currency amount:
Hopper Grace ar%
...which would be...
x'C8969797859940404040C799818385404040404081996C'
...in hex. This is code page 37, commonly referred to as EBCDIC.
Without knowing that the family name is confined to the first 10 bytes, the given name confined the the subsequent 10 bytes, and the currency amount is in packed decimal (also known as binary coded decimal) in the next 3 bytes, you cannot accurately transfer the data because code page conversion will destroy the currency amount which is +819.96. Converting to code page 1250, commonly in use on Microsoft Windows, you would end up with...
x'486F707065722020202047726163652020202020617225'
...where the text data is translated but the packed data is destroyed. The packed data no longer has a valid sign in the last nibble (the lower half of the last byte) and the amount itself has been changed.
Security
Is the data you wish to access covered by privacy legislation? You may have to provide some evidence that whatever protections are in place to guarantee that only authorized personnel have access to this data on the mainframe are also in place once you have transferred it off of the mainframe. Such guarantees may have to satisfy an auditor.
What you need
You need to know what operating system holds your data, you need to know what type of file holds your data (a DBMS isn't a type of file but let's let that go for now), and you need to know your record layout(s).
Typically, the easy way to retrieve data is to extract it from its existing data store (QSAM, VSAM, DBMS) into a flat file where all the data is in a text format. There are mainframe utilities to accomplish this. In extreme cases, a program can be written to accomplish this goal. Once it has been accomplished, you can transfer your data without fear of destroying packed or binary data.
You may be able to read data directly from a DBMS if that's where your data resides, but this may depend on shop standards, including security.
Modern mainframes can transfer data via FTP, FTPS, and SFTP. Which is recommended in your shop is something to ask your technical staff.

How to create custom config files in OpenSMILE

I am trying to extract some features from an audio sample using OpenSMILE, but I'm realizing how difficult it is to set up a config file.
The documentation is not very helpful. The best I could do was run some of the sample config files that are provided, see what came out, and then go into the config file and try to determine where the feature was specified. Here's what I did:
I used the default feature set used from The INTERSPEECH 2010 Paralinguistic Challenge (IS10_paraling.conf).
I ran it over a sample audiofile.
I looked at what came out. Then I read the config file in depth, trying to find out where the feature was specified.
Here's a little markdown table showing the results of my exploration:
| Feature generated | instruction in the conf file |
|-------------------|---------------------------------------------------------|
| pcm_loudness | I see: 'loudness=1' |
| mfcc | I see a section: [mfcc:cMfcc] |
| lspFreq | no matches for the text 'lspFreq' anywhere |
| F0finEnv | I seeF0finalEnv = 1 under [pitchSmooth:cPitchSmoother] |
What I see, is 4 different features, all generated by a different instruction in the config file. Well, for one of them, there was no disconcernable instruction in the config file that I could find. With no pattern or intuitive syntax or apparent system, I have no idea how I can eventually figure out how to specify my own features I want to generate.
There are no tutorials, no YouTube videos, no StackOverflow question and no blog posts out there talking about how this could be done. Which is really surprising since this is obviously a huge part of using OpenSMILE.
If anyone finds this, please, can you advise me on how to create custom config files of OpenSMILE? Thanks!
thanks for your interest in openSMILE and your eagerness to build your own configuration files.
Most users in the scientific community actually use openSMILE for its pre-defined config files for the baseline feature sets, which in version 2.3 are even more flexible to use (more commandline options to output to different file formats etc.).
I admit that the documentation provided is not as good as it could be. However, openSMILE is a very complex piece of Software with a lot of functionality, of which only the most important parts are currently well documented.
The best starting point would be to read the openSMILE book and the SIG'MM tutorials all referenced at http://opensmile.audeering.com/ . It contains a section on how to write configuration files. The next important element is the online help of the binary:
SMILExtract -L lists the available components
SMILExtract -H cComponentName lists all options which a given component supports (and thus also features it can extract) with a short description for each
SMILExtract -configDflt cComponentName gives you a template configuration section for the component with all options listed and defaults set
Due to the architecture of openSMILE, which is centered on incremental processing of all audio features, there is (at least not yet) no easy syntax to define the features you want. Rather, you define the processing chain by adding components:
data sources will read in data (from audio files, csv files, or microphone, for example),
data processors will do signal processing and feature extraction in individual steps (windowing, window function, FFT, magnitudes, mel-spectrum, cepstral coefficients (MFCC), for example for extracting MFCC); for each step there is a data processor.
data sinks will write data to output files or send results to a server etc.
You connect the components via the "reader.dmLevel" and "writer.dmLevel" options. These define a name of a data memory level that the components use to exchange data. Only one component may write to one level, i.e. writer.dmLevel=levelName defines the level and may appear only once. Multiple components can read from this level by setting reader.dmLevel=levelName.
In each component you then set the options to enable computation of features and set parameters for this. To answer your question about lspFreq: This is probably enabled by default in the cLsp component, so you don't see an explicit option for it. For future versions of openSMILE the practice of setting all options explicitly will and should be followed more tightly.
The names of the features in the output will be automatically defined by the components. Often each component adds a part the the name, so you can infer from the name the full chain of processing. The options nameAppend and copyInputName (available to most data processors) control this behaviour, although some components might internally override them or change the behaviour a bit.
To see the names (and other info) for each data memory level, including e.g. which features a component in the configuration produces, you can set the option "printLevelStats=5" in the section of componentInstances:cComponentManager.
As everyhting in openSMILE is built for real-time incremental processing, each data memory level has a buffer, which by default is a ring buffer to keep memory footprint constant when the application runs for a longer time.
Sometimes you might want to summarise features over a window of a given length (e.g. with the cFunctionals component). In this case you must ensure that the buffer size of the input level to this component is large enough to hold the full window. You do this via the following options:
writer.levelconf.isRb = 1/0 : sets type of buffer to ringbuffer (1) or fixed size buffer
writer.levelconf.growDyn = 1/0 : sets the buffer to dynamically grow if more data is written to it (1)
writer.levelconf.nT = sets the size of the buffer in frames. Alternatively you can use bufferSizeSec=x to set the size size in seconds and convert to frames automatically.
In most cases the sizes will be set correctly automatically. Subsequent levels also inherit the configuration from the previous levels. Exceptions are when you set a cFunctionals component to read the full input (e.g. only produce one feature at the end of the file), the you must use growDyn=1 on the level that the functionals component reads from, or if you use a variable framing mode (see below).
The cFunctionals component provides frameMode, frameSize, and frameStep options. Where frameMode can be full* (one vector produced at end of input/file), **list (specify a list of frames), var (receive messages, e.g. from a cTurnDetector component, that define frames on-the-fly), or fix (fixed length window). Only in the case of fix the options frameSize set the size of this window, and frameStep the rate at which the window is shifted forward. In case of fix the buffer size of the input level is set correctly automatically, in the other cases you have to set it manually.
I hope this helps you to get started! With every new openSMILE release we at audEERING are trying to document things a bit better and unify things through various components.
We also welcome contributions from the community (e.g. anybody willing to write a graphical configuration file editor where you drag/drop components and connect them graphically? ;)) - although we know that more documentation will make this easier. Until then, you always have to source code to read ;)
Cheers,
Florian

Adding big junks of custom data to jpg image file

I wonder if there is an obvious and elegant way to add additional data to a jpeg while keeping it readable for standard image viewers. More precisely I would like to embed a picture of the backside of a (scanned) photo into it. Old photos often have personal messages written on the back, may it be the date or some notes. Sure you could use EXIF and add some text, but an actuall image of the back is more preferable.
Sure I could also save 2 files xyz.jpg and xyz_back.jpg, or arrange both images side by side, always visible in one picture, but that's not what I'm looking for.
It is possible and has been done, like on Samsung Note 2 and 3 you can add handwritten notes to the photos you've taken as a image. Or some smartphones allow to embed voice recordings to the image files while preserving the readability of those files on other devices.
There are two ways you can do this.
1) Use and Application Marker (APP0–APPF)—the preferred method
2) Use a Comment Marker (COM)
If you use an APPn marker:
1) Do not make it the first APPn in the file. Every known JPEG file format expects some kind of format specific APPn marker right after the SOI marker. Make sure that your marker is not there.
2) Place a unique application identifier (null terminated string) at the start of the data (something done by convention).
All kinds of applications store additional data this way.
One issue is that the length field is only 16-bits (Big Endian format). If you have a lot of data, you will have to split it across multiple markers.
If you use a COM marker, make sure it comes after the first APPn marker in the file. However, I would discourage using a COM marker for something like this as it might choke applications that try to display the contents.
An interesting question. There are file formats that support multiple images per file (multipage TIFF comes to mind) but JPEG doesn't support this natively.
One feature of the JPEG file format is the concept of APP segments. These are regions of the JPEG file that can contain arbitrary information (as a sequence of bytes). Exif is actually stored in one of these segments, and is identified by a preamble.
Take a look at this page: http://www.sno.phy.queensu.ca/~phil/exiftool/#JPEG
You'll see many segments there that start with APP such as APP0 (which can store JFIF data), APP1 (which can contain Exif) and so forth.
There's nothing stopping you storing data in one of these segments. Conformant JPEG readers will ignore this unrecognised data, but you could write software to store/retrieve data from within there. It's even possible to embed another JPEG file within such a segment! There's no precedent I know for doing this however.
Another option would be to include the second image as the thumbnail of the first. Normally thumbnails are very small, but you could store a second image as the thumbnail of the first. Some software might replace or remove this though.
In general I think using two files and a naming convention would be the simplest and least confusing, but you do have options.

How does text processing works?

Consider the whole novel (e.g. The Da Vinci Code).
How does e-book reader software process and output the whole book??
Does it put the WHOLE book in one very large string?? array of strings?? Or what??
One of the very first "real" programs I wrote (as part of a class excersise in high school) was a text editor. Part of the requirement for this excersise was for the program to be able to handle documents of arbitrary length (ie larger than the available system memory).
We achieved this by opening the file, but reading only the portion of it required to display the current page of data. When the user moves forward or backward in the file, we read that portion of the file and display it.
We can speed the program up by reading ahead to load pages which we anticipate that the user will want, and by retaining recently read pages in memory so that there is no obvious delay when the user moves forward or backward.
So basically, the answer to your question is: "No. with very large text files, it is unusual to load the whole thing into memory at once. A program that can handle files like that will load it in chunks as it needs to, and drop chunks it doesn't need any more."
Complex document formats (such as ebooks) may have lookup tables built into the file to allow the user to search or jump quickly to a given page or chapter. In this, they effectively work like a database.
I hope that helps.

Message Passing Arbitrary Object Graphs?

I'm looking to parallelize some code across a Beowulf cluster, such that the CPUs involved don't share address space. I want to parallelize a function call in the outer loop. The function calls do not have any "important" side effects (though they do use a random number generator, allocate memory, etc.).
I've looked at libs like MPI and the problem I see is that they seem to make it very non-trivial to pass complex object graphs between nodes. The input to my function is a this pointer that points to a very complex object graph. The return type of my function is another complex object graph.
At a language-agnostic level (I'm working in the D programming language, and I'm almost sure no canned solution is available here, but I'm willing to create one), is there a "typical" way that passing complex state across nodes is dealt with? Ideally, I want the details of how the state is copied to be completely abstracted away and for the calls to look almost like normal function calls. I don't care that copying this much state over a network isn't particularly efficient, as the level of parallelism in question is so coarse-grained that it probably won't matter.
Edit: If there is no easy way to pass complex state, then how is message passing typically used? It seems to me like anything involving copying data over a network requires coarse grained parallelism, yet coarse grained parallelism usually requires passing complex state so that a lot of work can be done in one work unit.
I do a fair bit of MPI programming but I don't know of any typical way of passing complex state (as you describe it) between processes. Here's how I've been thinking about your problem, it probably matches your own thinking ...
I surmise that your complex object graphs are represented, in memory, by blocks of data and pointers to other blocks of data -- a usual sort of implementation of a graph. How best can you move one of these COGs (to coin an abbreviation) from the address space of one process to the address space of another ? To the extent that a pointer is a memory address, a pointer in one address space is no use in another address space, so you will have to translate it into some neutral form for transport (I think ?).
To send a COG, therefore, it has to be put into some form from which the receiving process can build, in its own address space, a local version of the graph with the pointers pointing to local memory addresses. Do you ever write these COGs to file ? If so, you already have a form in which one could be transported. I hate to suggest it, but you could even use files to communicate between processes -- and that might be easier to handle than the combination of D and MPI. Your choice !
If you don't have a file form for the COGs can you easily represent them as adjacency matrices or lists ? In other words, work out your own representation for transport ?
I'll be very surprised (but pleased to learn) if you can pass a COG between processes without transforming it from pointer-based to some more static structure such as arrays or records.
Edit, in response to OP's edit. MPI does provide easy ways to pass complex state around, provided that the complex state is represented as values not pointers. You can pass complex state around in either the intrinsic or customised MPI datatypes; as one of the other answers shows you these are flexible and capable. If our program does not keep the complex state in a form that MPI custom datatypes can handle, you'll have to write functions to pack/unpack to a message-friendly representation. If you can do that, then your message calls will look (for most purposes) like function calls.
As to the issues surrounding complex state and the graininess of parallelism, I'm not sure I quite follow you. We (include yourself in this sweeping generalisation if you want, or not) typically resort to MPI programming because we can't get enough performance out of a single processor, we know that we'll pay a penalty in terms of computation delayed by waiting for communication, we work hard to minimise that penalty, but in the end we accept the penalty as the cost of parallelisation. Certainly some jobs are too small or too short to benefit from parallelisation, but a lot of what we (parallel computationalists that is) do is just too big and too long-running to avoid parallelisation
You can do marvelous things with custom MPI datatypes. I'm currently working on a project where several MPI processes are tracking particles in a piece of virtual space, and when particles cross over from one process' territory into another one's, their data (position/speed/size/etc) has to be sent over the network.
The way I achieved this is the following:
1) All processes share an MPI Struct datatype for a single particle that contains all its relevant attributes, and their displacement in memory compared to the base address of the particle object.
2) On sending, the process iterates over whatever data structure it stores the particles in, notes down the memory address of each one that needs to be sent, and then builds a Hindexed datatype where each block is 1 long (of the above mentioned particle datatype) and starts at the memory addresses previously noted down. Sending 1 object of the resulting type will send all the necessary data over the network, in a type safe manner.
3) On the receiving end, things are slightly trickier. The receiving process first inserts "blank" particles into its own data structure: "blank" means that all the attributes that will be received from the other process are initialized to some default value. The memory addresses of the freshly inserted particles are noted down, and a datatype similar to that of the sender is created from these addresses. Receiving the sender's message as a single object of this type will automatically unpack all the data into all the right places, again, in a type safe manner.
This example is simpler in the sense that there are no relationships between particles (as there would be between nodes of a graph), but you could transmit that data in a similar way.
If the above description is not clear, I can post the C++ code that implements it.
I'm not sure I understand the question correctly so forgive me if my answer is off. From what I understand you want to send non-POD datatypes using MPI.
A library that can do this is Boost.MPI. It uses a serialization library to send even very complex data structures. There is a catch though: you will have to provide code to serialize the data yourself if you use complicated structures that Boost.Serialize does not already know about.
I believe message passing is typically used to transmit POD datatypes.
I'm not allowed to post more links so here is what I wanted to include:
Explanation of POD: www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html
Serialization Library: www.boost.org/libs/serialization/doc
it depends on organization of your data. If you use pointers or automatic memory inside your objects, it will be difficult. If you can organize your objects to be contiguous in memory, you have two choices: send memory as bytes,cast it back to object type on the receiver or define mpi derived type for your object. If however you use inheritance, things will become complicated due to how objects are laid out in memory.
I do not know your problem, but maybe can take a look at ARMCI if you manage memory manually.

Resources