I wonder if there is an obvious and elegant way to add additional data to a jpeg while keeping it readable for standard image viewers. More precisely I would like to embed a picture of the backside of a (scanned) photo into it. Old photos often have personal messages written on the back, may it be the date or some notes. Sure you could use EXIF and add some text, but an actuall image of the back is more preferable.
Sure I could also save 2 files xyz.jpg and xyz_back.jpg, or arrange both images side by side, always visible in one picture, but that's not what I'm looking for.
It is possible and has been done, like on Samsung Note 2 and 3 you can add handwritten notes to the photos you've taken as a image. Or some smartphones allow to embed voice recordings to the image files while preserving the readability of those files on other devices.
There are two ways you can do this.
1) Use and Application Marker (APP0–APPF)—the preferred method
2) Use a Comment Marker (COM)
If you use an APPn marker:
1) Do not make it the first APPn in the file. Every known JPEG file format expects some kind of format specific APPn marker right after the SOI marker. Make sure that your marker is not there.
2) Place a unique application identifier (null terminated string) at the start of the data (something done by convention).
All kinds of applications store additional data this way.
One issue is that the length field is only 16-bits (Big Endian format). If you have a lot of data, you will have to split it across multiple markers.
If you use a COM marker, make sure it comes after the first APPn marker in the file. However, I would discourage using a COM marker for something like this as it might choke applications that try to display the contents.
An interesting question. There are file formats that support multiple images per file (multipage TIFF comes to mind) but JPEG doesn't support this natively.
One feature of the JPEG file format is the concept of APP segments. These are regions of the JPEG file that can contain arbitrary information (as a sequence of bytes). Exif is actually stored in one of these segments, and is identified by a preamble.
Take a look at this page: http://www.sno.phy.queensu.ca/~phil/exiftool/#JPEG
You'll see many segments there that start with APP such as APP0 (which can store JFIF data), APP1 (which can contain Exif) and so forth.
There's nothing stopping you storing data in one of these segments. Conformant JPEG readers will ignore this unrecognised data, but you could write software to store/retrieve data from within there. It's even possible to embed another JPEG file within such a segment! There's no precedent I know for doing this however.
Another option would be to include the second image as the thumbnail of the first. Normally thumbnails are very small, but you could store a second image as the thumbnail of the first. Some software might replace or remove this though.
In general I think using two files and a naming convention would be the simplest and least confusing, but you do have options.
Related
i would like to implement a lambda in aws which receives as input pixel coordinates (x/y), retrieve that pixel's RGB from one image, and then do something with it.
the catch now is that the image is very large: 21600x10800 pixels (a 684MB tif file).
Many of the image's pixels will likely never be accessed (its a world map so it includes e.g. oceans, for which no lambda calls will happen. But i don't know which pixels will be needed.)
The result of the lambda will be persisted so that the image operation is only done once per pixel.
My main concern is that i would like to avoid large unnecessary processing time and costs. I expect multiple calls per second of the lambda. The naive way would be to throw the image into an s3 bucket, then read it in the lambda to get one pixel - but i would think that then each lambda invoke would become very heavy. I could do some custom solution such as storing the rows separately but was wondering if there is some set of technologies that handles it more elegant.
Right now i am using Node.js 14.x but that's not a strong requirement.
the image is in tif format but i could convert it to another image format beforehand if needed. (just not to the answer of the lambda as that is even bigger)
How can i efficiently design this lambda?
As I said in the comments, I think Lambda is the wrong solution unless your traffic is very bursty. If you have continuous traffic with "multiple calls per second," it will be more cost-effective to use an alternate technology, such as EC2 or ECS. And these give you far more control over storage.
However, if you're set on using Lambda, then I think the best solution is to store the file on an EFS volume, then mount that filesystem onto your Lambda. In my experiments, it takes roughly 150 ms for a Lambda to read an arbitrary byte from a file on EFS, using Python and the mmap package.
Of course, if your TIFF library attempts to read the file into memory before performing any operations, this is moot. The TIFF format is designed so that shouldn't be necessary, but some libraries take the easy way out (because in most cases you're displaying or transforming the entire image). You may need to pre-process your file, to produce a raw byte format, in order to be able to make single-pixel gets.
Thanks everyone for the useful information!
so after some testing i settled for the solution from luk2302's comment with 100x100 pixel sub-images hosted on s3, but can't flag a comment as the solution. My tests showed that the lambda operates within 110ms to access a pixel (from the now only 4kb large files) which i think is quite sufficient for my expectations. (the very first request was at 1s time, but now even requests to sub-images which have never been touched before are answered within 110ms)
Parsifal's solution is what i originally envisioned to be ideal in order to really only check the relevant data (open question then being which image library actually omits loading the entire file) but i don't have the means to check the file system aspect more closely if that has more potential. In my case indeed the requests are very much burst driven (with long periods of expected inactivity after the bursts), so for the moment i will remain with a lambda but will keep the mentioned alternatives in mind.
I am trying to extract some features from an audio sample using OpenSMILE, but I'm realizing how difficult it is to set up a config file.
The documentation is not very helpful. The best I could do was run some of the sample config files that are provided, see what came out, and then go into the config file and try to determine where the feature was specified. Here's what I did:
I used the default feature set used from The INTERSPEECH 2010 Paralinguistic Challenge (IS10_paraling.conf).
I ran it over a sample audiofile.
I looked at what came out. Then I read the config file in depth, trying to find out where the feature was specified.
Here's a little markdown table showing the results of my exploration:
| Feature generated | instruction in the conf file |
|-------------------|---------------------------------------------------------|
| pcm_loudness | I see: 'loudness=1' |
| mfcc | I see a section: [mfcc:cMfcc] |
| lspFreq | no matches for the text 'lspFreq' anywhere |
| F0finEnv | I seeF0finalEnv = 1 under [pitchSmooth:cPitchSmoother] |
What I see, is 4 different features, all generated by a different instruction in the config file. Well, for one of them, there was no disconcernable instruction in the config file that I could find. With no pattern or intuitive syntax or apparent system, I have no idea how I can eventually figure out how to specify my own features I want to generate.
There are no tutorials, no YouTube videos, no StackOverflow question and no blog posts out there talking about how this could be done. Which is really surprising since this is obviously a huge part of using OpenSMILE.
If anyone finds this, please, can you advise me on how to create custom config files of OpenSMILE? Thanks!
thanks for your interest in openSMILE and your eagerness to build your own configuration files.
Most users in the scientific community actually use openSMILE for its pre-defined config files for the baseline feature sets, which in version 2.3 are even more flexible to use (more commandline options to output to different file formats etc.).
I admit that the documentation provided is not as good as it could be. However, openSMILE is a very complex piece of Software with a lot of functionality, of which only the most important parts are currently well documented.
The best starting point would be to read the openSMILE book and the SIG'MM tutorials all referenced at http://opensmile.audeering.com/ . It contains a section on how to write configuration files. The next important element is the online help of the binary:
SMILExtract -L lists the available components
SMILExtract -H cComponentName lists all options which a given component supports (and thus also features it can extract) with a short description for each
SMILExtract -configDflt cComponentName gives you a template configuration section for the component with all options listed and defaults set
Due to the architecture of openSMILE, which is centered on incremental processing of all audio features, there is (at least not yet) no easy syntax to define the features you want. Rather, you define the processing chain by adding components:
data sources will read in data (from audio files, csv files, or microphone, for example),
data processors will do signal processing and feature extraction in individual steps (windowing, window function, FFT, magnitudes, mel-spectrum, cepstral coefficients (MFCC), for example for extracting MFCC); for each step there is a data processor.
data sinks will write data to output files or send results to a server etc.
You connect the components via the "reader.dmLevel" and "writer.dmLevel" options. These define a name of a data memory level that the components use to exchange data. Only one component may write to one level, i.e. writer.dmLevel=levelName defines the level and may appear only once. Multiple components can read from this level by setting reader.dmLevel=levelName.
In each component you then set the options to enable computation of features and set parameters for this. To answer your question about lspFreq: This is probably enabled by default in the cLsp component, so you don't see an explicit option for it. For future versions of openSMILE the practice of setting all options explicitly will and should be followed more tightly.
The names of the features in the output will be automatically defined by the components. Often each component adds a part the the name, so you can infer from the name the full chain of processing. The options nameAppend and copyInputName (available to most data processors) control this behaviour, although some components might internally override them or change the behaviour a bit.
To see the names (and other info) for each data memory level, including e.g. which features a component in the configuration produces, you can set the option "printLevelStats=5" in the section of componentInstances:cComponentManager.
As everyhting in openSMILE is built for real-time incremental processing, each data memory level has a buffer, which by default is a ring buffer to keep memory footprint constant when the application runs for a longer time.
Sometimes you might want to summarise features over a window of a given length (e.g. with the cFunctionals component). In this case you must ensure that the buffer size of the input level to this component is large enough to hold the full window. You do this via the following options:
writer.levelconf.isRb = 1/0 : sets type of buffer to ringbuffer (1) or fixed size buffer
writer.levelconf.growDyn = 1/0 : sets the buffer to dynamically grow if more data is written to it (1)
writer.levelconf.nT = sets the size of the buffer in frames. Alternatively you can use bufferSizeSec=x to set the size size in seconds and convert to frames automatically.
In most cases the sizes will be set correctly automatically. Subsequent levels also inherit the configuration from the previous levels. Exceptions are when you set a cFunctionals component to read the full input (e.g. only produce one feature at the end of the file), the you must use growDyn=1 on the level that the functionals component reads from, or if you use a variable framing mode (see below).
The cFunctionals component provides frameMode, frameSize, and frameStep options. Where frameMode can be full* (one vector produced at end of input/file), **list (specify a list of frames), var (receive messages, e.g. from a cTurnDetector component, that define frames on-the-fly), or fix (fixed length window). Only in the case of fix the options frameSize set the size of this window, and frameStep the rate at which the window is shifted forward. In case of fix the buffer size of the input level is set correctly automatically, in the other cases you have to set it manually.
I hope this helps you to get started! With every new openSMILE release we at audEERING are trying to document things a bit better and unify things through various components.
We also welcome contributions from the community (e.g. anybody willing to write a graphical configuration file editor where you drag/drop components and connect them graphically? ;)) - although we know that more documentation will make this easier. Until then, you always have to source code to read ;)
Cheers,
Florian
Consider the whole novel (e.g. The Da Vinci Code).
How does e-book reader software process and output the whole book??
Does it put the WHOLE book in one very large string?? array of strings?? Or what??
One of the very first "real" programs I wrote (as part of a class excersise in high school) was a text editor. Part of the requirement for this excersise was for the program to be able to handle documents of arbitrary length (ie larger than the available system memory).
We achieved this by opening the file, but reading only the portion of it required to display the current page of data. When the user moves forward or backward in the file, we read that portion of the file and display it.
We can speed the program up by reading ahead to load pages which we anticipate that the user will want, and by retaining recently read pages in memory so that there is no obvious delay when the user moves forward or backward.
So basically, the answer to your question is: "No. with very large text files, it is unusual to load the whole thing into memory at once. A program that can handle files like that will load it in chunks as it needs to, and drop chunks it doesn't need any more."
Complex document formats (such as ebooks) may have lookup tables built into the file to allow the user to search or jump quickly to a given page or chapter. In this, they effectively work like a database.
I hope that helps.
I want to create an application to read and write DICOM files without using any third party software
How can I do that?
Can any one help me?
"I my project, I need only to update pixel data. So it was not too tough to handle. I just parsde the DICOM file till I reach pixel data, and then I replaced the same with my own data. and It become success."
Even though there are quite a few research applications that do the same thing that you've done, it is precisely The Wrong Thing To Do(TM). Why is this such a bad practice? DICOM images are supposed to be uniquely identified by their SOP Instance UIDs. When you take an existing DICOM image and replace the pixel data, leaving the original header information unaltered, you are creating two data objects that share the same primary key.
Consider what will happen if you take this image and send it to a DICOM Storage SCP that already has a copy of the original image. The Storage SCP has to invoke a conflict resolution procedure because it can't have two SOP Instances with the same UID. Upon receipt of your new image, the Storage SCP detects that the new image has the same UID as an existing image and the required behavior of the SCP is not well defined. The Storage SCP can treat your new image as if it is just a retransmission of the original image and ignore your new image, or it can treat it as if it is a corrected version of the original image and replace the original image with your new image, or it can give up and admit that it has absolutely no idea what to do with this new image and throw it into a holding area and require a human being to interact with the application to decide what to do with the two images. You, the creator of the new image, have no way of knowing or controlling what the behavior of the Storage SCP will be when it receives your new image.
At a minimum, you need to generate a new valid SOP Instance UID when you create a new image. Your image type should also be one of the DERIVED\SECONDARY types because it is a post-processed image, not a primary acquisition generated by the modality. You should also look at the other DICOM tags present in the original header and seriously consider whether they accurately describe the new image that you've created.
That would pretty much mean starting from the DICOM standard and writing a lot of code.
Scenario: Several people go on holiday together, armed with digital cameras, and snap away. Some people remembered to adjust their camera clocks to local time, some left them at their home time, some left them at local time of the country they were born in, and some left their cameras on factory time.
The Problem: Timestamps in the EXIF metadata of photos will not be synchronised, making it difficult to aggregate all the photos into one combined collection.
The Question: Assuming that you have discovered the deltas between all of the camera clocks, What is the simplest way to correct these timestamp differences in Windows Vista?
use exiftool. open source, written in perl, but also available as standalone .exe file. author seems to have though of everything exif related. mature code.
examples:
exiftool "-DateTimeOriginal+=5:10:2 10:48:0" DIR
exiftool -AllDates-=1 DIR
refs:
http://www.sno.phy.queensu.ca/~phil/exiftool/
http://www.sno.phy.queensu.ca/~phil/exiftool/#shift
Windows Live Photo Gallery Wave 3 Beta includes this feature. From the help:
If you change the date and time
settings for more than one photo at
the same time, each photo's time stamp
is changed by the same amount, so that
the time stamps of all the selected
photos remain in their original
chronological order.
Instructions:
Select Photos to change (you can use the search feature to limit by camera model, etc).
Right-Click and select 'Change Time Taken...'.
Select a new time and click OK.
Current download location is from LiveSide.net.
Easiest, probably a small python script that will use something like os.walk to go through all the files below a folder and then use pyexiv2 to actually read and then modify the EXIF data. A tutorial on pyexiv2 can be found here.
I'd dare to advice my software for this purpose: EXIFTimeEdit. Open-source and simple, it supports all the possible variants I could imagine:
Shifting date part (year/month/day/hour/minute) by any value
Setting date part to any value
Determining necessary shift value
Copying resulting timestamp to EXIF DateTime field and last modified property