NodeJS Sharp Provides Incorrect Quality For JPEG Images - node.js

I am working with sharp node package to calculate the quality of an image.
I couldn't find any API in the package which would calculate the quality.
So I came up with an implementation that follows following steps -
Accept incoming image buffer & create sharp instance
Create another instance from this instance by setting quality to 100
Compare the size of original sharp instance and new sharp instance
If there is a difference in the size, update the quality and execute step 2 with updated quality
Return the quality once comparison in step 4 gives smallest difference
I tested this approach by using an image of known quality i.e. 50 (confirmed)
EDIT - I generated images with different quality values using Photoshop
However, the above logic returns the quality as 82 (expected is something close to 50)
Problem
So the problem is I am not able to figure out the quality of image.
It is fine if the above logic returns a closest value such as 49 or 51.
However the result is totally different.
Results
As per this logic, I get following results for a given quality -
Actual Quality 50 - Result 82
Actual Quality 60 - Result 90
Actual Quality 70 - Result 90
Actual Quality 80 - Result 94
Actual Quality 90 - Result 98
Actual Quality 100 - Result 98
Code
The following code snippet is used to calculate the quality.
I do understand that it is needs improvements for precise results.
But it should at least provide close values.
async function getJpegQuality(image: sharp.Sharp, min_quality: number, max_quality: number): Promise<number> {
if (Math.abs(max_quality - min_quality) <= 1) {
return max_quality;
}
const updated_image: sharp.Sharp = sharp(await image.jpeg({ quality: max_quality }).toBuffer());
const [metadata, updated_metadata]: sharp.Metadata[] = await Promise.all([image.metadata(), updated_image.metadata()]);
// update quality as per size comparison
if (metadata.size > updated_metadata.size) {
const temp_max = Math.round(max_quality);
max_quality = Math.round((max_quality * 2) - min_quality);
min_quality = Math.round(temp_max);
} else {
max_quality = Math.round((min_quality + max_quality) / 2);
min_quality = Math.round((min_quality + max_quality) / 2);
}
// recursion
return await getJpegQuality(image, min_quality, max_quality);
}
Usage
const image: sharp.Sharp = sharp(file.originalImage.buffer);
const quality = await getJpegQuality(image, 1, 100);
console.log(quality);
Thanks!

Related

Converting video to images using OpenCV library problem

I have this code which converts .mov videos to images with the specified frequency (e.g. every 30 seconds here).
import cv2
# the path to take the video from
vidcap = cv2.VideoCapture(r"C:\Users\me\Camera_videos\Images_30sec\PIR-206_7.MOV")
def getFrame(sec):
vidcap.set(cv2.CAP_PROP_POS_MSEC,sec*1000)
hasFrames,image = vidcap.read()
if hasFrames:
cv2.imwrite("image"+str(count)+".jpg", image) # save frame as JPG file
return hasFrames
sec = 0
frameRate = 30 #//it will capture image every 30 seconds
count= 1
success = getFrame(sec)
while success:
count = count + 1
sec = sec + frameRate
sec = round(sec, 2)
success = getFrame(sec)
I have no problem with smaller files. A 5min long .mov file for example, produces 11 images as expected (5 x 60 seconds / 30 seconds = about 10 images with the first image taken at 0 seconds).
However, when I tried a bigger file, which is 483 MB and is about 32mins long, I have encountered a problem.
It is expected to generate some 32 x 60/30 = 64 images.
However, it runs and runs generating some 40'000 images until I manually stop the program. it seems to be stuck at one of the last images??
I have uploaded both .mov files to my google drive, if anyone wants to have a look.
small file
https://drive.google.com/file/d/1guKtLgM-vwt-5fG3_suJrhVbtwMSjMQe/view?usp=sharing
large file
https://drive.google.com/file/d/1V_HVRM29qwlsU0vCyWiOuBP-tkjdokul/view?usp=sharing
Can somebody advise on what's going on here?

Understanding BLE characteristic values for cycle power measurement 0x2A63

I am currently using Dart/Flutter BLE plugin to better understand BLE devices.
Plugin:
https://pub.dartlang.org/packages/flutter_blue
When I connect to my virtual cycle trainer I select the 0x1818 service and then I subscribe to the 0x2A63 characteristic for Cycle Power Measurement.
I am struggling to align the response list I get with the GATT documentation for this service/characteristics below. There is 18 values in this list, however there is only 17 in the GATTS list. Also the values don't seem to make any sense.
I also tried to convert the first two values '52','24' to a 16 bit binary to see if that aligns with the flags for the first field, but the result was the below which again makes no sense.
0x3418 = 11010000011000
https://www.bluetooth.com/specifications/gatt/viewer?attributeXmlFile=org.bluetooth.characteristic.cycling_power_measurement.xml
This screenshot is when I first connect to the trainer.
This screenshot is when I am cycling lightly on the bike
This screenshot is when I stop cycling but the pedals and wheel are still turning.
The cycle trainer is the Cycleops Magnus, which doesn't have the Cycle Speed Cadence service 1816, but can provide virtual speed based on power.
My Question is this:
Which of the values in the list corresponding with the GATTS
characteristics and bonus question is, how would I infer speed or
cadence from the values in this service?
Based on section 3.55 of the Bluetooth GATT specs:
DEC - [52,24,40,0,58,29,59,0,0,0,107,136,23, 0,214, 81, 1,0]
BIT - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Flag field = 24,52 (bit0 and bit1)
2452 = 00001001 10010100
section 3.55.2.1
the corresponding (1) equates to
- bit2 = Accumulated Torque Present
- bit4 = Wheel Revolution Data Present
- bit7 = Extreme Torque Magnitudes Present
- bit8 = Extreme Angles Present
- bit11 = Accumulated Energy Present
Then from section 3.55.2, you go down the list of bits based on the flags:
Instant Power is bits2 (40) and bit3 (0)
(Dec) 0040 == 00000000 00101000 == 40w
to decipher the rest of the bits, we then have to refer to the flags field since the remaining bits after the flags field and instant power have to depend on what the flags field says that the trainer is supporting.
Based on bit2 of the flags field which says that "Accumulated Torque Present" (
Present if bit 2 of Flags field set to 1) Hence the next 2 bits represents Accumulated Torque
Dec (2958)
The next data would then be based on bit4 of the flags field - Wheel Rev Data Present (Present if bit 4 of Flags field set to 1). This is wheel speed which would translate into speed once you taken into account wheel circumference. For Wheel Rev Data, this is represented by the next 6 bits.
Cumulative Wheel Revolutions - 4 bits
Last Wheel Event Time - 2 bits
like you mentioned, this trainer does not offer cadence service, hence that's why you do not see the flags field (bit5) to be 1. Hence you cannot infer cadence from this data set.
For Wheel speed, you would then decode the data from the 6 bits based on Cum Wheel Rev and Last Wheel Event Time. I can't offer you code on how to decode the 6 bits as you're using flutter and I've no experience in flutter language. (I do Swift) but can likely take a look at this code from GoldenCheetah and convert accordingly.
BT40Device::getWheelRpm(QDataStream& ds)
{
quint32 wheelrevs;
quint16 wheeltime;
ds >> wheelrevs;
ds >> wheeltime;
double rpm = 0.0;
if(!prevWheelStaleness) {
quint16 time = wheeltime - prevWheelTime;
quint32 revs = wheelrevs - prevWheelRevs;
// Power sensor uses 1/2048 second time base and CSC sensor 1/1024
if (time) rpm = (has_power ? 2048 : 1024)*60*revs / double(time);
}
else prevWheelStaleness = false;
prevWheelRevs = wheelrevs;
prevWheelTime = wheeltime;
dynamic_cast<BT40Controller*>(parent)->setWheelRpm(rpm);
}

Overwrite GPS coordinates in Image Exif using Python 3.6

I am trying to transform image geotags so that images and ground control points lie in the same coordinate system inside my software (Pix4D mapper).
The answer here says:
Exif data is standardized, and GPS data must be encoded using
geographical coordinates (minutes, seconds, etc) described above
instead of a fraction. Unless it's encoded in that format in the exif
tag, it won't stick.
Here is my code:
import os, piexif, pyproj
from PIL import Image
img = Image.open(os.path.join(dirPath,fn))
exif_dict = piexif.load(img.info['exif'])
breite = exif_dict['GPS'][piexif.GPSIFD.GPSLatitude]
lange = exif_dict['GPS'][piexif.GPSIFD.GPSLongitude]
breite = breite[0][0] / breite[0][1] + breite[1][0] / (breite[1][1] * 60) + breite[2][0] / (breite[2][1] * 3600)
lange = lange[0][0] / lange[0][1] + lange[1][0] / (lange[1][1] * 60) + lange[2][0] / (lange[2][1] * 3600)
print(breite) #48.81368778730952
print(lange) #9.954511162420633
x, y = pyproj.transform(wgs84, gk3, lange, breite) #from WGS84 to GaussKrüger zone 3
print(x) #3570178.732528623
print(y) #5408908.20172699
exif_dict['GPS'][piexif.GPSIFD.GPSLatitude] = [ ( (int)(round(y,6) * 1000000), 1000000 ), (0, 1), (0, 1) ]
exif_bytes = piexif.dump(exif_dict) #error here
img.save(os.path.join(outPath,fn), "jpeg", exif=exif_bytes)
I am getting struct.error: argument out of range in the dump method. The original GPSInfo tag looks like: {0: b'\x02\x03\x00\x00', 1: 'N', 2: ((48, 1), (48, 1), (3449322402, 70000000)), 3: 'E', 4: ((9, 1), (57, 1), (1136812930, 70000000)), 5: b'\x00', 6: (3659, 10)}
I am guessing I have to offset the values and encode them properly before writing, but have no idea what is to be done.
It looks like you are already using PIL and Python 3.x, not sure if you want to continue using piexif but either way, you may find it easier to convert the degrees, minutes, and seconds into decimal first. It looks like you are trying to do that already but putting it in a separate function may be clearer and account for direction reference.
Here's an example:
def get_decimal_from_dms(dms, ref):
degrees = dms[0][0] / dms[0][1]
minutes = dms[1][0] / dms[1][1] / 60.0
seconds = dms[2][0] / dms[2][1] / 3600.0
if ref in ['S', 'W']:
degrees = -degrees
minutes = -minutes
seconds = -seconds
return round(degrees + minutes + seconds, 5)
def get_coordinates(geotags):
lat = get_decimal_from_dms(geotags['GPSLatitude'], geotags['GPSLatitudeRef'])
lon = get_decimal_from_dms(geotags['GPSLongitude'], geotags['GPSLongitudeRef'])
return (lat,lon)
The geotags in this example is a dictionary with the GPSTAGS as keys instead of the numeric codes for readability. You can find more detail and the complete example from this blog post: Getting Started with Geocoding Exif Image Metadata in Python 3
After much hemming & hawing I reached the pages of py3exiv2 image metadata manipulation library. One will find exhaustive lists of the metadata tags as one reads through but here is the list of EXIF tags just to save few clicks.
It runs smoothly on Linux and provides many opportunities to edit image-headers. The documentation is also quite clear. I recommend this as a solution and am interested to know if it solves everyone else's problems as well.

TFRecord format for multiple instances of the same or different classes on one training image

I am trying to train a Faster R-CNN on grocery dataset detection using the new Object Detection API, but I do not quite understand the process of creating a TFRecord file for that. I am aware of the Oxford and VOC dataset examples and the scripts to create TFRecord files, and they work fine if there is only one object in a training image , which is what I see in all of the official examples and github's projects. I have images where more than 20 objects are defined and By the way objects have different classes. I don't want to iterate 20+ times per one image and create 20 almost the same tf_examples where only img_encoded that will be 20+ will take all my space.
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_image_data),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
I believe that the answer for my question in the field that during creating tf_records xmin, xmax, ymin, ymax, classes_text, and classes should all be lists with one value per bounding box, so I can add different objects and parameters into these lists per one image.
Maybe someone has experience and can help with advice. The way I've described is going to work or not and if not is there any ways to create tf_recrds for multiple objects in one image in delicate and simple way?
I just put here some features(not all of them) for creating tfrecords the way I think has to work because of what is said in comments(List of ... (1 per box)) in link I attached. Hope idea is clean from the attached json.
To clean some situation : xmin for example has 4 different normalized xmins [0.4056372549019608, 0.47794117647058826, 0.4840686274509804, 0.4877450980392157] for 4 different bboxes in attached feature example . Don't forget that lists were converted using dataset_util.float_list_feature method into serializable json format. c
features {
feature {
key: "image/filename"
value {
bytes_list {
value: "C4_P06_N1_S4_1.JPG"
}
}
}
feature {
key: "image/format"
value {
bytes_list {
value: "jpeg"
}
}
}
feature {
key: "image/height"
value {
int64_list {
value: 2112
}
}
}
feature {
key: "image/key/sha256"
value {
bytes_list {
value: "4e0b458e4537f87d72878af4201c55b0555f10a0e90decbd397fd60476e6e973"
}
}
}
feature {
key: "image/object/bbox/xmax"
value {
float_list {
value: 0.43323863636363635
value: 0.4403409090909091
value: 0.46448863636363635
value: 0.5085227272727273
}
}
}
feature {
key: "image/object/bbox/xmin"
value {
float_list {
value: 0.3565340909090909
value: 0.36363636363636365
value: 0.39204545454545453
value: 0.4318181818181818
}
}
}
feature {
key: "image/object/bbox/ymax"
value {
float_list {
value: 0.9943181818181818
value: 0.7708333333333334
value: 0.20265151515151514
value: 0.9943181818181818
}
}
}
feature {
key: "image/object/bbox/ymin"
value {
float_list {
value: 0.8712121212121212
value: 0.6174242424242424
value: 0.06818181818181818
value: 0.8712121212121212
}
}
}
feature {
key: "image/object/class/label"
value {
int64_list {
value: 1
value: 0
value: 3
value: 0
}
}
}
}
I kinda did what I thought have to help but I got these numbers during training and that's abnormal.
INFO:tensorflow:global step 204: loss = 1.4067 (1.177 sec/step)
INFO:tensorflow:global step 205: loss = 1.0570 (1.684 sec/step)
INFO:tensorflow:global step 206: loss = 1.0229 (0.916 sec/step)
INFO:tensorflow:global step 207: loss = 80484784668672.0000 (0.587 sec/step)
INFO:tensorflow:global step 208: loss = 981436265922560.0000 (0.560 sec/step)
INFO:tensorflow:global step 209: loss = 303916113723392.0000 (0.539 sec/step)
INFO:tensorflow:global step 210: loss = 4743170218786816.0000 (0.613 sec/step)
INFO:tensorflow:global step 211: loss = 2933532187951104.0000 (0.518 sec/step)
INFO:tensorflow:global step 212: loss = 1.8134 (1.513 sec/step)
INFO:tensorflow:global step 213: loss = 73507901414572032.0000 (0.553 sec/step)
INFO:tensorflow:global step 214: loss = 650799901688463360.0000 (0.622 sec/step)
P.S additional information: for normal view where 1 image has 1 object class from this dataset all works fine.
You are correct in that xmin, xmax, ymin, ymax, classes_text, and classes are all lists with one value per bounding box. There is no need to duplicate the image for each bounding box; it would indeed take up a lot of disk space. As #gautam-mistry pointed out, the records are streamed into tensorflow; as long as each image will fit into RAM you should be okay, even if you duplicated the images (so long as you have the disk space).
TFRecords file represents a sequence of (binary) strings. The format is not random access, so it is suitable for streaming large amounts of data but not suitable if fast sharding or other non-sequential access is desired.
tf.python_io.TFRecordWritertf.python_io.tf_record_iteratortf.python_io.TFRecordCompressionTypetf.python_io.TFRecordOptions
I found what was the problem --> I had a mistake in my protobuf class file. Different type of classes related to the one number of class. For example:
item {
id: 1
name: 'raccoon'
}
item {
id: 1
name: 'lion'
}
And so on, but because I had around 50 classes only in some step loss is went tremendously hight. Maybe it'll help someone, be cautious with proto txt :)

AVPlayer rate property is inaccurate

Any idea why the playback tempo of audio files via an AVQueuePlayer is not an accurate product of the original audio file's tempo and the AVPlayers rate property? e.g. the original tempo is 100 b.p.m. and I set the rate to 0.7, expecting an output of audio playing at 70 b.p.m. But what I in fact get is a tempo around 65... (please excuse the inelegant code)
let assetQueue = [aVItem1, aVItem2, aVItem3, aVItem4, aVItem5, aVItem6, aVItem7, aVItem8, aVItem9, aVItem0]
var itemQueue: [AVPlayerItem] = []
for index in 0...9{
let nextItem: AVPlayerItem = AVPlayerItem(asset: assetQueue[index])
itemQueue.append(nextItem)
}
player = AVQueuePlayer(items: itemQueue)
player.play()
player.rate = 0.7
It plays perfectly at 100 b.p.m. when player.rate = 1.0
I need this to play accurately at all integer tempo values from 70 to 140 b.p.m. as it needs to synchronise with a tempo controlled UI element (the tempo of which is triggered using an NSTimer). Or is there maybe a simpler way to achieve this (perhaps with the setRate() method)?
Any assistance would be much appreciated :)

Resources