Using Lame function hip_decode in Android NDK to decode mp3 return 0 - android-ndk

I am using Lame's mpglib to decode mp3 to PCM in Android NDK for playing. But when I called hip_decode(), it returen 0 meaning that "need more data before we can complete the decode". I had no idea how to solve it. Can someone helps me? Here is my code:
void CBufferWrapper::ConvertMp3toPCM (AAssetManager* mgr, const char *filename){
Print ("ConvertMp3toPCM:file:%s", filename);
AAsset* asset = AAssetManager_open (mgr, filename, AASSET_MODE_UNKNOWN);
// the asset might not be found
assert (asset != NULL);
// open asset as file descriptor
off_t start, length;
int fd = AAsset_openFileDescriptor (asset, &start, &length);
assert (0 <= fd);
long size = AAsset_getLength (asset);
char* buffer = (char*)malloc (sizeof(char)*size);
memset (buffer, 0, size*sizeof(char));
AAsset_read (asset, buffer, size);
AAsset_close (asset);
hip_t ht = hip_decode_init ();
int count = hip_decode (ht, (unsigned char*)buffer, size, pcm_l, pcm_r);
free (buffer);
Print ("ConvertMp3toPCM: length:%ld,pcmcount=%d",length, count);
}
I used MACRO "HAVE_MPGLIB" to compile Lame in NDK. So I think it should work for decoding literally.

Yesterday I had the same problem. Is the same problem but using lame_enc.dll. I did not know how to resolve this 0 returned, this is the reason to this post.
Create a buffer to put mp3 data: unsigned char mp3Data[4096]
Create two buffers for pcm data, but bigger than mp3 one:
unsigned short[4096 * 100];
Open mp3 file and initialize hip.
Now, enter in a do while loop until read bytes are 0 (the end of file).
Inside the loop read 4096 bytes into mp3Data and call hip_decode with
hip_decode(ht, mp3Data, bytesRead, lpcm, rpcm);
You are right, it returns 0. It is asking you for more data.
You need to repeat the reading of 4096 bytes and the call to hip_decode until it returns a valid samples number.
Here is the important part of my program:
int total = 0;
int hecho = 0;
int leido = 0;
int lon = 0;
int x;
do
{
total = fread(mp3b, 1, MAXIMO, fich);
leido += total;
x = hip_decode(hgf, mp3b, total, izquierda, derecha);
if(x > 0)
{
int tamanio;
int y;
tamanio = 1.45 * x + 9200;
unsigned char * bu = (unsigned char *) malloc(tamanio);
y = lame_encode_buffer(lamglofla, izquierda, derecha, x, bu, tamanio);
fwrite(bu, 1, y, fichs);
free(bu);
}
}while(total > 0);
My program decodes a mp3 file and encodes the output into another mp3 file.
I expect that this could be useful.

Related

sending audio via bluetooth a2dp source esp32

I am trying to send measured i2s analogue signal (e.g. from mic) to the sink device via Bluetooth instead of the default noise.
Currently I am trying to change the bt_app_a2d_data_cb()
static int32_t bt_app_a2d_data_cb(uint8_t *data, int32_t i2s_read_len)
{
if (i2s_read_len < 0 || data == NULL) {
return 0;
}
char* i2s_read_buff = (char*) calloc(i2s_read_len, sizeof(char));
bytes_read = 0;
i2s_adc_enable(I2S_NUM_0);
while(bytes_read == 0)
{
i2s_read(I2S_NUM_0, i2s_read_buff, i2s_read_len,&bytes_read, portMAX_DELAY);
}
i2s_adc_disable(I2S_NUM_0);
// taking care of the watchdog//
TIMERG0.wdt_wprotect=TIMG_WDT_WKEY_VALUE;
TIMERG0.wdt_feed=1;
TIMERG0.wdt_wprotect=0;
uint32_t j = 0;
uint16_t dac_value = 0;
// change 16bit input signal to 8bit
for (int i = 0; i < i2s_read_len; i += 2) {
dac_value = ((((uint16_t) (i2s_read_buff[i + 1] & 0xf) << 8) | ((i2s_read_buff[i + 0]))));
data[j] = (uint8_t) dac_value * 256 / 4096;
j++;
}
// testing for loop
//uint8_t da = 0;
//for (int i = 0; i < i2s_read_len; i++) {
// data[i] = (uint8_t) (i2s_read_buff[i] >> 8);// & 0xff;
// da++;
// if(da>254) da=0;
//}
free(i2s_read_buff);
i2s_read_buff = NULL;
return i2s_read_len;
}
I can hear the sawtooth sound from the sink device.
Any ideas what to do?
your data can be an array of some float digits representing analog signals or analog signal variations, for example, a 32khz sound signal contains 320000 float numbers to define captures sound for every second. if your data have been expected to transmit in offline mode you can prepare your outcoming data in the form of a buffer plus a terminator sign then send buffer by Bluetooth module of sender device which is connected to the proper microcontroller. for the receiving device, if you got terminator character like "\r" you can process incoming buffer e.g. for my case, I had to send a string array of numbers but I often received at most one or two unknown characters and to avoid it I reject it while fulfill receiving container.
how to trim unknown first characters of string in code vision
if you want it in online mode i.e. your data must be transmitted and played concurrently. you must consider delays and reasonable time to process for all microcontrollers and devices like Bluetooth, EEprom iCs and...
I'm also working on a project "a2dp source esp32".
I'm playing a wav-file from spiffs.
If the wav-file is 44100, 16-bit, stereo then you can directly write a stream of bytes from the file to the array data[ ].
When I tried to write less data than in the len-variable and return less (for example 88), I got an error, now I'm trying to figure out how to reduce this buffer because of big latency (len=512).
Also, the data in the array data[ ] is stored as stereo.
Example: read data from file to data[ ]-array:
size_t read;
read = fread((void*) data, 1, len, fwave);//fwave is a file
if(read<len){//If get EOF, go to begin of the file
fseek(fwave , 0x2C , SEEK_SET);//skip wav-header 44bytesт
read = fread((void*) (&(data[read])), 1, len-read, fwave);//read up
}
If file mono, I convert it to stereo like this (I read half and then double data):
int32_t lenHalf=len/2;
read = fread((void*) data, 1, lenHalf, fwave);
if(read<lenHalf){
fseek(fwave , 0x2C , SEEK_SET);//skip wav-header 44bytesт
read = fread((void*) (&(data[read])), 1, lenHalf-read, fwave);//read up
}
//copy to the second channel
uint16_t *data16=(uint16_t*)data;
for (int i = lenHalf/2-1; i >= 0; i--) {
data16[(i << 1)] = data16[i];
data16[(i << 1) + 1] = data16[i];
}
I think you have got sawtooth sound because:
your data is mono?
in your "return i2s_read_len;" i2s_read_len less than len
you // change 16bit input signal to 8bit, in the array data[ ] data as 16-bit: 2ByteLeft-2ByteRight-2ByteLeft-2ByteRight-...
I'm not sure, it's a guess.

ALSA playback interrupted without snd_pcm_hw_params_get_* calls

I'm finding that a simple ALSA playback program behaves differently when I put in some calls to snd_pcm_hw_params_get_* functions. My program plays a sine wave from a buffer repeatedly. When I include the calls, I get a pure tone as I expected. When I remove the calls, however, I get a series of beeps. This worries me, because I would not expect calls that retrieve data to have anything to do with how the sound is played. I get this behavior both on a cheap USB sound card and my (presumably nicer) internal sound card.
Here is the code:
#define GETPARAMS
int main() {
snd_pcm_t *handle;
snd_pcm_hw_params_t *params;
const char name[] = "hw:0,0";
int dir;
snd_pcm_stream_t stream = SND_PCM_STREAM_PLAYBACK;
snd_pcm_access_t access = SND_PCM_ACCESS_RW_INTERLEAVED;
snd_pcm_format_t format = SND_PCM_FORMAT_S16_LE;
unsigned int rate = 48000;
unsigned int channels = 2;
unsigned int periods = 4;
snd_pcm_uframes_t periodsize = 2048;
int num_frames = 2*periodsize;
snd_pcm_hw_params_alloca(&params);
snd_pcm_open(&handle, name, stream, 0);
snd_pcm_hw_params_any(handle, params);
#ifdef GETPARAMS
printf("\nparameters before setting:\n");
snd_pcm_hw_params_get_rate(params, &rate, &dir);
printf(" rate = %d, dir = %d\n", rate, dir);
snd_pcm_hw_params_get_channels(params, &channels);
printf(" channels = %d\n", channels);
snd_pcm_hw_params_get_periods(params, &periods, &dir);
printf(" periods = %d, dir = %d\n", periods, dir);
snd_pcm_hw_params_get_buffer_size(params, &periodsize);
printf(" periodsize = %ld\n", periodsize);
#endif
snd_pcm_hw_params_set_access(handle, params, access);
snd_pcm_hw_params_set_format(handle, params, format);
snd_pcm_hw_params_set_rate_near(handle, params, &rate, &dir);
snd_pcm_hw_params_set_channels(handle, params, 2);
snd_pcm_hw_params_set_periods(handle, params, periods, 0);
snd_pcm_hw_params_set_buffer_size(handle, params, num_frames);
snd_pcm_hw_params(handle, params);
#ifdef GETPARAMS
printf("\nparameters after setting:\n");
snd_pcm_hw_params_get_rate(params, &rate, &dir);
printf(" rate = %d, dir = %d\n", rate, dir);
snd_pcm_hw_params_get_channels(params, &channels);
printf(" channels = %d\n", channels);
snd_pcm_hw_params_get_periods(params, &periods, &dir);
printf(" periods = %d, dir = %d\n", periods, dir);
snd_pcm_hw_params_get_buffer_size(params, &periodsize);
printf(" periodsize = %ld\n\n", periodsize);
#endif
int16_t *data = (int16_t*)calloc(2*periodsize, sizeof(int16_t));
loadpage(data, 2*periodsize);
snd_pcm_sframes_t frames;
snd_pcm_prepare(handle);
for (int i=0; i<8; i++) {
frames = snd_pcm_writei(handle, data, num_frames);
if (frames < 0)
frames = snd_pcm_recover(handle, frames, 0);
if (frames < 0) {
printf("snd_pcm_writei failed: %s\n", snd_strerror(frames));
}
if (frames > 0 && frames < num_frames)
printf("short write (expected %d, write %li)\n", num_frames, frames);
}
snd_pcm_close(handle);
free(data);
}
loadpage() fills the buffer. When I comment out the #define GETPARAMS I get a series of short beeps. When I include it I get a pure tone.
Here is the output when GETPARAMS is defined:
parameters before setting:
rate = 48000, dir = 32766
channels = 2
periods = 4, dir = 32766
periodsize = 2048
parameters after setting:
rate = 48000, dir = 0
channels = 2
periods = 4, dir = 0
periodsize = 4096
You must not call the snd_pcm_hw_param_get_*() functions if the parameters have not yet been set because at that time, the configuration space contains multiple potential values for the parameters.
To print the current state of the hw_params container, use snd_pcm_hw_params_dump():
snd_output_t *output;
snd_output_stdio_attach(&output, stdout, 0);
...
snd_pcm_hw_params_dump(params, output);
...
snd_output_close(output);
Anyway, the problem is that the initial values of periods, periodsize, and num_frames are inconsistent, and that the _get_ calls overwrite these variables with other values that happen to be consistent.
I do not know what values you actually want to use, but note that the period size and the buffer size are measured in frames, and that one frame contains all samples of all channels, i.e., in this case, one frame has four bytes.

Driver programming : cat command not showing output

I am new to driver programming and I have written a simple char device driver code. When I wrote it without using pointers, it crashed.
When writing to a driver using echo, it works. But when reading from it, there is no output. Someone please help. File operations part of the code is shown below. 'p' and 'q' are normal character pointers. 'max' value was set as 10. 'ptr' is of static int type initialized as '0'.
int my_open(struct inode *inode,struct file *filp)
{
printk("In open.\n");
if((filp->f_flags & O_ACCMODE) == O_WRONLY){
p = (char *)buffer;
ptr = 0;
}
else if((filp->f_flags & O_ACCMODE) == O_RDONLY)
q = (char *)buffer;
return 0;
}
int my_close(struct inode *inode,struct file *filp)
{
printk("In close.\n");
return 0;
}
ssize_t my_read(struct file *filp,char *buff,size_t count,loff_t *pos)
{
long ret;
printk("In read.\n");
ret = copy_to_user(buff,q,max);
q += max;
*pos += max;
if(ptr -= max)
return max;
else
return 0;
}
ssize_t my_write(struct file *filp,const char *buff,size_t count,loff_t *pos)
{
long ret;
printk("In write.\n");
ret = copy_from_user(p,buff,max);
p += max;
*pos += max;
ptr += max;
return max;
}
module_init(my_init);
module_exit(my_exit);
In both read and write you are not taking into account the "count" parameter, as your code seems to assume that "count>=max", that is not guaranteed. This by itself may lead to any sort of troubles in the process executing read. Also, you copy_to/from_user before checking if the current read or write position is over the buffer limit. Moreover, the assignment/test if (ptr -= max) only works if ptr is exactly equal to max, also not guaranteed it you execute the read more than once.
NOTE: since definitions of p, q, buffer, ptr and max are missing, I'll assume that they look like:
static char *p;
static char *q;
statint int ptr = 0;
static char buffer[10];
static int max=10;

How to convert sample rate from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16?

I am decoding aac to pcm with ffmpeg with avcodec_decode_audio3. However it decodes into AV_SAMPLE_FMT_FLTP sample format (PCM 32bit Float Planar) and i need AV_SAMPLE_FMT_S16 (PCM 16 bit signed - S16LE).
I know that ffmpeg can do this easily with -sample_fmt. I want to do the same with the code but i still couldn't figure it out.
audio_resample did not work for: it fails with error message: .... conversion failed.
EDIT 9th April 2013: Worked out how to use libswresample to do this... much faster!
At some point in the last 2-3 years FFmpeg's AAC decoder's output format changed from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP. This means that each audio channel has it's own buffer, and each sample value is a 32-bit floating point value scaled from -1.0 to +1.0.
Whereas with AV_SAMPLE_FMT_S16 the data is in a single buffer, with the samples interleaved, and each sample is a signed integer from -32767 to +32767.
And if you really need your audio as AV_SAMPLE_FMT_S16, then you have to do the conversion yourself. I figured out two ways to do it:
1. Use libswresample (recommended)
#include "libswresample/swresample.h"
...
SwrContext *swr;
...
// Set up SWR context once you've got codec information
swr = swr_alloc();
av_opt_set_int(swr, "in_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "in_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
swr_init(swr);
...
// In your decoder loop, after decoding an audio frame:
AVFrame *audioFrame = ...;
int16_t* outputBuffer = ...;
swr_convert(&outputBuffer, audioFrame->nb_samples, audioFrame->extended_data, audioFrame->nb_samples);
And that's all you have to do!
2. Do it by hand in C (original answer, not recommended)
So in your decode loop, when you've got an audio packet you decode it like this:
AVCodecContext *audioCodec; // init'd elsewhere
AVFrame *audioFrame; // init'd elsewhere
AVPacket packet; // init'd elsewhere
int16_t* outputBuffer; // init'd elsewhere
int out_size = 0;
...
int len = avcodec_decode_audio4(audioCodec, audioFrame, &out_size, &packet);
And then, if you've got a full frame of audio, you can convert it fairly easily:
// Convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16
int in_samples = audioFrame->nb_samples;
int in_linesize = audioFrame->linesize[0];
int i=0;
float* inputChannel0 = (float*)audioFrame->extended_data[0];
// Mono
if (audioFrame->channels==1) {
for (i=0 ; i<in_samples ; i++) {
float sample = *inputChannel0++;
if (sample<-1.0f) sample=-1.0f; else if (sample>1.0f) sample=1.0f;
outputBuffer[i] = (int16_t) (sample * 32767.0f);
}
}
// Stereo
else {
float* inputChannel1 = (float*)audioFrame->extended_data[1];
for (i=0 ; i<in_samples ; i++) {
outputBuffer[i*2] = (int16_t) ((*inputChannel0++) * 32767.0f);
outputBuffer[i*2+1] = (int16_t) ((*inputChannel1++) * 32767.0f);
}
}
// outputBuffer now contains 16-bit PCM!
I've left a couple of things out for clarity... the clamping in the mono path should ideally be duplicated in the stereo path. And the code can be easily optimized.
I found 2 resample function from FFMPEG. The performance maybe better.
avresample_convert()
http://libav.org/doxygen/master/group__lavr.html
swr_convert() http://spirton.com/svn/MPlayer-SB/ffmpeg/libswresample/swresample_test.c
Thanks Reuben for a solution to this. I did find that some of the sample values were slightly off when compared with a straight ffmpeg -i file.wav. It seems that in the conversion, they use a round() on the value.
To do the conversion, I did what you did with a bid of modification to work for any amount of channels:
if (audioCodecContext->sample_fmt == AV_SAMPLE_FMT_FLTP)
{
int nb_samples = decoded_frame->nb_samples;
int channels = decoded_frame->channels;
int outputBufferLen = nb_samples & channels * 2;
short* outputBuffer = new short[outputBufferLen/2];
for (int i = 0; i < nb_samples; i++)
{
for (int c = 0; c < channels; c++)
{
float* extended_data = (float*)decoded_frame->extended_data[c];
float sample = extended_data[i];
if (sample < -1.0f) sample = -1.0f;
else if (sample > 1.0f) sample = 1.0f;
outputBuffer[i * channels + c] = (short)round(sample * 32767.0f);
}
}
// Do what you want with the data etc.
}
I went from ffmpeg 0.11.1 -> 1.1.3 and found the change of sample format annoying. I looked at setting the request_sample_fmt to AV_SAMPLE_FMT_S16 but it seems the aac decoder doesn't support anything other than AV_SAMPLE_FMT_FLTP anyway.

How does one find the start of the "Central Directory" in zip files?

Wikipedia has an excellent description of the ZIP file format, but the "central directory" structure is confusing to me. Specifically this:
This ordering allows a ZIP file to be created in one pass, but it is usually decompressed by first reading the central directory at the end.
The problem is that even the trailing header for the central directory is variable length. How then, can someone get the start of the central directory to parse?
(Oh, and I did spend some time looking at APPNOTE.TXT in vain before coming here and asking :P)
My condolences, reading the wikipedia description gives me the very strong impression that you need to do a fair amount of guess + check work:
Hunt backwards from the end for the 0x06054b50 end-of-directory tag, look forward 16 bytes to find the offset for the start-of-directory tag 0x02014b50, and hope that is it. You could do some sanity checks like looking for the comment length and comment string tags after the end-of-directory tag, but it sure feels like Zip decoders work because people don't put funny characters into their zip comments, filenames, and so forth. Based entirely on the wikipedia page, anyhow.
I was implementing zip archive support some time ago, and I search last few kilobytes for a end of central directory signature (4 bytes). That works pretty good, until somebody will put 50kb text into comment (which is unlikely to happen. To be absolutely sure, you can search last 64kb + few bytes, since comment size is 16 bit).
After that, I look up for zip64 end of central dir locator, that's easier since it has fixed structure.
Here is a solution I have just had to roll out incase anybody needs this. This involves grabbing the central directory.
In my case I did not want any of the compression features that are offered in any of the zip solutions. I just wanted to know about the contents. The following code will return a ZipArchive of a listing of every entry in the zip.
It also uses a minimum amount of file access and memory allocation.
TinyZip.cpp
#include "TinyZip.h"
#include <cstdio>
namespace TinyZip
{
#define VALID_ZIP_SIGNATURE 0x04034b50
#define CENTRAL_DIRECTORY_EOCD 0x06054b50 //signature
#define CENTRAL_DIRECTORY_ENTRY_SIGNATURE 0x02014b50
#define PTR_OFFS(type, mem, offs) *((type*)(mem + offs)) //SHOULD BE OK
typedef struct {
unsigned int signature : 32;
unsigned int number_of_disk : 16;
unsigned int disk_where_cd_starts : 16;
unsigned int number_of_cd_records : 16;
unsigned int total_number_of_cd_records : 16;
unsigned int size_of_cd : 32;
unsigned int offset_of_start : 32;
unsigned int comment_length : 16;
} ZipEOCD;
ZipArchive* ZipArchive::GetArchive(const char *filepath)
{
FILE *pFile = nullptr;
#ifdef WIN32
errno_t err;
if ((err = fopen_s(&pFile, filepath, "rb")) == 0)
#else
if ((pFile = fopen(filepath, "rb")) == NULL)
#endif
{
int fileSignature = 0;
//Seek to start and read zip header
fread(&fileSignature, sizeof(int), 1, pFile);
if (fileSignature != VALID_ZIP_SIGNATURE) return false;
//Grab the file size
long fileSize = 0;
long currPos = 0;
fseek(pFile, 0L, SEEK_END);
fileSize = ftell(pFile);
fseek(pFile, 0L, SEEK_SET);
//Step back the size of the ZipEOCD
//If it doesn't have any comments, should get an instant signature match
currPos = fileSize;
int signature = 0;
while (currPos > 0)
{
fseek(pFile, currPos, SEEK_SET);
fread(&signature, sizeof(int), 1, pFile);
if (signature == CENTRAL_DIRECTORY_EOCD)
{
break;
}
currPos -= sizeof(char); //step back one byte
}
if (currPos != 0)
{
ZipEOCD zipOECD;
fseek(pFile, currPos, SEEK_SET);
fread(&zipOECD, sizeof(ZipEOCD), 1, pFile);
long memBlockSize = fileSize - zipOECD.offset_of_start;
//Allocate zip archive of size
ZipArchive *pArchive = new ZipArchive(memBlockSize);
//Read in the whole central directory (also includes the ZipEOCD...)
fseek(pFile, zipOECD.offset_of_start, SEEK_SET);
fread((void*)pArchive->m_MemBlock, memBlockSize - 10, 1, pFile);
long currMemBlockPos = 0;
long currNullTerminatorPos = -1;
while (currMemBlockPos < memBlockSize)
{
int sig = PTR_OFFS(int, pArchive->m_MemBlock, currMemBlockPos);
if (sig != CENTRAL_DIRECTORY_ENTRY_SIGNATURE)
{
if (sig == CENTRAL_DIRECTORY_EOCD) return pArchive;
return nullptr; //something went wrong
}
if (currNullTerminatorPos > 0)
{
pArchive->m_MemBlock[currNullTerminatorPos] = '\0';
currNullTerminatorPos = -1;
}
const long offsToFilenameLen = 28;
const long offsToFieldLen = 30;
const long offsetToFilename = 46;
int filenameLength = PTR_OFFS(int, pArchive->m_MemBlock, currMemBlockPos + offsToFilenameLen);
int extraFieldLen = PTR_OFFS(int, pArchive->m_MemBlock, currMemBlockPos + offsToFieldLen);
const char *pFilepath = &pArchive->m_MemBlock[currMemBlockPos + offsetToFilename];
currNullTerminatorPos = (currMemBlockPos + offsetToFilename) + filenameLength;
pArchive->m_Entries.push_back(pFilepath);
currMemBlockPos += (offsetToFilename + filenameLength + extraFieldLen);
}
return pArchive;
}
}
return nullptr;
}
ZipArchive::ZipArchive(long size)
{
m_MemBlock = new char[size];
}
ZipArchive::~ZipArchive()
{
delete[] m_MemBlock;
}
const std::vector<const char*> &ZipArchive::GetEntries()
{
return m_Entries;
}
}
TinyZip.h
#ifndef __TinyZip__
#define __TinyZip__
#include <vector>
#include <string>
namespace TinyZip
{
class ZipArchive
{
public:
ZipArchive(long memBlockSize);
~ZipArchive();
static ZipArchive* GetArchive(const char *filepath);
const std::vector<const char*> &GetEntries();
private:
std::vector<const char*> m_Entries;
char *m_MemBlock;
};
}
#endif
Usage:
TinyZip::ZipArchive *pArchive = TinyZip::ZipArchive::GetArchive("Scripts_unencrypt.pak");
if (pArchive != nullptr)
{
const std::vector<const char*> entries = pArchive->GetEntries();
for (auto entry : entries)
{
//do stuff
}
}
In case someone out there is still struggling with this problem - have a look at the repository I hosted on GitHub containing my project that could answer your questions.
Zip file reader
Basically what it does is download the central directory part of the .zip file which resides in the end of the file.
Then it will read out every file and folder name with it's path from the bytes and print it out to console.
I have made comments about the more complicated steps in my source code.
The program can work only till about 4GB .zip files. After that you will have to do some changes to the VM size and maybe more.
Enjoy :)
I recently encountered a similar use-case and figured I would share my solution for posterity since this post helped send me in the right direction.
Using the Zip file central directory offsets detailed on Wikipedia here, we can take the following approach to parse the central directory and retrieve a list of the contained files:
STEPS:
Find the end of the central directory record (EOCDR) by scanning the zip file in binary format for the EOCDR signature (0x06054b50), beginning at the end of the file (i.e. read the file in reverse using std::ios::ate if using a ifstream)
Use the offset located in the EOCDR (16 bytes from the EOCDR) to position the stream reader at the beginning of the central directory
Use the offset (46 bytes from the CD start) to position the stream reader at the file name and track its position start point
Scan until either another central directory header is found (0x02014b50) or the EOCDR is found, and track the position
Reset the reader to the start of the file name and read until the end
Position the reader over the next header, or terminate if the EOCDR is found
The key point here is that the EOCDR is uniquely identified by a signature (0x06054b50) that occurs only one time. Using the 16 byte offset, we can position ourselves to the first occurrence of the central directory header (0x02014b50). Each record will have the same 0x02014b50 header signature, so you just need to loop through occurrences of the header signatures until you hit the EOCDR ending signature (0x06054b50) again.
SUMMARY:
If you want to see a working example of the above steps, you can check out my minimal implementation (ZipReader) on GitHub here. The implementation can be used like this:
ZipReader zr;
if (zr.SetInput("blah.zip") == ZipReaderStatus::S_FAIL)
std::cout << "set input error" << std::endl;
std::vector<std::string> entries;
if (zr.GetEntries(entries) == ZipReaderStatus::S_FAIL)
std::cout << "get entries error" << std::endl;
for (auto entry : entries)
std::cout << entry << std::endl;

Resources