"Xpressive leak" fixed, but not understood - memory-leaks

I know. Xpressive is (probably) not at fault here, but I've put a lot of effort into finding the memory leaks and I had to adapt the code layout to fix the haemorrhage.
Can someone explain to me why the change in layout fixed it? I don't see why the (correct/improved) use of "static const" fixes the leaks.
BTW, the leaks were occurring on a MIPs core, using boost version 1.49, and cross compiled with GCC 4.3.3.
Original "sieve" code:
// source.cpp
#include <boost/...
cregex token = keep(+set[ alnum|'!'|'%'|'_'|'*'|'+'|'.'|'\''|'`'|'~'|'-']);
cregex more = ...
bool foo(const char * begin, const char * end, size_t& length, std::string& dest)
{
mark_tag name_t(1);
cregex regx = bos >>
icase("name:") >>
(name_t= token) >> eos;
cmatch what;
bool ok = regex_search( begin, end, what, regx );
...
return ok;
}
Fixed "non-leaky" code:
// header.hpp
#include <boost/...
class Xpr {
public:
static const cregex token;
static const cregex more;
};
// source.cpp
#include "header.hpp"
const cregex Xpr::token = keep(+set[ alnum|'!'|'%'|'_'|'*'|'+'|'.'|'\''|'`'|'~'|'-']);
const cregex Xpr::more = ...
bool foo(const char * begin, const char * end, size_t& length, std::string& dest)
{
mark_tag name_t(1);
static const cregex regx = bos >>
icase("name:") >>
(name_t= Xpr::token) >> eos;
cmatch what;
bool ok = regex_search( begin, end, what, regx );
...
return ok;
}
The leaks seemed to be occurring upon every call of foo!

EDIT: After writing the below response, I tried to reproduce your problem and was unable to. Here is the code I'm using.
#include <boost/xpressive/xpressive.hpp>
using namespace boost;
using namespace xpressive;
cregex token = keep(+set[ alnum|'!'|'%'|'_'|'*'|'+'|'.'|'\''|'`'|'~'|'-']);
//cregex more = ...
bool foo(const char * begin, const char * end)
{
mark_tag name_t(1);
cregex regx = bos >>
icase("name:") >>
(name_t= token) >> eos;
cmatch what;
bool ok = regex_search( begin, end, what, regx );
//...
return ok;
}
int main()
{
char const buf[] = "name:value";
while(true)
foo(buf, buf + sizeof(buf) - 1);
}
This code doesn't leak. Is it possible you're using an earlier version of xpressive? Could you post a complete, self-contained example so I can investigate? Even better, file a bug and attach the code there. Thanks,
Eric
-----Begin Original Response-----
I suspect that you're running afoul of xpressive's cycle tracking code. See here for a warning against nesting global regex objects in function-local ones. I think what's happening is that, in order to prevent dangling references, the function-local regx must hold a reference to token, and token must hold a (weak) reference back to regx. What is growing is token's map of weak references. This isn't a leak in the strict technical sense, since the memory will get reclaimed when token gets destroyed. But it's obviously not ideal.
I fixed a similar "leak" in xpressive a while back by adding an opportunistic purge of the map to clear out weak references to expired regexes. I have to look into why it's not happening in this case. Please file a bug here. Thanks.
In the mean time, your fix is plenty good. Declaring the function-local regx static means it will only get constructed once, so token's weak-reference map will never grow beyond size 1.

Related

LPNMITEMACTIVATE and code analysis (C26462)

Why is it that in the source code in the SDK for LPNMITEMACTIVATE it is defined with the asterix to the left?
typedef struct tagNMITEMACTIVATE
{
NMHDR hdr;
int iItem;
int iSubItem;
UINT uNewState;
UINT uOldState;
UINT uChanged;
POINT ptAction;
LPARAM lParam;
UINT uKeyFlags;
} NMITEMACTIVATE, *LPNMITEMACTIVATE;
I am always used to the pointer being on the right. Either way, code like:
const LPNMITEMACTIVATE pNMItem = reinterpret_cast<LPNMITEMACTIVATE>(pNMHDR);
Will still flag a const (C26462) warning:
If I change the code to:
const NMITEMACTIVATE* pNMItem = reinterpret_cast<LPNMITEMACTIVATE>(pNMHDR);
The warning will go away.
I tried this with Visual Studio 2022, first of all, warning C26462 was not enabled by default. Perhaps you are using an earlier release, or there is something odd with my installation.
After manually enabling the warning, I could make that warning go away by assigning pNMItem more than once:
LPNMITEMACTIVATE pNMItem = nullptr;
pNMItem = reinterpret_cast<LPNMITEMACTIVATE>(pNMHDR);
How is this useful?
Or it can be fixed as suggested in other answers. But you may have additional problem because pNMHDR was probably declared as LPNMHDR, so you have to rewrite more lines:
NMHDR hdr = { 0 };
const NMHDR* pNMHDR = reinterpret_cast<NMHDR*>(&hdr);
const NMITEMACTIVATE* pNMItem = reinterpret_cast<const NMITEMACTIVATE*>(pNMHDR);
This can be a big waste of time. Note, the extra compliance is recommended if you are writing code that's supposed to run on any system. But MFC is tied to Windows so this isn't really an issue. MFC and Windows are still using that "long pointer" crap that's left over from 16-bit Windows, they are not compliant themselves, so consider turning off some of these warnings.
This is standard C/C++
Like in this (not runnable) code snippet:
typedef int *LPINT;
// typedef int* LPINT; // you could write this, it's exactly the
// the same as above
int main()
{
LPINT pint;
int* pint2;
*pint = *pint2;
}
pint and pint2 are both pointers to int. BTW this is hiding a pointer type behind a typedef, which is a bad idea (but was considered as a good idea in old MS days), but lots of Microsoft headers still have these typedef sometype *LPsometype; typedefs for compatibility reasons.
Another example which is closer to the MS header you're refering to:
This:
typedef struct tagNMITEMACTIVATE
{
int hdr;
int iItem;
} NMITEMACTIVATE, *LPNMITEMACTIVATE;
is equivalent to this:
typedef struct tagNMITEMACTIVATE
{
int hdr;
int iItem;
} NMITEMACTIVATE;
typedef struct tagNMITEMACTIVATE *LPNMITEMACTIVATE;
For pointer const can be applied to the type the pointer points at:
const NMITEMACTIVATE* p;
or
NMITEMACTIVATE const* p;
Or it can be applied to the pointer variable itself:
NMITEMACTIVATE* const p;
Now if you have typedef:
typedef NMITEMACTIVATE *PNMITEMACTIVATE;
The const would not apply to the type being pointed at. Either way it is the pointer itself is constant:
const PNMITEMACTIVATE p;
PNMITEMACTIVATE const p;
To avoid this confusion, prefer not to use raw pointer typedefs (and not to define them).

Converting between WinRT HttpBufferContent and unmanaged memory in C++cx

As part of a WinRT C++cx component, what's the most efficient way to convert an unmanaged buffer of bytes (say expressed as a std::string) back and forth with a Windows::Web::Http::HttpBufferContent?
This is what I ended up with, but it doesn't seem very optimal:
std::string to HttpBufferContent:
std::string m_body = ...;
auto writer = ref new DataWriter();
writer->WriteBytes(ArrayReference<unsigned char>(reinterpret_cast<unsigned char*>(const_cast<char*>(m_body.data())), m_body.length()));
auto content = ref new HttpBufferContent(writer->DetachBuffer());
HttpBufferContent to std::string:
HttpBufferContent^ content = ...
auto operation = content->ReadAsBufferAsync();
auto task = create_task(operation);
if (task.wait() == task_status::completed) {
auto buffer = task.get();
size_t length = buffer->Length;
if (length > 0) {
unsigned char* storage = static_cast<unsigned char*>(malloc(length));
DataReader::FromBuffer(buffer)->ReadBytes(ArrayReference<unsigned char>(storage, length));
auto m_body = std::string(reinterpret_cast<char*>(storage), length);
free(storage);
}
} else {
abort();
}
UPDATE: Here's the version I ended up using (you can trivially create a HttpBufferContent^ from an Windows::Storage::Streams::IBuffer^):
void IBufferToString(IBuffer^ buffer, std::string& string) {
Array<unsigned char>^ array = nullptr;
CryptographicBuffer::CopyToByteArray(buffer, &array); // TODO: Avoid copy
string.assign(reinterpret_cast<char*>(array->Data), array->Length);
}
IBuffer^ StringToIBuffer(const std::string& string) {
auto array = ArrayReference<unsigned char>(reinterpret_cast<unsigned char*>(const_cast<char*>(string.data())), string.length());
return CryptographicBuffer::CreateFromByteArray(array);
}
I think you are making at least one unnecessary copy of your data in your current approach for HttpBufferContent to std::string, you could improve this by accessing the IBuffer data directly, see the accepted answer here: Getting an array of bytes out of Windows::Storage::Streams::IBuffer
I think it's better to use smart pointer (no memory management needed) :
#include <wrl.h>
#include <robuffer.h>
#include <memory>
using namespace Windows::Storage::Streams;
using namespace Microsoft::WRL;
IBuffer^ buffer;
ComPtr<IBufferByteAccess> byte_access;
reinterpret_cast<IInspectable*>(buffer)->QueryInterface(IID_PPV_ARGS(&byte_access));
std::unique_ptr<byte[]> raw_buffer = std::make_unique<byte[]>(buffer->Length);
byte_access->Buffer(raw_buffer.get());
std::string str(reinterpret_cast<char*>(raw_buffer.get())); // just 1 copy

VS2012 Static Analysis: this pointer as an output pointer?

In this code snippet, the Init() function acts as a on-demand initializer that fills in all member variables of the structure. This is done to avoid calling default constructors all members of a large array on the stack:
struct Foo {
int m_Member;
void Init(int i);
};
void Foo::Init(int i) {
m_Member = i;
// Many other members initialized here.
}
void SomeFunction(int n) {
Foo buffer[64];
assert(n <= 64);
// Explicitly initialize what is needed.
for (int i = 0; i < n; ++i) {
buffer[i].Init(i * 3);
}
// Use buffer[0] - buffer[n-1] somehow.
}
This triggers a static analysis error in VS2012 with /analyze:
warning C6001: Using uninitialized memory 'buffer'.: Lines: 17, 19, 20
I'm looking for a way to annotate Foo::Init() so that this warning doesn't occur. There are plenty of other ways to make the warning go away, including:
Adding an empty constructor
Moving Init() to the constructor and calling placement new in the loop
But I'd like to avoid changing the structure of the code.
I've tried the following annotation without success:
void _At_(this, _Out_) Init();
This syntax is accepted, but only changes the warning to be:
warning C6001: Using uninitialized memory 'buffer'.: Lines: 18, 20, 21
warning C6001: Using uninitialized memory 'buffer[BYTE:0]'.: Lines: 18, 20, 21
Does anyone know how I can declare the intent of this Init() function to the static analysis engine?
Your question is somewhat elusive. You have shown SomeFunction taking int, but want annotation for method Init or constructor.
The warning shown is absolutely correct, assert won't hide the warning. You need to put if to check if n is greateer than 64 and reset n (or do something else, but not to loop when n>=64).
For annotation you need to use __in_bcount or similar alternative. An example:
bool SetBuffer(__in_bcount(8) const char* sBuffer);
Whichs says sBuffer is of 8 bytes (not elements).
You can read this this article for more information.
Too ugly to add an extra helper?
struct Foo {
int m_Member;
void Init(int i);
};
void Foo::Init(int i) {
m_Member = i;
// Many other members initialized here.
}
void Initialize(__in_bcount(sizeof(Foo) * n) Foo* buffer, int n) {
// Explicitly initialize what is needed.
for (int i = 0; i < n; ++i) {
buffer[i].Init(i * 3);
}
}
void SomeFunction(int n) {
Foo buffer[64];
assert(n <= 64);
Initialize(buffer, n);
// Use buffer[0] - buffer[n-1] somehow.
}
I found a work around by implementing a function to index the array. I flagged the return value as invalid so that this new function only escapes the uninitialized value check in the specific case where the return value is only used to initialize. I've only tested this in VS2017.
#define _Ret_invalid_ _SAL2_Source_(_Ret_invalid_, (), _Ret1_impl_(__notvalid_impl))
template <typename T>
_Ret_invalid_ T& UninitialzedIndex(T* pt, int index)
{
return pt[index];
}
Then, where the value is indexed, I call UninitialzedIndex instead of operator[]
void SomeFunction(int n) {
Foo buffer[64];
if (n <= 64)
return;
// Explicitly initialize what is needed.
for (int i = 0; i < n; ++i) {
UninitialzedIndex(buffer, i).Init(i * 3);
}
// Use buffer[0] - buffer[n-1] somehow.
}
Just add a default constructor (that calls Init()). What is wrong with that?
[Edit] The root problem is not how to lie to the static analyzer or your compiler. It is how to enforce that you don't leave foo in an uninitialized state. There is nothing wrong with adding a default constructor. I'd say the desire to NOT do it imposes risk.
Perhaps some client will use that poorly constructed foo class (Long after you wrote it and long after you are gone) and perhaps they will forget to call .Init() ?? What then? They will be left with data that is uninitialized.
If you are looking to enforce that rule, no amount of static analysis will help you there.
Take care of the foundation before you put on the roof.

Corrupted vector entries with LPCWSTR vector

I ahve the following piece of code. I get a correctly filled vector. But I am unable to print or use the vector contents which are file names from a directory. As soon as I do enter the first iteration. Everything gets lost. What am I doing wrong?
wprintf - This works OK
wcout-- here is where everything ends up corrupted
#include <windows.h>
#include <sstream>
#include <string>
#include <vector>
#include<iostream>
void GetAllFiles(vector<LPCWSTR>&, wstring);
using namespace std;
void main (void)
{
vector<LPCWSTR> files(0);
wstring path = L"Datasets\\Persons\\";
wstring ext = L"*.*";
wstring fullPath = path+ext;
GetAllFiles(files,fullPath);
for (unsigned i=0; i<files.size() ; i++)
{
try
{
wcout<<"::\n"<<files[i];
}
catch(exception &ex)
{
cout<<"Error:"<<ex.what();
}
}
}
void GetAllFiles(vector<LPCWSTR>& fileNames,wstring dir)
{
WIN32_FIND_DATA search_data;
memset(&search_data, 0, sizeof(WIN32_FIND_DATA));
HANDLE handle = FindFirstFile(dir.c_str(),&search_data);
while(handle != INVALID_HANDLE_VALUE)
{
wprintf(L"Found file: %s\r\n", search_data.cFileName);
fileNames.push_back(search_data.cFileName);
if(FindNextFile(handle, &search_data) == FALSE)
break;
}
}
I have attached a screen shots of the output.
search_data.cFileName is a pointer to memory controlled by the FindFirstFile/FindNextFile iterator interface; you cannot store this pointer value as the pointed-to memory could change from iteration to iteration (or even be freed after the iteration completes).
Instead, you must make a copy of the string to put in your vector, e.g. using wcsdup. Even better, define your vector as a vector<wstring>, so that push_back(search_data.cFileName); creates a wstring with the contents of search_data.cFileName.
Probably that's happening because you pass local variable to push_back(). I'm not sure here, but what could happen here:
push_back expects object of type LPCWSTR, while you passing char* instead. I don't know, how this conversion is done, but probably the pointer is just copied, and the value of this pointer becomes invalid whenyou return from the function - try explicit copying the strings before passing them to push_back.

Is it possible to change strings (content and size) in Lua bytecode so that it will still be correct?

Is it possible to change strings (content and size) in Lua bytecode so that it will still be correct?
It's about translating strings in Lua bytecode. Of course, not every language has the same size for each word...
Yes, it is if you know what you're doing. Strings are prefixed by their size stored as an int. The size and endianness of that int is platform-dependent. But why do you have to edit bytecode? Have you lost the sources?
After some diving throught Lua source-code I found such a solution:
#include "lua.h"
#include "lauxlib.h"
#include "lopcodes.h"
#include "lobject.h"
#include "lundump.h"
/* Definition from luac.c: */
#define toproto(L,i) (clvalue(L->top+(i))->l.p)
writer_function(lua_State* L, const void* p, size_t size, void* u)
{
UNUSED(L);
return (fwrite(p,size,1,(FILE*)u)!=1) && (size!=0);
}
static void
lua_bytecode_change_const(lua_State *l, Proto *f_proto,
int const_index, const char *new_const)
{
TValue *tmp_tv = NULL;
const TString *tmp_ts = NULL;
tmp_ts = luaS_newlstr(l, new_const, strlen(new_const));
tmp_tv = &f_proto->k[INDEXK(const_index)];
setsvalue(l, tmp_tv, tmp_ts);
return;
}
int main(void)
{
lua_State *l = NULL;
Proto *lua_function_prototype = NULL;
FILE *output_file_hnd = NULL;
l = lua_open();
luaL_loadfile(l, "some_input_file.lua");
lua_proto = toproto(l, -1);
output_file_hnd = fopen("some_output_file.luac", "w");
lua_bytecode_change_const(l, lua_function_prototype, some_const_index, "some_new_const");
lua_lock(l);
luaU_dump(l, lua_function_prototype, writer_function, output_file_hnd, 0);
lua_unlock(l);
return 0;
}
Firstly, we have start Lua VM and load the script we want to modify. Compiled or not, doesn't matter. Then build a Lua function prototype, parse and change it's constant table. Dump Prototype to a file.
I hope You got that for the basic idea.
You can try using the decompiler LuaDec. The decompiler would allow the strings to be modified in generated Lua code similar to the original source.
ChunkSpy has A No-Frills Introduction to Lua 5.1 VM Instructions that may help you understand the compiled chunk format and make the changes directly to bytecode if necessary.

Resources