I want to perform a full-text-search on 2 columns with partial queries included.
I've tried multiple options and this one seems the best to me:
Add <-> between the words of the query and :* at the end
Execute query
The problem is, that I have to execute the query in TypeORM. So when I use to_tsquery(:query) there might be invalid syntax in the query, which produces an error.
The function plainto_tsquery() would be perfect since it prevents invalid syntax in the argument, but at the same time it prevents the partial queries, which I can do as described.
Any idea how I could combine the best of the to worlds?
You could try something like
SELECT to_tsquery(quote_literal(query) || ':*')
This will add <-> between word and :* at the end of every word, while quote_literal should protect you from syntax issues by escaping the text.
Disadvantage of this method however is that the generated query might behave unexpectedly when encountering queries with symbols, e.g. o'reilley as query will yield 'o':* <-> 'reilley':* as tsquery, which likely won't give back the expected result. Unfortunately, the only solution I know for this is cleaning both the input and text data of any symbols.
Related
Objective
All I am trying to do is retrieve a single record from a specific table where the primary key matches. I have a feeling I'm greatly over complicating this as it seems to be a simple enough task. I have a theory that it may not know the variable value because it isn't actually pulling it from the Python code but instead trying to find a variable by the same name in the database.
EDIT: Is it possible that I need to wrap my where clause in an expression statement?
Attempted
My Python code is
def get_single_record(name_to_search):
my_engine = super_secret_inhouse_engine_constructor("sample_data.csv")
print("Searching for " + name_to_search)
statement = my_engine.tables["Users"].select().where(my_engine.tables["Users"].c.Name == name_to_search)
# Print out the raw SQL so we can see what exactly it's checking for
print("You are about to run: " + str(statement))
# Print out each result (should only be one)
print("Results:")
for item in my_engine.execute(statement):
print(item)
I tried hard coding a string in its place.
I tried using like instead of where.
All to the same end result.
Expected
I expect it to generate something along the lines of SELECT * FROM MyTable WHERE Name='Todd'.
Actual Result
Searching for Todd
STATEMENT: SELECT "Users"."Name", ...
FROM "Users"
WHERE "Users"."Name" = ?
That is an actual question mark appearing my statement, not simply my own confusion. This is then followed by it printing out a collection of all the records from the table, as though it successfully matched everything.
EDIT 2: Running either my own hard coded SQL string or the generated query by Alchemy returns every record from the table. I'm beginning to think the issue may be with the engine I've set up not accepting the query.
Why I'm Confused
According to the official documentation and third party sources, I should be able to compare to hardcoded strings and then, by proxy, be able to compare to a variable.
I am using jQuery to get information from SharePoint 2010's listData.svc. I noticed some inconsistencies with regards to case sensitivity in my queries:
The following command is case sensitive:
...&$filter=substringof('String', property) eq True
The following command is case insensitive
...&$filter=substringof(tolower('String'), tolower(property)) eq True
The following command is also case insensitive but much shorter:
...&$filter=substringof('String', property) or substringof('String', property2)
However, the case insensitivity using the short method is lost for the entire filter when one part is using a property more than two levels down. So in the following command the entire filter becomes case sensitive again:
...&$filter=substringof('String', property/property/property) or substringof('String', property2)
Is this an issue with SharePoint's service? Or am I just doing something wrong?
It seems like a bug in ListData.svc.
If comparisons (delegated to SQL server at the end of the day) are case-sensitive in any query they should always be case-sensitive.
Clearly the tolower call makes things match whether cases match or not, so we can ignore that.
However I have no idea why doing an OR on another property works.
Either it is a bug in SharePoint or perhaps you've inadvertantly picked an OR clause that returns the data your were expecting by accident.
I have an SSIS package that obtains a list of new GUIDs from a SQL table. I then shred the GUIDs into a string variable so that I have them separated out by comma. An example of how they appear in the variable is:
'5f661168-aed2-4659-86ba-fd864ca341bc','f5ba6d28-7283-4bed-9f11-e8f6bef225c5'
The problem is in the data flow task. I use the variable as a parameter in a SQL query to get my source data and I cannot get my results. When the WHERE clause looks like:
WHERE [GUID] IN (?)
I get an invalid character error so I found out the implicit conversion doesn't work with the GUIDs like I thought they would. I could resolve this by putting {} around the GUID if this were a single GUID but there are a potential 4 or 5 different GUIDs this will need to retrieve at runtime.
Figuring I could get around it with this:
WHERE CAST([GUID] AS VARCHAR(50)) IN (?)
But this simply produces no results and there should be two in my current test.
I figure there must be a way to accomplish this... What am I missing?
You can't, at least not using the mechanics you have provided.
You cannot concatenate values and make that work with a parameter.
I'm open to being proven wrong on this point but I'll be damned if I can make it work.
How can I make it work?
The trick is to just go old school and make your query via string building/concatenation.
In my package, I defined two variables, filter and query. filter will be the concatenation you are already performing.
query will be an expression (right click, properties: set EvaluateAsExpression to True, Expression would be something like "SELECT * FROM dbo.RefData R WHERE R.refkey IN (" + #[User::filter] + ")"
In your data flow, then change your source to SQL Command from variable. No mapping required there.
Basic look and feel would be like
OLE Source query
I'm using replaceAll to replace single quotes with "\\\\'" per a colleague's suggestion, but I'm pretty sure that's not enough to prevent all SQL injections.
I did some googling and found this: http://wiki.postgresql.org/wiki/8.1.4_et._al._Security_Release_Technical_Info
This explains it for PostgreSQL, but does the replacing not work for all SQL managers? (Like, MySQL, for example?)
Also, I think I understand how the explanation I linked works for single backslash, but does it extend to my situation where I'm using four backslashes?
Please note that I'm not very familiar with databases and how they parse input, but this is my chance to learn more! Any insight would be appreciated.
Edit: I've gotten some really helpful, useful answers. My next question is, what kind of input would break my implementation? That is, if you give me input and I prepend all single quotes with four backslashes, what kind of input would you give me to inject SQL code? While I am convinced that my approach is naive and wrong, maybe some examples would better teach me how easy it is to inject SQL against my "prevention".
No, because what about backslashes? for instance if you turn ' into \' then the input \' will become \\' which is an unescaped single quote and a "character literal" backslash. For mysql there is mysql_real_escape_string() which should exist for every platform because its in the MySQL library bindings.
But there is another problem. And that is if you have no quote marks around the data segment. In php this looks like:
$query="select * from user where id=".$_GET[id];
The PoC exploit for this is very simple: http://localhost/vuln.php?id=sleep(10)
Even if you do a mysql_real_escape_string($_GET[id]) its still vulnerable to sqli because the attacker doesn't have to break out of quote marks in order to execute sql. The best solution is Parameterized Queries.
No.
This is not enough, and this is not the way to go. And I can say it without even knowing anything about your data, your SQL or even anything about your application. You should never, ever include any user data directly into your SQL. You should use parameterized statements instead.
Besides if you are asking this question you shouldn't write your own SQL by hand in the first place. Use a good ORM instead. Asking if your home-grown regular expression would make your application safe from SQL injection is like asking if your home-grown memory allocation routine that you have written in Assembly language is safe from buffer overruns - to which I would say: if you are asking this question then you should use a memory-safe language in the first place.
A simple case of SQL injection works like this (in pseudocode):
name = form_params["name"]
year = 2011
sql = "INSERT INTO Students (name, year) " +
"VALUES ('" + name + "', " + year + ");"
database_handle.query(sql)
year is supplied by you, the programmer, so it's not tainted, and can be embedded in the query in any way you find suitable; in this case — as an unquoted number.
But name is supplied by the user and so can be anything. Along comes Bobby Tables and inputs this value:
name = "Robert'); DROP TABLE Students; -- "
And the query becomes
INSERT INTO Students (name, year) VALUES ('Robert');
DROP TABLE Students; -- ', 2011);
That substitution turned your one query into two.
The first one gives an error because of the mismatched row count, but that doesn't matter, because the database is able to unambiguously find and run the second query. The attacker can work around the error by fiddling with the input anyway. The -- is a comment so that the rest of the input is ignored.
Note how data suddenly became code — a typical sign of a security problem.
What the suggested replacement does is this:
name = form_params["name"].regex_replace("'", "\\\\'")
How this works is confusing, hence my earlier comment. The string literal "\\\\'" represents the string \\'. The regex_replace function interprets that as the string \'. The database then sees
... VALUES ('Robert\'); DROP TABLE Students; -- ', 2011);
and interprets that correctly as a quite unusual name.
Among other problems this approach is very fragile. If the strings you use in your language don't substitute \\ as \, if your string substitution function doesn't interpret \\ as \ (if it's not a regex function or it uses $1 instead of \1 for backreferences) you could end up with an even number of slashes like
... VALUES ('Robert\\'); DROP TABLE Students; -- ', 2011);
and no SQL injection will be prevented.
The solution is not to check what the language and library does with all possible input you can think of, or to anticipate what it might do in a future version, but rather to use the facilities provided by the database. These usually come in two flavours:
database-aware escaping, which does exactly the right escaping of any data because the client library matches the server and it knows what the character encoding of the database you are querying is:
sql = "... '" + database_handle.escape(name) + "' ..."
out-of-band data submission (usually with prepared statments), so the data isn't even in the same string as the code:
sql = "... VALUES (:n, :y);"
database_handle.query(sql, n = name, y = year)
I am taking in a string from user input, and splitting it on whitespace (using \w) into an array of strings. I then loop through the array, and append a part of the where clause like this:
query += " AND ( "
+ "field1 LIKE '%" + searchStrings[i] +"%' "
+ " OR field2 LIKE '%" + searchStrings[i] +"%' "
+ " OR field3 LIKE '%" + searchStrings[i] +"%' "
+ ") ";
I feel like this is dangerous, since I am appending user input to my query. However, I know that there isn't any whitespace in any of the search strings, since I split the initial input on whitespace.
Is it possible to attack this via a SQL injection? Giving Robert');DROP TABLE students;-- wouldn't actually drop anything, since there needs to be whitespace in there. In that example, it would not behave properly, but no damage would be done.
Can anyone with more experience fighting SQL injections help me either fix this, or put my mind at ease?
Thanks!
EDIT:
Wow, that is a lot of great input. Thank you everyone who responded. I will investigate full-text search and, at a minimum, parameterize my query.
Just so I can better understand the problem, would it be possible to inject if all whitespace AND single quotes were escaped?
Any time you allow a user to enter data into a query string like this you are vulnerable to SQL injection and it should be avoided like the plague!
You should be very careful how you allow your searchStrings[] array to be populated. You should always append variable data to your query using parameter objects:
+ field1 like #PropertyVal Or field2 like #PropertyVal Or field3 like #PropertyVal etc...
And if you're using SQL Server for example
Query.Parameters.Add(new SqlParameter("PropertyVal", '%' + searchStrings[i] + '%'));
Be very wary how you build a query string that you're going to run against a production server, especially if it has any data of consequence in it!
In your example you mentioned Little Bobby Tables
Robert');DROP TABLE students;--
And cited that because it needs white space, you couldn't do it - but if the malicious user encoded it using something like this:
Robert');Exec(Replace('Drop_Table_students','_',Char(32)));--
I would say - better to be safe and do it the right way. There's no simple way to make sure you catch every scenario otherwise...
Yes, this script did not contain any whitespaces, just encoded characters that SQL decoded back and executed:
http://www.f-secure.com/weblog/archives/00001427.html
The injected script was something like this:
DECLARE%20#S%20NVARCHAR(4000);SET%20#S=CAST(0x440045004300
4C00410052004500200040005400200076006100720063006800610072
00280032003500350029002C0040004300200076006100720063006800
610072002800320035003500290020004400450043004C004100520045
0020005400610062006C0065005F0043007500720073006F0072002000
43005500520053004F005200200046004F0052002000730065006C0065
0063007400200061002E006E0061006D0065002C0062002E006E006100
6D0065002000660072006F006D0020007300790073006F0062006A0065
00630074007300200061002C0073007900730063006F006C0075006D00
6E00730020006200200077006800650072006500200061002E00690064
003D0062002E0069006400200061006E006400200061002E0078007400
7900700065003D00270075002700200061006E0064002000280062002E
00780074007900700065003D003900390020006F007200200062002E00
780074007900700065003D003300350020006…
Wich SQL decoded to :
DECLARE #T varchar(255)'#C
varchar(255) DECLARE Table_Cursor
CURSOR FOR select a.name'b.name from
sysobjects a'syscolumns b where
a.id=b.id and a.xtype='u' and
(b.xtype=99 or b.xtype=35 or b…
and so on, so its possible to do whatever you want with every table in the database without using any whitespaces.
There are just too many ways to get this wrong, that I wouldn't rely if anyone told me "no, this will be safe, because ..."
What about escaping the whitespace in some form (URL-Encode or somethign). What about using non-obvious Unicode whitespace characters that your simple tests don't check for. What if your DB supports some malicious operations that don't require a white space?
Do the correct way: Use a PreparedStatement (or whatever your platform uses for injection-safe parameterization), append and prepend the "%" to the user input and use that as a parameter.
The rule of thumb is: If the string you're appending isn't SQL, it must be escaped using prepared statements or the correct 'escape' function from your DB client library.
I totally agree that query parameters are the absolutely safest way to go. With them you have no risk of SQL injection whatsover (unless you do something stupid with the parameters) and there is no overhead of escaping.
If your DBMS does not support query parameters, then it MUST support string escaping. In the worst case you can try to escape single-quotes yourself, although there still is a Unicode exploit that can circumvent this. However, if your DBMS does not support query parameters, it probably doesn't support Unicode either. :)
Added: Also. queries like you wrote up there are killers for performance - no indexes can be used. I'd advise to look up your DBMS' full-text-indexing capabilities. They are meant exactly for cases such as this.
Yes, they could still inject items, without spaces it might not do much, but it is still a vulnerability.
In general blindly adding user input to a query is not a good idea.
Here's a trivial injection, if I set field1 to this I've listed all rows in your database. This may be bad for security...
'+field1+'
You should use parameters (these are valid in inline SQL too), e.g.
AND Field1 = #Field1