Linux Sed won't work on large files (3GB)

Linux Sed won't work on large files (3GB) - linux

I have a large file (>3GB) which has a single line containing content similar to below
{"id","1","name":"one"},{"id","2","name":"two"},{"id","3","name":"three"},
I am using sed to search and replace }, with },\n so that each dictionary is on its own line.
sed 's/},/},\\n/g' FILE >> NEW_FILE
This doesn't work all the time and I had to split the file into smaller 1GB chunks, replace the text on each smaller file then combine it.
Is there any other way to do the search and replace on this large file.

Related

Data hidden in jpg

I am currently looking for hidden data in a jpg file but I have no clue on how to operate.
There is a jpg file containing text in a format I have never seen before :
-ne \xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01\x01\x01\x00\x60\x00\x60\x00\x00\xff\xdb\x00\x43\x00\x06\x04\x04\x05\x04\x04\x06\x05\x05\x05\x06\x06\x06\x07\x09\x0e\x09\x09\x08\x08\x09\x12\x0d\x0d\x0a\x0e\x15\x12\x16\x16\x15\x12\x14\x14\x17\x1a\x21\x1c\x17\x18\x1f\x19\x14\x14\x1d\x27\x1d\x1f\x22\x23\x25\x25\x25\x16\x1c\x29\x2c\x28\x24\x2b\x21\x24\x25\x24\xff\xdb\x00\x43\x01\x06\x06\x06\x09\x08\x09\x11\x09\x09\x11\x24\x18\x14\x18\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\x24\xff\xc0\x00\x11\x08\x01\x8e\x03\x4e\x03\x01\x22\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01\x7d\x01\x02\x03\x00\x04\x11\x05\x12\x21\x31\x41\x06\x13\x51\x61\x07\x22\x71\x14\x32\x81\x91\xa1\x08\x23
-ne \x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82\x09\x0a\x16\x17\x18\x19\x1a\x25\x26\x27\x28\x29\x2a\x34\x35\x36\x37\x38\x39\x3a\x43\x44\x45\x46\x47\x48\x49\x4a\x53\x54\x55\x56\x57\x58\x59\x5a\x63\x64\x65\x66\x67\x68\x69\x6a\x73\x74\x75\x76\x77\x78\x79\x7a\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xff\xc4\x00\x1f\x01\x00\x03\x01\x01\x01\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\xff\xc4\x00\xb5\x11\x00\x02\x01\x02\x04\x04\x03\x04\x07\x05\x04\x04\x00\x01\x02\x77\x00\x01\x02\x03\x11\x04\x05\x21\x31\x06\x12\x41\x51\x07\x61\x71\x13\x22\x32\x81\x08\x14\x42\x91\xa1\xb1\xc1\x09\x23\x33\x52\xf0\x15\x62\x72\xd1\x0a\x16\x24\x34\xe1\x25\xf1\x17\x18\x19\x1a\x26\x27\x28\x29\x2a\x35\x36\x37\x38\x39\x3a\x43\x44\x45\x46\x47\x48\x49
This is just the beggining of the file as there is at least a hundred lines.
The file type given by the command file : file.jpg: ASCII text, with very long lines
I tried some of the common tools to identify any patterns or hidden data like exiftools, strings, xxd but I found nothing.
If you have any idea on what to do it would be very much appreciated.

If it's a challenge of CTF, there are some common way to find out flag.
First try to find flag in file metadata, like description of file field
you can also try tool: stegsolve.jar.
In more advance sence, stego info hidden with some math calulation, give this tool a try: zsteg

Perhaps I'm misunderstanding the problem here, but if your file actually starts with a backslash character followed by the characters x, f, f, \, x, d, 8 and so on, then what you're looking at is the binary content of a JPG file that has been converted into ASCII text.
If so, you need to convert this back into binary data. For example, in Linux or MacOS, you could do this by entering the following on the command line:
echo -ne '\xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01...etc...' > img.jpg
echo -ne '\x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82...etc...' >> img.jpg
(Note: > sends the results to a new file, and >> appends to the end of the file)
Or alternatively in Python:
with open("img.jpg","wb") as f:
f.write(b'\xff\xd8\xff\xe0\x00\x10\x4a\x46\x49\x46\x00\x01...etc...')
f.write(b'\x42\xb1\xc1\x15\x52\xd1\xf0\x24\x33\x62\x72\x82...etc...')
# and so on for all the other lines
Either way, you should end up with a file called img.jpg containing the image you're after.

Is there a linux command loop through files in a directory then write all files names and content into csv file?

I have a directory with 100 thousand files (each file has only one line), I need a fast way to go through all files to write one csv file that contains two columns: File Name, Content of the File.
File_A:
This is File one.
File_B:
This is file two.
I need to get a csv file:
File_A This is File one.
File_B This is file two.

Just do grep . * > ../output.csv to create a file with a : separator. If you want to use a different delimiter, pipe it to sed. Some grep provide a --null option to use a zero byte as the delimiter.

Print out file lines with special characters - Linux

I have a txt file with that is formatted like this:
xxxxxxxxx,xxx
with a few thousand lines.
I have this command right now to delete the special characters from the file
sed -i -e:a -e 's/[^0-9]\(.*,\)/\1/;ta' file.txt
instead of editing the actual file, is there a way to print out the line of text that each individual one lies on?
I only need to test the first 9 characters as every line has a comma in it. :)

It SOUNDS like all you want is:
grep '[^[:alnum:]]' file
but without sample input/output we're just guessing.

Ignore spaces, tabs and new line in SED

I tried to replace a string in a file that contains tabs and line breaks.
the command in the shell file looked something like this:
FILE="/Somewhere"
STRING_OLD="line 1[ \t\r\n]*line 2"
sed -i 's/'"$STRING_OLD"'/'"$STRING_NEW"'/' $FILE
if I manually remove the line breaks and the tabs and leave only the spaces then I can replace successfully the file. but if I leave the line breaks then SED is unable to locate the $STRING_OLD and unable to replace to the new string
thanks in advance
Kobi

sed reads lines one at a time, and usually lines are also processed one at a time, as they are read. However, sed does have facilities for reading additional lines and operating on the combined result. There are several ways that could be applied to your problem, such as:
FILE="/Somewhere"
STRING_OLD="line 1[ \t\r\n]*line 2"
sed -n "1h;2,\$H;\${g;s/$STRING_OLD/$STRING_NEW/g;p}"
That that does more or less what you describe doing manually: it concatenates all the lines of the file (but keeps newlines), and then performs the substitution on the overall buffer, all at once. That does assume, however, either that the file is short (POSIX does not require it to work if the overall file length exceeds 8192 bytes) or that you are using a sed that does not have buffer-size limitations, such as GNU sed. Since you tagged Linux, I'm supposing that GNU sed can be assumed.
In detail:
the -n option turns off line echoing, because we save everything up and print the modified text in one chunk at the end.
there are multiple sed commands, separated by semicolons, and with literal $ characters escaped (for the shell):
1h: when processing the first line of input, replace the "hold space" with the contents of the pattern space (i.e. the first line, excluding newline)
2,\$H: when processing any line from the second through the last, append a newline to the hold space, then the contents of the pattern space
\${g;s/$STRING_OLD/$STRING_NEW/g;p}: when processing the last line, perform this group of commands: copy the hold space into the pattern space; perform the substitution, globally; print the resulting contents of the pattern space.
That's one of the simpler approaches, but if you need to accommodate seds that are not as capable as GNU's with regard to buffer capacity then there are other ways to go about it. Those start to get ugly, though.

replacing fixed amount of text in a large file

I'm trying to replace a small amount of text on a specific line of a large log file (totaling ~40 mil lines):
sed -i '20000000s/.\{5\}$/zzzzz/' log_file
The purpose of this is to "mark" a line with an expected unique string, for later testing.
The above command works fine, but in-place editing of sed (and perl) creates a temp file, which is costly.
Is there a way to replace a fixed number of characters (i.e. 5 chars with 5 other chars) in a file without having to create a temp file, or a very large buffer, which would wind up becoming a temp file itself.

You could use dd to replace some bytes in place:
dd if=/dev/zero of=path/to/file bs=1 count=10 conv=notrunc skip=1000
would write 10 zeros (0x00) after the 1000s byte. You can put whatever you want to replace inside a file and write the path to it in the if parameter. Then you had to insert the size of the replacement file into the count parameter, so the whole file gets read.
the conv=notrunc parameter tells dd to leave the end of the file untruncated.
This should work well for any 1-byte file encoding.

ex is a scriptable file editor, so it will work in-place:
ex log_file << 'END_OF_COMMANDS'
20000000s/.\{5\}$/zzzzz/
w
q
END_OF_COMMANDS

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux Sed won't work on large files (3GB) - linux

Related

Data hidden in jpg

Is there a linux command loop through files in a directory then write all files names and content into csv file?

Print out file lines with special characters - Linux

Ignore spaces, tabs and new line in SED

replacing fixed amount of text in a large file

Categories

Resources