Split fastq read into 10G mini files, assembler not accepting as fastq format - linux

I split a 52G fastq file into 10G chunks with the following code:
split -b 10G /home/bilalm/H_glaber_quality_filtering/AfterQC/good_reads/SRR530529.good.fq outputfile
This produced the following files:
-rw-rw-r-- 1 bilalm bilalm 10G Aug 11 13:48 outputfileaa
-rw-rw-r-- 1 bilalm bilalm 10G Aug 11 13:49 outputfileab
-rw-rw-r-- 1 bilalm bilalm 10G Aug 11 13:50 outputfileac
-rw-rw-r-- 1 bilalm bilalm 10G Aug 11 13:51 outputfilead
-rw-rw-r-- 1 bilalm bilalm 10G Aug 11 13:52 outputfileae
-rw-rw-r-- 1 bilalm bilalm 1.6G Aug 11 13:53 outputfileaf
When I was attempting to assemble "outputfileab", with Velvet, I get the following error message:
velveth: /home/bilalm/H_glaber_quality_filtering/AfterQC/good_reads/split_SRR530529_file/outputfileab does not seem to be in FastQ format
Strangely, both velveth and velvetg was used normally to assemble the first 10G read i.e. "outputfileaa".
Anybody know what's going on?

split by file size rather than line counts does just that, and will split in the middle of a line if the byte limit is reached. velvet has a check to assert if every fourth line starts with #, so this check will fail considering the split method, which is why we are seeing this happen on the second file and not the first. I would suggest you split this file by line count passing the -l xxxx flag.

Related

S-record files output by objcopy are smaller than the original binaries

After using arm-none-eabi-gcc to build a file in ELF format, I am using arm-none-eabi-objcopy to create an S-record file. The command that my makefile runs is:
$(TOOLCHAIN)-objcopy --srec-len 10 -O srec "$<" "$#"
The makefile can build with various different settings - with debug symbols, with optimization, and with neither.
With some information such as my username removed, the output of ls -la after doing all three builds is:
-rw-r--r-- 1 4096 270330 Oct 12 18:13 outfile_Debug.mot
-rw-r--r-- 1 4096 825888 Oct 12 18:13 outfile_Debug.out
-rw-r--r-- 1 4096 270334 Oct 12 17:06 outfile_Default.mot
-rw-r--r-- 1 4096 465928 Oct 12 17:06 outfile_Default.out
-rw-r--r-- 1 4096 184776 Oct 12 19:02 outfile_Optimized.mot
-rw-r--r-- 1 4096 395672 Oct 12 19:02 outfile_Optimized.out
Now, I have read an unsourced claim that srec files canot contain debugging information, which would explain why the Default and Debug .mot files are roughly the same size while the corresponding .out file sizes differ enormously. But otherwise, the ELF file is a binary representation of the executable, while the S-record file uses hex strings in ASCII text, so surely it should be larger than the binary ELF file for a non-debug build?

Directory given in a variable gets never reached

There are two variables in my .gitlab-ci.yml file, both of them are used in the same script line:
variables:
TEST_SERVER: 10.11.12.13
BUILD_DIR: "/var/www/distrib"
[...]
script:
- ssh skipper#$TEST_SERVER 'ls -la $BUILD_DIR'
The server IP gets picked up correctly, but the directory gets never reached (and it exists, of course). The directory contents listed below are obviously user's home dir contents:
$ ssh skipper#$TEST_SERVER 'ls -la $BUILD_DIR'
Warning: Permanently added '10.11.12.13' (ECDSA) to the list of known hosts.
total 48
drwxr-xr-x 5 skipper skipper 4096 Mar 12 12:03 .
drwxr-xr-x 16 root root 4096 Mar 11 09:29 ..
-rw------- 1 skipper skipper 2056 Mar 18 09:43 .bash_history
-rw-r--r-- 1 skipper skipper 220 Mar 11 09:29 .bash_logout
-rw-r--r-- 1 skipper skipper 3771 Mar 11 09:29 .bashrc
drwx------ 2 skipper skipper 4096 Mar 11 11:38 .cache
drwx------ 3 skipper skipper 4096 Mar 11 11:38 .gnupg
-rw-r--r-- 1 skipper skipper 807 Mar 11 09:29 .profile
drwx------ 2 skipper root 4096 Mar 11 11:30 .ssh
-rw------- 1 skipper skipper 9800 Mar 12 12:03 .viminfo
I tried defining the directory variable with or without quotation marks, then calling it with double dollar sign ($$BUILD_DIR), but none of these attempts worked.
Any ideas what is wrong here?
I think the single quotes might be messing with the script section somewhat, as everything inside the single quotes is preserved literally.
Using double quotes round the ls should resolve the issue.
ssh skipper#$TEST_SERVER "ls -la $BUILD_DIR"

Total size of files in Kafka logs directory is less than the sum of their sizes

I'm testing a Kafka producer application and noticed something strange about the disk usage of the Kafka logs. When looking at the total size of a certain partition's log directory, while the application is writing to Kafka, I see this:
$ ls -l --block-size=kB kafka-logs/mytopic-0
total 52311kB
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.index
-rw-rw-r-- 1 app-data app-data 46505kB Oct 29 12:45 00000000000000000000.log
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.timeindex
-rw-rw-r-- 1 app-data app-data 1kB Oct 29 11:55 leader-epoch-checkpoint
Then I stop my application, and a few minutes later I repeat the above command, and get this:
$ ls -l --block-size=kB kafka-logs/mytopic-0
total 46519kB
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.index
-rw-rw-r-- 1 app-data app-data 46505kB Oct 29 12:45 00000000000000000000.log
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.timeindex
-rw-rw-r-- 1 app-data app-data 1kB Oct 29 11:55 leader-epoch-checkpoint
Questions: Why does the ls total figure not represent the sum of sizes of all the files in that directory? Why does the total decrease a few minutes after stopping the producer application, even though all the files in the directory remain the same size?
The files might have holes. Can you run following commands :
du --apparent-size *

java -xvf test.jar not working

i tried to extract the test.jar
command is executing successfully but no output.
user#host:home/test->ll
drwxr-xr-x 107 user abc 6040 Apr 4 09:55 ..
drwxr-xr-x 2 user abc 26 Apr 4 10:06 .
-rw-r--r-- 1 user abc 51241 Apr 4 10:06 test.jar
user#host:home/test->jar -xvf test.jar
user#host:home/test->ll
total 262
drwxr-xr-x 107 user abc 6040 Apr 4 09:55 ..
drwxr-xr-x 2 user abc 26 Apr 4 10:06 .
-rw-r--r-- 1 user abc 51241 Apr 4 10:06 test.jar
Kindly help me to resolve this
Actual Requirement:
Need to extract and access a resource in jar file.
According to Oracle Java toutorials:
https://docs.oracle.com/javase/tutorial/deployment/jar/unpack.html
You should unpack it with:
jar xfv test.jar
Without '-' sign.
x option indicates that you want to extract files from the JAR archive.
f options indicates that the JAR file from which files are to be extracted is specified on the command line, rather than through stdin.
v is verbose
try to this command also unzip test.jar and below commands help you view some file without extract all the files.
unzip -q -c test.jar META-INF/MANIFEST.MF

Linux: Finding Newly Added Files

I am trying to obtain a backup of 'newly' added files to a Fedora system. Files can be copied through a Windows Samba share and appear to retain the original created timestamp. However, because it retains this timestamp I am having issues identifying which files were newly added to the system.
Currently, the only way I can think of doing this is to have a master list snapshot of all the files on the system at a specific time. Then when I perform the backup I compare the previous snapshot with a current snapshot. It would detect files that were removed from the system but it seems excessive and I was thinking there must be an easier way to backup newly added files.
Terry
Try using find. Something like this:
find . -ctime -10
That will give you a list of files and directories, starting from within your current directory, that has had its state changed within the last 10 days.
Example:
My Downloads directory looks like this:
kobus#akira:~/Downloads$ ll
total 2025284
drwxr-xr-x 4 kobus kobus 4096 Nov 4 11:25 ./
drwxr-xr-x 41 kobus kobus 4096 Oct 30 09:26 ../
-rw-rw-r-- 1 kobus kobus 8042383 Oct 28 14:08 apache-maven-3.3.3- bin.tar.gz
drwxrwxr-x 2 kobus kobus 4096 Oct 14 09:55 ELKImages/
-rw-rw-r-- 1 kobus kobus 1469054976 Nov 4 11:25 Fedora-Live-Workstation-x86_64-23-10.iso
-rw------- 1 kobus kobus 351004 Sep 21 14:07 GrokConstructor-master.zip
drwxrwxr-x 11 kobus kobus 4096 Jul 11 2014 jboss-eap-6.3/
-rw-rw-r-- 1 kobus kobus 183399393 Oct 19 16:26 jboss-eap-6.3.0-installer.jar
-rw-rw-r-- 1 kobus kobus 158177216 Oct 19 16:26 jboss-eap-6.3.0.zip
-rw-rw-r-- 1 kobus kobus 71680110 Oct 13 13:51 jre-8u60-linux-x64.tar.gz
-rw-r--r-- 1 kobus kobus 4680 Oct 12 12:34 nginx-release-centos-7-0.el7.ngx.noarch.rpm
-rw-r--r-- 1 kobus kobus 3479765 Oct 12 14:22 ngx_openresty-1.9.3.1.tar.gz
-rw------- 1 kobus kobus 16874455 Sep 15 16:49 Oracle_VM_VirtualBox_Extension_Pack-5.0.4-102546.vbox-extpack
-rw-r--r-- 1 kobus kobus 7505310 Oct 6 10:29 sublime_text_3_build_3083_x64.tar.bz2
-rw------- 1 kobus kobus 41467245 Sep 7 10:37 tagspaces-1.12.0-linux64.tar.gz
-rw-rw-r-- 1 kobus kobus 42658300 Nov 4 10:14 tagspaces-2.0.1-linux64.tar.gz
-rw------- 1 kobus kobus 70046668 Sep 15 16:49 VirtualBox-5.0-5.0.4_102546_el7-1.x86_64.rpm
Here's what the find returns:
kobus#akira:~/Downloads$ find . -ctime -10
.
./tagspaces-2.0.1-linux64.tar.gz
./apache-maven-3.3.3-bin.tar.gz
./Fedora-Live-Workstation-x86_64-23-10.iso
kobus#akira:~/Downloads$
Most unices do not have a concept of file creation time. You can't make ls print it because the information is not recorded. If you need creation time, use a version control system: define creation time as the check-in time.
If your unix variant has a creation time, look at its documentation. For example, on Mac OS X (the only example I know of¹), use ls -tU. Windows also stores a creation time, but it's not always exposed to ports of unix utilities, for example Cygwin ls doesn't have an option to show it. The stat utility can show the creation time, called “birth time” in GNU utilities, so under Cygwin you can show files sorted by birth time with stat -c '%W %n' * | sort -k1n.
Note that the ctime (ls -lc) is not the file creation time, it's the inode change time. The inode change time is updated whenever anything about the file changes (contents or metadata) except that the ctime isn't updated when the file is merely read (even if the atime is updated). In particular, the ctime is always more recent than the mtime (file content modification time) unless the mtime has been explicitly set to a date in the future.
"Newly added files, Fedora" : The below examples will show a list with date and time.
Example, all installed packages : $ rpm -qa --last
Example, the latest 100 packages : $ rpm -qa --last | head -100
Example, create a text file : $ rpm -qa --last | head -100 >> last-100-packages.txt

Resources