Redis Fast File System Search - search

I am new to Redis, and
I would like to iterate all files and folders on a given computer and save it in Redis DB,
so I can search for files or folders by their name later.
I wonder how I should store the data in Redis and how can I make the search as fast as possible?
Thanks

Your requirement can divided as:
Iterate and save in Redis.
Get from Redis and search.
The choice of data type depends on usage of data.
Redis provides us "KEY/VALUE" relationship.
Taking some sample data:
File Name Location
----------------------------
Sys.log /root/tmp
info.txt /var/log
redis.log /var/log/redis/redis.log
abc.log /app/task
abc.log /home/test
Note there can be n files with same name at different location. This means we cannot use normal key/value with file names as keys.
One relationship which should be kept in mind is Parent-child. A directory(parent) will have files(child) or other directories.
Also there should be a way to distinguish between files and directories.
Solution:
Iterate file system and store in REDIS
(Directory design)
Create Redis Sets of directories with their content as file names and other directories which too have their own lists.
And every enter in the list should have a prefix which can be used to identify whether the entry is a file or directory. If it is a directory then you can use it to further search for more files.
This gives us a capability to use the sets for printing all their children.
127.0.0.1:6379> SADD "/var/log" "File:info.txt"
(integer) 1
127.0.0.1:6379> SMEMBERS "/var/log"
1) "File:info.txt"
127.0.0.1:6379> SADD "/var/log" "Dir:redis"
(integer) 1
127.0.0.1:6379> SMEMBERS "/var/log"
1) "Dir:redis"
2) "File:info.txt"
content of redis
127.0.0.1:6379> SADD "redis" "redis.log"
(integer) 1
127.0.0.1:6379> SADD "redis" "error.log"
(integer) 1
127.0.0.1:6379> SMEMBERS redis
1) "redis.log"
2) "error.log"
Search REDIS
( Input a file name print all possible locations where it is present.)
While iterating file system when we are creating sets for directories we parallel create a hashmap which stores .
Contents of a list for a file will show where all it is present.
127.0.0.1:6379> lpush "info.txt" "/var/log"
(integer) 1
127.0.0.1:6379> lpush "info.txt" "/tmp"
(integer) 2
127.0.0.1:6379> lrange "info.txt" 0 -1
1) "/tmp"
2) "/var/log"
Note: for fast experience with REDIS try and execute group of commands at one go .i.e use multi or use eval (lua scripts).
Hope this gives you a start in your design.

Related

How to see key count of matching pattern in Azure Redis Cache Console

I want to just see the total number of keys available in Azure Redis cache that matches the given pattern. I tried the following command it is showing the count after displaying all the keys (which caused server load), But I need only the count.
>SCAN 0 COUNT 10000000 MATCH "{UID}*"
Except command SCAN, the command KEYS pattern can return the same result as your current command SCAN 0 COUNT 10000000 MATCH "{UID}*".
However, for your real needs to get the number of keys matching a pattern, there is an issue add COUNT command from the Redis offical GitHub repo which had answered by the author antirez for you, as the content below.
Hi, KEYS is only intended for debugging since it is O(N) and performs a full keyspace scan. COUNT has the same problem but without the excuse of being useful for debugging... (since you can simply use redis-cli keys ... | grep ...). So feature not accepted. Thanks for your interest.
So you can not directly get the count of KEYS pattern, but there are some possible solutions for you.
Count the keys return from command KEYS pattern in your programming language for the small number of keys with a pattern, like doing redis-cli KEYS "{UID}*" | wc -l on the host server of Redis.
To use the command EVAL script numkeys key \[key ...\] arg \[arg ...\] to run a Lua script to count the keys with pattern, there are two scripts you can try.
2.1. Script 1
return #redis.call("keys", "{UID}*")
2.2. Script 2
return table.getn(redis.call('keys', ARGV[1]))
The completed command in redis-cli is EVAL "return table.getn(redis.call('keys', ARGV[1]))" 0 {UID}*

Extracting a range of keys from leveldb or redis

I would like to extract a range of keys from either leveldb or redis. For example i have the following key structure;
group:1/member:1
group:1/member:1/log:1
group:1/member:1/log:2
group:1/member:1/log:3
group:1/member:1/log:4
group:1/member:2
group:1/member:2/log:1
group:1/member:2/log:2
group:1/member:3
group:1/member:3/log:1
I would like to get all members(members:1, members:2, members:3) but i do not want their log entries to be included in results(there may be thousands of logs). What is the best approach to achieving this using a KV store like redis or leveldb?
For LevebDB, you can use the leveldb::Iterator to iterate the key space, and only keep the keys that match your pattern.
For Redis, you can use the SCAN command to scan the key space with a pattern.

select all and truncate redis database

I'm looking for something similar to BLPOP, but instead of element I want to get them all run running over them in a loop.
It means that I want to get all the records of redis collection, and truncate it.
Consider using a LUA script to do the LRANGE+DEL atomically.
Or use RENAME to move the list to a temporary key which you will use to process the data.
RENAME yourlist temp-list
LRANGE temp-list 0 -1
... process the list
DEL temp-list

How to sum value of all Key in Redis

In Redis DB, i have many key in String type to save dowloads time with a application,
example:
Key value
20131028:1 100
20131028:2 15
20131028:3 10
..........
I want to sum all value of all key by redis command, please helpe me solve it. Thank you so much.
Redis is not designed to do this kind of things. You will better served by a RDBMS, MongoDB, or something like ElasticSearch.
Still, if you need to do it (to be launched from a shell):
$ redis-cli keys '20131028:*' | awk '{print "get "$1}' | redis-cli | awk '{x+=$1} END { print x }'
An other way to do it is to use a Lua server-side script:
$ redis-cli eval "local keys = redis.call('keys',KEYS[1]) ; local sum=0 ; for _,k in ipairs(keys) do sum = sum + redis.call('get',k) end ; return sum" 1 '20131028:*'
In both cases, performance will suck if you have many keys in the Redis instance, and the instance will be blocked for all connections while the keys are scanned.
Available since Redis v2.6 is the most awesome ability to execute Lua scripts on the Redis server.
127.0.0.1:6379> EVAL "local sum = 0 local i=1 local a1 = redis.call('hvals','Key') while(a1[i]) do sum=sum+a1[i] i=i+1 end return sum" 0
An important note is that Redis server-side lua scripts blocks EVERYTHING, which can be a dealbreaker in most cases. Source: stackoverflow.com/a/30896608/2440
I think its better to to use mget and fetch all keys with one command instead of issueing one command for each key. that way you get all results with only one call to redis and just have to sum them up. of course this only works if you know the keys beforehand ...
Use redis lua:
eval "local s = 0 for _,v in ipairs(redis.call('hvals', KEYS[1])) do s = s + v end return s" 1 Key
Could save the script with script load, then reuse the script with evalsha for any hash key.

Matching text files from a list of system numbers

I have ~ 60K bibliographic records, which can be identified by system number. These records also hold full-text (individudal text files named by system number).
I have lists of system numbers in bunches of 5K and I need to find a way to copy only the text files from each 5K list.
All text files are stored in a directory (/fulltext) and are named something along these lines:
014776324.txt.
The 5k lists are plain text stored in separated directories (e.g. /5k_list_1, 5k_list_2, ...), where each system number matches to a .txt file.
For example: bibliographic record 014776324 matches to 014776324.txt.
I am struggling to find a way to copy into the 5k_list_* folders only the corresponding text files.
Any idea?
Thanks indeed,
Let's assume we invoke the following script this way:
./the-script.sh fulltext 5k_list_1 5k_list_2 [...]
Or more succinctly:
./the-script.sh fulltext 5k_list_*
Then try using this (totally untested) script:
#!/usr/bin/env bash
set -eu # enable error checking
src_dir=$1 # first argument is where to copy files from
shift 1
for list_dir; do # implicitly consumes remaining args
while read bibliographic record sys_num rest; do
cp "$src_dir/$sys_num.txt" "$list_dir/"
done < "$list_dir/list.txt"
done

Resources