I want to create a two node cluster in Cassandra. I have done following changes in my yaml file -
Example:
Node 1
cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider: class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
‐ seeds: "10.168.66.41,10.176.170.59"
listen_address:10.168.66.41
rpc_address:10.168.66.41
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap : false
Node 2
cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider: class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
‐ seeds: "10.168.66.41"
listen_address:10.176.170.59
rpc_address:10.176.170.59
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap : false
But still I am not able to create two node cluster. Why am I facing this issue?
Well, it's hard to know without seeing an actual error message from your system.log, but I'll take a guess. It looks like you might have a chicken-before-the-egg problem, based on your seed nodes.
10.176.170.59 won't be able to start without 10.168.66.41 already running. And while .41 has itself specified as a seed node, it also has .59 specified, which might throw things off.
My recommendation is to change your seed list to be the same on all (both) nodes. Just set it to this on both:
seeds: "10.168.66.41"
Then, start .41, which should come up. Then start .59.
If that doesn't do it, look for exceptions in your system.log.
Auto bootstrap should be set to true when a new node is added in a cluster.
So set auto bootstrap to true and set your seed node as one node E.g. in your case it 10.168.66.41 (or) 10.176.70.59.
Start your seed node first
Telnet your seed node and storage port (default 7000) from your secondary node, if not able to telnet then check your firewall settings.
Start your secondary node now
Related
During a shuffle, the mappers dump their outputs to the local disk from where it gets picked up by the reducers. Where exactly on the disk are those files dumped? I am running pyspark cluster on YARN.
What I have tried so far:
I think the possible locations where the intermediate files could be are (In the decreasing order of likelihood):
hadoop/spark/tmp. As per the documentation at the LOCAL_DIRS env variable that gets defined by the yarn.
However, post starting the cluster (I am passing master --yarn) I couldn't find any LOCAL_DIRS env variable using os.environ but, I can see SPARK_LOCAL_DIRS which should happen only in case of mesos or standalone as per the documentation (Any idea why that might be the case?). Anyhow, my SPARK_LOCAL_DIRS is hadoop/spark/tmp
tmp. Default value of spark.local.dir
/home/username. I have tried sending custom value to spark.local.dir while starting the pyspark using --conf spark.local.dir=/home/username
hadoop/yarn/nm-local-dir. This is the value of yarn.nodemanager.local-dirs property in yarn-site.xml
I am running the following code and checking for any intermediate files being created at the above 4 locations by navigating to each location on a worker node.
The code I am running:
from pyspark import storagelevel
df_sales = spark.read.load("gs://monsoon-credittech.appspot.com/spark_datasets/sales_parquet")
df_products = spark.read.load("gs://monsoon-credittech.appspot.com/spark_datasets/products_parquet")
df_merged = df_sales.join(df_products,df_sales.product_id==df_products.product_id,'inner')
df_merged.persist(storagelevel.StorageLevel.DISK_ONLY)
df_merged.count()
There are no files that are being created at any of the 4 locations that I have listed above
As suggested in one of the answers, I have tried getting the directory info in the terminal the following way:
At the end of log4j.properties file located at $SPARK_HOME/conf/ add log4j.logger.or.apache.spark.api.python.PythonGatewayServer=INFO
This did not help. The following is the screenshot of my terminal with logging set to INFO
Where are the spark intermediate files (output of mappers, persist etc) stored?
Without getting into the weeds of Spark source, perhaps you can quickly check it live. Something like this:
>>> irdd = spark.sparkContext.range(0,100,1,10)
>>> def wherearemydirs(p):
... import os
... return os.getenv('LOCAL_DIRS')
...
>>>
>>> irdd.map(wherearemydirs).collect()
>>>
...will show local dirs in terminal
/data/1/yarn/nm/usercache//appcache/<application_xxxxxxxxxxx_xxxxxxx>,/data/10/yarn/nm/usercache//appcache/<application_xxxxxxxxxxx_xxxxxxx>,/data/11/yarn/nm/usercache//appcache/<application_xxxxxxxxxxx_xxxxxxx>,...
But yes, it will basically point to the parent dir (created by YARN) of UUID-randomized subdirs created by DiskBlockManager, as #KoedIt mentioned:
:
23/01/05 10:15:37 INFO storage.DiskBlockManager: Created local directory at /data/1/yarn/nm/usercache/<your-user-id>/appcache/application_xxxxxxxxx_xxxxxxx/blockmgr-d4df4512-d18b-4dcf-8197-4dfe781b526a
:
This is going to depend on what your cluster setup is and your Spark version, but you're more or less looking at the correct places.
For this explanation, I'll be talking about Spark v3.3.1. which is the latest version as of the time of this post.
There is an interesting method in org.apache.spark.util.Utils called getConfiguredLocalDirs and it looks like this:
/**
* Return the configured local directories where Spark can write files. This
* method does not create any directories on its own, it only encapsulates the
* logic of locating the local directories according to deployment mode.
*/
def getConfiguredLocalDirs(conf: SparkConf): Array[String] = {
val shuffleServiceEnabled = conf.get(config.SHUFFLE_SERVICE_ENABLED)
if (isRunningInYarnContainer(conf)) {
// If we are in yarn mode, systems can have different disk layouts so we must set it
// to what Yarn on this system said was available. Note this assumes that Yarn has
// created the directories already, and that they are secured so that only the
// user has access to them.
randomizeInPlace(getYarnLocalDirs(conf).split(","))
} else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) {
conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator)
} else if (conf.getenv("SPARK_LOCAL_DIRS") != null) {
conf.getenv("SPARK_LOCAL_DIRS").split(",")
} else if (conf.getenv("MESOS_SANDBOX") != null && !shuffleServiceEnabled) {
// Mesos already creates a directory per Mesos task. Spark should use that directory
// instead so all temporary files are automatically cleaned up when the Mesos task ends.
// Note that we don't want this if the shuffle service is enabled because we want to
// continue to serve shuffle files after the executors that wrote them have already exited.
Array(conf.getenv("MESOS_SANDBOX"))
} else {
if (conf.getenv("MESOS_SANDBOX") != null && shuffleServiceEnabled) {
logInfo("MESOS_SANDBOX available but not using provided Mesos sandbox because " +
s"${config.SHUFFLE_SERVICE_ENABLED.key} is enabled.")
}
// In non-Yarn mode (or for the driver in yarn-client mode), we cannot trust the user
// configuration to point to a secure directory. So create a subdirectory with restricted
// permissions under each listed directory.
conf.get("spark.local.dir", System.getProperty("java.io.tmpdir")).split(",")
}
}
This is interesting, because it makes us understand the order of precedence each config setting has. The order is:
if running in Yarn, getYarnLocalDirs should give you your local dir, which depends on the LOCAL_DIRS environment variable
if SPARK_EXECUTOR_DIRS is set, it's going to be one of those
if SPARK_LOCAL_DIRS is set, it's going to be one of those
if MESOS_SANDBOX and !shuffleServiceEnabled, it's going to be MESOS_SANDBOX
if spark.local.dir is set, it's going to be that
ELSE (catch-all) it's going to be java.io.tmpdir
IMPORTANT: In case you're using Kubernetes, all of this is disregarded and this logic is used.
Now, how do we find this directory?
Luckily, there is a nicely placed logging line in DiskBlockManager.createLocalDirs which prints out this directory if your logging level is INFO.
So, set your default logging level to INFO in log4j.properties (like so), restart your spark application and you should be getting a line saying something like
Created local directory at YOUR-DIR-HERE
[Question posted by a user on YugabyteDB Community Slack]
I'm just testing yugabyte under some negative tests and I'm facing a kind of issue. I'm running a 3 node cluster (master and tserver running on same node) When I stop one node and start it up again, T-server is not booting up under this log
F20220506 06:50:49 ../../src/yb/tserver/tablet_server_main.cc:220] Invalid argument (yb/util/universe_key_manager.cc:73): Could not init Tablet Manager: Failed to open tablet metadata for tablet: eb1e5457022f42c084148ca8fa4ba5c6: Failed to load tablet metadata for tablet id eb1e5457022f42c084148ca8fa4ba5c6: Co
uld not load Raft group metadata from /data/yugabyte/data/yb-data/tserver/tablet-meta/eb1e5457022f42c084148ca8fa4ba5c6: Key with version number c7b91fad-dd60-404f-8846-cab568e52468 does not exist.
# 0x7fcdace5ee4c yb::LogFatalHandlerSink::send()
# 0x7fcdaa5e28ee google::LogMessage::SendToLog()
# 0x7fcdaa5dfa7a google::LogMessage::Flush()
# 0x7fcdaa5e3169 google::LogMessageFatal::~LogMessageFatal()
# 0x4124ea yb::tserver::(anonymous namespace)::TabletServerMain()
# 0x7fcda6811825 __libc_start_main
# 0x410f99 _start
# (nil) (unknown)
The only way to start it up is remove the old data.
My steps were:
1.- Cluster up with 3 server
2.- Create a taable with 3 partition on different tablet id confirmed via UI)
3.- Insert 3 different row to diff partition
4.- Select * working fine
5.- Shut down one Table server
6.- Select * working fine
7.- Starting up the table server (error)
We are running using config file:
/usr/local/yugabyte/src/yugabyte-2.11.0.1/bin/./yb-tserver --flagfile /data/yugabyte/etc/tserver.conf
and config:
--tserver_master_addrs=ip1:7100,ip2:7100,ip3:7100
--rpc_bind_addresses=fqdn
--server_broadcast_addresses=ip1
--enable_ysql
--pgsql_proxy_bind_address=ip1:5433
--cql_proxy_bind_address=ip1:9042
--fs_data_dirs=/data/yugabyte/data
--placement_cloud=cloud
--placement_region=reg
--placement_zone=zone
--use_client_to_server_encryption=true
--certs_for_client_dir=/data/yugabyte/ssl
--certs_dir=/data/yugabyte/ssl
--use_node_to_node_encryption=true
--ysql_enable_auth=true
--log_dir=/data/yugabyte/logs
--ssl_protocols=tls12,tls13
--ysql_pg_conf=pgaudit.log='DDL',pgaudit.log_level=notice,pgaudit.log_client=ON,log_min_messages=notice,log_line_prefix='\%m \%r \%u \%d [\%p]'
Looks like the key is not in memory:
/usr/local/yugabyte/src/yugabyte-2.11.0.1/bin/yb-admin -master_addresses $master get_universe_config
{"version":2,"replicationInfo":{"liveReplicas":{"numReplicas":3,"placementBlocks":[{"cloudInfo":{"placementCloud":"cloud","placementRegion":"region","placementZone":"zone"},"minNumReplicas":1}]}},"clusterUuid":"dccea8cb-9790-48ba-8a05-6218a8e875a4","encryptionInfo":{"encryptionEnabled":true,"universeKeyRegistryEncoded":"sZTzNciYu6b1KxZonpJx6v7CDDvexiv1jh/HIEAOkpV4YRrIZbIK9jtajdEMmVEUy706+dmz8bmnZvy6/n33u+qS7fzRSOTPOlpxYI6+k1lSM6bu2DRTTffhZtaiKN15gy8a3ifaZV7xJ9QJ3z9SvFYzb96+KDWw","keyPath":"/data/yugabyte/rest/universe_key","latestVersionId":"c7b91fad-dd60-404f-8846-cab568e52468","keyInMemory":false}}
KeyInMemory: False
We are using encryption at rest but the the file with the key should be there.
Am I doing something wrong?
usr/local/yugabyte/src/yugabyte-2.11.0.1/bin/yb-admin -master_addresses fqdn:7100 all_masters_have_universe_key_in_memory 7e13c99e-5278-4abd-ab78-79f70d6c2679
Error running all_masters_have_universe_key_in_memory: Operation failed. Try again. (yb/tools/yb-admin_client_ent.cc:1027): Unable to check whether master has universe key in memory.: Node fqdn:7100 does not have universe key in memory
The above error seems to indicate the tserver is looking for a key - “c7b91fad-dd60-404f-8846-cab568e52468” in order to open the file, but can’t find it.
Realizing that there’s an actual key file on the masters. That’s been deprecated for using more secure in-memory keys. I just did a little digging through the code, and sure enough, we don’t actually support sending on-disk keys to tablet servers on restart.
The command you have to run is this one:
yb-admin -master_addresses <master-addresses> add_universe_keys_to_all_masters <key_id> <key_path>
and then right after that it should work:
/usr/local/yugabyte/src/yugabyte-2.11.0.1/bin/yb-admin -master_addresses $master all_masters_have_universe_key_in_memory 1
Node fqdn1:7100 has universe key in memory: 1
Node fqdn2:7100 has universe key in memory: 1
Node fqdn3:7100 has universe key in memory: 1
I'm trying to log the properties for each Spark application that run in one Yarn cluster ( properties like spark.shuffle.compress, spark.reducer.maxMbInFlight, spark.executor.instances and so on ).
However i don't know if this information is logged anywhere. I know that we can access to the yarn logs through the "yarn" command but the properties I'm talking about are not store there.
Is there anyway to access to this kind of info?. The idea is to have a trace of all the applications that run in the cluster together with its properties to identify which ones have the most impact in their execution time.
You could log it yourself... use sc.getConf.toDebugString, sqlContext.getConf("") or sqlContext.getAllConfs.
scala> sqlContext.getConf("spark.sql.shuffle.partitions")
res129: String = 200
scala> sqlContext.getAllConfs
res130: scala.collection.immutable.Map[String,String] = Map(hive.server2.thrift.http.cookie.is.httponly -> true, dfs.namenode.resource.check.interval ....
scala> sc.getConf.toDebugString
res132: String =
spark.app.id=local-1449607289874
spark.app.name=Spark shell
spark.driver.host=10.5.10.153
Edit: However, I could not find the properties you specified among the 1200+ properties in sqlContext.getAllConfs :( Otherwise the documentation says:
The application web UI at http://:4040 lists Spark properties
in the “Environment” tab. This is a useful place to check to make sure
that your properties have been set correctly. Note that only values
explicitly specified through spark-defaults.conf, SparkConf, or the
command line will appear. For all other configuration properties, you
can assume the default value is used.
I use Titan 0.4.0 All, running Rexster in shared VM mode on Ubuntu 12.04.
How could I properly delete a graph in Titan which is using the Cassandra storage backend?
I have tried the TitanCleanup.clear(graph), but it does not delete everything. The indices are still there. My real issue is that I have an index which I don't want (it crashes every query), however as I understand Titan's documentation it is impossible to remove an index once it is created.
You can clear all the edges/vertices with:
g.V.remove()
but as you have found that won't clear the types/indices previously created. The most cleanly option would be to just delete the Cassandra data directory.
If you are executing the delete via a unit test you might try to do this as part of your test setup:
this.config = new BaseConfiguration(){{
addProperty("storage.backend", "berkeleyje")
addProperty("storage.directory", "/tmp/titan-schema-test")
}}
GraphDatabaseConfiguration graphconfig = new GraphDatabaseConfiguration(config)
graphconfig.getBackend().clearStorage()
g = (StandardTitanGraph) TitanFactory.open(config)
Be sure to call g.shutdown() in your test teardown method.
Just to update this answer.
With Titan 1.0.0 this can be done programmatically in Java with:
TitanGraph graph = TitanFactory.open(config);
graph.close();
TitanCleanup.clear(graph);
For the continuation of Titan called JanusGraph, the command is JanusGraphFactory.clear(graph) but is soon to be JanusGraphCleanup.clear(graph).
As was mentioned in one of the comments to the earlier answer DROPping a keyspace titan using cqlsh should do it:
cqlsh> DROP KEYSPACE titan;
The name of the keyspace Titan uses is set up using storage.cassandra.keyspace configuration option. You can change it to whatever name you want and is acceptable by Cassandra.
storage.cassandra.keyspace=hello_titan
When Cassandra is getting up, it prints out the keyspace's name as follows:
INFO 19:50:32 Create new Keyspace: KSMetaData{name=hello_titan,
strategyClass=SimpleStrategy, strategyOptions={replication_factor=1},
cfMetaData={}, durableWrites=true,
userTypes=org.apache.cassandra.config.UTMetaData#767d6a9f}
In 0.9.0-M1, the name appears in Titan's log in DEBUG (set log4j.rootLogger=DEBUG, stdout in conf/log4j-server.properties):
[DEBUG] AstyanaxStoreManager - Found keyspace titan
or the following when it doesn't:
[DEBUG] AstyanaxStoreManager - Creating keyspace titan...
[DEBUG] AstyanaxStoreManager - Created keyspace titan
i use bigcouch as my project...
i open 3 node ( default )
everything fine until one node suddenly down ( one server crash )
why if one node down, input process stuck...?
i read the documentation...
i try to set N = 1 ( replicate constant ) , R = 1 (read qourum constant ), and W = 1 (write qourum constant)...
i think my conf mean if 1 write and 1 replicate happen to server that's enaugh to return 201 status.
and then i made issue in bigcouch github..
i get the answer that i must set setting to default...
i already set the setting into default but bigcouch still stuck if one from three node down...
this 3 node i input in "nodes" database:
bigcouch#bigserver1.server1
bigcouch#bigserver2.server2
bigcouch#bigserver3.server3
and this error i get if i create a database via futon on one node down condition...
{timeout,[{{shard,undefined,'bigcouch#bigserver1.server1',undefined,undefined, #Ref}, ok}, {{shard,undefined,'bigcouch#bigserver2.server2',undefined,undefined, #Ref}, ok}, {{shard,undefined,'bigcouch#bigserver3.server3',undefined,undefined, #Ref}, nil}]}
need 10 minutes until this error come out...
this happen to with my node.js apps, and made my node.js apps stuck for 10 minutes
This is a known limitation of BigCouch 0.3. In 0.4 you will be able to create and delete databases as long as a majority of nodes is online.