Python Guide¶
Elasticsearch API¶
CAST leverages the Elasticsearch API python library to interact with Elasticsearch. If the API is being run on a node with internet access the following process may be used to install this library.
A requirements file is provided in the RPM:
pip install -r /opt/ibm/csm/bigdata/python/requirements.txt
If the node doesn’t have access to the internet please refer to the official python documentation for the installation of wheels: Installing Packages.
Big Data Use Cases¶
CAST offers a collection of use case scripts designed to interact with the Big Data Store through the elasticsearch interface.
findJobTimeRange.py¶
This use case may be considered a building block for the remaining ones. This use case demonstrates the use of the cast-allocation transactional index to get the time range of a job.
The usage of this use case is described by the –help option.
findJobKeys.py¶
This use case represents two comingled use cases. First when supplied a job identifier (allocation id or job id) and a keyword (regular expression case insensitive) the script will generate a listing of keywords and their occurrence rates on records associated with the supplied job. Association is filtered on by the time range of the jobs and hostnames that participated on the job.
A secondary usecase is presented in the verbose flag, allowing the user to see a list of all entries matching the keyword.
usage: findJobKeys.py [-h] [-a int] [-j int] [-s int] [-t hostname:port]
[-k [key [key ...]]] [-v] [--size size]
[-H [host [host ...]]]
A tool for finding keywords in the "message" field during the run time of a job.
optional arguments:
-h, --help show this help message and exit
-a int, --allocationid int
The allocation ID of the job.
-j int, --jobid int The job ID of the job.
-s int, --jobidsecondary int
The secondary job ID of the job (default : 0).
-t hostname:port, --target hostname:port
An Elasticsearch server to be queried. This defaults
to the contents of environment variable
"CAST_ELASTIC".
-k [key [key ...]], --keywords [key [key ...]]
A list of keywords to search for in the Big Data
Store. Case insensitive regular expressions (default :
.*). If your keyword is a phrase (e.g. "xid 13")
regular expressions are not supported at this time.
-v, --verbose Displays any logs that matched the keyword search.
--size size The number of results to be returned. (default=30)
-H [host [host ...]], --hostnames [host [host ...]]
A list of hostnames to filter the results to (filters on the "hostname" field, job independent).
findJobsRunning.py¶
A use case for finding all jobs running at the supplied timestamp. This usecase will display a list of jobs for which the start time is less than the supplied time and have either no end time or an end time greater than the supplied time.
usage: findJobsRunning.py [-h] [-t hostname:port] [-T YYYY-MM-DDTHH:MM:SS]
[-s size] [-H [host [host ...]]]
A tool for finding jobs running at the specified time.
optional arguments:
-h, --help show this help message and exit
-t hostname:port, --target hostname:port
An Elasticsearch server to be queried. This defaults
to the contents of environment variable
"CAST_ELASTIC".
-T YYYY-MM-DDTHH:MM:SS, --time YYYY-MM-DDTHH:MM:SS
A timestamp representing a point in time to search for
all running CSM Jobs. HH, MM, SS are optional, if not
set they will be initialized to 0. (default=now)
-s size, --size size The number of results to be returned. (default=1000)
-H [host [host ...]], --hostnames [host [host ...]]
A list of hostnames to filter the results to.
findJobMetrics.py¶
Leverages the built in Elasticsearch statistics functionality. Takes a list of fields and a job identifier then computes the min, max, average, and standard deviation of those fields. The calculations are computed against all records for the field during the running time of the job on the nodes that participated.
This use case also has the ability to generate correlations between the fields specified.
usage: findJobMetrics.py [-h] [-a int] [-j int] [-s int] [-t hostname:port]
[-H [host [host ...]]] [-f [field [field ...]]]
[-i index] [--correlation]
A tool for finding metrics about the nodes participating in the supplied job
id.
optional arguments:
-h, --help show this help message and exit
-a int, --allocationid int
The allocation ID of the job.
-j int, --jobid int The job ID of the job.
-s int, --jobidsecondary int
The secondary job ID of the job (default : 0).
-t hostname:port, --target hostname:port
An Elasticsearch server to be queried. This defaults
to the contents of environment variable
"CAST_ELASTIC".
-H [host [host ...]], --hostnames [host [host ...]]
A list of hostnames to filter the results to.
-f [field [field ...]], --fields [field [field ...]]
A list of fields to retrieve metrics for (REQUIRED).
-i index, --index index
The index to query for metrics records.
--correlation Displays the correlation between the supplied fields
over the job run.
findUserJobs.py¶
Retrieves a list of all jobs that the the supplied user owned. This list can be filtered to a time range or on the state of the allocation. If the –commonnodes argument is supplied a list nodes will be displayed where the node participated in more nodes than the supplied threshold. The colliding nodes will be sorted by number of jobs they participated in.
usage: findUserJobs.py [-h] [-u username] [-U userid] [--size size]
[--state state] [--starttime YYYY-MM-DDTHH:MM:SS]
[--endtime YYYY-MM-DDTHH:MM:SS]
[--commonnodes threshold] [-v] [-t hostname:port]
A tool for finding a list of the supplied user's jobs.
optional arguments:
-h, --help show this help message and exit
-u username, --user username
The user name to perform the query on, either this or
-U must be set.
-U userid, --userid userid
The user id to perform the query on, either this or -u
must be set.
--size size The number of results to be returned. (default=1000)
--state state Searches for jobs matching the supplied state.
--starttime YYYY-MM-DDTHH:MM:SS
A timestamp representing the beginning of the absolute
range to look for failed jobs, if not set no lower
bound will be imposed on the search.
--endtime YYYY-MM-DDTHH:MM:SS
A timestamp representing the ending of the absolute
range to look for failed jobs, if not set no upper
bound will be imposed on the search.
--commonnodes threshold
Displays a list of nodes that the user jobs had in
common if set. Only nodes with collisions exceeding
the threshold are shown. (Default: -1)
-v, --verbose Displays all retrieved fields from the `cast-
allocation` index.
-t hostname:port, --target hostname:port
An Elasticsearch server to be queried. This defaults
to the contents of environment variable
"CAST_ELASTIC".
findWeightedErrors.py¶
An extension of the findJobKeys.py use case. This use case will query elasticsearch for a job then run a predefined collection of mappings to assist in debugging a problem with the job.
usage: findWeightedErrors.py [-h] [-a int] [-j int] [-s int]
[-t hostname:port] [-k [key [key ...]]] [-v]
[--size size] [-H [host [host ...]]]
[--errormap file]
A tool which takes a weighted listing of keyword searches and presents
aggregations of this data to the user.
optional arguments:
-h, --help show this help message and exit
-a int, --allocationid int
The allocation ID of the job.
-j int, --jobid int The job ID of the job.
-s int, --jobidsecondary int
The secondary job ID of the job (default : 0).
-t hostname:port, --target hostname:port
An Elasticsearch server to be queried. This defaults
to the contents of environment variable
"CAST_ELASTIC".
-v, --verbose Displays the top --size logs matching the --errormap mappings.
--size size The number of results to be returned. (default=10)
-H [host [host ...]], --hostnames [host [host ...]]
A list of hostnames to filter the results to.
--errormap file A map of errors to scan the user jobs for, including
weights.
JSON Mapping Format¶
This use case utilizes a JSON mapping to define a collection of keywords and values to query the elasticsearch cluster for. These values can leverage the native elasticsearch boost feature to apply weights to the mappings allowing a user to quickly determine high priority items using scoring.
The format is defined as follows:
[
{
"category" : "A category, used for tagging the search in output. (Required)",
"index" : "Matches an index on the elasticsearch cluster, uses elasticsearch syntax. (Required)",
"source" : "The hostname source in the index.",
"mapping" : [
{
"field" : "The field in the index to check against(Required)",
"value" : "A value to query for; can be a phrase, regex or number. (Required)",
"boost" : "The elasticsearch boost factor, may be thought of as a weight. (Required)",
"threshold" : "A range comparison operator: 'gte', 'gt', 'lte', 'lt'. (Optional)"
}
]
}
]
When applied to a real configuration a mapping file will look something like this:
[
{
"index" : "*syslog*",
"source" : "hostname",
"category": "Syslog Errors" ,
"mapping" : [
{
"field" : "message",
"value" : "error",
"boost" : 50
},
{
"field" : "message",
"value" : "kdump",
"boost" : 60
},
{
"field" : "message",
"value" : "kernel",
"boost" : 10
}
]
},
{
"index" : "cast-zimon*",
"source" : "source",
"category" : "Zimon Counters",
"mapping" : [
{
"field" : "data.mem_active",
"value" : 12000000,
"boost" : 100,
"threshold" : "gte"
},
{
"field" : "data.cpu_system",
"value" : 10,
"boost" : 200,
"threshold" : "gte"
}
]
}
]
Note
The above configuration was designed for demonstrative purposes, it is recommended that users create their own mappings based on this example.
UFM Collector¶
A tool interacting with the UFM collector is provided in ibm-csm-bds-*.noarch.rpm
.
This script performs 3 key operations:
- Connects to the UFM monitoring snapshot RESTful interface.
- This connection specifies a collection attributes and functions to execute against the
- interface.
- Processes and enriches the output of the REST connection.
- Adds a type, timestamp and source field to the root of the JSON document.
- Opens a socket to a target logstash instance and writes the payload.
Beats¶
The following scripts are bundled in the /opt/ibm/csm/bigdata/beats/
directory. They
are generally used to regenerate logs for filebeat ingestion.
csmTransactionRebuild.py¶
Script Location: | |
---|---|
/opt/ibm/csm/bigdata/beats/csmTransactionRebuild.py |
|
RPM: | ibm-csm-bds-*.noarch.rpm |
This script is used to regenerate the CSM transaction log from the postgresql databse. It is recommended when using this script for the first time to back up your original transactional logs.
The core objective of this script is to repair issues with the transactional index that were
exposed in the transitory steps of the CSM Big Data development. As such, this script should only
be run in clusters which were running pre 1.5.0
level code.
usage: csmTransactionRebuild.py [-h] [-d db] [-u user] [-o output]
A tool for regenerating the csm transactional logs from the database.
optional arguments:
-h, --help show this help message and exit
-d db, --database db Database to archive tables from. Default: csmdb
-u user, --user user The database user. Default: postgres
-o output, --output output
The output file, overwrites existing file. Default:
csm-transaction.log
Transition scripts¶
Note
The following scripts are NOT shipped in the RPMs.
Sometimes between major versions fields may be renamed in the Big Data Store (this is generally
only performed in the event of a major bug). When CSM performs such a change a transition-script
will be provided on the GitHub repository in the csm_big_data/transition-scripts
directory.
metric-transaction_140-150.py¶
Performs the transition from the 1.4.0 metric and transaction logs to 1.5.0.
# ./metric-transaction_140-150.py -h
usage: metric-transaction_140-150.py [-h] -f file-glob [--overwrite]
A tool for converting 1.4.0 CSM BDS logs to 1.5.0 CSM BDS logs.
optional arguments:
-h, --help show this help message and exit
-f file-glob, --files file-glob
A file glob containing the bds logs to run the fix
operations on.
--overwrite If set the script will overwrite the old files.
Default writes new file *.fixed.
The following commands will migrate the old logs to the new format:
./metric-transaction_140-150.py -f '/var/log/ibm/csm/csm_transaction.log*' --overwrite
./metric-transaction_140-150.py -f '/var/log/ibm/csm/csm_allocation_metrics.log*' --overwrite
Note
If performing this transition, the old data may need to be purged from BDS (in the case of the metrics log especially).