Ok… we have ELK with some nodes. Sometimes, when we work with ELK, we have some questions:

  1. What datastream mostly disk usage in cluster?
  2. What datastream mostly disk usage on this node?

I propose to answer on these questions using bash

1. What datastream mostly disk usage in cluster?

Next command shows datastreams sorted by ascending. (output disk usage in Gb)

1
2
3
4
5
6
7
8
export elasticURL=https://olol.elasticsearch.com:9200
export elasticCRED=elastic:passwd
curl -N -k -XGET "${elasticURL}/_cat/shards?h=i,sto,n&bytes=b" \
-u ${elasticCRED}|tac|grep "\.ds-"|\
gawk -v OFMT='%.5f' \
'{sub(/-[^-]*-[^-]*$/,"",$1); a[$1]+=$2;} \
END {for(i in a){print sprintf("%.15f", a[i]/1024/1024/1024/1024*1e3),i }}'|\
sort -k 1 -n

2. What datastream mostly using space on this node?

Ok now this same as overall nodes, but we add “grep”:

1
grep ${elasticNode}
1
2
3
4
5
6
7
8
9
export elasticURL=https://olol.elasticsearch.com:9200
export elasticCRED=elastic:passwd
export elasticNode=data-1
curl -N -k -XGET "${elasticURL}/_cat/shards?h=i,sto,n&bytes=b" \
-u ${elasticCRED}|tac|grep ${elasticNode}|grep "\.ds-"|\
gawk -v OFMT='%.5f' \
'{sub(/-[^-]*-[^-]*$/,"",$1); a[$1]+=$2;} \
END {for(i in a){print sprintf("%.15f", a[i]/1024/1024/1024*1e3),i }}'|\
sort -k 1 -n

Sometimes you must add grep -v “node3” after first grep its mean exclude node3, becouse when shard move from one node to another in query “_cat/shards” we will see two nodes.

P.S After you find some datasteam you can use API for check fields etc…

1
curl -XPOST ${elasticURL}/my_datastream/_disk_usage?run_expensive_tasks=true

Datastream size across cluster

1
curl -XGET ${elasticURL}/_data_stream/_stats

Get command for del empty Datastreams

1
curl -XGET ${elasticURL}/_data_stream/_stats -u elastic:OLOLOL|jq '.data_streams|.[]|select(.maximum_timestamp==0)|.data_stream'|tr -d '"'|awk '{print "DELETE _data_stream/"$1}'

Sort datastreams by size

1
curl -XGET -u ololo:ALALAL http://olol.local:9200/_data_stream/_stats | jq '[.data_streams[]| {ds: .data_stream, sts: .store_size_bytes}] | sort_by(.sts)'

elasticsearch docs