Elasticsearch Calculate the Storage Size of Data streams... - Temporary Blog

Ok… we have ELK with some nodes. Sometimes, when we work with ELK, we have some questions:

What datastream mostly disk usage in cluster?
What datastream mostly disk usage on this node?

I propose to answer on these questions using bash

1. What datastream mostly disk usage in cluster?

Next command shows datastreams sorted by ascending. (output disk usage in Gb)

export elasticURL=https://olol.elasticsearch.com:9200
export elasticCRED=elastic:passwd
curl -N -k -XGET "${elasticURL}/_cat/shards?h=i,sto,n&bytes=b" \
-u ${elasticCRED}|tac|grep "\.ds-"|\
gawk -v OFMT='%.5f' \
'{sub(/-[^-]*-[^-]*$/,"",$1); a[$1]+=$2;} \
END {for(i in a){print sprintf("%.15f", a[i]/1024/1024/1024/1024*1e3),i }}'|\
sort -k 1 -n

2. What datastream mostly using space on this node?

Ok now this same as overall nodes, but we add “grep”:

grep ${elasticNode}

export elasticURL=https://olol.elasticsearch.com:9200
export elasticCRED=elastic:passwd
export elasticNode=data-1
curl -N -k -XGET "${elasticURL}/_cat/shards?h=i,sto,n&bytes=b" \
-u ${elasticCRED}|tac|grep ${elasticNode}|grep "\.ds-"|\
gawk -v OFMT='%.5f' \
'{sub(/-[^-]*-[^-]*$/,"",$1); a[$1]+=$2;} \
END {for(i in a){print sprintf("%.15f", a[i]/1024/1024/1024*1e3),i }}'|\
sort -k 1 -n

Sometimes you must add grep -v “node3” after first grep its mean exclude node3, becouse when shard move from one node to another in query “_cat/shards” we will see two nodes.

P.S After you find some datasteam you can use API for check fields etc…

curl -XPOST ${elasticURL}/my_datastream/_disk_usage?run_expensive_tasks=true

Datastream size across cluster

curl -XGET ${elasticURL}/_data_stream/_stats

Get command for del empty Datastreams

curl -XGET ${elasticURL}/_data_stream/_stats -u elastic:OLOLOL|jq '.data_streams|.[]|select(.maximum_timestamp==0)|.data_stream'|tr -d '"'|awk '{print "DELETE _data_stream/"$1}'

Sort datastreams by size

curl -XGET -u ololo:ALALAL http://olol.local:9200/_data_stream/_stats | jq '[.data_streams[]| {ds: .data_stream, sts: .store_size_bytes}] | sort_by(.sts)'

elasticsearch docs

1. What datastream mostly disk usage in cluster?

2. What datastream mostly using space on this node?

P.S After you find some datasteam you can use API for check fields etc…

Datastream size across cluster

CATALOG

FEATURED TAGS

FRIENDS