news 2026/6/10 16:16:42

AWS EKS部署Prometheus和Grafana

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
AWS EKS部署Prometheus和Grafana

、创建Prometheus工作区

1.创建工作区

为了可以把Prometheus数据写入到AWS managed Prometheus,需要先在AWS Prometheus控制台中创建工作区

image

2.保存工作区配置

点击AWS Prometheus工作区ID进入详情,将提取/收集 中的配置保存为prometheus.yaml,后面会在安装prometheus时使用。

image

3.创建从EKS提取指标的role

使用以下内容创建名为 createIRSA-AMPIngest.sh 的文件。将 <my_amazon_eks_clustername> 替换为您集群的名称,并将 <my_prometheus_namespace> 替换为您的 Prometheus 命名空间

复制代码

#!/bin/bash -e

CLUSTER_NAME=<my_amazon_eks_clustername>

SERVICE_ACCOUNT_NAMESPACE=<my_prometheus_namespace>

AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)

OIDC_PROVIDER=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")

SERVICE_ACCOUNT_AMP_INGEST_NAME=amp-iamproxy-ingest-service-account

SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE=amp-iamproxy-ingest-role

SERVICE_ACCOUNT_IAM_AMP_INGEST_POLICY=AMPIngestPolicy

#

# Set up a trust policy designed for a specific combination of K8s service account and namespace to sign in from a Kubernetes cluster which hosts the OIDC Idp.

#

cat <<EOF > TrustPolicy.json

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"

},

"Action": "sts:AssumeRoleWithWebIdentity",

"Condition": {

"StringEquals": {

"${OIDC_PROVIDER}:sub": "system:serviceaccount:${SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_AMP_INGEST_NAME}"

}

}

}

]

}

EOF

#

# Set up the permission policy that grants ingest (remote write) permissions for all AMP workspaces

#

cat <<EOF > PermissionPolicyIngest.json

{

"Version": "2012-10-17",

"Statement": [

{"Effect": "Allow",

"Action": [

"aps:RemoteWrite",

"aps:GetSeries",

"aps:GetLabels",

"aps:GetMetricMetadata"

],

"Resource": "*"

}

]

}

EOF

function getRoleArn() {

OUTPUT=$(aws iam get-role --role-name $1 --query 'Role.Arn' --output text 2>&1)

# Check for an expected exception

if [[ $? -eq 0 ]]; then

echo $OUTPUT

elif [[ -n $(grep "NoSuchEntity" <<< $OUTPUT) ]]; then

echo ""

else

>&2 echo $OUTPUT

return 1

fi

}

#

# Create the IAM Role for ingest with the above trust policy

#

SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN=$(getRoleArn $SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE)

if [ "$SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN" = "" ];

then

#

# Create the IAM role for service account

#

SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN=$(aws iam create-role \

--role-name $SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE \

--assume-role-policy-document file://TrustPolicy.json \

--query "Role.Arn" --output text)

#

# Create an IAM permission policy

#

SERVICE_ACCOUNT_IAM_AMP_INGEST_ARN=$(aws iam create-policy --policy-name $SERVICE_ACCOUNT_IAM_AMP_INGEST_POLICY \

--policy-document file://PermissionPolicyIngest.json \

--query 'Policy.Arn' --output text)

#

# Attach the required IAM policies to the IAM role created above

#

aws iam attach-role-policy \

--role-name $SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE \

--policy-arn $SERVICE_ACCOUNT_IAM_AMP_INGEST_ARN

else

echo "$SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN IAM role for ingest already exists"

fi

echo $SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN

#

# EKS cluster hosts an OIDC provider with a public discovery endpoint.

# Associate this IdP with AWS IAM so that the latter can validate and accept the OIDC tokens issued by Kubernetes to service accounts.

# Doing this with eksctl is the easier and best approach.

#

eksctl utils associate-iam-oidc-provider --cluster $CLUSTER_NAME --approve

复制代码

执行以上脚本创建role

bash createIRSA-AMPIngest.sh

使用以下内容创建名为 createIRSA-AMPQuery.sh 的文件。将 <my_amazon_eks_clustername> 替换为集群的名称,并将 <my_prometheus_namespace> 替换为您的 Prometheus 命名空间。

复制代码

#!/bin/bash -e

CLUSTER_NAME=<my_amazon_eks_clustername>

SERVICE_ACCOUNT_NAMESPACE=<my_prometheus_namespace>

AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)

OIDC_PROVIDER=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")

SERVICE_ACCOUNT_AMP_QUERY_NAME=amp-iamproxy-query-service-account

SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE=amp-iamproxy-query-role

SERVICE_ACCOUNT_IAM_AMP_QUERY_POLICY=AMPQueryPolicy

#

# Setup a trust policy designed for a specific combination of K8s service account and namespace to sign in from a Kubernetes cluster which hosts the OIDC Idp.

#

cat <<EOF > TrustPolicy.json

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"

},

"Action": "sts:AssumeRoleWithWebIdentity",

"Condition": {

"StringEquals": {

"${OIDC_PROVIDER}:sub": "system:serviceaccount:${SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_AMP_QUERY_NAME}"

}

}

}

]

}

EOF

#

# Set up the permission policy that grants query permissions for all AMP workspaces

#

cat <<EOF > PermissionPolicyQuery.json

{

"Version": "2012-10-17",

"Statement": [

{"Effect": "Allow",

"Action": [

"aps:QueryMetrics",

"aps:GetSeries",

"aps:GetLabels",

"aps:GetMetricMetadata"

],

"Resource": "*"

}

]

}

EOF

function getRoleArn() {

OUTPUT=$(aws iam get-role --role-name $1 --query 'Role.Arn' --output text 2>&1)

# Check for an expected exception

if [[ $? -eq 0 ]]; then

echo $OUTPUT

elif [[ -n $(grep "NoSuchEntity" <<< $OUTPUT) ]]; then

echo ""

else

>&2 echo $OUTPUT

return 1

fi

}

#

# Create the IAM Role for query with the above trust policy

#

SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN=$(getRoleArn $SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE)

if [ "$SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN" = "" ];

then

#

# Create the IAM role for service account

#

SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN=$(aws iam create-role \

--role-name $SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE \

--assume-role-policy-document file://TrustPolicy.json \

--query "Role.Arn" --output text)

#

# Create an IAM permission policy

#

SERVICE_ACCOUNT_IAM_AMP_QUERY_ARN=$(aws iam create-policy --policy-name $SERVICE_ACCOUNT_IAM_AMP_QUERY_POLICY \

--policy-document file://PermissionPolicyQuery.json \

--query 'Policy.Arn' --output text)

#

# Attach the required IAM policies to the IAM role create above

#

aws iam attach-role-policy \

--role-name $SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE \

--policy-arn $SERVICE_ACCOUNT_IAM_AMP_QUERY_ARN

else

echo "$SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN IAM role for query already exists"

fi

echo $SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN

#

# EKS cluster hosts an OIDC provider with a public discovery endpoint.

# Associate this IdP with AWS IAM so that the latter can validate and accept the OIDC tokens issued by Kubernetes to service accounts.

# Doing this with eksctl is the easier and best approach.

#

eksctl utils associate-iam-oidc-provider --cluster $CLUSTER_NAME --approve

复制代码

执行以上脚本,创建role

bash createIRSA-AMPQuery.sh

二、部署Prometheus

1.添加helm仓库

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics

helm repo update

2.创建部署Prometheus的命名空间

kubectl create namespace monitoring

3.检查Amazon EBS CSI

如果EBS CSI组件没有附加对应的IAM role,需要在IAM 控制台中创建附有AmazonEBSCSIDriverPolicy权限且类型为AWS账号的role,否则EKS创建PVC时会报错

image

4.创建storageClass

复制代码

#cat sc.yaml

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

name: ebs-sc

annotations:

storageclass.kubernetes.io/is-default-class: "true"

provisioner: ebs.csi.aws.com

allowVolumeExpansion: true

volumeBindingMode: WaitForFirstConsumer

parameters:

type: gp3

#kubectl apply -f sc.yaml

复制代码

5.部署Prometheus

helm install prometheus prometheus -n monitoring -f prometheus.yaml

6.查看Prometheus是否部署成功

kubectl get pods -n monitoring

7.部署grafana

复制代码

#cat grafana.yaml

---

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

name: grafana-pvc

spec:

accessModes:

- ReadWriteOnce

resources:

requests:

storage: 1Gi

---

apiVersion: apps/v1

kind: Deployment

metadata:

labels:

app: grafana

name: grafana

spec:

selector:

matchLabels:

app: grafana

template:

metadata:

labels:

app: grafana

spec:

securityContext:

fsGroup: 472

supplementalGroups:

- 0

containers:

- name: grafana

image: grafana/grafana:latest

imagePullPolicy: IfNotPresent

ports:

- containerPort: 3000

name: http-grafana

protocol: TCP

readinessProbe:

failureThreshold: 3

httpGet:

path: /robots.txt

port: 3000

scheme: HTTP

initialDelaySeconds: 10

periodSeconds: 30

successThreshold: 1

timeoutSeconds: 2

livenessProbe:

failureThreshold: 3

initialDelaySeconds: 30

periodSeconds: 10

successThreshold: 1

tcpSocket:

port: 3000

timeoutSeconds: 1

resources:

requests:

cpu: 250m

memory: 750Mi

volumeMounts:

- mountPath: /var/lib/grafana

name: grafana-pv

volumes:

- name: grafana-pv

persistentVolumeClaim:

claimName: grafana-pvc

---

apiVersion: v1

kind: Service

metadata:

name: grafana

spec:

ports:

- port: 3000

protocol: TCP

targetPort: http-grafana

selector:

app: grafana

sessionAffinity: None

type: ClusterIP

#kubectl apply -f grafana.yaml -n monitoring

复制代码

三、访问Prometheus和grafana

Prometheus和grafana部署完成以后,可以将SVC类型改为nodeport,然后通过ALB暴露出来,通过公网进行访问

grafana默认用户密码为admin/admin

image

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/10 11:33:26

研究表明,量子引力修正后的转换机制可解释CMB动力学异常的微观起源,全域监测网络使拓扑参数捕捉覆盖率达98%,灾害链模型对复合灾害的预警准确率较单一灾害模型提升27%。

分形纤维丛超统一框架的量子引力融合、全域监测与灾害链预警深化研究 摘要&#xff08;续四&#xff09; 为突破地球拓扑动力学在量子-经典转换机制、全球监测覆盖、复合灾害预警等方面的核心瓶颈&#xff0c;本文从量子引力理论融合、全域量子监测网络部署、灾害链拓扑演化建…

作者头像 李华
网站建设 2026/6/9 20:49:47

Scrypted:重新定义智能家居视频监控体验

Scrypted&#xff1a;重新定义智能家居视频监控体验 【免费下载链接】scrypted Scrypted is a high performance home video integration and automation platform 项目地址: https://gitcode.com/gh_mirrors/sc/scrypted 想象一下&#xff0c;当你外出时&#xff0c;只…

作者头像 李华
网站建设 2026/6/10 11:52:30

Qwen3-VL-8B:重新定义多模态AI的应用边界

Qwen3-VL-8B&#xff1a;重新定义多模态AI的应用边界 【免费下载链接】Qwen3-VL-8B-Thinking-FP8 项目地址: https://ai.gitcode.com/hf_mirrors/Qwen/Qwen3-VL-8B-Thinking-FP8 当传统AI模型仍在文本、图像、视频等单一模态中挣扎时&#xff0c;一个革命性的突破正在悄…

作者头像 李华
网站建设 2026/6/10 11:57:32

AI如何用Sysbench优化数据库性能调优

快速体验 打开 InsCode(快马)平台 https://www.inscode.net输入框内输入如下内容&#xff1a; 创建一个AI辅助的数据库性能测试工具&#xff0c;集成Sysbench进行自动化基准测试。功能包括&#xff1a;1) 自动生成不同负载场景的Sysbench测试脚本&#xff1b;2) 实时分析测试结…

作者头像 李华
网站建设 2026/6/10 9:02:15

资产管理(EAM,Enterprise Asset Management)模块的核心场景围绕 设备全生命周期管控 展开,其中预防性维护计划、工单管理、设备生命周期跟踪是三大核心支柱

资产管理&#xff08;EAM&#xff0c;Enterprise Asset Management&#xff09;模块的核心场景围绕 设备全生命周期管控 展开&#xff0c;其中预防性维护计划、工单管理、设备生命周期跟踪是三大核心支柱。以下将从 配置逻辑、操作步骤、底层原理、表结构、业务流程 四个维度&a…

作者头像 李华