Amazon Athena

  • S3 ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์„œ๋ฒ„๋ฆฌ์Šค ์ฟผ๋ฆฌ ์„œ๋น„์Šค
  • ์ฆ‰, SQL ์–ธ์–ด๋กœ ์ด๋Ÿฌํ•œ ํŒŒ์ผ๋“ค์„ ์ฟผ๋ฆฌํ•˜์ง€๋งŒ ๋กœ๋“œํ•  ํ•„์š”๋Š” ์—†๋‹ค.
  • ํŒŒ์ผ๋“ค์€ S3 ์— ์žˆ๊ณ , ๋‚˜๋จธ์ง€๋Š” Athena ๊ฐ€ ์ฒ˜๋ฆฌํ•ด์ค€๋‹ค.
  • CSV, Json, ORC, Avro, Parguet ๋“ฑ ๋‹ค์–‘ํ•œ ํŒŒ์ผ์˜ ํฌ๋งท์„ ์ง€์›ํ•œ๋‹ค. (Athena๋Š” Presto ์—”์ง„ ๊ธฐ๋ฐ˜)

  • ์‚ฌ์šฉ์ž๋“ค์ด ๋ฐ์ดํ„ฐ๋“ค์„ ์•„๋งˆ์กด S3์— ๋กœ๋“œํ•˜๋ฉด Athena๋Š” ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฟผ๋ฆฌํ•˜๊ณ  ๋ถ„์„ํ•œ๋‹ค.
  • Amazon QuickSight ์„œ๋น„์Šค๋ฅผ ์ด์šฉํ•˜๋ฉด ๋ณด๊ณ ์„œ๋„ ๋ฐ›์•„๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  • ๊ฐ€๊ฒฉ์€ ์Šค์บ”๋œ ๋ฐ์ดํ„ฐ TB ๋‹น 5๋‹ฌ๋Ÿฌ ์ด๋‹ค.
  • ์••์ถ•๋˜๊ฑฐ๋‚˜ ์ปฌ๋Ÿผํ˜•์œผ๋กœ ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ๋น„์šฉ์„ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค.
    • ๋ฐ์ดํ„ฐ๋ฅผ ์Šค์บ”ํ•˜๋Š” ์–‘์ด ์ ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ
  • UseCase (์‚ฌ์šฉ์‚ฌ๋ก€)
    • BI (๋น„์ฆˆ๋‹ˆ์Šค ์ธํ…”๋ฆฌ์ „์Šค), ๋ถ„์„, ๋ณด๊ณ , VPC๋‚˜ ELB ๋กœ๊ทธ์˜ Flow Logs ๋ถ„์„
    • CloudTrail ๋กœ๊ทธ, ํ”Œ๋žซํผ ๋กœ๊ทธ ๋“ฑ์˜ AWS์˜ ๋กœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ Athena๊ฐ€ ์œ ์šฉํ•˜๋‹ค.

SQL ์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ๋ถ„์„ ์„œ๋ฒ„๋ฆฌ์Šค ๋“ฑ์˜ ํ‚ค์›Œ๋“œ๊ฐ€ ๋‚˜์˜ค๋ฉด Amazon Athena ๋ฅผ ๋– ์˜ฌ๋ฆฌ๋ฉด ๋œ๋‹ค.

1. ์‹ค์Šต

๋น„์šฉ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ถ€๋ถ„์ด์–ด, ์‚ฌ๋‚ด ํ•™์Šต์šฉ Dev AWS ์—์„œ ์ง„ํ–‰ํ•˜๋ ค ํ–ˆ์œผ๋‚˜, IAM ๊ถŒํ•œ์ด ์—†์–ด์„œ ์‚ฌ์ง„์„ ๋งŽ์ด ์Šค์ƒท์„ ๋งŽ์ด ์ฒจ๋ถ€ํ•˜์ง€ ๋ชปํ•˜๋Š”์  ์–‘ํ•ด๋ฐ”๋ž๋‹ˆ๋‹ค. ( _ _ )โ€ฆ.

image-20220817011253770

create database s3_access_logs_db;

# ํ…Œ์ด๋ธ” ์ƒ์„ฑ 
CREATE EXTERNAL TABLE IF NOT EXISTS s3_access_logs_db.mybucket_logs(
         BucketOwner STRING,
         Bucket STRING,
         RequestDateTime STRING,
         RemoteIP STRING,
         Requester STRING,
         RequestID STRING,
         Operation STRING,
         Key STRING,
         RequestURI_operation STRING,
         RequestURI_key STRING,
         RequestURI_httpProtoversion STRING,
         HTTPstatus STRING,
         ErrorCode STRING,
         BytesSent BIGINT,
         ObjectSize BIGINT,
         TotalTime STRING,
         TurnAroundTime STRING,
         Referrer STRING,
         UserAgent STRING,
         VersionId STRING,
         HostId STRING,
         SigV STRING,
         CipherSuite STRING,
         AuthType STRING,
         EndPoint STRING,
         TLSVersion STRING
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$' )
LOCATION 's3://target-bucket-name/prefix/';


# HTTP Method ๋ณ„ 200, 404 ๋“ฑ์ด ๋ช‡๋ฒˆ ๋ฐœ์ƒํ–ˆ๋Š” ์ง€ ์ง‘๊ณ„SQL ์กฐํšŒ
SELECT requesturi_operation, httpstatus, count(*) FROM "s3_access_logs_db"."mybucket_logs" 
GROUP BY requesturi_operation, httpstatus;

# ๊ถŒํ•œ์ด ์—†๋Š”์‚ฌ๋žŒ์ด S3 Objects ์— ์ ‘๊ทผํ•œ ํšŸ์ˆ˜ ์„ธ๊ณ  ๋ˆ„๊ตฌ์ธ์ง€ ํ™•์ธ
SELECT * FROM "s3_access_logs_db"."mybucket_logs"
where httpstatus='403';

Athena ์žฅ์ 

  • S3 ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฟผ๋ฆฌํ•ด ์„œ๋ฒ„ ์„ค์ •๊ณผ ๋ฐ์ดํ„ฐ์˜ ๋ณ€ํ˜• ์—†์ด๋„ ์ ์ ˆํ•œ ๋ฐ์ดํ„ฐ ํฌ๋งท์„ ์„ค์ •ํ•จ์œผ๋กœ์จ ๋ณต์žกํ•œ ์ฟผ๋ฆฌ๋ฅผ ์ง์ ‘ ์‹คํ–‰ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ํŠน์ •ํ•  ์ˆ˜ ์žˆ๋‹ค
  • ์ตœ๊ทผ ์ฟผ๋ฆฌ์™€ ์ €์žฅ๋œ ์ฟผ๋ฆฌ๋„ ํ™•์ธ ๊ฐ€๋Šฅ
  • ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ ๋Œ€์ƒ ๋ฒ„ํ‚ท์— ์•”ํ˜ธํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์„ค์ •์„ ํŽธ์ง‘ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.
  • ์„œ๋ฒ„๋ฆฌ์Šค ์„œ๋น„์Šค์ด๋‹ค. (๋”ฐ๋กœ DB์„ค์ •์ด๋‚˜ ํ•  ํ•„์š” ์—†์Œ)

ํƒœ๊ทธ: , , ,

์นดํ…Œ๊ณ ๋ฆฌ:

์—…๋ฐ์ดํŠธ:

๋Œ“๊ธ€๋‚จ๊ธฐ๊ธฐ