로그가 retention time 보다 더 오래 유지되는 이유

Kafka는 전통적인 메시징 큐와는 달리 메시지를 바로 삭제하지 않고 보존하는 기능을 가지고 있습니다. 이 때 로그 보존은 기간 또는 세그먼트 크기에 기반합니다. 두 요소의 OR 조건으로 기간이 만료되거나 용량이 초과하면 메시지를 삭제합니다. 그래서 보존 기간보다 더 오래된 로그 메시지를 자주 확인하게 되는데, 오늘은 그 원인을 이해해 볼려고 합니다.

우리는 주로 log.retention.ms, log.retention.minutes, 또는 log.retention.hours 파라미터를 사용하여 데이터의 보존 기간을 정의합니다.

Kafka에서 데이터는 얼마나 오래 보존되나요? 이는 log.retention.bytes 파라미터를 사용하여 보존할 최대 바이트 수를 설정함으로써 구성할 수 있습니다. 보존 기간을 설정하려면 log.retention.ms, log.retention.minutes, 또는 log.retention.hours 파라미터를 사용할 수 있습니다(기본값은 7일).

예를 들어, 토픽을 설정할 때 보존 시간을 600000ms(10분)로 지정하고 세그먼트 크기를 16384바이트로 설정하면, 기대하는 것은 세그먼트 크기가 16KB에 도달하면 세그먼트가 롤링되지만, 삽입하려는 레코드 크기가 활성 세그먼트에서 사용할 수 있는 크기보다 크면 세그먼트가 롤링되고 레코드는 새로운 세그먼트에 저장됩니다.

로그 보존과 관련하여, 예상되는 것은 레코드가 10분 동안 지속되고 이후 삭제된다는 것입니다. 세그먼트와 그 안에 포함된 레코드는 세그먼트가 닫힐 때만 삭제될 수 있습니다. 따라서 레코드가 삭제되는 시점에 영향을 미칠 수 있는 사항은 다음과 같습니다.

프로듀서가 느리고 16KB의 최대 크기가 10분 이내에 도달하지 않으면 오래된 레코드는 삭제되지 않습니다. 이 경우, 로그 보유 시간은 10분보다 길어지게 됩니다.

활성 세그먼트가 빠르게 채워지면 세그먼트는 닫히지만 마지막으로 삽입된 레코드가 10분 동안 지속될 때까지 삭제되지 않습니다. 따라서 이 경우에도 최신 삽입된 레코드는 10분 이상 지속됩니다. 예를 들어, 세그먼트가 7분 만에 채워지고 닫히면, 마지막으로 삽입된 레코드는 10분 동안 유지되므로 첫 번째 레코드가 세그먼트에 삽입된 실제 보유 시간은 17분이 됩니다.

로그는 세그먼트에서 마지막으로 추가된 레코드보다 더 오래 지속될 수 있습니다. 어떻게 그런 일이 발생할까요? 로그 세그먼트가 삭제되어야 하는지 확인하는 스레드는 5분마다 실행됩니다. 이 주기는 log.retention.check.interval.ms 설정으로 구성할 수 있습니다. 마지막으로 추가된 레코드를 기준으로 이 정리 스레드는 10분 보유 기한을 놓칠 수 있습니다. 따라서 위 예시에서 세그먼트가 17분이 아닌 22분 동안 지속될 수 있습니다.

이것이 Kafka에서 레코드가 지속되는 최대 시간일까요? 아닙니다. 클리너 스레드는 세그먼트를 삭제해야 한다고 표시하고, log.segment.delete.delay.ms 브로커 파라미터는 "삭제됨"으로 표시된 파일이 실제로 파일 시스템에서 삭제될 시점을 정의합니다 (기본값은 1분). 따라서 위 예시에서 로그는 23분 동안 여전히 존재할 수 있으며, 이는 10분 보유 시간보다 훨씬 긴 시간입니다.

따라서 일반적인 보유 시간 제한은 log.retention.ms를 사용하여 설정되며, 이는 레코드가 파일 시스템에 지속되는 최소 시간을 정의합니다.

소비자는 닫힌 세그먼트에서 레코드를 가져오지만, 삭제된 세그먼트에서는 가져오지 않습니다. 삭제된 세그먼트는 실제로 파일 시스템에서 삭제되지 않고 "삭제됨"으로 표시된 상태일 수 있습니다.

참고: 여기서는 개념을 명확히 이해할 수 있도록 세그먼트에 하나의 레코드가 추가되는 예시로 설명했지만, 실제로는 여러 개의 레코드(레코드 배치)가 세그먼트 파일에 추가됩니다.

세그먼트와 그 안에 포함된 레코드는 세그먼트가 닫힐 때만 삭제될 수 있습니다.

시간단위, 용량단위로 지정합니다.(기본값: 7일). 저장한지 7일이 지난 로그는 삭제된다는 의미입니다.

시간단위, 용량단위에서는 OR 조건으로 기간이 만료되거나 용량이 초과하면 메시지를 삭제합니다.

현재 kafka의 특정 토픽의 로그 보관 주기를 확인

bin/kafka-topics.sh --describe --topic <topic-name> --bootstrap-server <kafka-broker> --all

#bin/kafka-topics.sh --describe --topic estest --bootstrap-server 172.16.4.203:9092 --all

retention.ms=604800000 (7 days)인데, 7일이 넘은 메시지가 여전히 존재하는 이유가 무엇일가요?

로그 세그먼트가 완전히 만료되지 않음: Kafka는 로그 세그먼트를 사용하여 메시지 저장소를 관리합니다. 보존은 개별 메시지 수준이 아니라 로그 세그먼트 수준에서 적용됩니다. 로그 세그먼트가 아직 '어리면'(즉, 삭제 대상이 되지 않은 경우), 개별 메시지가 7일 이상 경과했더라도 삭제되지 않습니다. 이는 토픽이 트래픽이 적거나 Kafka가 아직 로그 세그먼트를 롤오버하지 않은 경우 발생할 수 있습니다.

Kafka가 아직 로그 세그먼트를 언제 롤오버 하는가?

따라서 일반적인 보유 시간 제한은 log.retention.ms를 사용하여 설정되며, 이는 레코드가 파일 시스템에 지속되는 최소 시간을 정의합니다.

|── my-topic-0
   ├── 00000000000000000000.index
   ├── 00000000000000000000.log
   ├── 00000000000000000000.timeindex
   ├── 00000000000000001007.index
   ├── 00000000000000001007.log
   ├── 00000000000000001007.snapshot
   ├── 00000000000000001007.timeindex
   ├── leader-epoch-checkpoint

When the active segment becomes full (configured by log.segment.bytes, default 1 GB) or the configured time (log.roll.hours or log.roll.ms, default 7 days) passes, the segment gets rolled. This means that the active segment gets closed and re-opens with read-only mode and a new segment file (active segment) will be created in read-write mode.

활성 세그먼트가 가득 차면 (log.segment.bytes로 구성, 기본값 1GB) 또는 설정된 시간(log.roll.hours 또는 log.roll.ms, 기본값 7일)이 경과하면 세그먼트가 롤링됩니다.

Rolling segments

the active segment gets rolled once any of these conditions are met-

Maximum segment size - configured by log.segment.bytes, defaults to 1 Gb
Rolling segment time - configured by log.roll.ms and log.roll.hours, defaults to 7 days
Index/timeindex is full - The index and timeindex share the same maximum size, which is defined by the log.index.size.max.bytes, defaults to 10 MB

7일 변경

retention.ms의 값을 확인하면 되는데, 만약 retention.ms 의 값이 없다면 브로커 레벨에서 설정된 기본 값을 사용했다는 의미이므로, $KAFKA_HOME/config/server.properties 에서 확인할 수 있습니다.

log 보관주기에 대한 설정은 $KAFKA_HOME/config/server.properties에 있으며 아래와 같은 기본값을 가진다.

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

Kafka log의 보관은 크게 시간단위, 용량단위로 지정합니다.(기본값: 7일). 저장한지 7일이 지난 로그는 삭제된다는 의미입니다.

시간단위, 용량단위에서는 OR 조건으로 기간이 만료되거나 용량이 초과하면 메시지를 삭제합니다.

여기서 우리는 log.retention.{ms|minutes|hours} 또는 log.retention.bytes 설정을 수정하여 보관주기를 변경할 수 있다.

log.retension.{ms|minutes|hours} : log 파일의 최소 사용 기간(default: 168시간;7일)
log.retention.bytes : log 파일의 최소 용량(default: -1; 제한없음)

log.retension.{ms|minutes|hours}

log 파일의 최소 사용 기간
log.retension.ms: 초 단위로 지정
log.retension.minutes: 분 단위로 지정
log.retension.hours: 시간 단위로 지정
(주의) log.retention.check.interval.ms 설정은 삭제할 대상을 확인하는 시간주기 값으로, default 5분이다. 만약 설정한 log.retension.{ms|minutes|hours} 값이 이 값보다 짧으면 원하는 시간주기대로 log를 삭제하지 못할 수 있다.

log.retension.bytes

log 파일의 최소 용량
파티션당 용량이므로 Topic의 수, Partition의 수를 고려해서 설정해야 함(ex. 물리서버 3대 + Partition 3개 + Replication 2로 설정된 환경에서는 각 물리서버당 2개의 Partition이 저장됨 → log.retension.bytes를 1G로 설정하면 디스크는 2G까지 차지함)
(주의) log.retension.bytes 옵션은 Segment 단위로 log 파일을 삭제함 → Segment가 하나밖에 없는 경우는 정상적으로 삭제되지 않을 수 있음
따라서, log.segment.bytes 설정을 log.retention.bytes 이하의 값으로 지정하여 Segment를 분리해줘야 정상적으로 동작

test

du -h

Re: log.retention.hours not working?-Apache Mail Archives

https://lists.apache.org/thread/tzcq7vkwlp0o63kxw0b249rf1k2gbv0w

lists.apache.org

Retention is going to be based on a combination of both the retention and segment size settings (as a side note, it's recommended to use log.retention.ms and log.segment.ms, not the hours config. That's there for legacy reasons, but the ms configs are more consistent). As messages are received by Kafka, they are written to the current open log segment for each partition. That segment is rotated when either the log.segment.bytes or the log.segment.ms limit is reached. Once that happens, the log segment is closed and a new one is opened. Only after a log segment is closed can it be deleted via the retention settings. Once the log segment is closed AND either all the messages in the segment are older than log.retention.ms OR the total partition size is greater than log.retention.bytes, then the log segment is purged. As a note, the default segment limit is 1 gibibyte. So if you've only written in 1k of messages, you have a long way to go before that segment gets rotated. This is why the retention is referred to as a minimum time. You can easily retain much more than you're expecting for slow topics.

Could you try setting log.retention.ms=3600000 instead of using the hours config?

7days : 604,800,000 ms

'제품 > Kafka' 카테고리의 다른 글

서비스 계정 생성 (0)	2025.01.21
Replications(복제) - 리더와 팔로워 (0)	2024.12.22
[Kafka]Kafka, Zookeeper properties 파일 구성하기 (0)	2024.09.20
[Kafka] .bashrc에 Kafka CLI command path 지정하기 (0)	2024.09.19
Apache Kafka UI 설치하기 (0)	2024.09.18

뉴 직업인

로그가 retention time 보다 더 오래 유지되는 이유

Rolling segments

log.retension.{ms|minutes|hours}

log.retension.bytes

'제품 > Kafka' 카테고리의 다른 글

티스토리툴바

로그가 retention time 보다 더 오래 유지되는 이유

Rolling segments

log.retension.{ms|minutes|hours}

log.retension.bytes

'제품 > Kafka' 카테고리의 다른 글

관련글

티스토리툴바