Tag: monitoring

  • The Case of Missing Elasticsearch Logs: A Midnight Mystery

    The Case of Missing Elasticsearch Logs: A Midnight Mystery

    While debugging my Elasticsearch instance, I noticed a curious issue: logs would vanish consistently at midnight. No logs appeared between 23:40:00 and 00:00:05, leaving an unexplained gap. This guide walks through the debugging process, root cause identification, and a simple fix.

    Initial Investigation: Where Did the Logs Go?

    At first glance, the following possibilities seemed likely:

    1. Log Rotation: Elasticsearch rotates its logs at midnight. Could this process be causing the missing lines?
    2. Marvel Indices: Marvel creates daily indices at midnight. Could this interfere with log generation?

    Neither explained the issue upon closer inspection, so I dug deeper.

    The Real Culprit: Log4j and DailyRollingFileAppender

    The issue turned out to be related to Log4j. Elasticsearch uses Log4j for logging, but instead of a traditional log4j.properties file, it employs a translated YAML configuration. After reviewing the logging configuration, I found the culprit: DailyRollingFileAppender.

    What’s Wrong with DailyRollingFileAppender?

    The DailyRollingFileAppender class extends Log4j’s FileAppender but introduces a major flaw—it synchronizes file rolling at user-defined intervals, which can cause:

    • Data Loss: Logs might not be written during the rolling process.
    • Synchronization Issues: Overlap between log files leads to missing data.

    This behavior is well-documented in the Apache DailyRollingFileAppender documentation.

    Root Cause: Why Were Logs Missing?

    The missing logs were a direct result of using DailyRollingFileAppender, which failed to properly handle log rotation at midnight. This caused gaps in logging during the critical period when the file was being rolled over.

    The Fix: Switch to RollingFileAppender

    To resolve this, I replaced DailyRollingFileAppender with RollingFileAppender, which rolls logs based on file size rather than a specific time. This eliminates the synchronization issues associated with the daily rolling behavior.

    Updated YAML Configuration

    Here’s how I updated the configuration:

    file:
      type: rollingfile
      file: ${path.logs}/${cluster.name}.log
      maxFileSize: 100MB
      maxBackupIndex: 10
      layout:
        type: pattern
        conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" 

    Key Changes:

    • Type: Changed from dailyRollingFile to rollingFile.
    • File Size Limit: Set maxFileSize to 100MB.
    • Backup: Retain up to 10 backup log files.
    • Removed Date Pattern: Eliminated the problematic datePattern field used by DailyRollingFileAppender.

    Happy Ending: Logs Restored

    After implementing the fix, Elasticsearch logs stopped disappearing. Interestingly, further investigation revealed that the midnight log gap was also related to Marvel indices transitioning into a new day. This caused brief latency as new indices were created for shards and replicas.

    Lessons Learned

    1. Understand Your Tools: Familiarity with Log4j’s appenders helped identify the issue quickly.
    2. Avoid Deprecated Features: DailyRollingFileAppender is prone to issues—switch to RollingFileAppender for modern setups.
    3. Analyze Related Systems: The Marvel index creation provided additional context for the midnight timing.

    Conclusion

    Debugging missing Elasticsearch logs required diving into the logging configuration and understanding how appenders handle file rolling. By switching to RollingFileAppender, I resolved the synchronisation issues and restored the missing logs.

    If you’re experiencing similar issues, check your logging configuration and avoid using DailyRollingFileAppender in favor of RollingFileAppender. This can save hours of debugging in the future.

    For more insights, explore Log4j Appender Documentation.

    Also, to learn how to clean data coming into Elasticsearch see Cleaning Elasticsearch Data Before Indexing.