Java Io Implementation of Unix/Linux "Tail -F"

Execute shell script in Java and read Output

The primary reason why this doesn't work is that `$2` is not the same as `ls -1 | tail -1`, even when $2 is set to that string.

If your script accepts a literal string with a command to execute, you can use eval to do so.

I created a complete example. Please copy-paste it and verify that it works before you try applying any of it to your own code. Here's Test.java:

import java.io.*;                                                            

public class Test {                                                          
  public static void main(String[] args) throws Exception {                  
    String[] command = { "./myscript", "key", "ls -t | tail -n 1" };         
    Process process = Runtime.getRuntime().exec(command);                    
    BufferedReader reader = new BufferedReader(new InputStreamReader(        
        process.getInputStream()));                                          
    String s;                                                                
    while ((s = reader.readLine()) != null) {                                
      System.out.println("Script output: " + s);                             
    }                                                                        
  }                                                                          
}

And myscript:

#!/bin/bash                                
key="$1"                                   
value=$(eval "$2")                             
echo "The command  $2  evaluated to: $value"

Here's how we can run myscript separately:

$ ls -t | tail -n 1
Templates

$ ./myscript foo 'ls -t | tail -n 1'
The command  ls -t | tail -n 1  evaluated to: Templates

And here's the result of running the Java code:

$ javac Test.java && java Test
Script output: The command  ls -t | tail -n 1  evaluated to: Templates

Scala way of Tailing files

It's easy to wrap in Scala from the looks of it.

object ScalaTailer {
   private val DefaultBufSize = 4096

   def apply(file: File, onFileNotFound: => Unit = (), onFileRotated: => Unit = (),
              handleException: Exception => Unit = (), handleLine: String => Unit = (),
              delayMillis: Long = 1000, end: Boolean = false, reOpen: Boolean = false,
              bufSize: Int = DefaultBufSize) = {
     val listener = new TailerListener {
        override def fileNotFound() = onFileNotFound
        override def fileRotated() = onFileRotated
        override def handle(ex: Exception) = handleException(ex)
        override def handle(line: String) = handeLine(line)
     }
     new Tailer(file, listener, delayMillis, end, reOpen, bufSize)
  }
}

val tailer = ScalaTailer(myFile, handleLine = println)

This is probably the reason why there's no Scala implementation of it. Besides, the Apache Commons stuff is pretty robust, so it's probably a good idea to use it!

WatcherService to tail Gzip log files

First I would like to answer the technical aspect of your question:

A WatchEvent just gives you the file name of a changed (or created or deleted) file and nothing more. So if you need any logic beyond this you have to be implement it on your own (or use an existing library of course).

If you want to read only new lines, you have to remember the position for each file and whenever this file is changed you can move to the last known position. To get the current position you could use a CountingInputStream from the Commons IO package (credits go to [1]). To jump to the last position, you can use the function skip.

But you are using a GZIPInputStream, this means that skip will not give you a great performance boost since skipping a compressed stream is not possible. Instead GZIPInputStream skip will uncompress the stream as it would when you are reading it so you will experience only little performance improvements (try it!).

What I don't understand is why you are using compressed log files at all? Why don't you write uncompressed logs with a DailyRollingFileAppender and compress it at the end of the day, when the application doesn't access it anymore?

Another solution could be to keep the GZIPInputStream (store it) so that you don't have to reread the file again. It may depend on how many log files you have to watch to decide if this is reasonable.

Now some questions on your requirements:

You didn't mention the reason why you want to watch the log files in real time. Why don't you centralize your logs (see Centralised Java Logging)? For example take a look on logstash and this presentation (see [2] and [3]) or on scribe or on splunk, which is commercial (see [4]).

A centralized log would give you the opportunity to really have real time reactions based on your log data.

[1] https://stackoverflow.com/a/240740/734687

[2] Using elasticsearch, logstash & kibana to create realtime dashboards - slides

[3] Using elasticsearch, logstash & kibana to create realtime dashboards - video

[4] Log Aggregation with Splunk - slides

Update

First, a Groovy script to generate a zipped log file. I start this script from GroovyConsole each time I want to simulate a log file change:

// Run with GroovyConsole each time you want new entries
def file = new File('D:\\Projekte\\watcher_service\\data\\log.gz')

// reading previous content since append is not possible
def content
if (file.exists()) {
    def inStream = new java.util.zip.GZIPInputStream(file.newInputStream())
    content = inStream.readLines()
}

// writing previous content and append new data
def random  = new java.util.Random()  
def lineCount = random.nextInt(30) + 1
def outStream = new java.util.zip.GZIPOutputStream(file.newOutputStream())

outStream.withWriter('UTF-8') { writer ->
    if (content) {
        content.each { writer << "$it\n" }
    }
    (1 .. lineCount).each {
        writer.write "Writing line $it/$lineCount\n"
    }
    writer.write '---Finished---\n'
    writer.flush()
    writer.close()
}

println "Wrote ${lineCount + 1} lines."

Then the logfile reader:

import java.nio.file.FileSystems
import java.nio.file.Files
import java.nio.file.Path
import java.nio.file.Paths
import java.nio.file.StandardOpenOption
import java.util.zip.GZIPInputStream
import org.apache.commons.io.input.CountingInputStream
import static java.nio.file.StandardWatchEventKinds.*

class LogReader
{
    private final Path dir = Paths.get('D:\\Projekte\\watcher_service\\data\\')
    private watcher
    private positionMap = [:]
    long lineCount = 0

    static void main(def args)
    {
        new LogReader().processEvents()
    }

    LogReader()
    {
        watcher = FileSystems.getDefault().newWatchService()
        dir.register(watcher, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)
    }

    void processEvents()
    {
        def key = watcher.take()
        boolean doLeave = false

        while ((key != null) && (doLeave == false))
        {
            key.pollEvents().each { event ->
                def kind = event.kind()
                Path name = event.context()

                println "Event received $kind: $name"
                if (kind == ENTRY_MODIFY) {
                    // use position from the map, if entry is not there use default value 0
                    processChange(name, positionMap.get(name.toString(), 0))
                }
                else if (kind == ENTRY_CREATE) {
                    processChange(name, 0)
                }
                else {
                    doLeave = true
                    return
                }
            }
            key.reset()
            key = watcher.take()
        }
    }

    private void processChange(Path name, long position)
    {
        // open file and go to last position
        Path absolutePath = dir.resolve(name)
        def countingStream =
                new CountingInputStream(
                new GZIPInputStream(
                Files.newInputStream(absolutePath, StandardOpenOption.READ)))
        position = countingStream.skip(position)
        println "Moving to position $position"

        // processing each new line
        // at the first start all lines are read
        int newLineCount = 0
        countingStream.withReader('UTF-8') { reader ->
            reader.eachLine { line ->
                println "${++lineCount}: $line"
                ++newLineCount
            }
        }
        println "${++lineCount}: $newLineCount new lines +++Finished+++"

        // store new position in map
        positionMap[name.toString()] = countingStream.count
        println "Storing new position $countingStream.count"
        countingStream.close()
    }
}

In the function processChange you can see 1) the creation of the inputstreams. The line with the .withReader creates the InputStreamReader and the BufferedReader. I use always Grovvy, it is Java on stereoids and when you start using it, you cannot stop. A Java developer should be able to read it, but if you have questions just comment.

terminal - run java app in background and how to close it?

Normally when starting you get the pid returend like so:

~ $ nohup java -jar server.jar &
[1] 3305
~ $ nohup: ignoring input and appending output to ‘nohup.out’

to see if it is running you can issue

~ $ ps -ef | grep  server
user1  3305  2936  0 13:58 pts/1    00:00:00 java -jar server.jar

if you see a line like the above it is running. You may also hava a look at the nohup.out file, which is written to the directory you started the server in, by using

tail nohup.out

to kill the process issue kill . Where pid is the process id, you either remembered, or will find out by looking at the second row of the "ps -ef | grep server" command, in our case 3305

kill 3305

kill without options tries to end the process gracefully. Read more abut kill and ps by using

man kill

and

man ps

respectively.

Split File - Java/Linux

One way is to use regular unix commands to split the file and the prepend the last 1000 bytes from the previous file.

First split the file:

split -b 30000000 inputfile part.

Then, for each part (ignoring the farst make a new file starting with the last 1000 bytes from the previous:

unset prev
for i in part.*
do if [ -n "${prev}" ]
  then 
    tail -c 1000 ${prev} > part.temp
    cat ${i} >> part.temp
    mv part.temp ${i}
  fi
  prev=${i}
done

Before assembling we again iterate over the files, ignoring the first and throw away the first 1000 bytes:

unset prev
for i in part.*
do if [ -n "${prev}" ]
  then 
    tail -c +1001 ${i} > part.temp
    mv part.temp ${i}
  fi
  prev=${i}
done

Last step is to reassemble the files:

cat part.* >> newfile

Since there was no explanation of why the overlap was needed I just created it and then threw it away.