The developer API of Fitbit provides access to the data collected by it’s personal trackers for use with custom applications development. Besides read also write access can be used not only to it’s own but on behalf of other platform users via OAuth authentication. A comprehensive documentation of the Fitbit API can be found here: https://dev.fitbit.com/docs/ . Continue reading “Access to Fitbit API with Scala”
tf-idf is the approach of determine relevant documents by the count of words they contain. While this would emphasis common words like ‘the’, tf-idf takes for each word it’s ratio of the overall appearence in a set of documents – the inverse-doucment-frequence. Here I’ll try to give a simple MapReduce implemention. As a little quirk Avro will be used to model the representation of a document. We are going to need secondary sorting to reach an effective implementation.
Flume is a distributed system to aggregate log files into the Hadoop Distributed File System (HDFS). It has a simple design of Events, Sources, Sinks, and Channels which can be connected into a complex multi-hop architecture.
While Flume is designed to be resilient “with tunable reliability mechanisms for fail-over and recovery” in this blog post we’ll also look at the reliable forwarding of rsyslog, which we are going to use to store postfix logs in Amazon S3.
Web scraping is the process of extracting entities from web pages. This entities can either be news articles, blog posts, products, or any other information displayed on the web. Web pages consist of HTML (a XML like structure) and therefor present information in a structured form, in a DOM (Document Object Model), which can be extracted. The structure is usually not very strict or complaint to other web pages and object to change as the web page evolves. Using scripts over compiled programming languages has therefor a lot of advantages. On the other hand compiled languages have often a speed advantage as well as a broad foundation of fundamental libraries. The DOM reference implementation of the W3C is written in Java for example.
Java’s approach of opening up the JVM (Java Virtual Machine) for other (dynamically typed) languages is a great way to combine the benefits of both, a compiled and scripted language. With JDK 8 compiling dynamic languages to the JVM has become simpler with potentially improved implementations of compilers and runtime systems through the invokedynamic instruction. I found that this presentation currently explains best what invokedynamic is and how it works in general.