In part one, I showed how I keep track of my diet using Emacs. In this part I'll explain how I extract that data into something that can be used outside of Emacs.

The full source code is available at the end of the post.

What it does

I wrote a small ruby script that converts all of my daily entries into a json file. It turns something like this:

** CAL-OUT Diet for day <2019-10-05 Sat>
  :PROPERTIES:
  :Weight:   167.3
  :END:

| Timestamp              | Food                   | Calories | Quantity |   Total |
|------------------------+------------------------+----------+----------+---------|
| [2019-10-05 Sat 08:42] | Cup of tea             |       54 |        1 |      54 |
| [2019-10-05 Sat 11:34] | Chocolate Chip Pancake |      2.7 |      202 |   545.4 |
| [2019-10-05 Sat 14:46] | Cup of tea             |       54 |        1 |      54 |
| [2019-10-05 Sat 14:46] | Vanilla yogurt raisins |     4.33 |       30 |   129.9 |
| [2019-10-05 Sat 17:45] | Beef meatballs         |     2.02 |      370 |   747.4 |
| [2019-10-05 Sat 17:45] | Fresh Pasta            |     1.31 |      306 |  400.86 |
| [2019-10-05 Sat 17:45] | Garlic bread           |     2.75 |       94 |   258.5 |
| [2019-10-05 Sat 19:58] | Cup of tea             |       54 |        1 |      54 |
|------------------------+------------------------+----------+----------+---------|
| Total                  |                        |          |          | 2244.06 |
#+TBLFM: $5=$3*$4::$LR5=vsum(@2$5..@-I$5)

into this:

[
  {
    "keyword": "CAL-OUT",
    "date": "2019-10-05 Sat",
    "properties": {
      "Weight": "167.3"
    },
    "entries": [
      [
	"Timestamp",
	"Food",
	"Calories",
	"Quantity",
	"Total"
      ],
      [
	"[2019-10-05 Sat 08:42]",
	"Cup of tea",
	"54",
	"1",
	"54"
      ],
      [
	"[2019-10-05 Sat 11:34]",
	"Chocolate Chip Pancake",
	"2.7",
	"202",
	"545.4"
      ],
      [
	"[2019-10-05 Sat 14:46]",
	"Cup of tea",
	"54",
	"1",
	"54"
      ],
      [
	"[2019-10-05 Sat 14:46]",
	"Vanilla yogurt raisins",
	"4.33",
	"30",
	"129.9"
      ],
      [
	"[2019-10-05 Sat 17:45]",
	"Beef meatballs",
	"2.02",
	"370",
	"747.4"
      ],
      [
	"[2019-10-05 Sat 17:45]",
	"Fresh Pasta [Ronzoni Homestyle]",
	"1.31",
	"306",
	"400.86"
      ],
      [
	"[2019-10-05 Sat 17:45]",
	"Garlic bread",
	"2.75",
	"94",
	"258.5"
      ],
      [
	"[2019-10-05 Sat 19:58]",
	"Cup of tea",
	"54",
	"1",
	"54"
      ]
    ]
  }
]

I went with json as there are far more libraries available that parse json compared to org-mode, which makes it easier to use the data elsewhere.

Other potential solutions

There were a couple of other ideas I considered.

1. Write an elisp org-mode exporter

org-mode has a very flexible exporting system; it's what turns my .org blog posts into html so that Jekyll can publish them. There are ten exporters included with org-mode, including exporters for html, latex, and OpenDocument.

This was my first idea, but my elisp knowledge in 2012 was limited at best. There's also the issue that Emacs has to start up and run to extract data this way. It's fine if I'm exporting data whilst using Emacs, but not quite as quick if I'm running it from the command line.

It might be something I try again in the future now that I have more lisp experience.

2. Write a script in a different language

I went with ruby as it has an easy-to-use org-mode library that is fairly up-to-date. There are other org-mode parsers available, but not all of them are as robust as ruby's.

I'd like to try the common lisp parse, cl-org-mode, at some point.

3. Export the content to SQLite instead

Json is useful for passing data around, but a dedicated database with SQL is a better tool when it comes to querying. I've worked with SQLite for smaller projects and it's always quick to get started with.

Ruby has some nice libraries for interacting with SQLlite, so it could be integrated fairly easily into my existing script. Emacs also has SQL mode which can query databases and return the results into a buffer.

The full script

The script requires the following gems to be installed:

It wasn't designed to be used as a standalone script as it's part of a larger library that I use to build this site. It's also not as ruby-fied as I would like, but it does the job for now.

Usage

Add the following to the bottom of the full script, replacing f.input and f.output with the full input and output paths of the diet file.

extractor = OrgDietExtractor.new do |f|
  f.from     = '2020-01-01'
  f.to       = '2020-12-31'
  f.keywords = %w[CAL-OUT CAL-CANCEL]
  f.headline = 'Diet for day'
  f.input    = 'diet.org'
  f.output   = 'diet.json'
end

extractor.generate

Available configuration variables are:

from
Any entries made before this date will be excluded from the export file.
to
Like from, except it excludes entries made after this date.
keywords
An array of headline keywords to include in the export. Any headlines with a different keyword (such as CAL-IN) will be excluded.
headline
The starting words of headlines to include.
input
The full path of the diet file to parse.
output
The full path where the json data should go.

The script

# frozen_string_literal: true

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <https://www.gnu.org/licenses/>.

require 'json'
require 'org-ruby'

class OrgDietExtractor
  attr_accessor :format
  attr_accessor :input
  attr_accessor :output
  attr_accessor :from
  attr_accessor :to
  attr_accessor :headline
  attr_accessor :keywords
  attr_accessor :current_parent
  attr_accessor :current_level

  def initialize
    yield self
  end

  def headline_contents(headline)
    text_lines = headline.body_lines
    text_lines.shift

    # Strip properties stuff.
    text_lines = text_lines.map do |line|
      task_line = line.output_text.strip
      next if task_line.start_with?(':')

      task_line
    end

    text_lines = text_lines.reject { |c| c.nil? || c.empty? }
    text_lines.join("\n")
  end

  def belongs_to?(headline)
    @current_parent == headline
  end

  def entry_in_range?(headline)
    return true unless @from || @to

    # Extract the dates
    date       = Date.parse(diet_date(headline))
    start_date = Date.parse(@from) if @from
    end_date   = Date.parse(@to) if @to

    return false if start_date && date < start_date
    return false if end_date && date > end_date

    true
  end

  def extract_table_headers(content)
    header_line = content.split("\n").first
    return if header_line.nil?

    header_cols = header_line.split('|').map(&:strip)

    header_cols.reject { |col| col.empty? }
  end

  def extract_table_rows(content)
    table_lines = content.split("\n")
    return if table_lines.nil?

    # Remove first + last line
    table_lines = table_lines.drop(2)
    table_lines.pop(3)

    table_lines.map do |line|
      line.split('|').map(&:strip).reject(&:empty?)
    end
  end

  def diet_node?(headline)
    @keywords.include?(headline.keyword) && headline.headline_text.start_with?(@headline)
  end

  def diet_date(headline)
    headline.headline_text.scan(/\<(.*?)\>/).last.first
  end

  def diet_properties(headline)
    headline.property_drawer
  end

  def diet_entries(contents)
    headers = extract_table_headers(contents)
    return nil unless headers

    table = []
    table << headers
    table += extract_table_rows(contents)

    table
  end

  def remap_property(task, from, to)
    return unless task[:properties].key?(from)

    task[to] = task[:properties][from]
    task[:properties].delete(from)
  end

  def generate()
    # Get full paths.
    input_file  = File.absolute_path(@input)
    output_file = File.absolute_path(@output)

    # Load the org file and parse it.
    file_in = IO.read(input_file)
    doc     = Orgmode::Parser.new(file_in)

    # Parse all headlines.
    diet_days = []

    doc.headlines.each do |headline|
      # Want top-level headlines in date range only.
      next unless diet_node?(headline)
      next unless entry_in_range?(headline)

      # Create diet info container.
      diet_entry = {
	keyword:    headline.keyword,
	date:       diet_date(headline),
	properties: diet_properties(headline),
	entries:    diet_entries(headline_contents(headline))
      }

      diet_days << diet_entry
    end

    # Generate the file.
    File.open(output_file, 'w') do |file_out|
      file_out.write(diet_days.to_json)
    end
  end
end