In part one, I showed how I keep track of my diet using Emacs. In this part I'll explain how I extract that data into something that can be used outside of Emacs.
The full source code is available at the end of the post.
What it does
I wrote a small ruby script that converts all of my daily entries into a json file. It turns something like this:
** CAL-OUT Diet for day <2019-10-05 Sat> :PROPERTIES: :Weight: 167.3 :END: | Timestamp | Food | Calories | Quantity | Total | |------------------------+------------------------+----------+----------+---------| | [2019-10-05 Sat 08:42] | Cup of tea | 54 | 1 | 54 | | [2019-10-05 Sat 11:34] | Chocolate Chip Pancake | 2.7 | 202 | 545.4 | | [2019-10-05 Sat 14:46] | Cup of tea | 54 | 1 | 54 | | [2019-10-05 Sat 14:46] | Vanilla yogurt raisins | 4.33 | 30 | 129.9 | | [2019-10-05 Sat 17:45] | Beef meatballs | 2.02 | 370 | 747.4 | | [2019-10-05 Sat 17:45] | Fresh Pasta | 1.31 | 306 | 400.86 | | [2019-10-05 Sat 17:45] | Garlic bread | 2.75 | 94 | 258.5 | | [2019-10-05 Sat 19:58] | Cup of tea | 54 | 1 | 54 | |------------------------+------------------------+----------+----------+---------| | Total | | | | 2244.06 | #+TBLFM: $5=$3*$4::@>$5=vsum(@2$5..@-I$5)
into this:
[ { "keyword": "CAL-OUT", "date": "2019-10-05 Sat", "properties": { "Weight": "167.3" }, "entries": [ [ "Timestamp", "Food", "Calories", "Quantity", "Total" ], [ "[2019-10-05 Sat 08:42]", "Cup of tea", "54", "1", "54" ], [ "[2019-10-05 Sat 11:34]", "Chocolate Chip Pancake", "2.7", "202", "545.4" ], [ "[2019-10-05 Sat 14:46]", "Cup of tea", "54", "1", "54" ], [ "[2019-10-05 Sat 14:46]", "Vanilla yogurt raisins", "4.33", "30", "129.9" ], [ "[2019-10-05 Sat 17:45]", "Beef meatballs", "2.02", "370", "747.4" ], [ "[2019-10-05 Sat 17:45]", "Fresh Pasta [Ronzoni Homestyle]", "1.31", "306", "400.86" ], [ "[2019-10-05 Sat 17:45]", "Garlic bread", "2.75", "94", "258.5" ], [ "[2019-10-05 Sat 19:58]", "Cup of tea", "54", "1", "54" ] ] } ]
I went with json as there are far more libraries available that parse json
compared to org-mode
, which makes it easier to use the data elsewhere.
Other potential solutions
There were a couple of other ideas I considered.
1. Write an elisp org-mode
exporter
org-mode
has a very flexible exporting system; it's what turns my .org
blog
posts into html so that Jekyll can publish them. There are ten exporters
included with org-mode
, including exporters for html, latex, and OpenDocument.
This was my first idea, but my elisp knowledge in 2012 was limited at best. There's also the issue that Emacs has to start up and run to extract data this way. It's fine if I'm exporting data whilst using Emacs, but not quite as quick if I'm running it from the command line.
It might be something I try again in the future now that I have more lisp experience.
2. Write a script in a different language
I went with ruby as it has an easy-to-use org-mode
library that is
fairly up-to-date. There are other org-mode parsers available, but not all of
them are as robust as ruby's.
I'd like to try the common lisp parse, cl-org-mode, at some point.
3. Export the content to SQLite instead
Json is useful for passing data around, but a dedicated database with SQL is a better tool when it comes to querying. I've worked with SQLite for smaller projects and it's always quick to get started with.
Ruby has some nice libraries for interacting with SQLlite, so it could be integrated fairly easily into my existing script. Emacs also has SQL mode which can query databases and return the results into a buffer.
The full script
The script requires the following gems to be installed:
- json
- org-ruby
It wasn't designed to be used as a standalone script as it's part of a larger library that I use to build this site. It's also not as ruby-fied as I would like, but it does the job for now.
Usage
Add the following to the bottom of the full script, replacing f.input
and
f.output
with the full input and output paths of the diet file.
extractor = OrgDietExtractor.new do |f| f.from = '2020-01-01' f.to = '2020-12-31' f.keywords = %w[CAL-OUT CAL-CANCEL] f.headline = 'Diet for day' f.input = 'diet.org' f.output = 'diet.json' end extractor.generate
Available configuration variables are:
from
- Any entries made before this date will be excluded from the export file.
to
- Like
from
, except it excludes entries made after this date. keywords
- An array of headline keywords to include in the export. Any
headlines with a different keyword (such as
CAL-IN
) will be excluded. headline
- The starting words of headlines to include.
input
- The full path of the diet file to parse.
output
- The full path where the json data should go.
The script
# frozen_string_literal: true # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # You should have received a copy of the GNU General Public License # along with this program. If not, see <https://www.gnu.org/licenses/>. require 'json' require 'org-ruby' class OrgDietExtractor attr_accessor :format attr_accessor :input attr_accessor :output attr_accessor :from attr_accessor :to attr_accessor :headline attr_accessor :keywords attr_accessor :current_parent attr_accessor :current_level def initialize yield self end def headline_contents(headline) text_lines = headline.body_lines text_lines.shift # Strip properties stuff. text_lines = text_lines.map do |line| task_line = line.output_text.strip next if task_line.start_with?(':') task_line end text_lines = text_lines.reject { |c| c.nil? || c.empty? } text_lines.join("\n") end def belongs_to?(headline) @current_parent == headline end def entry_in_range?(headline) return true unless @from || @to # Extract the dates date = Date.parse(diet_date(headline)) start_date = Date.parse(@from) if @from end_date = Date.parse(@to) if @to return false if start_date && date < start_date return false if end_date && date > end_date true end def extract_table_headers(content) header_line = content.split("\n").first return if header_line.nil? header_cols = header_line.split('|').map(&:strip) header_cols.reject { |col| col.empty? } end def extract_table_rows(content) table_lines = content.split("\n") return if table_lines.nil? # Remove first + last line table_lines = table_lines.drop(2) table_lines.pop(3) table_lines.map do |line| line.split('|').map(&:strip).reject(&:empty?) end end def diet_node?(headline) @keywords.include?(headline.keyword) && headline.headline_text.start_with?(@headline) end def diet_date(headline) headline.headline_text.scan(/\<(.*?)\>/).last.first end def diet_properties(headline) headline.property_drawer end def diet_entries(contents) headers = extract_table_headers(contents) return nil unless headers table = [] table << headers table += extract_table_rows(contents) table end def remap_property(task, from, to) return unless task[:properties].key?(from) task[to] = task[:properties][from] task[:properties].delete(from) end def generate() # Get full paths. input_file = File.absolute_path(@input) output_file = File.absolute_path(@output) # Load the org file and parse it. file_in = IO.read(input_file) doc = Orgmode::Parser.new(file_in) # Parse all headlines. diet_days = [] doc.headlines.each do |headline| # Want top-level headlines in date range only. next unless diet_node?(headline) next unless entry_in_range?(headline) # Create diet info container. diet_entry = { keyword: headline.keyword, date: diet_date(headline), properties: diet_properties(headline), entries: diet_entries(headline_contents(headline)) } diet_days << diet_entry end # Generate the file. File.open(output_file, 'w') do |file_out| file_out.write(diet_days.to_json) end end end