Breaking through Roadblocks

Jan 2 2013

Hello in 2013! It has been ages since I’ve blogged anything, mainly because I enjoy Google’s social site, google+ way too much, despite, or perhaps due to it being filled mostly with my geek friends.

I decided to post this on wordpress, but it made me think about the possibilities to break the walled garden of g+ and somehow syndicate certain posts on this site. But that is perhaps material for another post.

What I wanted to share are two stumbling blocks, trivial for most of you, but very frustrating until you know the solution. A total must for your typical batch processing is the xargs utility. Used typically with find, it allows you to perform commands on a list of arguments. By default it lists all arguments on one line:

find . | grep svg$ | xargs echo

Now find itself has a million switches to perform filtering, but I prefer not diving into the manpage if given the option :) The default behavior of xargs leaves a lot to be desired, because usually there is a big list you are working on, and bash and other shells have a limit on the number of arguments. Additionally, it is very likely you will need another argument to follow the one you got passed. The magical parameter you’re looking for is -i that splits the inline list and calls the provided command separately for each passed argument. You can place that argument anywhere on the commandline using {} brackets:

find . | grep mp4$ | xargs -i ffmpeg -i {} -sameq {}.webm

So while the manpage surely includes this info, I bet someone will find this through a google query and will appreciate it :)

The other big stumbling block that I also hit with ruby is about Xpath queries in python. Big thanks to Patryk Zawadzki for the solution. When parsing inkscape svg xml documents, they actually include numerous namespaced tags, so simple queries like //rect will fail. You need to prepend all elements with the svg namespace (such as //{http://www.w3.org/2000/svg}rect). Full example here:

#!/usr/bin/env python3

import glob
import os
import csv
from xml.etree import ElementTree

members = csv.reader(open('members.csv'))
TEMPLATE = 'template.svg'

for data in members:
  print(data[0])
  svg = ElementTree.parse(TEMPLATE)
  svg.find(".//{http://www.w3.org/2000/svg}text[@id='memno']/{http://www.w3.org/2000/svg}tspan").text = data[0]
  svg.find(".//{http://www.w3.org/2000/svg}text[@id='name']/{http://www.w3.org/2000/svg}tspan").text = data[1]
  svg.find(".//{http://www.w3.org/2000/svg}text[@id='validto']/{http://www.w3.org/2000/svg}tspan").text = data[2]
  svg.write('./out/%s.svg' % (data[0]))
  os.system("inkscape -A ./out/%s.pdf ./out/%s.svg" % (data[0],data[0]))
  os.unlink('./out/%s.svg' % (data[0]))

Update: Turns out the “ evaluation in the xargs example was flawed. Thanks for spotting. Additionally, find itself seems to have an iterator of its own:

<code>find . -name '*.avi' -exec echo ffmpeg -i '{}' -sameq '{}'.webm  ';'</code>