Thursday, November 26, 2009

Get code blocks easily in your blogs

In my previous blog I added some Python code. But when I published my code all code was aligned to the left-hand side. Unfortunately indentation is very important for Python so some googling got me to this site:

Add your code and click Process and then remove all br tags (e.g. by using vim)

Copy this result in the html version of the blog.

Python File Manipulation

Last week I ripped one of my dvd movies to have a backup. Unfortunately I forgot to add the dutch subtitles but I was too lazy to start all over again. So a simple google search gave me the dutch subtitle files, unfortunately not in sync with my dvd rip.
The subs appeared 1 min and something too fast, so my first intention was to manually edit the file... to no avail, after 2 or 3 lines I already gave up. This was going to take too much time: 1536 subtitles, appx 10s per subtitle... over 4 hours of work.
Ok, what could I do to get the subs synced? I think it's fairly easy by using bash scripting in combination with sed and awk, but I have no experience in bash, so that solution was out of the question.
What else could I do to get this file updated, and that's where Python came to the foreground, the only programming language I know a little, but enough to give it a try. The code in this blog has not indentation, so if you want to use this code, you only need to adjust the correct indentation.
The structure of the file looks like:

02:26:23,091 --> 02:26:24,978
subtitle here

A first thing was to detect the lines that contain the start and end time of the subtitle. In the example above it's line 2 but of course I can not select the time lines by line number. The detection however, is fairly easy to do by using a regular expression.
This is the regex I use: ^(0[0-2])

The mechanism I use to update the file is as follows:
1. Read the source file
2. Check if line match the regular expression
2.1. if match: update the line by adding time to both time-stamps and write to an output file
2.2. if no match: write the line to the same output file.

Fairly straight-forward, isn't it?

When a line matches the regex, I need to filter the start and end time. Luckily these times are always in the same positions so I sliced the lines to get start and end time.
Once I have those times I do some calculations to calculate the new time and then I write the file to the new output file.

finput = open("/home/input", 'r')
foutput = open("/home/newsubtitles", 'w') #new empty file
for line in finput:
    if re.match(regex, line):
        start = line[0:12]
        end = line[17:29]
        newstart = addTime(start)
        newend = addTime(end)
        #replace start and end by newstart and newend
        line.replace(start, newstart)
        line.replace(end, newend)
        #write updated line to output file
        #in case of no match, just write line to output
#when all lines are copied, close file

So in this file section I use the addTime function. This function expects a string
and will return a new string. This string is always of the format hh:mm:ss,xxx where xxx are the milliseconds. Unfortunately the separator isn't always the colon, so I have to slice up the input string to get the hour, minutes, seconds and milliseconds. Then I add to milliseconds, seconds and minutes the necessary time
and check if adding these times don't pass the normal hour times, since 64s does not exist in the real world. For example if number of seconds exceeds 59s, I add 1 to number of minutes and substract 60 from the number of seconds.
In the code below the "add*" variables are constants, defined somewhere else, so you can update to your own needs.

def addTime(time):
    hh = int(time[0:2])
    mm = int(time[3:5])
    ss = int(time[6:8])
    milli = int(time[9:12])
    milli = milli + addmilli
    ss = ss + addss
    mm = mm + addmm
    if milli > 999:
        milli = milli - 1000
        ss = ss + 1
    if ss > 59:
        ss = ss - 60
        mm = mm + 1
    if mm > 59:
        mm = mm - 60
        hh = hh + 1
    newtime = updateTime(hh, mm, ss, milli)
    return newtime

The updateTime function makes that all hh, mm and ss are always 2 digits and milli is always 3 digits.

def updateTime(hh, mm, ss, milli):
    milli = str(milli)
    if len(milli) == 1:
        milli = "00%s" %milli
    elif len(milli) == 2:
        milli = "0%s" %milli
    ss = str(ss)
    if len(ss) == 1:
        ss = "0%s" %ss
    mm = str(mm)
    if len(mm) == 1:
        mm = "0%s" %mm
    hh = str(hh)
    if len(hh) == 1:
        hh = "0%s" %hh
    newtime = "%s:%s:%s,%s" %(hh, mm, ss, milli)
    return newtime

We're there, I'm quite sure that this script could be done in less code by an experienced programmer, which I'm not. But at least it worked for me and it took me less than 1 hour to get this working, so it gave me time to write this blog and even then I would still be changing the file manually and probably with more mistakes.

Final code:

import re

regex = "^(0[0-2])"
input = "/home/dewolfth/input.txt"
subs = "/home/dewolfth/mymovie.txt"
addhour = 1
addmm = 1
addss = 7
addmilli = 500

def updateTime(hh, mm, ss, milli):
    milli = str(milli)
    if len(milli) == 1:
        milli = "00%s" %milli
    elif len(milli) == 2:
        milli = "0%s" %milli
    ss = str(ss)
    if len(ss) == 1:
        ss = "0%s" %ss
    mm = str(mm)
    if len(mm) == 1:
        mm = "0%s" %mm
    hh = str(hh)
    if len(hh) == 1:
        hh = "0%s" %hh
    newtime = "%s:%s:%s,%s" %(hh, mm, ss, milli)
    return newtime

def addTime(time):        
    hh = int(time[0:2])
    mm = int(time[3:5])
    ss = int(time[6:8])
    milli = int(time[9:12])
    milli = milli + addmilli
    ss = ss + addss
    mm = mm + addmm
    if milli > 999:
        milli = milli - 1000
        ss = ss + 1
    if ss > 59:
        ss = ss - 60
        mm = mm + 1
    if mm > 59:
        mm = mm - 60
        hh = hh + 1
    newtime = updateTime(hh, mm, ss, milli)
    return newtime    

def main():
    finput = open(input, 'r')
    foutput = open(subs, 'w')
    for line in finput:
        if re.match(regex,line):
            start = line[0:12]
            newstart = addTime(start)
            end = line[17:29]
            newend = addTime(end)
            line = line.replace(start, newstart)
            line = line.replace(end, newend)
if __name__ == "__main__":

Wednesday, November 25, 2009

From techn writer to software developer and back and ...

The last weeks I have been lucky to do some development. Even though the tasks I did, would be for a sr developer some minor work, the job helped me getting a better view over our software. All development I did was in Python.
My first job was to refactor a wizard so that it became consistent to the new specs of the software.
After this refactoring work, I refactored the same wizard again and two others so that these wizards became available in the web interface.
A lot of development and testing later, I succeeded in all my tasks, I even did some bug-fixes, w00t.
But time has come that my engineering tasks are over, and that I'm back in my natural habitat of writing documentation. I'm a tech writer after all...

So what did I learn during these last weeks as developer?
First I have to admit that I really enjoyed my time as engineer. Not that I've become an experienced engineer (pretty far away from that level), but I just like to learn new stuff. And just like everything, you learn by doing so. So I'm glad that my manager gave me the opportunity to spend some time in development, since I don't have too much time in private to become a developer.

I have learned quite a lot about Q-layer's Q-Action/Workflow mechanism and using variables in that system. Even though it's quite complex, imho it's a great system.
Once you get to know the terminology and mechanism, it goes quite fast to create your own Q-Actions, annex workflows.

So before returning completely to tech writing mode, I was thinking, why not write a short message before I start documenting the things I learned the passed few weeks...

Bye bye til the next time!