In my previous blog I added some Python code. But when I published my code all code was aligned to the left-hand side. Unfortunately indentation is very important for Python so some googling got me to this site:
http://www.simplebits.com/cgi-bin/simplecode.pl
Add your code and click Process and then remove all br tags (e.g. by using vim)
Copy this result in the html version of the blog.
Thursday, November 26, 2009
Python File Manipulation
Last week I ripped one of my dvd movies to have a backup. Unfortunately I forgot to add the dutch subtitles but I was too lazy to start all over again. So a simple google search gave me the dutch subtitle files, unfortunately not in sync with my dvd rip.
The subs appeared 1 min and something too fast, so my first intention was to manually edit the file... to no avail, after 2 or 3 lines I already gave up. This was going to take too much time: 1536 subtitles, appx 10s per subtitle... over 4 hours of work.
Ok, what could I do to get the subs synced? I think it's fairly easy by using bash scripting in combination with sed and awk, but I have no experience in bash, so that solution was out of the question.
What else could I do to get this file updated, and that's where Python came to the foreground, the only programming language I know a little, but enough to give it a try. The code in this blog has not indentation, so if you want to use this code, you only need to adjust the correct indentation.
The structure of the file looks like:
1518
02:26:23,091 --> 02:26:24,978
subtitle here
A first thing was to detect the lines that contain the start and end time of the subtitle. In the example above it's line 2 but of course I can not select the time lines by line number. The detection however, is fairly easy to do by using a regular expression.
This is the regex I use: ^(0[0-2])
The mechanism I use to update the file is as follows:
1. Read the source file
2. Check if line match the regular expression
2.1. if match: update the line by adding time to both time-stamps and write to an output file
2.2. if no match: write the line to the same output file.
Fairly straight-forward, isn't it?
When a line matches the regex, I need to filter the start and end time. Luckily these times are always in the same positions so I sliced the lines to get start and end time.
Once I have those times I do some calculations to calculate the new time and then I write the file to the new output file.
So in this file section I use the addTime function. This function expects a string
and will return a new string. This string is always of the format hh:mm:ss,xxx where xxx are the milliseconds. Unfortunately the separator isn't always the colon, so I have to slice up the input string to get the hour, minutes, seconds and milliseconds. Then I add to milliseconds, seconds and minutes the necessary time
and check if adding these times don't pass the normal hour times, since 64s does not exist in the real world. For example if number of seconds exceeds 59s, I add 1 to number of minutes and substract 60 from the number of seconds.
In the code below the "add*" variables are constants, defined somewhere else, so you can update to your own needs.
The updateTime function makes that all hh, mm and ss are always 2 digits and milli is always 3 digits.
We're there, I'm quite sure that this script could be done in less code by an experienced programmer, which I'm not. But at least it worked for me and it took me less than 1 hour to get this working, so it gave me time to write this blog and even then I would still be changing the file manually and probably with more mistakes.
Final code:
The subs appeared 1 min and something too fast, so my first intention was to manually edit the file... to no avail, after 2 or 3 lines I already gave up. This was going to take too much time: 1536 subtitles, appx 10s per subtitle... over 4 hours of work.
Ok, what could I do to get the subs synced? I think it's fairly easy by using bash scripting in combination with sed and awk, but I have no experience in bash, so that solution was out of the question.
What else could I do to get this file updated, and that's where Python came to the foreground, the only programming language I know a little, but enough to give it a try. The code in this blog has not indentation, so if you want to use this code, you only need to adjust the correct indentation.
The structure of the file looks like:
1518
02:26:23,091 --> 02:26:24,978
subtitle here
A first thing was to detect the lines that contain the start and end time of the subtitle. In the example above it's line 2 but of course I can not select the time lines by line number. The detection however, is fairly easy to do by using a regular expression.
This is the regex I use: ^(0[0-2])
The mechanism I use to update the file is as follows:
1. Read the source file
2. Check if line match the regular expression
2.1. if match: update the line by adding time to both time-stamps and write to an output file
2.2. if no match: write the line to the same output file.
Fairly straight-forward, isn't it?
When a line matches the regex, I need to filter the start and end time. Luckily these times are always in the same positions so I sliced the lines to get start and end time.
Once I have those times I do some calculations to calculate the new time and then I write the file to the new output file.
finput = open("/home/input", 'r')
foutput = open("/home/newsubtitles", 'w') #new empty file
for line in finput:
if re.match(regex, line):
start = line[0:12]
end = line[17:29]
newstart = addTime(start)
newend = addTime(end)
#replace start and end by newstart and newend
line.replace(start, newstart)
line.replace(end, newend)
#write updated line to output file
foutput.write(line)
else:
#in case of no match, just write line to output
foutput.write(line)
#when all lines are copied, close file
foutput.close()
So in this file section I use the addTime function. This function expects a string
and will return a new string. This string is always of the format hh:mm:ss,xxx where xxx are the milliseconds. Unfortunately the separator isn't always the colon, so I have to slice up the input string to get the hour, minutes, seconds and milliseconds. Then I add to milliseconds, seconds and minutes the necessary time
and check if adding these times don't pass the normal hour times, since 64s does not exist in the real world. For example if number of seconds exceeds 59s, I add 1 to number of minutes and substract 60 from the number of seconds.
In the code below the "add*" variables are constants, defined somewhere else, so you can update to your own needs.
def addTime(time):
hh = int(time[0:2])
mm = int(time[3:5])
ss = int(time[6:8])
milli = int(time[9:12])
milli = milli + addmilli
ss = ss + addss
mm = mm + addmm
if milli > 999:
milli = milli - 1000
ss = ss + 1
if ss > 59:
ss = ss - 60
mm = mm + 1
if mm > 59:
mm = mm - 60
hh = hh + 1
newtime = updateTime(hh, mm, ss, milli)
return newtime
The updateTime function makes that all hh, mm and ss are always 2 digits and milli is always 3 digits.
def updateTime(hh, mm, ss, milli):
milli = str(milli)
if len(milli) == 1:
milli = "00%s" %milli
elif len(milli) == 2:
milli = "0%s" %milli
ss = str(ss)
if len(ss) == 1:
ss = "0%s" %ss
mm = str(mm)
if len(mm) == 1:
mm = "0%s" %mm
hh = str(hh)
if len(hh) == 1:
hh = "0%s" %hh
newtime = "%s:%s:%s,%s" %(hh, mm, ss, milli)
return newtime
We're there, I'm quite sure that this script could be done in less code by an experienced programmer, which I'm not. But at least it worked for me and it took me less than 1 hour to get this working, so it gave me time to write this blog and even then I would still be changing the file manually and probably with more mistakes.
Final code:
import re
regex = "^(0[0-2])"
input = "/home/dewolfth/input.txt"
subs = "/home/dewolfth/mymovie.txt"
addhour = 1
addmm = 1
addss = 7
addmilli = 500
def updateTime(hh, mm, ss, milli):
milli = str(milli)
if len(milli) == 1:
milli = "00%s" %milli
elif len(milli) == 2:
milli = "0%s" %milli
ss = str(ss)
if len(ss) == 1:
ss = "0%s" %ss
mm = str(mm)
if len(mm) == 1:
mm = "0%s" %mm
hh = str(hh)
if len(hh) == 1:
hh = "0%s" %hh
newtime = "%s:%s:%s,%s" %(hh, mm, ss, milli)
return newtime
def addTime(time):
hh = int(time[0:2])
mm = int(time[3:5])
ss = int(time[6:8])
milli = int(time[9:12])
milli = milli + addmilli
ss = ss + addss
mm = mm + addmm
if milli > 999:
milli = milli - 1000
ss = ss + 1
if ss > 59:
ss = ss - 60
mm = mm + 1
if mm > 59:
mm = mm - 60
hh = hh + 1
newtime = updateTime(hh, mm, ss, milli)
return newtime
def main():
finput = open(input, 'r')
foutput = open(subs, 'w')
for line in finput:
if re.match(regex,line):
start = line[0:12]
newstart = addTime(start)
end = line[17:29]
newend = addTime(end)
line = line.replace(start, newstart)
line = line.replace(end, newend)
foutput.write(line)
else:
foutput.write(line)
foutput.close()
if __name__ == "__main__":
main()
Wednesday, November 25, 2009
From techn writer to software developer and back and ...
The last weeks I have been lucky to do some development. Even though the tasks I did, would be for a sr developer some minor work, the job helped me getting a better view over our software. All development I did was in Python.
My first job was to refactor a wizard so that it became consistent to the new specs of the software.
After this refactoring work, I refactored the same wizard again and two others so that these wizards became available in the web interface.
A lot of development and testing later, I succeeded in all my tasks, I even did some bug-fixes, w00t.
But time has come that my engineering tasks are over, and that I'm back in my natural habitat of writing documentation. I'm a tech writer after all...
So what did I learn during these last weeks as developer?
First I have to admit that I really enjoyed my time as engineer. Not that I've become an experienced engineer (pretty far away from that level), but I just like to learn new stuff. And just like everything, you learn by doing so. So I'm glad that my manager gave me the opportunity to spend some time in development, since I don't have too much time in private to become a developer.
I have learned quite a lot about Q-layer's Q-Action/Workflow mechanism and using variables in that system. Even though it's quite complex, imho it's a great system.
Once you get to know the terminology and mechanism, it goes quite fast to create your own Q-Actions, annex workflows.
So before returning completely to tech writing mode, I was thinking, why not write a short message before I start documenting the things I learned the passed few weeks...
Bye bye til the next time!
My first job was to refactor a wizard so that it became consistent to the new specs of the software.
After this refactoring work, I refactored the same wizard again and two others so that these wizards became available in the web interface.
A lot of development and testing later, I succeeded in all my tasks, I even did some bug-fixes, w00t.
But time has come that my engineering tasks are over, and that I'm back in my natural habitat of writing documentation. I'm a tech writer after all...
So what did I learn during these last weeks as developer?
First I have to admit that I really enjoyed my time as engineer. Not that I've become an experienced engineer (pretty far away from that level), but I just like to learn new stuff. And just like everything, you learn by doing so. So I'm glad that my manager gave me the opportunity to spend some time in development, since I don't have too much time in private to become a developer.
I have learned quite a lot about Q-layer's Q-Action/Workflow mechanism and using variables in that system. Even though it's quite complex, imho it's a great system.
Once you get to know the terminology and mechanism, it goes quite fast to create your own Q-Actions, annex workflows.
So before returning completely to tech writing mode, I was thinking, why not write a short message before I start documenting the things I learned the passed few weeks...
Bye bye til the next time!
Thursday, September 10, 2009
Reading manuals is not wasting time... sometimes
I wanted to test Oracle's Enterprise Linux (Unbreakable Linux). On the website of Oracle I couldn't find one big iso, so I downloaded the 5 CD iso images.
As I don't have enough computers, I use VirtualBox. Since I thought that it was not possible to install an OS via multiple iso images with VirtualBox, I did the following:
I merged the 5 iso images into one big iso image. Don't worry, this doesn't require complex manipulations, this is what I did:
1. copied the first iso to a new iso image: "cp first.iso new.iso"
2. executed the next command: "cat second.iso >> new.iso"
This just appends the second iso image to the copy of the first iso image.
Repeat this for the other iso images ("cat third.iso >> new.iso", "cat fourth.iso >> new.iso", and "cat fifth.iso >> new.iso")
Unfortunately, during the installation of Enterprise Linux, the wizard kept asking for Installation CD 2... No luck for me.
Back to installing via the 5 iso images then. And after a tip I found out that it is easy to switch from iso image during the installation.
First make sure that all iso images are loaded in VirtualBox.
Then create the a virtual machine and let it boot from the first iso image.
When the install wizards asks for the second iso image, click Devices in the VirtualBox menu, and select Unmount CD/DVD-ROM.
Then Devices > Mount CD/DVD-ROM > CD/DVD-ROM Image and select the second iso image.
Simple as that... perhaps I should try reading manuals...
As I don't have enough computers, I use VirtualBox. Since I thought that it was not possible to install an OS via multiple iso images with VirtualBox, I did the following:
I merged the 5 iso images into one big iso image. Don't worry, this doesn't require complex manipulations, this is what I did:
1. copied the first iso to a new iso image: "cp first.iso new.iso"
2. executed the next command: "cat second.iso >> new.iso"
This just appends the second iso image to the copy of the first iso image.
Repeat this for the other iso images ("cat third.iso >> new.iso", "cat fourth.iso >> new.iso", and "cat fifth.iso >> new.iso")
Unfortunately, during the installation of Enterprise Linux, the wizard kept asking for Installation CD 2... No luck for me.
Back to installing via the 5 iso images then. And after a tip I found out that it is easy to switch from iso image during the installation.
First make sure that all iso images are loaded in VirtualBox.
Then create the a virtual machine and let it boot from the first iso image.
When the install wizards asks for the second iso image, click Devices in the VirtualBox menu, and select Unmount CD/DVD-ROM.
Then Devices > Mount CD/DVD-ROM > CD/DVD-ROM Image and select the second iso image.
Simple as that... perhaps I should try reading manuals...
Thursday, September 3, 2009
Turn Ubuntu Jaunty into Mac OSX Leopard
Just to test if mac layout is so much better than gnome, well, not really, imho they're equal, though, I must admit that mac layout is just that bit fancier...
On this site it is really good explained how you can turn your ubuntu (intrepid) into mac osx leopard layout. This tutorial has been tested on Ubuntu Jaunty
The only part where I didn't follow the tutorial, is in the section Configuring usplash screen where you need to download the .deb files (splashy_....deb and libsplashy....deb)
I had to execute this command to get splashy working:
dpkg –force-overwrite -i /var/cache/apt/archives/splashy_0.3.13-3ubuntu1_i386.deb
Hope to get a real mac soon ...
On this site it is really good explained how you can turn your ubuntu (intrepid) into mac osx leopard layout. This tutorial has been tested on Ubuntu Jaunty
The only part where I didn't follow the tutorial, is in the section Configuring usplash screen where you need to download the .deb files (splashy_....deb and libsplashy....deb)
I had to execute this command to get splashy working:
dpkg –force-overwrite -i /var/cache/apt/archives/splashy_0.3.13-3ubuntu1_i386.deb
Hope to get a real mac soon ...
Wednesday, September 2, 2009
Get to know the power of GIMP
I recently bought me a DSLR (Sony Alpha-300). Of course the images, taken by my new dslr are too big to publish on websites, such as picasaweb.
I used to open my windows in VirtualBox and then use Microsoft's Picture Manager (comes with Office) to resize my images to web-quality. I have to admit that this worked fine, but I got annoyed by the slow access speeds from my virtual windows to my local drives.
I heard that GIMP could be used as batch system, so I started looking around to see how I could compress my images to web-quality for multiple files. And in fact, it's fairly easy. GIMP comes with the most common Linux distributions, you only need to install gimp-plugin-registry with your package manager (apt-get, yast, yum, ...).
Once the package is installed, start GIMP. The batch processor can be found under Filters > Batch > Batch Process.
In the window that appear you can select your images, and the actions to be executed on each tab. For example for compressing images, go to the Output tab, select the image type and select a quality level.
And yet another step away from Windows :-)
I used to open my windows in VirtualBox and then use Microsoft's Picture Manager (comes with Office) to resize my images to web-quality. I have to admit that this worked fine, but I got annoyed by the slow access speeds from my virtual windows to my local drives.
I heard that GIMP could be used as batch system, so I started looking around to see how I could compress my images to web-quality for multiple files. And in fact, it's fairly easy. GIMP comes with the most common Linux distributions, you only need to install gimp-plugin-registry with your package manager (apt-get, yast, yum, ...).
Once the package is installed, start GIMP. The batch processor can be found under Filters > Batch > Batch Process.
In the window that appear you can select your images, and the actions to be executed on each tab. For example for compressing images, go to the Output tab, select the image type and select a quality level.
And yet another step away from Windows :-)
Friday, August 28, 2009
Numerical keypad doesn't work anymore
Every once in a while, my numerical keypad only works as a mouse on my Ubuntu. I'm experiencing this behavior since the Intrepid version (8.10). Luckily for me it's not a big deal to solve it, but I had to google it every time. So perhaps by writing this small blog, it stays in my mind.
So to get the numerical keypad working again, go to System > Preferences > Keyboard
Then go to the tab Mouse Keys and clear the option "Pointer can be controlled using the keypad".
I still need to figure out which update causes this behavior...
My thanks to the person of this solution (http://bit.ly/47sAQ).
Cheerio!
So to get the numerical keypad working again, go to System > Preferences > Keyboard
Then go to the tab Mouse Keys and clear the option "Pointer can be controlled using the keypad".
I still need to figure out which update causes this behavior...
My thanks to the person of this solution (http://bit.ly/47sAQ).
Cheerio!
Subscribe to:
Posts (Atom)