Musings of an anonymous geek

November 1, 2007

Python Magazine Defies Skeptics

Filed under: Big Ideas,Productivity,Python,Scripting,Technology — m0j0 @ 8:40 am

I was informed today by the publisher that Python Magazine has been deemed “viable” using all of the important business metrics that they use to evaluate the magazine. This is fantastic news, and speaks volumes about the viability of the magazine in *non* business terms, as well as the model we’ve been employing at MTA since 2002.

We’ve never (yet) done anything to market the magazine. We didn’t really do a whole bunch of market analysis and research. We don’t pitch old guys in suits to convince them to fund our work. Not with php|architect, and not with Python Magazine. In each case, there was someone with a passion for the language, who was plugged into the community, who could see that a magazine would be valued as a tool by the community. In each case, we could see that there were people with great knowledge, and people with relatively little knowledge, and that those people didn’t often get around to finding each other to share that knowledge.

What we found with php|architect was that the magazine served as a bridge between those who have knowledge, and those who want it. We’re finding the same exact thing happening with Python Magazine.

There are millions of things we’ve discovered about how people consume documentation and think about languages and lots of other things along the way. It’s an immensely interesting business to be involved in. We’ve learned a whole lot about how publishing, distribution, translation, and even weird things like banking work in other countries all over the world. We’ve learned about how communities organize themselves into subcommunities in the digital realm and how different communication mechanisms affect how information is perceived and consumed and used. It’s fascinating stuff.

In the end, though, I think the success of the magazines is owed to the fact that the people producing them have a passion for the content. We’re still plugged into the respective communities – and not for the sake of the magazines. We’re plugged into the communities to help us perform at our day jobs, and we produce the magazine to help ourselves and our friends in the community get at the information they need.

Advertisements

October 31, 2007

Getting at your Google Spreadsheets columns

Filed under: Productivity,Python,Scripting,Technology — m0j0 @ 8:29 am

Regular readers know that I’ve been working on a pet project to build a command line interface to Google Spreadsheets. Basically, I find working in a spreadsheet interface to be clunky and uncomfortable. If I need to put in a new row, I’d rather just be prompted at the CLI for the values I want to put in for each column. Later I’ll add the ability to edit and query my spreadsheet from the command line as well. The nice thing is that I can’t see any reason for this application to be specific to any *particular* spreadsheet, so anyone should be able to use this for whatever Google Spreadsheets document they want 😀

For now, though, in the event that other coders are struggling with their own project, here’s how I finally figured out how to print out the “column: value” pairs for every row in my spreadsheet:

spreadsheet_id = PromptForSpreadsheet(gd_client)
worksheet_id = PromptForWorksheet(gd_client, spreadsheet_id)
columnfeed = ListGetAction(gd_client, spreadsheet_id, worksheet_id)
for attr, val in enumerate(columnfeed.entry):
   for key in val.custom.keys():
      print "%s:   %s" % (key, val.custom[key].text)
   print "\n"

I had initially hit a bit of a snag in getting to this point, because it’s not made clear in the Google Documentation how to reference your columns. They *do* tell you how to print “val.content.text”, but that prints all of the column:value combinations together in one big, long string. You can’t even parse that, because nothing is quoted, and column data can include anything you might use as a delimiter. I finally got around to looking at this again today, and with the help of IDLE (which I’ve now accepted as my saviour) finally poked and prodded ‘val’ until it spit out something close to what I was looking for.With that task out of the way, it should be easy enough to start performing write operations, and I believe I remember query operations being documented by Google separately – so hopefully this will go more quickly now.Wish me luck!

October 26, 2007

Python, regex, and IRC

Filed under: Python,Scripting,Technology — m0j0 @ 8:53 am

So, I’m on IRC a lot. I’m on a lot of channels, too. I’m on more than one Python channel. One scenario in these chans that comes up somewhat often is one in which a user converting from PHP, Perl, Ruby, or whatever walk in and want to get a better understanding of how regex works in Python.

Flaming ensues.

Flaming the flamers is a topic for another blog post, but for some reason, Python users seem to really be resistant to regex. At one point, I actually suggested in one of the larger channels that someone write an article for Python Magazine about the proper use (or non-use) of regular expressions. That was weeks ago now. I got no bites.

So here again, I’ll put this in a very public place and say that if you can get me a proposal for an article that details when to use or not use regex, and how to use them properly when you *should* use them, we will pay you to write that article.

October 25, 2007

PlanetPlanet++

Filed under: Python,Scripting,Technology — m0j0 @ 9:01 pm

I have to admit that I have not really made friends with Python as a web scripting language. I use it for network, system, and database scripting, and I’ve done some web services stuff with it, but I haven’t been able to use it for things that have, say, a browser interface. Until the other night.

I got email that this guy who maintained a site was going to shut it down. This really annoyed me. Then I remembered that a new web host I’m using actually supports Python. Sure, it’s a really old, crusty version of Python, but Python nonetheless. The site being shut down was running ‘PlanetPlanet’, which is a feed aggregator website package. You tell it the url’s for all of the feeds that interest you, and it goes and grabs all of that content from the various feeds, formats it, and spits back something that looks pretty nice.

PlanetPlanet needs no database, and no Apache modules. I unzipped it, configured it, fed it the feed url’s, went to the site, and I was live. I got it running in under 10 minutes, and had templates in place and security accounted for in another 10 minutes. Very nice!!

October 23, 2007

I need a Google Apps Mashup

Filed under: Big Ideas,Productivity,Python,Scripting,Technology — m0j0 @ 7:31 am

Google Docs is nice. Calendar is really nice. Gmail is ok, too. The notion that you can more or less use any of the tools without going too far is pretty nice, and they’ve opened things up with the API just enough to get some useful plugin capabilities, *and* there’s a Python client available for the Google Data API, which is nice (my experience with Google Spreadsheets notwithstanding). The problem now is that I would like something that goes beyond a simple plugin.

Outside of my day job doing infrastructure architecture and sysadmin work (with some development thrown in for good measure), I run Python Magazine. I have a ton of communication and deadlines to track in working for the magazine; I get several article proposals per week (sometimes per day), I’m working with contract people, other editors, technical folks on the back end of things, layout folks, the people writing the checks and managing invoices, and whoever I need to talk to for business development tasks. I send emails to a great number of people every day, just for the magazine.

I use Gmail for my Python Magazine mail (my pythonmagazine.com addy is forwarded to gmail. GMail also lets me send mail using my pythonmagazine.com email address (otherwise, this would not be a usable solution for me).

I use Google Calendar to track deadlines. Each article deadline is a full day event in Google Calendar. I’d also *like* to use Google Calendar as something of a logging tool to track out-of-band conversations I have with people on IRC or (gasp!) in person.

The reason I haven’t gone this route yet is because there’s no interface where I can, say, search for a person’s name, and get a nice list of the things related to that person, grabbed from GMail *and* Google Calendar (not that you’d need to stop integration efforts at those two services – they’re just the two most useful to *me* right now).

For my purposes, it would even be OK if Google just added an “include calendar results” in the GMail search interface.  That would give me a list, ordered by date, of conversations via email, perhaps GTalk, out-of-band events logged with Calendar, and deadlines, also tracked via Google Calendar. It could essentially be a time line of my working relationship with a person, which can be very useful.

It might even be useful to get a time line of events and conversations related to a specific topic, rather than a specific person. If I could do a search for “contract request”, this hypothetical interface would actually spit out a time line showing all of the interactions between me and our contract person specifically in relation to commissioning articles, because I use the term “contract request” in the subject of all contract requests, and would naturally carry that consistency into notes I might take about contract requests in Google Calendar or other apps.

Well, that’s my latest idea. I’m not sure what form the app would take. Ideally it would be a web page I can get to from anywhere, but I have yet to do anything significant with Python as a web scripting language (though I’ve rewritten a whole lot of old Perl code in Python for sysadmin-ish stuff). A fat client application would inevitably *not* be useful to a whole lot of folks… I dunno. Thoughts hereby solicited on that.

Let me know if something already exists that does this, or if I just wrote this whole post for nothing because Google already does this somehow. I don’t think it does. It seems to treat applications as separate entities, and the same account using different apps are different entities as well. There needs to be a higher level vision of the user as a single object across all of the applications in order to get at the kinds of interesting uses of data that are possible and would add a lot of value to the individual services. My $.02.

October 17, 2007

For my next pet project…

Filed under: Me stuff,Productivity,Python,Scripting,Technology — m0j0 @ 9:24 am

Stand back!

running install_egg_info
Writing /usr/lib/python2.5/site-packages/gdata.py-1.0.9.egg-info
brj@dawg:~/working/gdata.py-1.0.9$ python
Python 2.5.1 (r251:54863, May  2 2007, 16:27:44)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gdata
>>> print 'yay!'
yay!
>>>

No good can come of this! ;-P

Seriously, though – I really really strongly dislike spreadsheet interfaces. I hate resizing cells so I can see what’s going on, I hate cell selection, copy/pasting, and doing anything in those little cells. However, I really *need* to use one to handle some administrivia at Python Magazine, because it’s already being used by some back end processes/departments, and I don’t have time to write code and overhaul that whole process, and I don’t want to rock the boat anyway – what they have works – I just hate spreadsheets. It’s my problem, not theirs 😉

The good news is they use Google Docs, and there’s a Google Data client library for Python. So I’m creating a command line interface to the spreadsheet 🙂

September 7, 2007

New Job!

Filed under: Linux,Me stuff,Python,Scripting,Sysadmin,Technology — m0j0 @ 7:32 am

I started a new job about 6 weeks ago. I’m now doing infrastructure architecture at http://gfdl.noaa.gov

GFDL stands for Geophysical Fluid Dynamics Lab. It’s a NOAA site that supports atmospheric and climatology research. So in other words, the work I do supports research into things ranging from global warming to what the atmosphere on Mars is like to the weather here on Earth to simulations of the shape and movement of Katrina. I think of it as sort of an Institute for Advanced Study devoted to climatology research. Great minds in the field are here.

The research actually takes place at three different sites, DC, Boulder and Princeton, and affiliations with academic institutions flourish as well. In fact, I knew at least 4 people who worked here because of interactions between this site and cs.princeton.edu, my former employer.

My job, as it’s been described to me, is to provide a vision as to the design and direction of the infrastructure which supports the rather enormous high performance compute (HPC) cluster. This involves something of a learning curve to understand what’s here, how the systems are used, what the needs are, what people like and hate, where the redundancies and inefficiencies exist, etc. It also involves having meetings and coordinating with people who manage the network, the facilities (power & cooling, etc), the security policy, etc. I’ll be grilled on my ideas, and create prototypes and demos to get my ideas across. Lots of communication.

An aspect of my job will also involve getting my hands on the HPC clusters themselves as well, which are also at each site. All of the clusters are on top500.org last time I looked. Just go through the pages and search for GFDL and/or NOAA.

The systems here are all Linux. Even the standard-issue workstations are running Linux.  Scripting is done in Perl and shell, but Python is everywhere, so I’ll be doing either Perl or Python if I have the choice (because “shell” == “csh” here, which I never took well too, honestly). Some aspects of the environment are pretty fascinating. For example, how exactly do you store (*and* easily retrieve, on the fly) 9 PETABYTES of data? How do you back that up? How do you recover from hiccups? How do you instrument systems consisting of thousands of CPUs,  to pinpoint problems and get them fixed? And, by the way, how’s the best way to tune a system’s network stack to use a 50MBps pipe (that’s Mega *bytes*) efficiently enough to move multiple terabytes of data every day between collaborators at different sites? How, exactly, do you consolidate services and provide failover across geographically dispersed sites?

So that’s it for now 🙂  It’s too early to tell how things are going, really. It’s certainly not the cushy environment that Princeton U. was, but there are bigger challenges and problems to be solved here, and that’s the part I’m looking forward to.

July 21, 2007

Where does that Python DB handle go?

Filed under: Python,Scripting,Technology — m0j0 @ 11:33 am

UPDATE: Well, that didn’t take long. My solution works, but there’s a better way. Create the connection in main(), and create separate *cursors* for each Host. Cursors are cheap, and you reuse the connection. Thanks to Brend on #python (irc.freenode.net) for the enlightenment.

Python has been a wonderful language to get to know so far. However, one thing I didn’t really miss about Java and C++ were the decisions that are kind of forced upon you when you have various objects working together in a program, controlled by code in a “main” function. Here’s one decision I was faced with that took more thinking than I’d like to admit to make.

I have a class, we’ll call it “Host”. Of course, there are also methods for that class, like Host.record_name() and Host.update_mac() and the like. These methods are just wrappers around some SQL.

I also have, of course, a “main” function, which is where we create instances of Host and call the methods we need on the objects.

The question now is, where exactly should we create the handle to the database? There would appear to be 4 choices (without getting overly absurd):

  1. In main() and then pass the handle to the class’s ‘init()’
  2. In main(), but pass the handle to each method of any object that needs it.
  3. In Host.init(), so methods can refer to it without creating it themselves.
  4. Inside the individual methods themselves.

I chose option #3. The only part that bothers me about it is that a new “Host” is created, and then destroyed, for each of a couple thousand lines of a file. Hence, there is the overhead of setting up and tearing down a database connection a couple thousand times every time this script is run. I’m not sure (yet) what the cost of this will be.

Certainly, the cost will be less than if I did it for each *method* that was called, so that insures that option 4 is probably not the right way to go.

The semantics involved in creating the handle in main() and passing it to Host’s init seemed less than straightforward to me. It seemed there was the potential for main to pass in a copy of the handle at object instantiation time, have the object be destroyed when it’s done doing its work, taking the connection with it, and then main still thinks it’s a useful handle and tries to pass it to the next object that needs to be created. If that’s not the case, then what is the status of this thing that main created? What will happen if main tries to pass it to another object?

Of course, this problem would also exist if we passed it to each method that needed it. It would also be more overhead, and it would seem needlessly draconian, especially since *every* method of the object in question would need it.

If I’ve misunderstood something, or have made a poor choice, feel free to clarify/flame/enlighten me in the comments 🙂

Technorati Tags: , , , , , , ,

Social Bookmarks:

July 11, 2007

Python 2.5’s “partition” saves my bacon

Filed under: Python,Scripting,Technology — m0j0 @ 8:42 pm

So I was on more than one IRC channel today asking a question that I got lots of answers to, all of which looked really messy to me, so I dug into the documentation and found that in Python 2.5, strings have a new method, called “partition”.

Here’s the trouble I had, and why “partition” helps:


>>> m = "parent:child"
>>> x = m.split(":")[1]
>>> x
'child'

This is actually pretty close to what I want. I want to split m, and assign the second element in the resulting list to x. The problem arises when there is no “:”. In that case, that second line is actually trying to assign to x by referencing an index that is beyond the end of the list. You get an “index out of range” error.

So you’re left with a number of options. You can use an “if” to check for the existence of the colon.


if ":" in m:
x = m.split(":")[1]

I guess. But then I have to use another “if” to check and see if x was ever defined, or I have to initialize x ahead of time. At this point I’m onto at least three lines of code for something that takes 1 line in Ruby, Perl, or PHP. That just *can’t* be right!

You can also go ahead and just add the “:” explicitly to “m”:


>>> y = m + ":"
>>> y
'parent:'
>>> x = m.split(":")[1]
>>> x
''

Nothing wrong with that except I still think I just shouldn’t have to do that. I should be able to, as a last resort, go with regex and do a one-liner that assigns everything after “:” to x, and if there isn’t a “:”, then x would wind up being empty. Something along the lines of x = re.search(“:.*”, m) – or something like that.

Well, partition solves the problem rather nicely. From the documentation:

Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings. New in version 2.5.

Here’s what I did with it:


>>> y
'parent:'
>>> x = y.partition(":")[2]
>>> x
''

Of course, I got an empty string in this case, but “partition” did what I wanted “split” to do – namely, assign ” or None to “x” if the separator didn’t exist. I guess what makes this possible with “partition” is the fact that it returns a fixed-length tuple no matter what, while split returns an arbitrary-length list, which would introduce ambiguities in cases where multiple splits are done to a string.

Technorati Tags: , , , , , , , ,

Social Bookmarks:

July 3, 2007

Python Magazine Lives

Filed under: Big Ideas,Me stuff,Python,Scripting,Technology — m0j0 @ 10:58 pm

I have a confession to make: For the past 6 weeks, I’ve been leading a secret double life. By day, I’m a mild mannered system/network/database admin in academia. I also write some PHP, Perl, and Python code. By night, however, I’m an author and editor. My latest project is bigger than most. In fact, it’s an entire magazine. Devoted to Python.

I am the Editor in Chief of the newly launched Python Magazine.

Why on Earth Are You Doing This?

Python Magazine was created as a result of some rather unfortunate events in my own early experiences with Python. Getting started, of course, couldn’t be easier. It was what happened after I had been coding for a while that I had issues with. Once you needed to do something a little out of the ordinary with the language, it was hard to feel confident that the way I was going was the right way.

For example, I decided to wrap up a bunch of SQL calls in Python and expose them as an API using Python’s built in SimpleXMLRPCServer. I thought this was great, because then I could maintain a single back end API, and any language that could make an xmlrpc call could use it without me having to maintain APIs in several languages. Nice in theory, but people smarter than I questioned my decision to use the built in SimpleXMLRPCServer. The right road to take, though, was completely unclear.

As another example, I needed to get up to speed on using the python-ldap module, but found that a lot of the documentation lacked anything but the most basic of features, but I was trying to write a full-fledged LDAP management API (and accompanying command line and GUI tools). Other articles I found were outdated enough that people warned me not to bother with them, pointing to glaring issues with the code samples (which turned out to be true – some of what was in the code samples turned out to be completely deprecated!).

When I wanted to write code against a PostgreSQL server, the correct module to use was also not immediately obvious, so I had to hunt down the sites of various modules, see which ones were maintained, search for articles that weren’t 5 years old on how to use them… Gah!

What I really wanted was a resource that fed me information in a way that my brain likes to feed on information. I really wanted to learn to do things with Python the way I learned to do things with Linux, Solaris, PHP, and even non-technical things like photography, billiards, and brewing beer. I wanted a magazine.

There was no magazine. I was bummed.

How did you finagle this one, jonesy?

I have a friend named Marco Tabini. He’s a publisher. He runs Marco Tabini Associates in Toronto. He is the publisher of php|architect Magazine. He’s also a total geek. For fun he does things like writing lexical parsers… in PHP. Nobody should ever do that. He thinks it’s fun. I say pass on that if you are given the chance.

Marco and I met via email. I wrote to tell him that I had received my first issue of php|architect, and would not recommend it to a friend. I had found something like 15 errors (typos and grammatical issues) in the first two pages of the magazine. Marco wrote back and said “hey, we’re a small outfit. We’re an Italian immigrant and an Iranian immigrant, living in Canada, trying to edit technical articles written by people from all over the world with varying levels of experience with English… all for a largely American audience. Come help us out!” So I did.

Shortly thereafter I became Editor in Chief of php|architect. Now there were three of us. Oh joyous day.

Those were great times, and the magazine has since spawned its own online and on-site training, its own line of books, its own series of conferences, and even a cruise! It probably has stuff I don’t even know about because I haven’t worked directly for that particular publication since 2004.

The success of that magazine gave me the courage to go to Marco about 6 weeks ago and ask about letting me head up another magazine, this time about a topic of *my* choosing. We chatted on IRC for several hours over the course of about a week, bought a couple of domain names, settled on budgets and team members and all that, and set out to make Python Magazine a reality.

So… How’s it going?

Things are *REALLY* rolling now. There are columnists, there are tech editors, there are authors. Articles have been commissioned. Logos and trademarks are in place. The design team is rocking, the contract team is rolling, and the emails are flying. In the background, the sound of constant typewriter activity can be heard, just like on those old newscasts from the old Cronkite days. Exciting times!

That said, we still need LOTS of content. The behind-the-scenes of a magazine is that you’d really like to have something like 4 months worth of content “in the can” before “Volume 1 Issue 1” is released. I’m convinced that this has never happened in the history of publishing, but it’s a great goal to have, and I’d be pleased as punch to be the first person ever to achieve it 😉

If you’re a writer who is doing or has done something interesting with Python, or can illustrate high level concepts from the fields of computer science, research computing, or IT, using Python, we’d love to hear your thoughts!

Import This

In the end, I hope I can be a good steward to the language and community. I’ve already been in touch with a lot of wonderful people – authors and others – who’ve helped out in some way, either with the magazine, with my own buggy Python code, or both. That’s all the news that’s fit to print for now, but keep an eye here and on the Python Magazine website for more updates as they happen.

Oh yeah – and if you subscribe now, you get a discount, and a chance to win a MacBook!

Technorati Tags: , , , , , , , , , , ,

Social Bookmarks:
Next Page »

Create a free website or blog at WordPress.com.