Dev Corner

Saturday, July 29, 2006

Postgresql part 1

There are two pieces of free software that I really love. Erlang and Postgresql. So, a while back I decided to marry the two to allow me to issue queries from Erlang via a pure erlang library for Postgresql. At the time, there were no postgresql libraries in jungerl.
There is now, but I had created my own and I think it would be worthwhile posting anyway.

Originally, I had opted for a port program. It worked, but I didn't want to have to deal with SIGSEGVs coming from the port program, or memory leaks since the port program was written in C++. Its not to say that I couldn't periodically restart the port process or use another language with a garbage collector and established postgresql bindings. I just didn't feel right to have it in a 24/7 system that could have those types of potential dangers.

Also, I saw that implementing a pure erlang postgresql library wasn't terribly difficult, so I went ahead.

Here are some of the main features that I really think make it stand out:
  • Connection pooling built in
    • Pool connections can auto-regenerate after a certain amount of queries
  • Easy query issuing
    • Convenience functions to execute common queries like ones that return a single row or result
  • gen_server and gen_fsm implementation for each connection
  • pool statistics for server monitoring
  • convenience functions to execute queries that return single results
  • Sql statement logging and timing code built in
  • Rock solid from what I've seen so far
What's missing:
  • Ability to execute prepared statements
This library will soon be posted to jungerl as a postgresql alternative.

But before I do that, I wanted to create a set of blog entries that detail the design of the library , from start to finish, that will hopefully give people new to erlang something to learn from with a real world purpose.

I absolutely advise anyone starting out in Erlang to read Joe Armstrongs thesis. Required reading before we start, ok?

There are really good tutorials for beginners on his page as well.

One more thing I wish to note, there is an odbc port interface which works nicely. I just didn't want to have 20 or 50 processes going in a postgresql connection pool. That is another reason I wanted to go the pure erlang route.

I'll try to make a new series post every week until done.
The next installment in this series will be posted soon...

Tuesday, February 21, 2006

ETS is handy

The other day I wanted to create a counter that is shared amongst many clients that were connecting to a server. I wanted to assign a unique id to each, so I quickly listed how I would do this in erlang.

These are:
  1. Create a gen_server process that has private state that contains the counter variable
  2. Use ETS
  3. Make the accepting gen_server keep track of and assign incremental ids.
  4. ...
There are probably more, but those were a few that I quickly thought of.

So I started thinking about the create a new gen_server option. Perhaps it would have been a little overkill, and then I had to worry about what would happen, if by some odd chance, that the gen_server died.
If it did, then I would have to remember the state of the counter. It is true that the gen_server would be very simple that the chance of this happening are very slim, but still, you never know.

I thought about putting the counter somewhere and having the socket acceptor gen_server I had, assign the ids. I held off on that one, I didn't want to pollute the duties of that gen_server with lots of ancillary things.

So then I started looking into ETS. ETS stands for Erlang Term Storage. From the manual:
"... provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data."
I had used ETS before for storing server performance tables. I noticed that there was a function named: update_counter. This function updates a counter field in an ets table.
So I thought, cool, let me try creating some quick code to prototype this:

ETS stores tuples of information, so lets create a record that will be the data that is stored:

-record(counter_entry, {id, nextid=1}).

I added id in there to that I would be able to create several counters. Each counter sequence would be identifiable so that we can have many types of counters.

Next I added a function to initialize the counter table:

init(CounterID) ->
    ets:new(t_mycounters, [set, {keypos, 2}, public, named_table]),
    ets:insert(t_mycounters, #counter_entry{id=CounterID, nextid=1}).

When you create a new ets table, you can provide some options for how it is accessed and indexed. I passed in {keypos, 2}. This tells ets which tuple element number will serve as the index field of the table.
I gave the table public access, meaning that any other process can query and manipulate the table. Otherwise, only the process creator can manipulate the table.

The table is also a named_table. This way, I can use the table by passing in the atom: t_mycounters to identify the table. If I hadn't done this, then I should have stashed away the table identifier returned by ets:new and used that each time I wanted to operate on the table.

I then defined a function to get the next successive value for the counter:

getnext
(CounterID) ->
    ets:update_counter(t_freechatcounter, CounterID, {3, 1}).

ets provides a useful function to atomically treat a table field as a counter. In this case I told ets to update tuple position 3 by one. This is where the {3,1} comes in. It will actually be position 3 of the counter_entry record. Which is nextid.

Here is how to use it:

counters:init(chat_id_sequence).
NextSeqNo = counters:getnext(chat_id_sequence).

Kinda like the concept of postgresql sequences!

Well, that's it for now. If you were to take one thing away, it should be: ETS is handy and cool.

Sunday, January 22, 2006

List Comprehensions

As I was trekking through yaws country, I realized the value that the recent(sort of) feature of list comprehensions are. I'm still getting used to them all the time. I find myself writing a recursive function to iterate over data using some of the functions in the lists module, or even forgetting about map, fold(l|r) etc...

I was writing a page that contacts an erlang node in order to display memory data for the VM.

The function to get the remote data is:


get_memory_data(Server) ->
case rpc:call(Server, erlang, memory, []) of
{badrpc, Reason} ->
{error, Reason};
Data ->
{ok, Data}
end.


I use the most excellent rpc module to easily execute a remote method call to a remote erlang node, the name of which is passed in as a parameter.

What I wanted to do with this page is to output the memory information in a simple table. In order to do this in yaws, I wrote an tag that outputs this information dynamically.

<erl>

out(A) ->
Data = yaws_api:parse_query(A),
{ok, Server} = wsform:get_value_atom("server", Data),
{ehtml, output_memory(Server)}.

</erl>

Notice that I'm using the form library;)

What I did initially was this:


output_memory(Server) ->
case get_memory_data(Server) of
{ok, Data} ->
Fun = fun({Name, Value}) ->
{tr, [], [{td, [], f("~p", [Name])},
{td, [], f("~p", [Value])}]}
end,
Rows = lists:map(Fun, Data),
{'div', [{style, "float: left; padding-right: 10px;"}],
{table, [], [{tr, [], {td, [{colspan, "2"}], "Memory Stats"}},
Rows]}};
{error, Reason} ->
{p, [], f("No memory data: ~p", [Reason])}
end.

This function gets the memory data and loops over the list to create a tr and two td tags for each memory statistic. It uses lists:map to do this

But then I remembered list comprehensions. I then wrote:

output_memory(Server) ->
case get_memory_data(Server) of
{ok, Data} ->
Rows = [{tr, [], [{td, [{class,"dataHeader"}], f("~p", [Name])},
{td, [], f("~p", [Value])}]} || {Name, Value} <- Data],
{'div', [{style, "float: left; padding-right: 10px;"}],
{table, [], [{tr, [], {td, [{colspan, "2"}, {class, "twocolDataHeader"}], "Memory Stats"}},
Rows]}};
{error, Reason} ->
{p, [], f("No memory data: ~p", [Reason])}
end.

I've highlighted the list comprehension line of code. Now I don't have to define my own anonymous fun.

Is there any difference? Sematically, no I don't believe, but I think you will find that list comprehensions will simplify list iteration code in the long run. They are used in qlc queries too.

I will cover qlc and mnesia in another blog entry. I thought that I should put another yaws entry out there before I go on to postgresql and mnesia.

One more thing, I did write a parameter validation block using the form validation library, which I placed before the memory output block. Here it is:

<erl>
out(A) ->

Rules = [{"server", [required]}],

Data = yaws_api:parse_query(A),
case wsform:validate_page_data(Data, Rules) of
ok ->
ok;
Error ->
[{html, f("Invalid data: ~p ~p", [Error, Data])},
break]
end.


</erl>

Returning ok from an out function is a noop. It does nothing and continues the page. If there is an error, a cryptic programmer-friendly error message is displayed and the page processing is stopped via the break atom.

See you next time.

Friday, January 13, 2006

Yaws and form parameters

The first post!

Well, I'll start with my obsession with erlang. It is my favorite language for server side apps. It is simply the best, in my opinion, concurrent programming language for server applications.
Alot of other people have written the advantages, my hope of this blog is to demonstrate how to use erlang for practical problems.

Enough praising, lets get down to the code.

I started playing with yaws, a erlang-based very fast and cool web server, and created a simple login form, but was used to PHPs QuickForm. So I wanted to create some code to parse and validate parameters.

With QuickForm, you create a form object, add parameter definitions and validation rules are attached to those parameters. QuickForm also supports rendering, but for now I figured I would start on validation only.

Here is how I wanted to define form parameters to process:

Rules = [{"username", [required, {regex, "^test$"}]},
{"password", [required,integer_rule()]},
{"id", [required, integer_rule()]}],

The rules is just a list of tuples that is of the form:
{form_parameter_name, [list_of_rules]}

Now the next step is to develop an api for processing form elements and checking them against the rules.

Lets start with:
validate_page_data(Data, Rules)
Where Data is the parsed GET or POST data from yaws. Rules a list of rules that we have defined previously.

Now lets define the meat of the function:
validate_page_data(Data, Rules) ->
validate_page_data(Data, Rules, []).

validate_page_data(Data, [{Name,Rules} | Rest], Errors) ->
case process_rules(Data, Name, Rules) of
ok ->
validate_page_data(Data, Rest, Errors);
{errors, Reason} ->
validate_page_data(Data, Rest, [Reason|Errors])
end;
validate_page_data(_Data, [], Errors) ->
case is_errors_empty(Errors) of
true ->
ok;
false ->
{errors, Errors}
end.
The purpose of validate_page_data is to loop over all parameter definitions in order to apply rules to each parameter. It also collects all errors for rules that failed rule validation.

What is a rule? Well, here is the example again:
 Rules = [{"username", [required, {regex, "^test$"}]},
{"password", [required, {callback, fun integer_rule_cb/2}]},
{"id", [required, integer_rule()]}],
So there is a required rule here, a regex one and a function called integer_rule, and a callback rule.
The callback rule is very useful because, with it, you can define a whole set of custom rules. integer_rule, as you will see, is actually a callback.
Now onto rule processing, what we want a rule processor to do is to parse the list
of rules for each parameter, and call the rule handler for the data in the form.

Here it is:
%% Data here is the posted data
%% Name is the form parameter name
%% required is the rule atom
process_rule(Data, Name, required) ->
case lists:keymember(Name, 1, Data) of
true ->
ok;
false ->
{error, lists:flatten(io_lib:format("~s is required", [Name]))}
end;
process_rule(Data, Name, {regex, Regex}) ->
case get_value(Name,Data) of
{ok, Value} ->
case regexp:match(Value, Regex) of
{match, _Start, _Length} ->
ok;
nomatch ->
{error, lists:flatten(io_lib:format("~s not valid", [Name]))};
Error ->
{error, lists:flatten(io_lib:format("~s invalid regex ~p", [Name,Error]))}
end;
_Else ->
ok
end;
process_rule(Data, Name, {callback, Fun}) ->
case lists:keysearch(Name, 1, Data) of
{value, {K,V}} ->
Fun(K,V);
false ->
ok
end.

process_rules(Data, Name, Rules) ->
process_rules(Data, Name, Rules, []).

process_rules(Data, Name, [Rule|Rest], Errors) ->
case process_rule(Data, Name, Rule) of
ok ->
process_rules(Data, Name, Rest, Errors);
{error, Reason} ->
process_rules(Data, Name, Rest, [Reason |Errors])
end;
process_rules(_Data, _Name, [], Errors) ->
case is_errors_empty(Errors) of
true ->
ok;
false ->
{errors, Errors}
end.
The required rule is a first class rule instead of being a callback, because it acutally has to run whether or not the argument is present in the form parameters.
If a parameter is not present, in the data, then they are never run for validation. I could have created the regex rule as a callback, but decided to make it first class also. Just a choice, nothing more.

Now onto creating a rule that will be needed for numbers. integer_rule.
We can create a callback for that one. Here is how we would use it:
Rules = [{"id", [required, integer_rule()]}]
We want id to be a required integer. I've created an integer_rule convenience function for syntactic sugar.

Here it is:
integer_rule_cb(Name, Value) ->
case string:to_integer(Value) of
{error, _Reason} ->
{error, lists:flatten(io_lib:format("~s must be numeric", [Name]))};
{_IntValue, []} ->
ok
end.

integer_rule() ->
{callback, fun integer_rule_cb/2}.

regex_rule(Regex) ->
{regex, Regex}.

I also created a regex_rule function for fun. Callbacks, as all rules, must either return ok, or {error, Reason}.

I could also augment each of the rules to add a custom reason string for each parameter, but have left this out on purpose to be breif.

Here are some convenience functions to get at form data, after validation has been done:
get_value(Name,Data) ->
case lists:keysearch(Name, 1, Data) of
{value, {_K,V}} ->
{ok, V};
false ->
{error, notfound}
end.

eget_value_integer(Name,Data) ->
case get_value_integer(Name, Data) of
Result = {ok, _Value} ->
Result;
Error ->
throw(Error)
end.

get_value_integer(Name, Data) ->
case lists:keysearch(Name, 1, Data) of
{value, {_K,V}} ->
case string:to_integer(V) of
{error, Reason} ->
{error, Reason};
{IntValue, _} ->
{ok, IntValue}
end;
false ->
{error, notfound}
end.

%% Here is a test for everything.
test() ->
Data = [{"username","test"},{"password","password"},
{"id", "23"}],
Rules = [{"username", [required, {regex, "^test$"}]},
{"password", [required,integer_rule()]},
{"id", [required, integer_rule()]}],
io:format("~p~n", [validate_page_data(Data, Rules)]),
io:format("~p~n", [get_value_integer("id", Data)]),
io:format("~p~n", [get_value_integer("username", Data)]),
io:format("~p~n", [get_value("username",Data)]).

In the coming week I'll polish up this library and post it somewhere, maybe the yaws mailing list. Hopefully, there is some interest in it.

I recommend anyone programming in erlang to consider yaws for front end management webapps for their erlang servers or even for full-fledged web applications. It simply is very cool and its easy to rpc:call with remote erlang nodes.

Thats it for this post. I will cover alot of topics in coming posts. I wrote a postgresql native erlang access library. I will go over the structure of that in this blog, in many posts of course. I will be posting that to jungerl RSN.