Code Answer: April 2011

Friday, April 29, 2011

Java Library with subgraph isomorphism problem support ?

Hi,

I'm trying to analyze the usage of "#include" in C files (what is included first, dependencies...).

To do so, I extract from a C file the "#include" and I'm building a graph. I would like to identify common patterns in this graph...

So far, I'm using JGraphT as the graph engine (not sure this is the correct expression) and JGraph for the rendering (however using jgraph is a bit problematic since the Layouts are no longer included in the free release).

I've been unable to find any isomorphism support in jgrapht. Do you know any solution providing this kind of support (something like igraph but for java)..?

I'm using java 1.5 and the proposed solution must be free...

From stackoverflow

I've been pondering this problem myself lately (looking for common markup structures to factor out of JSPs into tags, in my case).

A library for this would be great. I haven't found one yet. In the meantime, here are a couple of problems that may be related to yours (isomorphically?).
- I was planning to research the technique mathematical software uses to analytically evaluate integrals in calculus problems. In this case, there are a bunch of known structural patterns, and the problem in question has to be matched to one of the known patterns. The best way to do this is not always obvious because it depends on what terms are grouped together, etc.
- Algorithms used in biology to find corresponding structures in two complex molecules might also be adapted to this problem.
LB : You're right about the biology thing... Did you know this software http://www.cytoscape.org/ ? there's a plugin to find isomorphisms for small molecules: drugviz

erickson : No, I hadn't seen that. Thanks for the pointer! GINY is a bit of an unfortunate name, though...
I sure don't know of a particular graph library with subgraph isomorphism code — since it's known NP-complete, you can't do a lot other than search anyway. It shows up a lot in graph rewriting schemes, so AGG might help.
Looks like there was a mention of isomorphism in the "experimental" package of JGraphT a few months back, but apparently no documentation.

Isomorphism comparison is a fundamental requirement in cheminformatics software (technically it's monomorphism that's used). Atoms are "nodes" and bonds are "edges". Molecular graphs are undirected and can be cyclic. A few open source cheminformatics libraries written in Java are available. You might be able to find some clues for solving your problem by looking at these libraries.

For example, I've written a BSD-licensed cheminformatics library called MX that implements a monomorphism algorithm based on VF. I wrote a high-level overview of how the algorithm was implemented, and you can browse the source for the mapping package in my GitHub repository. Most of the work is done in the DefaultState class.

MX also includes a fast exhaustive ring detector and other graph manipulations that might be applicable to your problem.
Not sure one of them can do isomorphism but I've collected a couple of links to graph layout engines in my blog: http://blog.pdark.de/2009/02/11/graph-layout-in-java/

You might want to look at graphviz, too. It's not Java but has a very powerful layout engine.

As for isomorphism: You probably only need to check for patterns at level 0 (i.e. the direct includes) because anything below that must be isomorphic by definition (all files included by some include file will always be the same unless someone used a lot of #if magic in the includes section).

Avoiding cursor change over dynamic text fields in Flash CS3

I have a dynamic text field inside a MovieClip symbol. Whenever the mouse pointer is hovered over the symbol, the cursor changes to the I-shaped text editing cursor. This may be a very stupid question, but is there any way to avoid this? Not even using mouse.hide() keeps the "I" cursor from appearing.

From stackoverflow

Have you tried setting the TextField's selectable property to false? This will prevent the user from dragging the mouse to select the text (thus they can't copy it to the clipboard), but I believe that should also prevent the cursor from changing.

pypmannetjies : Thanks so much! I knew it had to be something like that, I just couldn't find it. Now I can go watch Mythbusters :)

Herms : I'm glad I could help someone at such a worth cause as watching Mythbusters!
mc.selectable = false;
I've tried this, it works when we roll over the textField but when we click the "I" reappears! :'(

Any suggestion?

pypmannetjies : I tried it, but it does not change when I click on it. Are you sure you are using a dynamic text field and not an input text field? Which version of Flash and Actionscript are you using?
For CS4: In the properties box is an icon(little button) that is available when you have the text field selected, which you can toggle on and off to determine if the text is selectable. This button is located below anti-alias in the properties box.

Netbeans and Rails: Is there a way to make the Run option in netbeans also start thinking_sphinx?

I don't want to keep sphinx running all the time on my dev machine. I'd like a way to automatically execute rake thinking_sphinx:start when I run my app via netbeans. Is there a way to do this

[also - I am using Windows]

Thinking sphinx needs to run as a separate process.

From stackoverflow

You will need a third-party program called PsExec to do this. Otherwise you won't be able to start a background process using Ruby on Windows.

First of all download PsExec here. Unpack (at least) psexec.exe and run it once manually - you have to agree to the license :-/.

After that add the following line to script/server of your Rails app:
```
system 'PATH_TO_PSEXEC/psexec -d rake.bat'
```
Now you create the rake.bat with the commands to run in parallel to your server. Put the file with the following contents into your Rails app's root directory.
```
rake thinking_sphinx:start
```
This line should do the trick, but it may fail e.g. when NetBeans' JRuby version differs from Ruby installed on your host. Or if Ruby isn't installed at all. In that case you should call rake with the complete path of JRuby:
```
"PATH_TO_NETBEANS/ruby2/jruby-1.2.0/bin/jruby" "PATH_TO_NETBEANS/ruby2/jruby-1.2.0/bin/rake" thinking_sphinx:start
```
When you start the server now, an additional Windows command-line pops up with the running rake task.

Needless to say that you shouldn't add the code to script/server on your production server.

How do I migrate Struts application into Spring?

From stackoverflow

Check struts2spring-mvc-plugin. http://code.google.com/p/struts2springmvc/

Thomman : Thanks, Could you please explain in high level?

Do GRANTs on [master] propogate to other DBs?

OK, I'm trying to make an "empty" version of a database on another instance of SQL Server 2005. To do this, I use the Scripting wizard to make a big script that CREATEs all the tables, views, SPs, etc., adds some users, and grants permissions for them. Now, in the permission section of the script, I have (among other things)

use [master]
GO
GRANT CREATE VIEW to [myUser]
GO
...
use [prodDb]
GO
GRANT REFERENCES on [tblMyTable] to [myUser]
GO
...

Remember, this is the script generated by the wizard. What I'm trying to figure out is, should the "use master ... GRANT CREATE VIEW" allow myUser to create views in prodDb? That's my understanding -- I'm saying "myUser can create views in any database on this instance" -- but I think I'm getting permission errors when I try to allow myUser to create a view in prodDb. When I look at permissions for prodDb in SQL Server Manager, the permissions are correct, but the target system where I'm executing the generated script doesn't have SQL Server Manager, and I'm not sure how to check per-user permissions on a given object from the command line (and obviously, the errors when I try to exercise those permissions are a big hint).

Is this a bug in the scripting wizard? Should I take away the permissions at Master level and grant them only at the per-DB level? If it makes any difference, this is for a local app on a standalone system that will never be networked.

ETA: OK, so why doesn't the scripting wizard script the correct GRANT statements for prodDb? I mean, if I look at the current permissions for myUser on prodDb, they can CREATE VIEW. But if I run the script on a fresh instance of SQL Server from the command line, myUser cannot CREATE VIEW on prodDb. I can fix this by hand, but it's pretty damn irritating to have to remember that every time I re-run the script wizard.

From stackoverflow

My understanding is that changes to Master do not propagate to other databases, it is just a holding ground for system-wide objects such as users, the system stored procedures and the list of other databases that are present.

Changes to Model will propagate to new databases created after the change, but not already existing databases.

Tom H. : logins are server-wide, users are database-specific
"idforstackoverflow" is exactly right:
- Master is the "system" database so to speak
- "Model" on the other hand is the template for new databases. Changes to Model will propagate to new databases when they're created - existing ones are unaffected by those changes
Marc

gtkmm: update gui from other thread?

I am using gtkmm (and glibmm), and I would like to update the GUI from another thread. I have followed the example on

http://www.gtkmm.org/docs/glibmm-2.4/docs/reference/html/thread_2dispatcher_8cc-example.html#_a1

but I have 2 questions:

Instead of a Glib::MainLoop receiving the events, is it possible for a Gtk::Main (that runs my window) to accept signals?
Can I .connect() functions with arguments in some way? I know it is possible using SigCX but I thought the syntax there was a bit nasty.

THanks!

From stackoverflow

You could have your Glib::MainLoop receive the events, then dispatch it to Gtk::Main.

I believe the only way to connect with custom arguments is to use sigc::bind.

For example, say you wanted to pass in an integer, as well as a pointer to the widget in question:
```
pWidget->signal_foo().connect( 
            sigc::bind<int, Gtk::SomeWidget*>( 
                sigc::mem_fun( *this, &Bar::OnFooBar ), i, pWidget ) );
```

How to name a collection of an interface?

The situation is as follows.

public interface IFoo { }

public abstract class FooBase : IFoo { }

Now I need a collection of IFoo with some additional methods.

public class IFooCollection : List<IFoo>
{
   public void UsefullMethod() { }
}

The problem is that IFooCollection looks like an interface while it is a class. The options are the following.

Keep it IFooCollection - I don't like this because it looks like an interface.
Name it FooCollection - I don't like this because it is not a collection of foos.
Turn it into FooBaseCollection because all implementations of IFoo derive from FooBase - I don't like this because this might not be true forever.
Don't create the class at all but provide extension methods for IList<IFoo> because there are only a hand full methods - I don't like this because changing the code because you cannot find a name for a class ... yes, that is nasty.
Something I did not think about or forgot to write it down - I hope I will like it!

So what would you do? Is there a naming convention I missed? We are basicaly using this Microsoft .NET Library Standards.

UPDATE

The code will not become widespread - it is just inside a GUI tool to put some data into a server. So I don't care about using the methods with other collections or overlooking the methods.

From stackoverflow

I like FooCollection you have a collection of the conceptual object "Foo" even if there is not an actual Foo class or interface. This is in keeping with IFoo is an interface of a "Foo" even if there is no Foo class. SpecialFoo would be a special kind of "Foo" even though there is no Foo class.

I definitely agree that IFooCollection is wrong because of the implied interface.
I actually like #4 more than building your own collection type, because ultimately users are going to want to stuff their IFoo-implementing objects into their own lists and other collections. This way, those collections will work as expected.
You have identified most of the choices really.

The only additional one I could think of is CollectionOfIFoo, but that is not in line with the conventions.

Id probably go with IFooCollection.
Personally, I like the idea of using extension methods. If you are worried about people being able to find the extension methods easily, just put them in a static class in the same code file as the IFoo interface. Or create an "IFooExtensions" class in a separate file in the same namespace to make it easy to spot when people are looking at "IFoo"
FooCollection - it's not obvious with 'Foo' as Foo has no meaning, so it's difficult to conceptualise. Try it with a 'real' class/interface name and it makes more sense - e.g.
```
public class ErrorHandlerCollection : List<IErrorHandler>
{
  public void PublishErrors(){//...}
}
```
This makes sense because an ErrorHandlerCollection is a collection of error handlers. Anything that implements IErrorHandler IS an error handler, so anything in the ErrorHandlerCollection will be an error handler.
IMO #2 is the correct naming, though I would also suggest FooList since you're deriving from a List.
How about CollectionOfIFoo?

Personally, I don't like the convention of prefixing interfaces with 'I' - it's basically a form of the bad kind of hungarian notation, and this is an example of why it's bad.

Adding a column efficently in SQL Server

I want to add an integer column to a table with a large number of rows and many indexes (Its a data warehouse Fact Table).

To keep the row width as narrow as possible all the columns in this table are defined as not null. So I want the new column to be not null with a default of zero.

From experience adding this column will take some time, presumably because the database will need to rewrite all the rows with a new column with a filled value. And this presumably will involve updating the clustered index and all the non-clustered indexes.

So should I drop all the indexes before adding the column and then recreate them all. Or is there an easier way to do this?

Also I don't really understand why adding a column that is nullable is so much quicker. Why does this not involve re-writng the records with an extra Is Null bit flipped for each row.

From stackoverflow

SQL Server is a row oriented database. This is in contrast to a column oriented database. This means that in SQL Server, all of the data for a given row is stored together on the disk. Let's have an example:

Say you have a Customer table with 3 columns, FirstName, MiddleInitial, and LastName. Then, say you have 3 records in this table for Jabba T. Hutt, Dennis T. Menace, and George W. Bush.

In a row oriented database (like SQL Server), the records will be stored on disk as such:

Jabba, T, Hutt; Dennis, T, Menace; George, W, Bush;

In contrast, a column oriented database would store the records on disk like this:

Jabba, Dennis, George; T, T, W; Hutt Menace, Bush;

Where columns are grouped together instead of rows.

Now, when you go to add a column to a table in a row oriented database (SQL Server, for example), the new data for each column has to be inserted alongside the existing rows, shifting the rows requiring a lot of read/write operations. So, if you were to insert a new column for the customer prefix that defaults to 'Mr', this is what you'd get:

Mr, Jabba, T, Hutt; Mr, Dennis, T, Menace; Mr, George, W, Bush;

As you can see, all of the original data has been shifted to the right. On the other hand, when you insert a new column that defaults to NULL, no new data has to be put into the existing rows. Thus, there is less shifting, requiring fewer disk read/write operations.

Of course, this an oversimplification of what's actually going on on disk. There are other things to take into account when dealing with indexes, pages, etc. But, it should help you get the picture.

For clarification I'm not at all suggesting you move to a column oriented database, I just put that info in there to help explain what Row oriented meant.
It will require updating the clustered index, yes - this IS the table data, after all.

But I don't see why any of the non-clustered indices would have to updated - your new column won't be member of any of the non-clustered indices.

Also, I don't see how dropping and recreating the indices would benefit you in this scenario. If you were bulk-loading several million existing rows from another table or database - yes, then it might be faster (due to the INSERTs being much faster) - but adding a column doesn't really suffer from any indices or constraints being around, I don't think.

Marc

DJ : "But I don't see why any of the non-clustered indices would have to updated" - adding a not null column requires the dropping and recreation of the table so as a by-product all the indexes have to be dropped and recreated as well.
"Also I don't really understand why adding a column that is nullable is so much quicker. Why does this not involve re-writng the records with an extra Is Null bit flipped for each row."

Adding a nullable column merely changes the definition of the table. The individual records are not affected.

The difference between assigning event handlers with bind() and each() in jQuery?

Hi, can someone tell me what the difference between assigning event handlers using bind():

$(function(){
        $('someElement')
        .bind('mouseover',function(e) {
      $(this).css({
         //change color
         });
    })
    .bind('mouseout',function(e) {
     $(this).css({
        //return to previous state

        }); 
    })
    .bind('click',function(e) {
     $(this).css({
        //do smth.
        }); 
    })

});

and using each() for the same task:

$('someElement').each(function(){

     $(this).mouseover(function(){$(this).css({/*change color*/})
        .mouseout(function(){$(this).css({/*return to previous state*/}); 
        });  
       }); 
    });

thank you.

From stackoverflow

From the examples you gave, I think you're actually asking what the difference, if any, is there between using the 'bind' method and then 'event' methods.

For example, what's the difference between:
```
$('.some_element').bind('click',function() { /* do stuff */ });
```
... and this?
```
$('.some_element').click(function() { /* do stuff */ });
```
The answer is that it really doesn't matter. It's a matter of preference. The event methods are syntactically simpler and involve less typing, but, as far as I know there really isn't any difference. I prefer to use the bind methods because you can use shorthand event binding if you need to attach more than one event to the same action. It also makes it simpler to understand when/if you need to 'unbind' an event.

See here: http://stackoverflow.com/questions/562548/jquery-difference-between-bind-and-other-events

But, from what the actual question asks, "What's the difference between the 'each' method and the 'bind' method"... well, that's a totally different beast.

You should never really use the 'each' method to attach events because the 'bind' and 'event' methods use the much quicker CSS selector engine (in jQuery's case, it uses the Sizzle engine).

There's hardly ever (or never) a case where this:
```
$('.some_element').each(function() { $(this).click(function() { /* do something */ }); });
```
... is better than this:
```
$('.some_element').bind('click',function() { /* do stuff */ });
```
chosta : that is what i needed to know, thank you.

KyleFarris : Good to hear, you are welcome!

Strategies for showing a nice "Currently Offline" page when the server is down

How can I make that a site automagically show a nice "Currently Offline" page when the server is down (I mean, the full server is down and the request can't reach IIS)

Changing the DNS manually is not an option.

Edit: I'm looking to some kind of DNS trick to redirect to other server in case the main server is down. I can make permanent changes to the DNS, but not manually as the server goes down.

From stackoverflow

I believe that if the server is down, there is nothing you can do. The request will send up a ~~404~~ network error because when the web address is resolved to an IP, the IP that is being requested does not exist (because the server is down). If you can't change the DNS entry, then the client browser will continue to hit xxx.xxx.xxx.xxx and will never get a response.

If the server is up, but the website is down, you have options.

EDIT

Your edit mentions that you can make a permanent change the IP. But you would still need a two server setup in order to achieve what you are talking about. You can direct the DNS to a load balancer which would be able to direct the request to a server that is currently active. However, this still requires 100% uptime for the server that the DNS points to.

No matter what, if the server that the DNS is pointing to (which you must control, in order to redirect the traffic) is down, then all requests will receive a ~~404~~ network error.

EDIT Thanks to brian for pointing out my 404 error error.

Eduardo Molteni : I can change the DNS, but the change will be not fast enough if I do it manually when the server is down.

: Any solution requires either changing the DNS, or having a server with 100% uptime.

Brian Campbell : You will not get a 404. A 404 is an error is generated by the web server. If the server is down, you will get a network error; the exact form of the error depends on your browser.

Eduardo Molteni : I know I will need two servers. I'm looking for a answer how can I configure them to achieve the result.
I'm thinking if the site is load balanced the load balancer itself would detect that the web servers it's trying to redirect clients to are down, therefore it would send the user to a backup server with a message dictating technical problems.

Other than that.....

Eduardo Molteni : But the problem with a load balanced is I can't balance with the "offline" server in normal conditions, because half of traffic will get the server offline message.

Peter D : I'm not 100% sure about this but there definately has to be a way to have a condition. Here is some code that might make it clear: If ( server1.isDown() && server2.isDown() ) { redirectTo(server3); }
Some server needs to dish out the "currently offline page", so if your server is completely down, there will have to be some other server serving the file(s), so either you can set up a cluster of servers (even if just 2) and while the first one is down, the 2nd is configured only to return the "currently offline page". Once the 1st server is back up, you can take down the 2nd safetly (as server 1 will take all the load).

Eduardo Molteni : Yes, but, how do you configure your servers or DNS to do it?

JonoW : If you're on Windows/IIS you can look at Network Load Balancing (NLB), where you point your DNS at a single IP which balances requests internally onto 2 (or more) internal IPs. So either you can disable one of the machines manually when u take it down, or perhaps investigate if this can be automated
I have used the uptime services at DNSMadeEasy to great success. In effect, they set the DNS TTL to a very low number (5 minutes). They take care of pinging your server.

In the event of outage, DNS queries get directed to the secondary IP. An excellent option for a "warm spare" in small shops with limited DNS requirements. I've used them for 3 years with not a single minute of downtime.

EDIT:

This allows for geographically redundant failover, which the NLB solution proposed does not address. If the network connection is down, both servers in a standard NLB configuration will be unreachable.

epochwolf : Not all DNS Servers follow the rules. Your IP could be cached for 24 hours in some ISP's DNS Server.

Peter J : I tested that exact scenario after deployment, and found that ~98% of traffic was redirected after the TTL expired. The shorter the TTL, the more likely DNS servers will ignore it. 5-10 minutes was the "polite" sweet spot.
The only thing I can think is to control the calling page. Obviously that won't work in all circumstances... but if you know that most of your hits to this server will come from a particular source, then you could add a java script test to the source, and redirect to a "server down" page that is generated on a different server.

But if you are trying to handle all hits, from all sources (some of which you can't control), then I think you are out of luck. As other folks are saying - when a server is down, the browser gets a 404 error when it attempts a connection.

... perhaps there would be a way at a point in between to detect 404 errors being returned by servers and replacing them with a "server is down" web page. You'd need something like an HTML firewall or some other intermediate network gear between the server and the web client.
Seriously, DNS is not the right answer to server load-balancing or fail-over. Too many systems (including stub clients and ISP recursive resolve) will cache records for much longer than the specified TTL.

If both servers are on the same network, use routing protocols to achieve fail-over by having both servers present the same IP address to the network, but where the fail-over server only takes over if it detects that the (supposedly) live server is offline.

If the servers are Unix, this is easily done by running Quagga on each server, and then using OSPF as the local routing protocol. I've personally used this for warm standby servers where the redundant system was actually in another data center, albeit one that was connected via a direct link to the main data center.
You probably need a second server with 100% uptime and then add some kind of failover load balancer. to it, and if the main server is online redirect to that and if it isn't redirect to itself showing a page saying server is down

Which initializer(s) to override for UITableViewController subclass

I have a UITableViewController subclass that's instantiated, depending on where it's used, in a NIB or via code. In both cases I want to do customization in the initializer method. Does that mean I need to implement both initWithNibName:bundle: and initWithCoder:, and would each method call its respective super initializer?

While I don't need this right now, what if I also want to be able to instantiate the view controller with initWithStyle:? Would I then need 3 different init methods that replicate the same behavior?

It seems like this violates the whole designated initializer convention, as there would essentially be 3 separate initializers that don't end up calling a common init method. Or is there a way to create a common designated initializer while supporting the 3 different instantiate routes?

From stackoverflow

Implement:
```
- (void) viewDidLoad
```
and do your component initialization there.

It has the advantage of only doing the initialization when the view is actually requested.

Or just make a separate setup method invoked by all initializers.

Daniel Dickison : I can't use viewDidLoad because, specifically, I need to set up self.navigationItem which may be used before the view is loaded. I could make a separate setup method. So is it just that NSCoding is fundamentally an exception to the "single designated initializer" rule?
My confusion was based on the mistaken belief that each class should have a single designated initializer. This is not true, and in the case of UITableViewController there are 3 designated initializers (as far as I can tell):
1. initWithStyle: declared locally
2. initWithNibName:bundle: inherited from UIViewController
3. initWithCoder: from adopting NSCoding protocol
You need to override 1 or more of these in your subclass depending on how your subclass gets instantiated. In my case I had to implement #2 and #3 since the class can be loaded from a NIB, or instantiated via code with reference to a NIB. (I imagine it's rare that you'll use both initWithStyle: and initWithNibName:bundle: for a single class.)

I found Apple's Coding Guidelines for Cocoa helpful.
To clarify, initWithStyle:, being UITableViewController's only published initializer in the docs, is its one explicit designated initializer.

initWithNibName:bundle: is inherited from UIViewController and is the designated initializer for that class. As such, in accordance with Cocoa guidelines, UITableViewController must override this method (by implementing it). However, this does not make it a designated initializer of UITableViewController.

initWithCoder: is, as you point out, an implicit designated initializer from NSCoding.
Internally,
- UITableViewController's -initWithStyle: calls the super's -init then set the _tableViewStyle ivar.
- UIViewController's -init simply calls -initWithNibName:bundle: with default arguments.
- UITableViewController does not override -initWithNibName:bundle:.
Therefore, if you override -initWithNibName:bundle: then -initWithStyle: will adopt the change too. Of course, to play safe (as you shouldn't rely on implementation details), override both of them.

(And no need to override -initWithCoder: unless you will un/archive the instances.)

Linq Select Compound From

Hi, I'm experimenting with linq compount selects. I find the following query not to return any element:

            Dim q = From s In d.GetChildRows("DossierSinistri") _
                    From i In s.GetChildRows("DossierIncarichi") _
                    Select s

while the following does:

            Dim q = From s In d.GetChildRows("DossierSinistri") _
                    Select s

Any help?

From stackoverflow

Doesn't each query have to have its own select statement? From what I see, you are not selecting anything in the second from statement. Unless you are trying to use a join, in which case you would need to use a join statement.

I could be misinterpreting your intention altogether. If you are trying to select a subset of s that is qualified by your i statement, you will need to qualify which i statements are used as a filter with a where statement.
I'm not selecting anything from the second query, i'll add it later on. But shouldn't it work this way anyhow?
To get the obvious out of the way, have you tried "Select i" instead of "Select s"?

pistacchio : it doesn't return a thing either

Best Hash function for detecting data changes?

We have a pricing dataset that changes the contained values or the number of records. The number of added or removed records is small compared to the changes in values. The dataset usually has between 50 and 500 items with 8 properties.

We currently use AJAX to return a JSON structure that represents the dataset and update a webpage using this structure with the new values and where necessary removing or adding items.

We make the request with two hash values, one for the values and another for the records. These are MD5 hashes returned with the JSON structure to be sent with a following request. If there is a change to the hashes we know we need a new JSON structure otherwise the hashes are just returned to save bandwidth and eliminate unnecessary client-side processing.

As MD5 is normally used with encryption is the best choice of hashing algorithm for just detecting data changes?

What alternative ways can we detect a change to the values and update as well as detecting added or removed items and manipulating the page DOM accordingly?

From stackoverflow

MD5 is a reasonable algorithm to detect changes to a set of data. However, if you're not concerned with the cryptographic properties, and are very concerned with the performance of the algorithm, you could go with a simpler checksum-style algorithm that isn't designed to be cryptographically secure. (though weaknesses in MD5 have been discovered in recent years, it's still designed to be cryptographically secure, and hence does more work than may be required for your scenario).

However, if you're happy with the computational performance of MD5, I'd just stick with it.
I think that any commonly used hash function will do what you want - provide a unique representation of an entity.

For the problem you are trying to solve, my solution would be to have a backend table that records all changes. Not the changes themselves, but an identifier of the rows that have changed. On a periodic basis callback to the server and get a list of all the objects that have changed, and use this to decide on the client which rows need updating/deleting/adding.

recursive : This is a common misunderstanding. Hash functions do not "provide a unique representation of an entity". In fact, it is guaranteed not to be the case for any hash function whose domain is larger than its range.
What you're doing sounds pretty good to me.

If server-side capacity is cheap and minimising network usage is crucial, you could have the server remember, for each client, what it's last dataset was, and send only the differences (as a list of insertions, deletions and edits) on each request. If you sort your data rows first, these differences can be calculated fairly efficiently using a differencing algorithm such as that used by diff.

This approach is sensitive to network outages -- if one response is not received by the client, errors will accumulate. However this can be remedied by having the client sent the MD5 hash with each request: if it is different than what the server expects, an entire list will be sent instead of a list of changes.
MD5 is just fine. Should it have too low performance, you can try fast checksum algorithm, such as for example Adler-32.
I agree with Jonathan's answer regarding MD5. As for alternative ways to detect changes, if you are willing to store (or already store) on the server the time/date of the most recent change, you could pass that back and forth to the client. You avoid the computation entirely and you might even be able to use most of your existing code.

--
bmb

string inserting error c++

WordChosenDuplicate.insert(0,WordChosen.length," _ ");
    cout <<WordChosenDuplicate<< endl;

I get the following error when I try to run this code

error C3867: 'std::basic_string<_Elem,_Traits,_Ax>::length': function call missing argument list; use '&std::basic_string<_Elem,_Traits,_Ax>::length' to create a pointer to member c:\documents and settings\main\my documents\uni\2nd year\tp2\hangman\hangman\hangman.cpp 119

From stackoverflow

It's a function: std::string::length()

Please read the Manuals available on the net and the answers posted to your previous questions.
You need to use WordChosen.length (). Add the parenthesis, this is a method call.
```
for (int f = 0; f <= WordChosen.length()-1;f++)
    {

      WordChosenDuplicate.insert(0,(WordChosen.length(),"_ "));

    }

    cout <<WordChosenDuplicate<< endl;
```
Thanks guys got the answer.

dirkgently : Do you know why, if at all, this works? (I take that works == compiles for the moment.)

: yip it compiles.Is for a hangman game so that was just making the blanks spaces for the word.

dirkgently : :sigh: I wish people take the ',' as an operator and not neglect its ramifications.

Adjacency tree from single table

I've read a lot of people discussing nested lists, but I was wondering how to iterate through an adjacancy list/tree in PHP.

I have a table with: id, title, parent_id

And I've selected all records out into an array called $pages.

Then using this php:

function makeList($pages, $used) {
    if (count($pages)) {
        echo "<ul>";
        foreach ($pages as $page) {
            echo "<li>".$page['pag_title'];
            $par_id = $page['pag_id'];
            $subsql("SELECT * FROM pages WHERE pag_parent = ".$par_id."");

            // running the new sql through an abstraction layer
            $childpages = $dbch->fetchAll();
            makeList($childpages, $used, $lastused);
            echo "</li>";
        }
        echo "</ul>";
    }
}

This sort of works but I end up with any sub menu being repeated e.g.

Home
- News
  - Sub-news
- Articles
  - Article
News
- Sub-news
Articles
- Article
Sub-news
Article

I've tried adding the current id into an array that gets passed through the function, and then using in_array to check if it's there, but I have had no joy doing that.

Any help would be much appreciated.

I need to parse the whole tree so choosing parent as 0 isn't an option

From stackoverflow

The simplest fix would just be, when you are doing the initial select to set $pages (which you don't show), add a WHERE clause like:
```
WHERE pag_parent = 0
```
(or IS NULL, depending how you're storing "top level" pages).

That way you won't select all the children initially.
Where is $page coming from? You might have an sql injection vulnerability in your code if you're not escaping it or using a prepared statement.

Also the SELECT statement inside a for loop jumps out as a bad practice. If the table is not that big, then select the contents of the entire table and then iterate through the result set in PHP to build the tree data structure. This could take up to n*(n-1)/2 iterations in the pathological case of your tree being a linked list. Stop when all nodes have been added to the tree, or the number of remaining nodes remains the same from one iteration to the next - this means the remaining nodes are not children of your root node.

Alternatively, if your database supports recursive SQL queries, you can use that, and it will only select the nodes that are children of your parent node. You will still have to build the tree object yourself in PHP. The form of the query would be something like:
```
WITH temptable(id, title, parent_id) AS (
  SELECT id, title, parent_id FROM pages WHERE id = ?
  UNION ALL
  SELECT a.id, a.title, a.parent_id FROM pages a, temptable t
   WHERE t.parent_id = a.id
) SELECT * FROM temptable
```
Substitute the '?' on the second line with the starting page ID.

Del : $page comes from the $pages array, which itself comes from the sql (the sql selection is done in a seperate class, and everything is escaped to avoid sql injection) It's the php that I'm interested in, not the SQL, I'm fine with that thanks

Since it already does the SQL, you dont have to do it outside before the first function call.

function makeList($par_id = 0) {
    //your sql code here
    $subsql("SELECT * FROM pages WHERE pag_parent = $par_id");
    $pages = $dbch->fetchAll();

    if (count($pages)) {
        echo '<ul>';
        foreach ($pages as $page) {
            echo '<li>', $page['pag_title'];
      makeList($page['pag_id']);
      echo '</li>';
        }
        echo '</ul>';
    }
}

For storing it more tree like you might want to look at this site: Storing Hierarchical Data in a Database.

Del : perfect thank you

If you create an array of pages grouped by parent id it is quite easy to recursively build the list. This will only require one database query.

<?php

 //example data
 $items = array(
    array('id'=>1, 'title'=>'Home', 'parent_id'=>0),
    array('id'=>2, 'title'=>'News', 'parent_id'=>1),
    array('id'=>3, 'title'=>'Sub News', 'parent_id'=>2),
    array('id'=>4, 'title'=>'Articles', 'parent_id'=>0),
    array('id'=>5, 'title'=>'Article', 'parent_id'=>4),
    array('id'=>6, 'title'=>'Article2', 'parent_id'=>4)
 );

 //create new list grouped by parent id
 $itemsByParent = array();
 foreach ($items as $item) {
    if (!isset($itemsByParent[$item['parent_id']])) {
        $itemsByParent[$item['parent_id']] = array();
    }

    $itemsByParent[$item['parent_id']][] = $item;
 }

 //print list recursively 
 function printList($items, $parentId = 0) {
    echo '<ul>';
    foreach ($items[$parentId] as $item) {
        echo '<li>';
        echo $item['title'];
        $curId = $item['id'];
        //if there are children
        if (!empty($items[$curId])) {
            makeList($items, $curId);
        }           
        echo '</li>';
    }
    echo '</ul>';
 }

printList($itemsByParent);

When that table gets large, recursion can get unwieldy. I wrote an blog post about a recursion-less method: http://www.alandelevie.com/2008/07/12/recursion-less-storage-of-hierarchical-data-in-a-relational-database/

Java Network Programming. Question about sockets

I have a server and has 2 clients connecting to it through TCP. These clients continuously send information to the server. This is a proxy server and relays the messages it receives from the client. But it has to do relay the messages alternately. i.e. message from client A, then message from client B and again A then B and so on. This I can achieve by checking where the message is coming from and then relay messages alternately and ignoring consecutive messages from the same client.

However I also do not want the server to get bogged down if any of the clients disconnects or is not sending messages. If this happens the proxy will continue to wait forever for the some message from the client which is now disconnected (or for some reason not sending message). In this case I would want the server to relay the message from from sole connected client.

Instead I am thinking if something like this is possible. If I get 2 consecutive messages from the same client I would like to check if the other client is read to send me a message. My question is whether it is possible to check from the other client's socket if there is a message buffered and ready to be sent. In this this case can ignore the consecutive message from the same the client and instead first send the message from the other client. (that I have checked.)

Is this possible? Hope I have asked the question clearly.

Thanks

From stackoverflow

You could set the read timeout on the sockets to be something short (like a second maybe or whatever time you want to wait for each client) with the setSoTimeout() method; that way, when you get the second message from client A, you can just read() from client B's socket and if you get a SocketTimeoutException then you can process A's message.

But you will have to modify your current code to catch any SocketTimeoutExceptions you may get whenever you're reading from the sockets.
For reasons I'll explain below, I would just treat the interesting event as being whether or not the message-- or part of a message-- has arrived to your server. So:
- have a thread reading data from each client, as you've probably got
- have another thread that does the message collation
- have the client threads signal to the message collation thread when: the very first part of a new message arrives; when a message is complete from a client (at that point, it passes the actual message to the collation thread)
- then, if your collation thread hasn't had the initial signal that the "first part of a message has arrived", just treat that as "the client isn't ready to send a message".
You probably want to set a timeout on the sockets as another poster has suggested, although with this threading model, I don't think it's crucial-- it really depends on what policy you want to impose.

Your other suggestion of asking the client is sort of possible, but it's probably not the solution you really want. You open another channel to the client to ask "have you just sent some data". But if the client has, and you have't got it, then in general Something Has Just Gone Wrong and it's not clear whether that knowledge will help you much. Or it might say "no" and then just at that moment send data. And will the other channel to the client work? If the client says it's sent something, how do you know whether it's just about to arrive or has "got lost" and will never arrive...?
The problem is this: To test if a client is dropped you must do a read on the socket, however, Java's Socket class does not allow for asynchronous reading on a socket. Therefore if you do a read without a timeout set, and there is nothing being sent, the application will hold the process hostage waiting for something to be read.

So, after you instantiated your socket you need to use: Socket.setSoTimeout(int) to give it a period of time to "wait" before it times out. What is going to happen is that you will time out everytime you attempt to read, and if the read comes back -1 you know the client has disconnected.
```
    Socket clientSocket=theServerSocket.accept();
    clientSocket.setSoTimeout(1);
    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));

while(serverIsRunning){
       try{
           if(bufferedReader.read()==-1){
                  logger.info("The server has disconnected!");
                  //Do whatever clean up you need, ect
           }
       }
       catch (SocketTimeoutException e){
           //the server is still running, yay?!
       }
}
```
The way I am thinking of doing this is: If I get 2 consecutive messages from client A, I will get the inputStream of socket connected to client B. Then call the available() method on the inputStream and see if there are any bytes available to read. If there are bytes to read then read from the socket of client B. Else relay the message from from client A (the second consecutive message).

Any comments on this.

windfinder : So, both client A and client B should be sending the same message at the same time? You are just trying to make one a backup server?
If client A sends a second message before client B sends a message, do you print out the second message from client A? In that case, it's not really alternating at all, you're just printing whichever messages arrive in the order they arrive. Or, do you want to wait a bit to see if a message arrives from client B?

I think either way, you'll want a separate thread for each client, running in addition to the "main" thread. Each client thread constantly tries to read a message from its respective client, which blocks until it receives a message. When a socket thread receives a message, it puts it onto a SynchronousQueue, of which there is one instance for all three threads. This put operation will block until the main thread calls take on the SynchronousQueue.

The main thread can simply loop and call take on the SynchronousQueue. This will block until one of the client threads calls put.

If you want to do alternating messages, you could have two SynchronousQueues instead, one for each client. You'd want to introduce some delay in your main loop, to give both clients a chance to write their messages before forwarding it on.
I am reading the problem as thus:

You have:

1 server

2 clients

Your server gets messages from client 1 and 2 and forwards them.

The clients are different producers, meaning the messages they are sending could potentially be different. What you want is the messages from the clients to be sent alteratively out from your server, but not to "wait" if a client has dropped.

In this scenario, I suggest that you have two queues (client1queue and client2queue) in your server.

You will have to read from the sockets in two seperate threads, and when a message comes in add it to its corresponding queue. client1Socket -> client1queue client2Socket -> client2queue

Then, in a third thread, have the server forward the messages, alternating pulling these messages from client1queue and client2queue.

To solve your problem of "not waiting" if the queue is empty simply skip that queues' "turn." This insures sending all messages at the fastest possible rate while still getting all the messages across. The downside is that it only alternates if a message is ready to be sent. You could, of course, have it wait for X amount to see if a message comes for the other queue, but I don't see why you would want to if the system is supposed to work regardless of the client state.
I would look up the javadoc for the available() method. It doesn't do what you think it does.
You have 2 separate problems:

1) Read messages from several clients, without blocking.

2) Writing messages to the output in a fair way.

The first problem can be solved if you use a separate thread for each client socket. If you have many clients, you might consider using nonblocking io using SocketChannel and and a Selector.

Remember that TCP/IP streams might deliver several or only partial messages in each read operation, depending on the network and socket options, so you have to take care of that.

The second problem can be solved with keeping a queue for each client or perhaps using a priority queue and attach a priority to each message depending on the clients activty.

Best way to store a sparse matrix in .NET

We have an application that stores a sparse matrix. This matrix has entries that mostly exist around the main diagonal of the matrix. I was wondering if there were any efficient algorithms (or existing libraries) that can efficiently handle sparse matrices of this kind? Preferably, this would be a generic implementation where each matrix entry can be a user-defined type.

Edit in response to a question/response:

When I say mostly around the main diagonal I mean that the characteristics of most of the matrices will be that most entries are clustered off of the main diagonal but there could be zeroes close to the diagonal and there could be non-zero values far out from the diagonal. I want something efficient for 'most' cases here.

What will I be using this for? I need to be able to have efficient access to all values in a row or all values in a column. The values stored would be Boolean values. An example would be:

For all true values in a row, foreach column a true appears in set all the entries of the column to something
For all false values in a row, set the entry to something

This was all done with linked lists previously but was very confusing to implement. I was hoping that with a sparse matrix I could improve the algorithm but finding the 'right' type of sparse matrix algorithm has proved difficult.

p.s. Thanks for the responses thus far

From stackoverflow

I haven't used it, but Nmath Matrix handles these (not free).

Also, Extreme Optimization Numerical Libraries for .NET (not free).

Here's a free one: Math.NET Project (specifically MathNet.Numerics.LinearAlgebra.Sparse namespace)
I guess a Dictionary<int, Dictionary<int, object >> should suffice.
I think this could be done by using a class holding plain array, saving the horizontal offset applied between matrix rows and defining stripe of a row, e.g. the number of valid entries. So for a large matrix where only the diagonal and two neighbor elements are defined you'd create an array of 3 * number of rows and store 3 as the stripe width. The offset depends on the size of the matrix.

I'm not aware of anything free which already does this.

Stefan Kendall : Good idea. I might implement it as such: Assuming only positive input, we could handle negative numbers as the number of 0 entries between entries. So the following... [1,2,-30,0,1,2,-29] Expands into [1,2,0,0...] [0,1,2,0...] To offset, array[m*row+column] is (row,column) of an mxn matrix
Here's a list of general data structure schemas. Each has its advantages and disadvantages, and are suitable for slightly different kinds of problems where sparse matrices arise. You'd probably want to implement them on top of existing data structures, such as List<> and Dictionary<>.
There are two questions here:
- "Mostly around the main diagonal" is too vague. If the elements lie in bands, then use banded storage of the bands themselves, as vectors offset from the main diagonal. If the elements are scattered randomly in the vicinity of the main diagonal, then either use a banded form that may include some zeros in the bands, or use a pure sparse form that stores only the elements and their positions in the array.
- What will you do with the matrix? If your goal is merely efficient storage, then a banded form will be efficient, with fast access to any element. If you will do linear algebra with the matrix, but never more than matrix*vector multiplies, then the banded form will still work splendidly. If you work with matrix*matrix multiplies or matrix factorizations, where fill-in becomes a problem, then a pure sparse form may be more appropriate. For example, the product of two banded matrices will have additional bands, so the product of two tridiagonal matrices will be pentadiagonal. For a factorization, reorderings will sometimes be useful to minimize fill-in. (AMD is one choice, Approximate Minimum Degree permutation, but there are other schemes.)

You could use an index based on the [row,col] of the cell. Since the data is on a diagonal, the typical approach of storing the row index and the associated column indeces with data is not optimal. Here is some code you could use to do it:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long Size { get; private set; }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.Size = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;
            }
        }
    }

    static void Main()
    {
        var sm = new SparseMatrix<int>(512, 512);
        sm[42, 42] = 42;
        int val1 = sm[13, 13];
        int val2 = sm[42, 42];

        Console.WriteLine("VAL1 = " + val1); // prints out 0
        Console.WriteLine("VAL2 = " + val2); // prints out 42

        Console.ReadLine();
    }

Note that when T is a struct, you might have to call the IsCellEmpty since getting the contents of a cell will not be null and will have the default value for that type. You can also expand the code to give you a quick "SparseRatio" based on the Size property and _cells.Count.

EDIT:

Well, if you are interesting is speed, you can do the trade-off of space vs speed. Instead of having only one dictionary, have three! It triples your space, but it makes enumerating in any way you want real easy. Here is some new code that shows that:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long MaxSize { get; private set; }
        public long Count { get { return _cells.Count; } }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        private Dictionary<int, Dictionary<int, T>> _rows = 
            new Dictionary<int, Dictionary<int, T>>();

        private Dictionary<int, Dictionary<int, T>> _columns = 
            new Dictionary<int, Dictionary<int, T>>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.MaxSize = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;

                UpdateValue(col, row, _columns, value);
                UpdateValue(row, col, _rows, value);
            }
        }

        private void UpdateValue(int index1, int index2, 
            Dictionary<int, Dictionary<int, T>> parent, T value)
        {
            Dictionary<int, T> dict;
            if (!parent.TryGetValue(index1, out dict))
            {
                parent[index2] = dict = new Dictionary<int, T>();
            }
            dict[index2] = value;
        }
    }

If you want to iterate over all the entries, use _cells. If you want all the rows for a given column use _columns. If you want all the columns in a given row use _rows.

If you want to iterate in sorted order, you can start to add LINQ into the mix and/or use a sorted list with an inner class that encapsulates an entry (which would have to store the row or column and implement IComparable<T> for sorting to work).

Jeffrey Cameron : Thank you, I like where you are going with this. Using dictionaries doesn't give me efficient access to entire rows or columns does it? (maybe using Linq it does ... ?). See my edit above.

Erich Mirabal : See the update for another option. If space is not an issue, do the trade-off to get faster access by having multiple dictionaries.

Jeffrey Cameron : Excellent suggestions, thank you very much

Friday, April 29, 2011

Blog Archive