summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2010-09-15Merge remote branch 'bkl-config/config'Stephen Rothwell
2010-09-15Merge remote branch 'bkl-llseek/llseek'Stephen Rothwell
Conflicts: drivers/char/virtio_console.c
2010-09-15Merge remote branch 'rcu/rcu/next'Stephen Rothwell
2010-09-15Merge remote branch 'trivial/for-next'Stephen Rothwell
2010-09-15Merge remote branch 'wireless/master'Stephen Rothwell
2010-09-15Merge remote branch 'net/master'Stephen Rothwell
2010-09-15Merge remote branch 'v9fs/for-next'Stephen Rothwell
2010-09-15Merge remote branch 'nfsd/nfsd-next'Stephen Rothwell
2010-09-15Merge remote branch 'net-current/master'Stephen Rothwell
2010-09-14Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies statfs() gives ESTALE error NFS: Fix a typo in nfs_sockaddr_match_ipaddr6 sunrpc: increase MAX_HASHTABLE_BITS to 14 gss:spkm3 miss returning error to caller when import security context gss:krb5 miss returning error to caller when import security context Remove incorrect do_vfs_lock message SUNRPC: cleanup state-machine ordering SUNRPC: Fix a race in rpc_info_open SUNRPC: Fix race corrupting rpc upcall Fix null dereference in call_allocate
2010-09-14net: use rcu_barrier() in rollback_registered_manyEric Dumazet
netdev_wait_allrefs() waits that all references to a device vanishes. It currently uses a _very_ pessimistic 250 ms delay between each probe. Some users reported that no more than 4 devices can be dismantled per second, this is a pretty serious problem for some setups. Most of the time, a refcount is about to be released by an RCU callback, that is still in flight because rollback_registered_many() uses a synchronize_rcu() call instead of rcu_barrier(). Problem is visible if number of online cpus is one, because synchronize_rcu() is then a no op. time to remove 50 ipip tunnels on a UP machine : before patch : real 11.910s after patch : real 1.250s Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Octavian Purdila <opurdila@ixiacom.com> Reported-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-14mac80211: match only assigned bss in sta_info_get_bssJohannes Berg
sta_info_get_bss() is used to match STA pointers for VLAN/AP interfaces, but if the same station is also added to multiple other interfaces it will erroneously match because both pointers are NULL, fix this by ignoring NULL pointers here. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-14mac80211: wait for scan work complete before restarting hwStanislaw Gruszka
This is needed to avoid warning in ieee80211_restart_hw about hardware scan in progress. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Wey-Yi W Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-14mac80211: Fix dangling pointer in ieee80211_xmitSteve deRosier
hdr pointer is left dangling after call to ieee80211_skb_resize. This can cause guards around mesh path selection to fail. Signed-off-by: Steve deRosier <steve@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-14nl80211: Uninitialized variableBill Jordan
There is a path in nl80211_set_wiphy where result is tested but uninitialized. I am hitting this path when I attempt: sh# iw dev wlan0 set channel 10 command failed: Unknown error 1069727332 (-1069727332) Signed-off-by: William Jordan <bjordan@rajant.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-14net/wireless: use ARRAY_SIZE macro in radiotap.cNikitas Angelinas
Replace sizeof(rtap_namespace_sizes) / sizeof(rtap_namespace_sizes[0]) with ARRAY_SIZE(rtap_namespace_sizes) in net/wireless/radiotap.c Signed-off-by: Nikitas Angelinas <nikitasangelinas@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-14BKL: introduce CONFIG_BKL.Arnd Bergmann
With all the patches we have queued in the BKL removal tree, only a few dozen modules are left that actually rely on the BKL, and even there are lots of low-hanging fruit. We need to decide what to do about them, this patch illustrates one of the options: Every user of the BKL is marked as 'depends on BKL' in Kconfig, and the CONFIG_BKL becomes a user-visible option. If it gets disabled, no BKL using module can be built any more and the BKL code itself is compiled out. The one exception is file locking, which is practically always enabled and does a 'select BKL' instead. This effectively forces CONFIG_BKL to be enabled until we have solved the fs/lockd mess and can apply the patch that removes the BKL from fs/locks.c. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-09-13flow: better memory managementEric Dumazet
Allocate hash tables for every online cpus, not every possible ones. NUMA aware allocations. Dont use a full page on arches where PAGE_SIZE > 1024*sizeof(void *) misc: __percpu , __read_mostly, __cpuinit annotations flow_compare_t is just an "unsigned long" Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-13ipv4: enable getsockopt() for IP_NODEFRAGMichael Kerrisk
While integrating your man-pages patch for IP_NODEFRAG, I noticed that this option is settable by setsockopt(), but not gettable by getsockopt(). I suppose this is not intended. The (untested, trivial) patch below adds getsockopt() support. Signed-off-by: Michael kerrisk <mtk.manpages@gmail.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-13ipv4: force_igmp_version ignored when a IGMPv3 query receivedBob Arendt
After all these years, it turns out that the /proc/sys/net/ipv4/conf/*/force_igmp_version parameter isn't fully implemented. *Symptom*: When set force_igmp_version to a value of 2, the kernel should only perform multicast IGMPv2 operations (IETF rfc2236). An host-initiated Join message will be sent as a IGMPv2 Join message. But if a IGMPv3 query message is received, the host responds with a IGMPv3 join message. Per rfc3376 and rfc2236, a IGMPv2 host should treat a IGMPv3 query as a IGMPv2 query and respond with an IGMPv2 Join message. *Consequences*: This is an issue when a IGMPv3 capable switch is the querier and will only issue IGMPv3 queries (which double as IGMPv2 querys) and there's an intermediate switch that is only IGMPv2 capable. The intermediate switch processes the initial v2 Join, but fails to recognize the IGMPv3 Join responses to the Query, resulting in a dropped connection when the intermediate v2-only switch times it out. *Identifying issue in the kernel source*: The issue is in this section of code (in net/ipv4/igmp.c), which is called when an IGMP query is received (from mainline 2.6.36-rc3 gitweb): ... A IGMPv3 query has a length >= 12 and no sources. This routine will exit after line 880, setting the general query timer (random timeout between 0 and query response time). This calls igmp_gq_timer_expire(): ... .. which only sends a v3 response. So if a v3 query is received, the kernel always sends a v3 response. IGMP queries happen once every 60 sec (per vlan), so the traffic is low. A IGMPv3 query *is* a strict superset of a IGMPv2 query, so this patch properly short circuit's the v3 behaviour. One issue is that this does not address force_igmp_version=1. Then again, I've never seen any IGMPv1 multicast equipment in the wild. However there is a lot of v2-only equipment. If it's necessary to support the IGMPv1 case as well: 837 if (len == 8 || IGMP_V2_SEEN(in_dev) || IGMP_V1_SEEN(in_dev)) { Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-13net/llc: make opt unsigned in llc_ui_setsockopt()Dan Carpenter
The members of struct llc_sock are unsigned so if we pass a negative value for "opt" it can cause a sign bug. Also it can cause an integer overflow when we multiply "opt * HZ". CC: stable@kernel.org Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-13fs/9p, net/9p: memory leak fixesLatchesar Ionkov
Four memory leak fixes in the 9P code. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-09-12sunrpc: increase MAX_HASHTABLE_BITS to 14Miquel van Smoorenburg
The maximum size of the authcache is now set to 1024 (10 bits), but on our server we need at least 4096 (12 bits). Increase MAX_HASHTABLE_BITS to 14. This is a maximum of 16384 entries, each containing a pointer (8 bytes on x86_64). This is exactly the limit of kmalloc() (128K). Signed-off-by: Miquel van Smoorenburg <mikevs@xs4all.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-12gss:spkm3 miss returning error to caller when import security contextBian Naimeng
spkm3 miss returning error to up layer when import security context, it may be return ok though it has failed to import security context. Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-12gss:krb5 miss returning error to caller when import security contextBian Naimeng
krb5 miss returning error to up layer when import security context, it may be return ok though it has failed to import security context. Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-12SUNRPC: cleanup state-machine orderingJ. Bruce Fields
This is just a minor cleanup: net/sunrpc/clnt.c clarifies the rpc client state machine by commenting each state and by laying out the functions implementing each state in the order that each state is normally executed (in the absence of errors). The previous patch "Fix null dereference in call_allocate" changed the order of the states. Move the functions and update the comments to reflect the change. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-12SUNRPC: Fix a race in rpc_info_openTrond Myklebust
There is a race between rpc_info_open and rpc_release_client() in that nothing stops a process from opening the file after the clnt->cl_kref goes to zero. Fix this by using atomic_inc_unless_zero()... Reported-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-09-12SUNRPC: Fix race corrupting rpc upcallTrond Myklebust
If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just after rpc_pipe_release calls rpc_purge_list(), but before it calls gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter will free a message without deleting it from the rpci->pipe list. We will be left with a freed object on the rpc->pipe list. Most frequent symptoms are kernel crashes in rpc.gssd system calls on the pipe in question. Reported-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-09-12Fix null dereference in call_allocateJ. Bruce Fields
In call_allocate we need to reach the auth in order to factor au_cslack into the allocation. As of a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound cred to struct rpc_rqst", call_allocate attempts to do this by dereferencing tk_client->cl_auth, however this is not guaranteed to be defined--cl_auth can be zero in the case of gss context destruction (see rpc_free_auth). Reorder the client state machine to bind credentials before allocating, so that we can instead reach the auth through the cred. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-09-12sch_atm: Fix potential NULL deref.David S. Miller
The list_head conversion unearther an unnecessary flow check. Since flow is always NULL here we don't need to see if a matching flow exists already. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-11llseek: automatically add .llseek fopArnd Bergmann
All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
2010-09-11mac80211: disallow seeks in minstrel debug codeArnd Bergmann
No need for seek here, so let's just use nonseekable_open. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-09-11irda/irnet: use noop_llseekArnd Bergmann
There may be applications trying to seek on the irnet character device, so we should use noop_llseek to avoid returning an error when the default llseek changes to no_llseek. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: netdev@vger.kernel.org
2010-09-11net/wireless: use generic_file_llseek in debugfsArnd Bergmann
The default llseek operation is changing from default_llseek to no_llseek, so all code relying on the current behaviour needs to make that explicit. The wireless driver infrastructure and some of the drivers make use of generated debugfs files, so they cannot be converted by our script that automatically determines the right operation. All these files use debugfs and they typically rely on simple_read_from_buffer, so the best llseek operation here is generic_file_llseek. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org
2010-09-10pkt_sched: remov unnecessary bh_disablestephen hemminger
Now that est_tree_lock is acquired with BH protection, the other call is unnecessary. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-10fib: cleanupsEric Dumazet
Use rcu_dereference_rtnl() helper Change hard coded constants in fib_flag_trans() 7 -> RTN_UNREACHABLE 8 -> RTN_PROHIBIT Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/main.c
2010-09-09tipc: Optimize handling excess content on incoming messagesPaul Gortmaker
Remove code that trimmed excess trailing info from incoming messages arriving over an Ethernet interface. TIPC now ignores the extra info while the message is being processed by the node, and only trims it off if the message is retransmitted to another node. (This latter step is done to ensure the extra info doesn't cause the sk_buff to exceed the outgoing interface's MTU limit.) The outgoing buffer is guaranteed to be linear. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09tunnels: missing rcu_assign_pointer()Eric Dumazet
xfrm4_tunnel_register() & xfrm6_tunnel_register() should use rcu_assign_pointer() to make sure previous writes (to handler->next) are committed to memory before chain insertion. deregister functions dont need a particular barrier. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09net/core: add lock context change annotations in net/core/sock.cNamhyung Kim
__lock_sock() and __release_sock() releases and regrabs lock but were missing proper annotations. Add it. This removes following warning from sparse. (Currently __lock_sock() does not emit any warning about it but I think it is better to add also.) net/core/sock.c:1580:17: warning: context imbalance in '__release_sock' - unexpected unlock Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09net/core: remove address space warnings on verify_iovec()Namhyung Kim
move_addr_to_kernel() and copy_from_user() requires their argument as __user pointer but were missing proper markups. Add it. This removes following warnings from sparse. net/core/iovec.c:44:52: warning: incorrect type in argument 1 (different address spaces) net/core/iovec.c:44:52: expected void [noderef] <asn:1>*uaddr net/core/iovec.c:44:52: got void *msg_name net/core/iovec.c:55:34: warning: incorrect type in argument 2 (different address spaces) net/core/iovec.c:55:34: expected void const [noderef] <asn:1>*from net/core/iovec.c:55:34: got struct iovec *msg_iov Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09sctp: fix test for end of loopJoe Perches
Add a list_has_sctp_addr function to simplify loop Based on a patches by Dan Carpenter and David Miller Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09Merge branch 'for-davem' of git://oss.oracle.com/git/agrover/linux-2.6David S. Miller
2010-09-08Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
2010-09-08udp: add rehash on connect()Eric Dumazet
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08net: inet_add_protocol() can use cmpxchg()Eric Dumazet
Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and friends. inet_protos[] & inet6_protos[] are moved to read_mostly section Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08RDS: Implement masked atomic operationsAndy Grover
Add two CMSGs for masked versions of cswp and fadd. args struct modified to use a union for different atomic op type's arguments. Change IB to do masked atomic ops. Atomic op type in rds_message similarly unionized. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: print string constants in more placesZach Brown
This prints the constant identifier for work completion status and rdma cm event types, like we already do for IB event types. A core string array helper is added that each string type uses. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS: cancel connection work structs as we shut downZach Brown
Nothing was canceling the send and receive work that might have been queued as a conn was being destroyed. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS: don't call rds_conn_shutdown() from rds_conn_destroy()Zach Brown
rds_conn_shutdown() can return before the connection is shut down when it encounters an existing state that it doesn't understand. This lets rds_conn_destroy() then start tearing down the conn from under paths that are still using it. It's more reliable the shutdown work and wait for krdsd to complete the shutdown callback. This stopped some hangs I was seeing where krdsd was trying to shut down a freed conn. Signed-off-by: Zach Brown <zach.brown@oracle.com>