``sets`` --- Unordered collections of unique elements
*****************************************************
New in version 2.3.
Deprecated since version 2.6: The built-in ``set``/``frozenset`` types
replace this module.
The ``sets`` module provides classes for constructing and manipulating
unordered collections of unique elements. Common uses include
membership testing, removing duplicates from a sequence, and computing
standard math operations on sets such as intersection, union,
difference, and symmetric difference.
Like other collections, sets support ``x in set``, ``len(set)``, and
``for x in set``. Being an unordered collection, sets do not record
element position or order of insertion. Accordingly, sets do not
support indexing, slicing, or other sequence-like behavior.
Most set applications use the ``Set`` class which provides every set
method except for ``__hash__()``. For advanced applications requiring
a hash method, the ``ImmutableSet`` class adds a ``__hash__()`` method
but omits methods which alter the contents of the set. Both ``Set``
and ``ImmutableSet`` derive from ``BaseSet``, an abstract class useful
for determining whether something is a set: ``isinstance(obj,
BaseSet)``.
The set classes are implemented using dictionaries. Accordingly, the
requirements for set elements are the same as those for dictionary
keys; namely, that the element defines both ``__eq__()`` and
``__hash__()``. As a result, sets cannot contain mutable elements such
as lists or dictionaries. However, they can contain immutable
collections such as tuples or instances of ``ImmutableSet``. For
convenience in implementing sets of sets, inner sets are automatically
converted to immutable form, for example, ``Set([Set(['dog'])])`` is
transformed to ``Set([ImmutableSet(['dog'])])``.
class class sets.Set([iterable])
Constructs a new empty ``Set`` object. If the optional *iterable*
parameter is supplied, updates the set with elements obtained from
iteration. All of the elements in *iterable* should be immutable or
be transformable to an immutable using the protocol described in
section *Protocol for automatic conversion to immutable*.
class class sets.ImmutableSet([iterable])
Constructs a new empty ``ImmutableSet`` object. If the optional
*iterable* parameter is supplied, updates the set with elements
obtained from iteration. All of the elements in *iterable* should
be immutable or be transformable to an immutable using the protocol
described in section *Protocol for automatic conversion to
immutable*.
Because ``ImmutableSet`` objects provide a ``__hash__()`` method,
they can be used as set elements or as dictionary keys.
``ImmutableSet`` objects do not have methods for adding or removing
elements, so all of the elements must be known when the constructor
is called.
Set Objects
===========
Instances of ``Set`` and ``ImmutableSet`` both provide the following
operations:
+---------------------------------+--------------+-----------------------------------+
| Operation | Equivalent | Result |
+=================================+==============+===================================+
| ``len(s)`` | | cardinality of set *s* |
+---------------------------------+--------------+-----------------------------------+
| ``x in s`` | | test *x* for membership in *s* |
+---------------------------------+--------------+-----------------------------------+
| ``x not in s`` | | test *x* for non-membership in |
| | | *s* |
+---------------------------------+--------------+-----------------------------------+
| ``s.issubset(t)`` | ``s <= t`` | test whether every element in *s* |
| | | is in *t* |
+---------------------------------+--------------+-----------------------------------+
| ``s.issuperset(t)`` | ``s >= t`` | test whether every element in *t* |
| | | is in *s* |
+---------------------------------+--------------+-----------------------------------+
| ``s.union(t)`` | ``s | t`` | new set with elements from both |
| | | *s* and *t* |
+---------------------------------+--------------+-----------------------------------+
| ``s.intersection(t)`` | ``s & t`` | new set with elements common to |
| | | *s* and *t* |
+---------------------------------+--------------+-----------------------------------+
| ``s.difference(t)`` | ``s - t`` | new set with elements in *s* but |
| | | not in *t* |
+---------------------------------+--------------+-----------------------------------+
| ``s.symmetric_difference(t)`` | ``s ^ t`` | new set with elements in either |
| | | *s* or *t* but not both |
+---------------------------------+--------------+-----------------------------------+
| ``s.copy()`` | | new set with a shallow copy of |
| | | *s* |
+---------------------------------+--------------+-----------------------------------+
Note, the non-operator versions of ``union()``, ``intersection()``,
``difference()``, and ``symmetric_difference()`` will accept any
iterable as an argument. In contrast, their operator based
counterparts require their arguments to be sets. This precludes
error-prone constructions like ``Set('abc') & 'cbs'`` in favor of the
more readable ``Set('abc').intersection('cbs')``.
Changed in version 2.3.1: Formerly all arguments were required to be
sets.
In addition, both ``Set`` and ``ImmutableSet`` support set to set
comparisons. Two sets are equal if and only if every element of each
set is contained in the other (each is a subset of the other). A set
is less than another set if and only if the first set is a proper
subset of the second set (is a subset, but is not equal). A set is
greater than another set if and only if the first set is a proper
superset of the second set (is a superset, but is not equal).
The subset and equality comparisons do not generalize to a complete
ordering function. For example, any two disjoint sets are not equal
and are not subsets of each other, so *all* of the following return
``False``: ``a**b``. Accordingly, sets do not
implement the ``__cmp__()`` method.
Since sets only define partial ordering (subset relationships), the
output of the ``list.sort()`` method is undefined for lists of sets.
The following table lists operations available in ``ImmutableSet`` but
not found in ``Set``:
+---------------+--------------------------------+
| Operation | Result |
+===============+================================+
| ``hash(s)`` | returns a hash value for *s* |
+---------------+--------------------------------+
The following table lists operations available in ``Set`` but not
found in ``ImmutableSet``:
+----------------------------------------+---------------+-----------------------------------+
| Operation | Equivalent | Result |
+========================================+===============+===================================+
| ``s.update(t)`` | *s* |= *t* | return set *s* with elements |
| | | added from *t* |
+----------------------------------------+---------------+-----------------------------------+
| ``s.intersection_update(t)`` | *s* &= *t* | return set *s* keeping only |
| | | elements also found in *t* |
+----------------------------------------+---------------+-----------------------------------+
| ``s.difference_update(t)`` | *s* -= *t* | return set *s* after removing |
| | | elements found in *t* |
+----------------------------------------+---------------+-----------------------------------+
| ``s.symmetric_difference_update(t)`` | *s* ^= *t* | return set *s* with elements from |
| | | *s* or *t* but not both |
+----------------------------------------+---------------+-----------------------------------+
| ``s.add(x)`` | | add element *x* to set *s* |
+----------------------------------------+---------------+-----------------------------------+
| ``s.remove(x)`` | | remove *x* from set *s*; raises |
| | | ``KeyError`` if not present |
+----------------------------------------+---------------+-----------------------------------+
| ``s.discard(x)`` | | removes *x* from set *s* if |
| | | present |
+----------------------------------------+---------------+-----------------------------------+
| ``s.pop()`` | | remove and return an arbitrary |
| | | element from *s*; raises |
| | | ``KeyError`` if empty |
+----------------------------------------+---------------+-----------------------------------+
| ``s.clear()`` | | remove all elements from set *s* |
+----------------------------------------+---------------+-----------------------------------+
Note, the non-operator versions of ``update()``,
``intersection_update()``, ``difference_update()``, and
``symmetric_difference_update()`` will accept any iterable as an
argument.
Changed in version 2.3.1: Formerly all arguments were required to be
sets.
Also note, the module also includes a ``union_update()`` method which
is an alias for ``update()``. The method is included for backwards
compatibility. Programmers should prefer the ``update()`` method
because it is supported by the builtin ``set()`` and ``frozenset()``
types.
Example
=======
>>> from sets import Set
>>> engineers = Set(['John', 'Jane', 'Jack', 'Janice'])
>>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice'])
>>> managers = Set(['Jane', 'Jack', 'Susan', 'Zack'])
>>> employees = engineers | programmers | managers # union
>>> engineering_management = engineers & managers # intersection
>>> fulltime_management = managers - engineers - programmers # difference
>>> engineers.add('Marvin') # add element
>>> print engineers # doctest: +SKIP
Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
>>> employees.issuperset(engineers) # superset test
False
>>> employees.update(engineers) # update from another set
>>> employees.issuperset(engineers)
True
>>> for group in [engineers, programmers, managers, employees]: # doctest: +SKIP
... group.discard('Susan') # unconditionally remove element
... print group
...
Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
Set(['Janice', 'Jack', 'Sam'])
Set(['Jane', 'Zack', 'Jack'])
Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack'])
Protocol for automatic conversion to immutable
==============================================
Sets can only contain immutable elements. For convenience, mutable
``Set`` objects are automatically copied to an ``ImmutableSet`` before
being added as a set element.
The mechanism is to always add a *hashable* element, or if it is not
hashable, the element is checked to see if it has an
``__as_immutable__()`` method which returns an immutable equivalent.
Since ``Set`` objects have a ``__as_immutable__()`` method returning
an instance of ``ImmutableSet``, it is possible to construct sets of
sets.
A similar mechanism is needed by the ``__contains__()`` and
``remove()`` methods which need to hash an element to check for
membership in a set. Those methods check an element for hashability
and, if not, check for a ``__as_temporarily_immutable__()`` method
which returns the element wrapped by a class that provides temporary
methods for ``__hash__()``, ``__eq__()``, and ``__ne__()``.
The alternate mechanism spares the need to build a separate copy of
the original mutable object.
``Set`` objects implement the ``__as_temporarily_immutable__()``
method which returns the ``Set`` object wrapped by a new class
``_TemporarilyImmutableSet``.
The two mechanisms for adding hashability are normally invisible to
the user; however, a conflict can arise in a multi-threaded
environment where one thread is updating a set while another has
temporarily wrapped it in ``_TemporarilyImmutableSet``. In other
words, sets of mutable sets are not thread-safe.
Comparison to the built-in ``set`` types
========================================
The built-in ``set`` and ``frozenset`` types were designed based on
lessons learned from the ``sets`` module. The key differences are:
* ``Set`` and ``ImmutableSet`` were renamed to ``set`` and
``frozenset``.
* There is no equivalent to ``BaseSet``. Instead, use ``isinstance(x,
(set, frozenset))``.
* The hash algorithm for the built-ins performs significantly better
(fewer collisions) for most datasets.
* The built-in versions have more space efficient pickles.
* The built-in versions do not have a ``union_update()`` method.
Instead, use the ``update()`` method which is equivalent.
* The built-in versions do not have a ``_repr(sorted=True)`` method.
Instead, use the built-in ``repr()`` and ``sorted()`` functions:
``repr(sorted(s))``.
* The built-in version does not have a protocol for automatic
conversion to immutable. Many found this feature to be confusing
and no one in the community reported having found real uses for it.
**