BitVector-2.1.html

Python: module BitVector BitVector (version 2.1, 2011-February-13) BitVector.py Version: 2.1 Author: Avinash Kak (kak@purdue.edu) Date: 2011-February-13 Download Version 2.1 gztar bztar View version 2.1 code in browser Switch to Version 2.2 Switch to Version 3.0 (Version 3.0 works with both Python 2.x and Python 3.x) CHANGE LOG: Version 2.1: Includes enhanced support for folks who use this class for computer security and cryptography work. You can now call on the methods of the BitVector class to do Galois Field GF(2^n) arithmetic on bit arrays. This should save the users of this class the bother of having to write their own routines for finding multiplicative inverses in GF(2^n) finite fields. Version 2.0.1: Fixed numerous typos and other errors in the documentation page for the module. The implementation code remains unchanged. Version 2.0: To address the needs of the folks who are using the BitVector class in data mining research, the new version of the class includes several additional methods. Since the bit vectors used by these folks can be extremely long, possibly involving millions of bits, the new version of the class includes a much faster method for counting the total number of set bits when a bit vector is sparse. [But note that this new bit counting method may perform poorly for dense bit vectors. So the old bit counting method has been retained.] Also for data mining folks, the new version of the class is provided with similarity and distance calculation metrics such as the Jaccard similarity coefficient, the Jaccard distance, and the Hamming distance. Again for the same folks, the class now also has a next_set_bit(from_index) method. Other enhancements to the class include methods for folks who do research in cryptography. Now you can directly calculate the greatest common divisor of two bit vectors, or find the multiplicative inverse of one bit vector modulo another bit vector. Version 1.5.1: Removed a bug from the implementation of the right circular shift operator. Version 1.5: This version should prove to be much more efficient for long bit vectors. Efficiency in BitVector construction when only its size is specified was achieved by eliminating calls to _setbit(). The application of logical operators to two BitVectors of equal length was also made efficient by eliminating calls to the padding function. Another feature of this version is the count_bits() method that returns the total number of bits set in a BitVector instance. Yet another feature of this version is the setValue() method that alters the bit pattern associated with a previously constructed BitVector. Version 1.4.1: The reset() method now returns 'self' to allow for cascaded invocation with the slicing operator. Also removed the discrepancy between the value of the __copyright__ variable in the module and the value of license variable in setup.py. Version 1.4: This version includes the following two upgrades: 1) code for slice assignment; and 2) A reset function to reinitialize a previously constructed BitVector. Additionally, the code was cleaned up with the help of pychecker. Version 1.3.2: Fixed a potentially misleading documentation issue for the Windows users of the BitVector class. If you are writing an internally generated BitVector to a disk file, you must open the file in the binary mode. If you don't, the bit patterns that correspond to line breaks will be misinterpreted. On a Windows machine in the text mode, the bit pattern 000001010 ('\n') will be written out to the disk as 0000110100001010 ('\r\n'). Version 1.3.1: Removed the inconsistency in the internal representation of bit vectors produced by logical bitwise operations vis-a-vis the bit vectors created by the constructor. Previously, the logical bitwise operations resulted in bit vectors that had their bits packed into lists of ints, as opposed to arrays of unsigned shorts. Version 1.3: (a) One more constructor mode included: When initializing a new bit vector with an integer value, you can now also specify a size for the bit vector. The constructor zero-pads the bit vector from the left with zeros. (b) The BitVector class now supports 'if x in y' syntax to test if the bit pattern 'x' is contained in the bit pattern 'y'. (c) Improved syntax to conform to well-established Python idioms. (d) What used to be a comment before the beginning of each method definition is now a docstring. Version 1.2: (a) One more constructor mode included: You can now construct a bit vector directly from a string of 1's and 0's. (b) The class now constructs a shortest possible bit vector from an integer value. So the bit vector for the integer value 0 is just one bit of value 0, and so on. (c) All the rich comparison operators are now overloaded. (d) The class now includes a new method 'intValue()' that returns the unsigned integer value of a bit vector. This can also be done through '__int__'. (e) The package now includes a unittest based framework for testing out an installation. This is in a separate directory called "TestBitVector". Version 1.1.1: The function that does block reads from a disk file now peeks ahead at the end of each block to see if there is anything remaining to be read in the file. If nothing remains, the more_to_read attribute of the BitVector object is set to False. This simplifies reading loops. This version also allows BitVectors of size 0 to be constructed Version 1.1: I have changed the API significantly to provide more ways for constructing a bit vector. As a result, it is now necessary to supply a keyword argument to the constructor. INSTALLATION: The BitVector class was packaged using Distutils. For installation, execute the following command-line in the source directory (this is the directory that contains the setup.py file after you have downloaded and uncompressed the tar archive): python setup.py install You have to have root privileges for this to work. On Linux distributions, this will install the module file at a location that looks like /usr/lib/python2.6/site-packages/ If you do not have root access, you have the option of working directly off the directory in which you downloaded the software by simply placing the following statements at the top of your scripts that use the BitVector class import sys sys.path.append( "pathname_to_BitVector_directory" ) To uninstall the module, simply delete the source directory, locate where BitVector was installed with "locate BitVector" and delete those files. As mentioned above, the full pathname to the installed version is likely to look like /usr/lib/python2.6/site-packages/BitVector* If you want to carry out a non-standard install of BitVector, look up the on-line information on Disutils by pointing your browser to http://docs.python.org/dist/dist.html INTRODUCTION: The BitVector class for a memory-efficient packed representation of bit arrays and for logical operations on such arrays. The core idea used in this Python script for bin packing is based on an internet posting by Josiah Carlson to the Pyrex mailing list. Operations supported on bit vectors: __getitem__ __setitem__ __len__ __iter__ __contains__ __getslice__ __str__ __int__ __add__ __eq__, __ne__, __lt__, __le__, __gt__, __ge__ | for bitwise or & for bitwise and ^ for bitwise xor ~ for bitwise inversion << for circular rotation to the left >> for circular rotation to the right + for concatenation count_bits count_bit_sparse faster for sparse bit vectors deep_copy divide_into_two gcd gf_divide for divisions in GF(2^n) gf_MI for multiplicative inverse in GF(2^n) gf_multiply for multiplications in GF(2) gf_multiply_modular for multiplications in GF(2^n) hamming_distance intValue for returning the integer value isPowerOf2 isPowerOf2_sparse faster for sparse bit vectors jaccard_distance jaccard_similarity length multiplicative_inverse next_set_bit pad_from_left pad_from_right permute rank_of_bit_set_at_index read_bits_from_file read_bits_from_fileobject reset reverse shift_left for non-circular left shift shift_right for non-circular right shift slice assignment setValue unpermute write_to_file write_bits_to_fileobject CONSTRUCTING BIT VECTORS: You can construct a bit vector in seven different ways. (1) You can construct a bit vector directly from either a tuple or a list of bits, as in bv = BitVector( bitlist = [1,0,1,0,0,1,0,1,0,0,1,0,1,0,0,1] ) (2) You can construct a bit vector from an integer by bv = BitVector( intVal = 56789 ) The bits stored now will correspond to the binary representation of the integer. The resulting bit vector is the shortest possible bit vector for the integer value supplied. For example, when intVal is 0, the bit vector constructed will consist of just the bit 0. (3) When initializing a bit vector with an intVal as shown above, you can also specify a size for the bit vector: bv = BitVector( intVal = 0, size = 8 ) will return the bit vector consisting of the bit pattern 00000000. The zero padding needed for meeting the size requirement is always on the left. If the size supplied is smaller than what it takes to create the shortest possible bit vector for intVal, an exception is thrown. (4) You can create a zero-initialized bit vector of a given size by bv = BitVector( size = 62 ) This bit vector will hold exactly 62 bits, all initialized to the 0 bit value. (5) You can construct a bit vector from a disk file by a two-step procedure. First you construct an instance of bit vector by bv = BitVector( filename = 'somefile' ) This bit vector itself is incapable of holding the bits. To now create bit vectors that actually hold the bits, you need to make the following sort of a call on the above variable bv: bv1 = bv.read_bits_from_file( 64 ) bv1 will be a regular bit vector containing 64 bits from the disk file. If you want to re-read a file from the beginning for some reason, you must obviously first close the file object that was acquired with a call to the BitVector constructor with a filename argument. This can be accomplished by bv.close_file_object() (6) You can construct a bit vector from a string of 1's and 0's by bv = BitVector( bitstring = '110011110000' ) (7) Yet another way to construct a bit vector is to read the bits directly from a file-like object, as in x = "111100001111" fileobj = StringIO.StringIO( x ) bv = BitVector( fp = fileobj ) OPERATIONS SUPPORTED BY THE BITVECTOR CLASS: DISPLAYING BIT VECTORS: 1) Since the BitVector class implements the __str__ method, a bit vector can be displayed on a terminal by print bitvec Basically, you can always obtain the string representation of a bit vector by str( bitvec ) and integer value by int( bitvec ) ACCESSING AND SETTING INDIVIDUAL BITS AND SLICES: 2) Any single bit of a bit vector bv can be set to 1 or 0 by bv[M] = 1_or_0 print bv[M] for accessing (and setting) the bit at the position that is indexed M. You can retrieve the bit at position M by bv[M]. Note that the index 0 corresponds to the first bit at the left end of a bit pattern. 3) A slice of a bit vector obtained by bv[i:j] is a bit vector constructed from the bits at index positions from i through j-1. 4) You can also carry out slice assignment: bv1 = BitVector( size = 25 ) bv2 = BitVector( bitstring = '1010001' ) bv1[6:9] = bv2[0:3] bv3 = BitVector( bitstring = '101' ) bv1[0:3] = bv3 The first slice assignment will set the 6th, 7th, and the 8th bits of the bit vector bv1 according to the first three bits of bv2. The second slice assignment will set the first three bits of bv1 according to the three bits in bv3. 5) You can iterate over a bit vector, as illustrated by for bit in bitvec: print bit, This is made possible by the override definition for the special __iter__() method. 6) Negative subscripts for array-like indexing are supported. Therefore, bitvec[ -i ] is legal assuming that the index range is not violated. A negative index carries the usual Python interpretation: The last element of a bit vector is indexed -1 and the first element -(n+1) if n is the total number of bits in the bit vector. 7) You can reset a previously constructed bit vector to either the all-zeros state or the all-ones state by bv1 = BitVector( size = 25 ) ... ... bv1.reset( 1 ) ... ... bv1.reset( 0 ) The first call to reset() will set all the bits of bv1 to 1's and the second call all the bits to 0's. LOGICAL OPERATIONS ON BIT VECTORS: 8) Given two bit vectors bv1 and bv2, you can perform bitwise logical operations on them by result_bv = bv1 ^ bv2 # for bitwise XOR result_bv = bv1 & bv2 # for bitwise AND result_bv = bv1 | bv2 # for bitwise OR result_bv = ~bv1 # for bitwise negation COMPARING BIT VECTORS: 9) Given two bit vectors bv1 and bv2, you can carry out the following comparisons that return Boolean values: bv1 == bv2 bv1 != bv2 bv1 < bv2 bv1 <= bv2 bv1 > bv2 bv1 >= bv2 The equalities and inequalities are determined by the integer values associated with the bit vectors. OTHER SUPPORTED OPERATIONS: 10) You can permute and unpermute bit vectors: bv_permuted = bv.permute( permutation_list ) bv_unpermuted = bv.unpermute( permutation_list ) 11) Left and right circular rotations can be carried out by bitvec << N bitvec >> N for circular rotations to the left and to the right by N bit positions. 12) If you want to shift a bitvector non-circularly: bitvec = BitVector( bitstring = '10010000' ) bitvec.shift_left(3) # 10000000 bitvec.shift_right(3) # 00010000 Obviously, for a sufficient large left or right non-circular shift, you will end up with a bitvector that is all zeros. 13) A bit vector containing an even number of bits can be divided into two equal parts by [left_half, right_half] = bitvec.divide_into_two() where left_half and right_half hold references to the two returned bit vectors. 14) You can find the integer value of a bit array by bitvec.intValue() or by int( bitvec ) 15) You can convert a bit vector into its string representation by str( bitvec ) 16) Because __add__ is supplied, you can always join two bit vectors by bitvec3 = bitvec1 + bitvec2 bitvec3 is a new bit vector that contains all the bits of bitvec1 followed by all the bits of bitvec2. 17) You can find the length of a bitvector by len = bitvec.length() 18) You can make a deep copy of a bitvector by bitvec_copy = bitvec.deep_copy() 19) You can write a bit vector directly to a file, as illustrated by the following example that reads one bit vector from a file and then writes it to another file bv = BitVector( filename = 'input.txt' ) bv1 = bv.read_bits_from_file(64) print bv1 FILEOUT = open( 'output.bits', 'wb' ) bv1.write_to_file( FILEOUT ) FILEOUT.close() bv = BitVector( filename = 'output.bits' ) bv2 = bv.read_bits_from_file( 64 ) print bv2 IMPORTANT: The size of a bit vector must be a multiple of of 8 for this write function to work. If this condition is not met, the function will throw an exception. IMPORTANT FOR WINDOWS USERS: When writing an internally generated bit vector out to a disk file, it is important to open the file in the binary mode as shown. Otherwise, the bit pattern 00001010 ('\n') in your bitstring will be written out as 0000110100001010 ('\r\n'), which is the linebreak on Windows machines. 20) You can also write a bit vector directly to a stream object, as illustrated by fp_write = StringIO.StringIO() bitvec.write_bits_to_fileobject( fp_write ) print fp_write.getvalue() # 111100001111 21) You can pad a bit vector from the left or from the right with a designated number of zeros bitvec.pad_from_left( n ) bitvec.pad_from_right( n ) In the first case, the new bit vector will be the same as the old bit vector except for the additional n zeros on the left. The same thing happens in the second case except that now the additional n zeros will be on the right. 22) You can test if a bit vector x is contained in another bit vector y by using the syntax 'if x in y'. This is made possible by the override definition for the special __contains__() method. 23) You can change the bit pattern associated with a previously constructed BitVector instance: bv = BitVector( intVal = 7, size =16 ) print bv # 0000000000000111 bv.setValue( intVal = 45 ) print bv # 101101 24) You can count the number of bits set in a BitVector instance by bv = BitVector( bitstring = '100111' ) print bv.count_bits() # 4 25) For folks who use bit vectors with millions of bits in them but with only a few bits set, your bit counting will go much, much faster if you call count_bits_sparse() instead of count_bits(): # a BitVector with 2 million bits: bv = BitVector( size = 2000000 ) bv[345234] = 1 bv[233]=1 bv[243]=1 bv[18]=1 bv[785] =1 print bv.count_bits_sparse() 26) You can calculate the similarity and the distance between two bit vectors using the Jaccard similarity coefficient and the Jaccard distance. Also, you can calculate the Hamming distance between two bit vectors: bv1 = BitVector( bitstring = '11111111' ) bv2 = BitVector( bitstring = '00101011' ) print bv1.jaccard_similarity( bv2 ) print bv1.jaccard_distance( bv2 ) print bv1.hamming_distance( bv2 ) 27) Starting from a given bit position, you can find the position index of the next set bit: bv = BitVector( bitstring = '00000000000001' ) print bv.next_set_bit( 5 ) # 13 since the position index of the SET bit after the bit whose position index 5 is 13. 28) You can measure the "rank" of a bit that is set at a given position. Rank is the number of bits that are set up to the position of the bit you are interested in. bv = BitVector( bitstring = '01010101011100' ) print bv.rank_of_bit_set_at_index( 10 ) # 6 29) You can test whether the integer value of a bit vector is a power of two. The sparse version of this method will work much faster for very long bit vectors. However, the regular version may work faster for small bit vectors. bv = BitVector( bitstring = '10000000001110' ) print bv.isPowerOf2() print bv.isPowerOf2_sparse() 30) Given a bit vector, you can construct a bit vector with all the bits reversed, in the sense that what was left to right before now becomes right to left. bv = BitVector( bitstring = '0001100000000000001' ) print bv.reverse() 31) You can find the greatest common divisor of two bit vectors: bv1 = BitVector( bitstring = '01100110' ) bv2 = BitVector( bitstring = '011' ) bv = bv1.gcd( bv2 ) print int(bv) 32) You can find the multiplicative inverse of a bit vector vis-a-vis a given modulus: bv_modulus = BitVector( intVal = 32 ) bv = BitVector( intVal = 17 ) bv_result = bv.multiplicative_inverse( bv_modulus ) if bv_result is not None: print int(bv_result) else: print "No multiplicative inverse in this case" This multiplicative inverse is calculated using normal integer arithmetic. For multiplicative inverses in GF(2^n), use the gf_MI() method described below. 33) To find the multiplicative inverse of a bit vector in GF(2^n) with respect to a modulus polynomial, you can do the following: modulus = BitVector( bitstring = '100011011' ) n = 8 a = BitVector( bitstring = '00110011' ) multi_inverse = a.gf_MI( modulus, n ) print multi_inverse # 01101100 34) If you just want to multiply two bit patterns in GF(2): a = BitVector( bitstring='0110001' ) b = BitVector( bitstring='0110' ) c = a.gf_multiply(b) print c # 00010100110 35) On the other hand, if you want to carry out modular multiplications in GF(2^n): modulus = BitVector( bitstring='100011011' ) # AES modulus n = 8 a = BitVector( bitstring='0110001' ) b = BitVector( bitstring='0110' ) c = a.gf_multiply_modular(b, modulus, n) print c # 10100110 36) To divide by a modulus bitvector in GF(2^n), mod = BitVector( bitstring='100011011' ) # AES modulus n = 8 bitvec = BitVector( bitstring='11100010110001' ) quotient, remainder = bitvec.gf_divide(mod, n) print quotient # 00000000111010 print remainder # 10001111 HOW THE BIT VECTORS ARE STORED: The bits of a bit array are stored in 16-bit unsigned ints. After resolving the argument with which the constructor is called (which happens in lines (A2) through (A82) of the file BitVector.py), the very first thing that the constructor does is to figure out in line (A83) as to how many of those 2-byte ints it needs for the bits. For example, if you wanted to store a 64-bit array, the variable 'two_byte_ints_needed' in line (A83) would be set to 4. (This does not mean that the size of a bit vector must be a multiple of 16. Any sized bit vectors can constructed using the required number of two-byte ints.) Line (A84) creates an array of 2-byte ints and initializes it with the required number of zeros. Lines (A85) then shifts the bits into the array of two-byte ints. As mentioned above, note that it is not necessary for the size of the vector to be a multiple of 16 even though we are using C's unsigned short as as a basic unit for storing the bit arrays. The class BitVector keeps track of the actual number of bits in the bit vector through the "size" instance attribute. With regard to the code in lines (A2) through (A82) of the file BitVector.py, note that, except for one case, the constructor must be called with a single keyword argument, which determines how the bit vector will be constructed. The single exception to this rule is for the keyword argument 'intVal' which can be used along with the 'size' keyword argument. When 'intVal' is used with the 'size' option, the bit vector constructed for the integer is the shortest possible bit vector. On the other hand, when 'size' is also specified, the bit vector is padded with zeroes from the left so that it has the specified size. Lines (A19) through (A25) are for the following sort of a call bv = BitVector( filename = 'myfilename' ) This call returns a bit vector on which you must subsequently invoke the 'read_bits_from_file()' method to actually obtain a bit vector consisting of the bits that constitute the information stored in the file. Lines (A26) through (A31) are for the case when you want to construct a bit vector by reading the bits off a file-like object, as in x = "111100001111" fileobj = StringIO.StringIO( x ) bv = BitVector( fp = fileobj ) Lines (A32) through (A64) are for the case when you want to construct a bit vector from an integer, as in bv = BitVector( intVal = 123456 ) The bits stored in the bit vector will correspond to the binary representation of the integer argument provided. The bit vector constructed with the above call will be the shortest possible bit vector for the integer supplied. As a case in point, when the intVal is 0, the bit vector will consist of a single bit which will be 0 also. The code in lines (A32) through (A64) can also handle the following sort of a call bv = BitVector( intVal = 46, size = 16 ) which returns a bit vector of a specific size by padding the shortest possible bit vector the the intVal with zeros from the left. Lines (A65) through (A71) are for constructing a bit vector with just the size information, as in bv = BitVector( size = 61 ) This returns a bit vector that will hold exactly 61 bits, all initialized to the zero value. Lines (A72) through (A76) are for constructing a bit vector from a bitstring, as in bv = BitVector( bitstring = '00110011111' ) Finally, lines (A77) through (A80) are for constructing a bit vector from a list or a tuple of the individual bits: bv = BitVector( bitlist = (1, 0, 1, 1, 0, 0, 1) ) The bit vector constructed is initialized with the supplied bits. ACKNOWLEDGMENTS: The author is grateful to Oleg Broytmann for suggesting many improvements that were incorporated in Version 1.1 of this package. The author would like to thank Kurt Schwehr whose email resulted in the creation of Version 1.2. Kurt also caught an error in my earlier version of 'setup.py' and suggested a unittest based approach to the testing of the package. Kurt also supplied the Makefile that is included in this distribution. The author would also like to thank all (Scott Daniels, Blair Houghton, and Steven D'Aprano) for their responses to my comp.lang.python query concerning how to make a Python input stream peekable. This feature was included in Version 1.1.1. With regard to the changes incorporated in Version 1.3, thanks are owed to Kurt Schwehr and Gabriel Ricardo for bringing to my attention the bug related to the intVal method of initializing a bit vector when the value of intVal exceeded sys.maxint. This problem is fixed in Version 1.3. Version 1.3 also includes many other improvements that make the syntax better conform to the standard idioms of Python. These changes and the addition of the new constructor mode (that allows a bit vector of a given size to be constructed from an integer value) are also owing to Kurt's suggestions. With regard to the changes incorporated in Version 1.3.1, I would like to thank Michael Haggerty for noticing that the bitwise logical operators resulted in bit vectors that had their bits packed into lists of ints, as opposed to arrays of unsigned shorts. This inconsistency in representation has been removed in version 1.3.1. Michael has also suggested that since BitVector is mutable, I should be overloading __iand__(), __ior__(), etc., for in-place modifications of bit vectors. Michael certainly makes a good point. But I am afraid that this change will break the code for the existing users of the BitVector class. I thank Mathieu Roy for bringing to my attention the problem with writing bitstrings out to a disk files on Windows machines. This turned out to be a problem more with the documentation than with the BitVector class itself. On a Windows machine, it is particularly important that a file you are writing a bitstring into be opened in binary mode since otherwise the bit pattern 00001010 ('\n') will be written out as 0000110100001010 ('\r\n'). This documentation fix resulted in Version 1.3.2. With regard to Version 1.4, the suggestions/bug reports made by John Kominek, Bob Morse, and Steve Ward contributed to this version. I wish to thank all three. John wanted me to equip the class with a reset() method so that a previously constructed class could be reset to either all 0's or all 1's. Bob spotted loose local variables in the implementation --- presumably left over from a debugging phase of the code. Bob recommended that I clean up the code with pychecker. That has been done. Steve noticed that slice assignment was not working. It should work now. Version 1.4.1 was prompted by John Kominek suggesting that if reset() returned self, then the slice operation could be combined with the reset operation. Thanks John! Another reason for 1.4.1 was to remove the discrepancy between the value of the __copyright__ variable in the module and the value of license variable in setup.py. This discrepancy was brought to my attention by David Eyk. Thanks David! Version 1.5 has benefited greatly by the suggestions made by Ryan Cox. By examining the BitVector execution with cProfile, Ryan observed that my implementation was making unnecessary method calls to _setbit() when just the size option is used for constructing a BitVector instance. Since Python allocates cleaned up memory, it is unnecessary to set the individual bits of a vector if it is known in advance that they are all zero. Ryan made a similar observation for the logical operations applied to two BitVector instances of equal length. He noticed that I was making unnecessary calls to _resize_pad_from_left() for the case of equal arguments to logical operations. Ryan also recommended that I include a method that returns the total number of bits set in a BitVector instance. The new method count_bits() does exactly that. Thanks Ryan for all your suggestions. Version 1.5 also includes the method setValue() that allows the internally stored bit pattern associated with a previously constructed BitVector to be changed. A need for this method was expressed by Aleix Conchillo. Thanks Aleix. Version 1.5.1 is a quick release to fix a bug in the right circular shift operator. This bug was discovered by Jasper Spaans. Thanks very much Jasper. Version 2.0 was prompted mostly by the needs of the folks who play with very long bit vectors that may contain millions of bits. I believe such bit vectors are encountered in data mining research and development. Towards that end, among the new methods in Version 2.0, the count_bits_sparse() was provided by Rhiannon Weaver. She says when a bit vector contains over 2 million bits and only, say, five bits are set, her method is faster than the older count_bits() method by a factor of roughly 18. Thanks Rhiannon. [The logic of the new implementation works best for very sparse bit vectors. For very dense vectors, it may perform more slowly than the regular count_bits() method. For that reason, I have retained the original method.] Rhiannon's implementation is based on what has been called the Kernighan way at the web site http://graphics.stanford.edu/~seander/bithacks.html. Version 2.0 also includes a few additional functions posted at this web site for extracting information from bit fields. Also included in this new version is the next_set_bit() method supplied by Jason Allum. I believe this method is also useful for data mining folks. Thanks Jason. Additional methods in Version 2.0 include the similarity and the distance metrics for comparing two bit vectors, method for finding the greatest common divisor of two bit vectors, and a method that determines the multiplicative inverse of a bit vector vis-a-vis a modulus. The last two methods should prove useful to folks in cryptography. ABOUT THE AUTHOR: Avi Kak is the author of "Programming with Objects: A Comparative Presentation of Object-Oriented Programming with C++ and Java", published by John-Wiley in 2003. This book presents a new approach to the combined learning of two large object-oriented languages, C++ and Java. It is being used as a text in a number of educational programs around the world. This book has also been translated into Chinese. Avi Kak is also the author of "Scripting with Objects: A Comparative Presentation of Object-Oriented Scripting with Perl and Python," published in 2008 by John-Wiley. SOME EXAMPLE CODE: #!/usr/bin/env python import BitVector # Construct a bit vector from a list or tuple of bits: bv = BitVector.BitVector( bitlist = (1, 0, 0, 1) ) print bv # 1001 # Construct a bit vector from an integer: bv = BitVector.BitVector( intVal = 5678 ) print bv # 0001011000101110 # Construct a bit vector of a given size from a given # integer: bv = BitVector( intVal = 45, size = 16 ) print bv # 0000000000101101 # Construct a zero-initialized bit vector of a given size: bv = BitVector.BitVector( size = 5 ) print bv # 00000 # Construct a bit vector from a bit string: bv = BitVector.BitVector( bitstring = '110001' ) print bv[0], bv[1], bv[2], bv[3], bv[4], bv[5] # 1 1 0 0 0 1 print bv[-1], bv[-2], bv[-3], bv[-4], bv[-5], bv[-6] # 1 0 0 0 1 1 # Construct a bit vector from a file like object: import StringIO x = "111100001111" fp_read = StringIO.StringIO( x ) bv = BitVector.BitVector( fp = fp_read ) print bv # 111100001111 # Experiments with bitwise logical operations: bv3 = bv1 | bv2 bv3 = bv1 & bv2 bv3 = bv1 ^ bv2 bv6 = ~bv5 # Find the length of a bit vector print len( bitvec ) # Find the integer value of a bit vector print int( bitvec ) # Open a file for reading bit vectors from bv = BitVector.BitVector( filename = 'TestBitVector/testinput1.txt' ) print bv # nothing yet bv1 = bv.read_bits_from_file(64) print bv1 # first 64 bits from the file # Divide a bit vector into two equal sub-vectors: [bv1, bv2] = bitvec.divide_into_two() # Permute and Un-Permute a bit vector: bv2 = bitvec.permute( permutation_list ) bv2 = bitvec.unpermute( permutation_list ) # Try circular shifts to the left and to the right bitvec << 7 bitvec >> 7 # Try 'if x in y' syntax for bit vectors: bv1 = BitVector( bitstring = '0011001100' ) bv2 = BitVector( bitstring = '110011' ) if bv2 in bv1: print "%s is in %s" % (bv2, bv1) else: print "%s is not in %s" % (bv2, bv1) ..... ..... (For a more complete working example, see the example code in the BitVectorDemo.py file in the Examples sub-directory.) Imported Modules array operator Classes BitVectorIterator __builtin__.object BitVector class BitVector(__builtin__.object) Methods defined here: __add__(self, other) Concatenate the argument bit vector with the bit vector on which the method is invoked. Return the concatenated bit vector as a new BitVector object. __and__(self, other) Take a bitwise 'and' of the bit vector on which the method is invoked with the argument bit vector. Return the result as a new bit vector. If the two bit vectors are not of the same size, pad the shorter one with zeros from the left. __contains__(self, otherBitVec) This supports 'if x in y' and 'if x not in y' syntax for bit vectors. __eq__(self, other) # Compare two bit vectors: __ge__(self, other) __getitem__ = _getbit(self, posn) __getslice__(self, i, j) Allow slicing with [i:j], [:], etc. __gt__(self, other) __init__(self, *args, **kwargs) __int__ = intValue(self) __invert__(self) Invert the bits in the bit vector on which the method is invoked and return the result as a new bit vector. __iter__(self) To allow iterations over a bit vector by supporting the 'for bit in bit_vector' syntax: __le__(self, other) __len__ = _getsize(self) __lshift__(self, n) For an in-place left circular shift by n bit positions __lt__(self, other) __ne__(self, other) __or__(self, other) Take a bitwise 'or' of the bit vector on which the method is invoked with the argument bit vector. Return the result as a new bit vector. If the two bit vectors are not of the same size, pad the shorter one with zero's from the left. __rshift__(self, n) For an in-place right circular shift by n bit positions. __setitem__(self, pos, item) This is needed for both slice assignments and for index assignments. It checks the types of pos and item to see if the call is for slice assignment. For slice assignment, pos must be of type 'slice' and item of type BitVector. For index assignment, the argument types are checked in the _setbit() method. __str__(self) To create a print representation __xor__(self, other) Take a bitwise 'xor' of the bit vector on which the method is invoked with the argument bit vector. Return the result as a new bit vector. If the two bit vectors are not of the same size, pad the shorter one with zeros from the left. circular_rot_left(self) This is merely another implementation of the method circular_rotate_left_by_one() shown above. This one does NOT use map functions. This method carries out a one-bit left circular shift of a bit vector. circular_rot_right(self) This is merely another implementation of the method circular_rotate_right_by_one() shown above. This one does NOT use map functions. This method does a one-bit right circular shift of a bit vector. circular_rotate_left_by_one(self) For a one-bit in-place left circular shift circular_rotate_right_by_one(self) For a one-bit in-place right circular shift close_file_object(self) For closing a file object that was used for reading the bits into one or more BitVector objects. count_bits(self) Return the number of bits set in a BitVector instance. count_bits_sparse(self) For sparse bit vectors, this method, contributed by Rhiannon, will be much faster. She estimates that if a bit vector with over 2 millions bits has only five bits set, this will return the answer in 1/18 of the time taken by the count_bits() method. Note however, that count_bits() may work much faster for dense-packed bit vectors. Rhianon's implementation is based on an algorithm generally known as the Brian Kernighan's way, although its antecedents predate its mention by Kernighan and Ritchie. deep_copy(self) divide_into_two(self) Divides an even-sized bit vector into two and returns the two halves as a list of two bit vectors. gcd(self, other) Using Euclid's Algorithm, returns the greatest common divisor of the integer value of the bit vector on which the method is invoked and the integer value of the argument bit vector. gf_MI(self, mod, n) Returns the multiplicative inverse of a vector in the GF(2^n) finite field with the modulus polynomial set to mod gf_divide(self, mod, n) Carries out modular division of a bitvector by the modulus bitvector mod in GF(2^n) finite field. Returns both the quotient and the remainder. gf_multiply(self, b) In the set of polynomials defined over GF(2), multiplies the bitvector on which the method is invoked with the bitvector b. Returns the product bitvector. gf_multiply_modular(self, b, mod, n) Multiplies a bitvector with the bitvector b in GF(2^n) finite field with the modulus bit pattern set to mod hamming_distance(self, other) Computes the Hamming distance between two bit vectors intValue(self) Return the integer value of a bitvector isPowerOf2(self) Determines whether the integer value of a bit vector is a power of 2. isPowerOf2_sparse(self) Faster version of isPowerOf2() for sparse bit vectors jaccard_distance(self, other) Computes the Jaccard distance between two bit vectors jaccard_similarity(self, other) Computes the Jaccard similarity coefficient between two bit vectors length(self) multiplicative_inverse(self, modulus) Calculates the multiplicative inverse of a bit vector modulo the bit vector that is supplied as the argument. Code based on the Extended Euclid's Algorithm. next_set_bit(self, from_index=0) This method, contributed by Jason Allum, calculates the number of bit positions from the current position index to the next set bit. pad_from_left(self, n) Pad a bit vector with n zeros from the left pad_from_right(self, n) Pad a bit vector with n zeros from the right permute(self, permute_list) Permute a bit vector according to the indices shown in the second argument list. Return the permuted bit vector as a new bit vector. rank_of_bit_set_at_index(self, position) For a bit that is set at the argument 'position', this method returns how many bits are set to the left of that bit. For example, in the bit pattern 000101100100, a call to this method with position set to 9 will return 4. read_bits_from_file(self, blocksize) Read blocksize bits from a disk file and return a BitVector object containing the bits. If the file contains fewer bits than blocksize, construct the BitVector object from however many bits there are in the file. If the file contains zero bits, return a BitVector object of size attribute set to 0. read_bits_from_fileobject(self, fp) This function is meant to read a bit string from a file like object. reset(self, val) Resets a previously created BitVector to either all zeros or all ones depending on the argument val. Returns self to allow for syntax like bv = bv1[3:6].reset(1) or bv = bv1[:].reset(1) reverse(self) Returns a new bit vector by reversing the bits in the bit vector on which the method is invoked. setValue(self, *args, **kwargs) Changes the bit pattern associated with a previously constructed BitVector instance. The allowable modes for changing the internally stored bit pattern are the same as for the constructor. shift_left(self, n) For an in-place left non-circular shift by n bit positions shift_left_by_one(self) For a one-bit in-place left non-circular shift. Note that bitvector size does not change. The leftmost bit that moves past the first element of the bitvector is discarded and rightmost bit of the returned vector is set to zero. shift_right(self, n) For an in-place right non-circular shift by n bit positions. shift_right_by_one(self) For a one-bit in-place right non-circular shift. Note that bitvector size does not change. The rightmost bit that moves past the last element of the bitvector is discarded and leftmost bit of the returned vector is set to zero. unpermute(self, permute_list) Unpermute the bit vector according to the permutation list supplied as the second argument. If you first permute a bit vector by using permute() and then unpermute() it using the same permutation list, you will get back the original bit vector. write_bits_to_fileobject(self, fp) This function is meant to write a bit vector directly to a file like object. Note that whereas 'write_to_file' method creates a memory footprint that corresponds exactly to the bit vector, the 'write_bits_to_fileobject' actually writes out the 1's and 0's as individual items to the file object. That makes this method convenient for creating a string representation of a bit vector, especially if you use the StringIO class, as shown in the test code. write_to_file(self, file_out) (Contributed by Joe Davidson) Write the bitvector to the file object file_out. (A file object is returned by a call to open()). Since all file I/O is byte oriented, the bitvector must be multiple of 8 bits. Each byte treated as MSB first (0th index). Data descriptors defined here: __dict__ dictionary for instance variables (if defined) __weakref__ list of weak references to the object (if defined) class BitVectorIterator Methods defined here: __init__(self, bitvec) __iter__(self) next(self) Data __author__ = 'Avinash Kak (kak@purdue.edu)' __copyright__ = '(C) 2011 Avinash Kak. Python Software Foundation.' __date__ = '2011-February-13' __url__ = 'http://RVL4.ecn.purdue.edu/~kak/dist/BitVector-2.1.html' __version__ = '2.1'

Author Avinash Kak (kak@purdue.edu)

Data
		__author__ = 'Avinash Kak (kak@purdue.edu)' __copyright__ = '(C) 2011 Avinash Kak. Python Software Foundation.' __date__ = '2011-February-13' __url__ = 'http://RVL4.ecn.purdue.edu/~kak/dist/BitVector-2.1.html' __version__ = '2.1'