deface package¶
deface.cli¶
- deface.cli.create_parser() argparse.ArgumentParser [source]¶
Create the argument parser for the deface command line tool.
- deface.cli.main() None [source]¶
A command line tool to convert Facebook posts from their personal archive format to a simpler, cleaner version. The tool reads in one or more files with possibly overlapping post data, simplifies the structure of the data, eliminates redundant information, reconciles the records into a single timeline, and then exports that timeline of posts as JSON.
deface.error¶
- exception deface.error.DefaceError[source]¶
Bases:
Exception
The base class for errors specific to this package.
- exception deface.error.ValidationError[source]¶
Bases:
deface.error.DefaceError
An error indicating that JSON data does not have expected fields or type.
- exception deface.error.MergeError[source]¶
Bases:
deface.error.DefaceError
An error indicating that two posts are unrelated and cannot be merged.
deface.ingest¶
- deface.ingest.ingest_into_history(data: deface.validator.Validator[Any], history: deface.model.PostHistory) list[deface.error.DefaceError] [source]¶
Ingest the JSON data value wrapped by the validator as list of posts into the given history. This function returns a list of ingestion errors.
- deface.ingest.ingest_post(data: deface.validator.Validator[Any]) deface.model.Post [source]¶
Ingest the JSON data value wrapped by the validator as a post.
- deface.ingest.ingest_media(data: deface.validator.Validator[Any]) deface.model.Media [source]¶
Ingest the JSON data value wrapped by the validator as a media descriptor.
- deface.ingest.ingest_location(data: deface.validator.Validator[Any]) deface.model.Location [source]¶
Ingest the JSON data value wrapped by the validator as a location.
- deface.ingest.ingest_external_context(data: deface.validator.Validator[Any]) deface.model.ExternalContext [source]¶
Ingest the JSON data value wrapped by the validator as an external context.
- deface.ingest.ingest_event(data: deface.validator.Validator[Any]) deface.model.Event [source]¶
Ingest the JSON data value wrapped by the validator as an event.
- deface.ingest.ingest_comment(data: deface.validator.Validator[Any]) deface.model.Comment [source]¶
Ingest the JSON data value wrapped by the validator as a comment.
deface.logger¶
- class deface.logger.Level(value)[source]¶
Bases:
enum.Enum
An enumeration.
- ERROR = '🛑 '¶
- WARN = '⚠️ '¶
- INFO = 'ℹ️ '¶
- class deface.logger.Logger(stream: TextIO = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>, prefix: str = '', use_color: bool = True, use_emoji: bool = True)[source]¶
Bases:
object
A simple console logger. By default, the logger prefixes messages with the given
prefix
followed by appropriate emoji. If the underlyingstream
is a TTY, it also uses ANSI escape codes to style messages. The use of color or emoji can be disabled by setting the corresponding argument to false.- print_json(value: Any, **kwargs: Any) None [source]¶
Log a nicely indented JSON representation of the given value
deface.model¶
The data model for posts. This module defines the deface’s own post schema,
which captures all Facebook post data in a much simpler fashion. The main type
is the Post
dataclass. It depends on the Comment
,
Event
, ExternalContext
, Location
,
Media
, and MediaMetaData
dataclasses as well as the
MediaType
enumeration. This module also defines the
PostHistory
and find_simultaneous_posts()
helpers for
building up a coherent timeline from Facebook post data.
The schema uses Python tuples instead of lists because the former are immutable and thus do not get in the way of all model classes being both equatable and hashable.
The model’s JSON serialization follows directly from its definition, with every
dataclass instance becoming an object in the JSON text that has the same fields
— with one important exception: If an attribute has None
or the empty tuple
()
as its value, deface.serde.prepare()
removes it from the JSON
representation. Since the schema needs to capture all information contained in
Facebook post data, it includes a relatively large number of optional
attributes. Including them in the serialized representation seems to have little
benefit while cluttering the JSON text.
The model can easily be reinstated from its JSON text post-by-post by passing
the deserialized dictionary to Post.from_dict()
. The method patches the
representation of nested model types and also fills in None
and ()
values. For uniformity of mechanism, all model classes implement from_dict
,
even if they do not need to patch fields before invoking the constructor.
- class deface.model.MediaType(value)[source]¶
Bases:
enum.Enum
An enumeration of media types.
- PHOTO = 'PHOTO'¶
- VIDEO = 'VIDEO'¶
- class deface.model.Comment(author: str, comment: str, timestamp: int)[source]¶
Bases:
object
A comment on a post, photo, or video.
- author: str¶
The comment’s author.
- comment: str¶
The comment’s text.
- timestamp: int¶
The comment’s timestamp.
- classmethod from_dict(data: dict[str, typing.Any]) deface.model.Comment [source]¶
Create a new comment from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of
deface.serde.prepare()
, just asdeface.serde.dumps()
does.
- class deface.model.Event(name: str, start_timestamp: int, end_timestamp: int)[source]¶
Bases:
object
An event
- name: str¶
The event’s name.
- start_timestamp: int¶
The beginning of the event.
- end_timestamp: int¶
The end of the event or zero for events without a defined duration.
- classmethod from_dict(data: dict[str, typing.Any]) deface.model.Event [source]¶
Create a new event from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of
deface.serde.prepare()
, just asdeface.serde.dumps()
does.
- class deface.model.ExternalContext(url: str, name: Optional[str] = None, source: Optional[str] = None)[source]¶
Bases:
object
The external context for a post. In the original Facebook post data, a post’s external context is part of the attachments:
{ "attachments": [ { "data": [ { "external_context": { "name": "Instagram Post by Ro\u00cc\u0081isi\u00cc\u0081n Murphy", "source": "instagram.com", "url": "https://www.instagram.com/p/B_13ojcD6Fh/" } } ] } ] }
Unusually, the example includes a
name
andsource
in addition to theurl
. It also illustrates the mojibake resulting from Facebook erroneously double encoding all text. Thename
should readInstagram Post by Róisín Murphy
.- url: str¶
A URL linking to external content.
- name: Optional[str] = None¶
The name of the website or, if article, its title. Not a common attribute.
- source: Optional[str] = None¶
The name of the website or, if article, the publication’s name. Not a common attribute.
- classmethod from_dict(data: dict[str, typing.Any]) deface.model.ExternalContext [source]¶
Create a new external context from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of
deface.serde.prepare()
, just asdeface.serde.dumps()
does.
- class deface.model.Location(name: str, address: Optional[str] = None, latitude: Optional[float] = None, longitude: Optional[float] = None, url: Optional[str] = None)[source]¶
Bases:
object
A location in the real world. In the original Facebook post data, a post’s place is part of the attachments:
{ "attachments": [ { "data": [ { "place": { "name": "Whitney Museum of American Art", "coordinate": { "latitude": 40.739541735, "longitude": -74.009095020556 }, "address": "", "url": "https://www.facebook.com/whitneymuseum/" } } ] } ] }
The
coordinate
is stripped during ingestion to hoistlatitude
andlongitude
into the location record. In rare cases, thecoordinate
may be missing from the original Facebook data, hence both thelatitude
andlongitude
attributes are optional.- name: str¶
The location’s name.
- address: Optional[str] = None¶
The location’s address.
- latitude: Optional[float] = None¶
The location’s latitude. In the original Facebook post data, this attribute is nested inside the
coordinate
attribute.
- longitude: Optional[float] = None¶
The location’s longitude. In the original Facebook data, this attribute is nested inside the
coordinate
attribute.
- url: Optional[str] = None¶
“The URL for the location on https://www.facebook.com.
- is_mergeable_with(other: deface.model.Location) bool [source]¶
Determine whether this location can be merged with the other location. For two locations to be mergeable, they must have identical
name
,address
,latitude
, andlongitude
attributes. Furthermore, they must either have identicalurl
attributes or one location has a string value while the other location hasNone
.
- merge(other: deface.model.Location) deface.model.Location [source]¶
Merge this location with the given location. In case of identical URLs, this method returns
self
. In case of divergent URLs, this method returns the instance with the URL value.- Raises
MergeError – indicates that the locations differ in more than their URLs and thus cannot be merged.
- classmethod from_dict(data: dict[str, typing.Any]) deface.model.Location [source]¶
Create a new location from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of
deface.serde.prepare()
, just asdeface.serde.dumps()
does.
- class deface.model.MediaMetaData(camera_make: Optional[str] = None, camera_model: Optional[str] = None, exposure: Optional[str] = None, focal_length: Optional[str] = None, f_stop: Optional[str] = None, iso_speed: Optional[int] = None, latitude: Optional[float] = None, longitude: Optional[float] = None, modified_timestamp: Optional[int] = None, orientation: Optional[int] = None, original_height: Optional[int] = None, original_width: Optional[int] = None, taken_timestamp: Optional[int] = None)[source]¶
Bases:
object
The metadata for a photo or video. In the original Facebook post data, this object also includes the
upload_ip
andupload_timestamp
, but since both attributes describe the use of the photo or video on Facebook and not the photo or video itself, they are hoisted into theMedia
record. The remaining attributes, even if present in the original Facebook post data, tend to be meaningless, i.e., are either the empty string or zero. Also, while the remaining attributes would be meaningful for both photos and videos, they are found only on photos.- camera_make: Optional[str] = None¶
- camera_model: Optional[str] = None¶
- exposure: Optional[str] = None¶
- focal_length: Optional[str] = None¶
- f_stop: Optional[str] = None¶
- iso_speed: Optional[int] = None¶
- latitude: Optional[float] = None¶
- longitude: Optional[float] = None¶
- modified_timestamp: Optional[int] = None¶
- orientation: Optional[int] = None¶
- original_height: Optional[int] = None¶
- original_width: Optional[int] = None¶
- taken_timestamp: Optional[int] = None¶
- classmethod from_dict(data: dict[str, typing.Any]) deface.model.MediaMetaData [source]¶
Create new media metadata from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of
deface.serde.prepare()
, just asdeface.serde.dumps()
does.
- class deface.model.Media(media_type: deface.model.MediaType, uri: str, description: Optional[str] = None, title: Optional[str] = None, thumbnail: Optional[str] = None, metadata: Optional[deface.model.MediaMetaData] = None, creation_timestamp: Optional[int] = None, upload_timestamp: Optional[int] = None, upload_ip: str = '', comments: tuple[deface.model.Comment, ...] = <factory>)[source]¶
Bases:
object
A posted photo or video.
- media_type: deface.model.MediaType¶
The media type, which is derived from the metadata key in the original data.
- uri: str¶
The path to the photo or video file within the personal data archive. In terms of RFC 3986, the attribute provides a relative-path reference, i.e., it lacks a scheme such as
file:
and does not start with a slash/
. However, it should not be resolved relative to the file containing the field but rather from the root of the personal data archive.
- description: Optional[str] = None¶
A description of the photo or video. In the original Facebook post data, the value for this attribute may be duplicated amongst all of a post’s media objects as well as the post’s body. Whereever safe, such redundancy is resolved in favor of the post’s body. As a result, any remaining description on a media record is unique to that photo or video.
- title: Optional[str] = None¶
The title for the photo or video. This field is filled in automatically and hence generic. Common variations are
Mobile Uploads
orTimeline Photos
for photos and the empty string for videos.
- thumbnail: Optional[str] = None¶
The thumbnail for a photo or video. If present in the original Facebook data, the value is an object with
uri
as its only field. Just likeMedia.uri
, the thumbnail URI is a relative-path reference that should be resolved from the root of the personal data archive.
- metadata: Optional[deface.model.MediaMetaData] = None¶
The metadata for the photo or video.
- creation_timestamp: Optional[int] = None¶
Seemingly the timestamp for when the media object was created on Facebook. In the original Facebook, this timestamp differs from the post’s timestamp by less than 30 seconds.
- upload_timestamp: Optional[int] = None¶
The timestamp at which the photo or video was uploaded. In the original Facebook post data, this field is part of the
photo_metadata
orvideo_metadata
object nested inside the media object’smedia_metadata
. However, since it really is part of Facebook’s data on the use of the photo or video, it is hoisted into the media record during ingestion.
- upload_ip: str = ''¶
The IP address from which the photo or video was uploaded from. In the original Facebook post data, this attribute is part of the
photo_metadata
orvideo_metadata
object nested inside the media object’smedia_metadata
. It also is the only attribute reliably included with that object. However, sinceupload_ip
really is part of Facebook’s data on the use of the photo or video, it is hoisted into the media record during ingestion.
- comments: tuple[deface.model.Comment, ...]¶
Comments specifically on the photo or video.
- is_mergeable_with(other: deface.model.Media) bool [source]¶
Determine whether this media object can be merged with the other media object. That is the case if both media objects have the same field values with exception of comments, which may be omitted from one of the two media objects.
- merge(other: deface.model.Media) deface.model.Media [source]¶
Merge this media object with the other media object.
- Raises
MergeError – indicates that the two media objects are not mergeable.
- classmethod from_dict(data: dict[str, typing.Any]) deface.model.Media [source]¶
Create a new media descriptor from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of
deface.serde.prepare()
, just asdeface.serde.dumps()
does.
- class deface.model.Post(timestamp: int, backdated_timestamp: Optional[int] = None, update_timestamp: Optional[int] = None, post: Optional[str] = None, name: Optional[str] = None, title: Optional[str] = None, text: tuple[str, ...] = <factory>, external_context: Optional[deface.model.ExternalContext] = None, event: Optional[deface.model.Event] = None, places: tuple[deface.model.Location, ...] = <factory>, tags: tuple[str, ...] = <factory>, media: tuple[deface.model.Media, ...] = <factory>)[source]¶
Bases:
object
A post on Facebook.
- timestamp: int¶
The time a post was made in seconds since the beginning of the Unix epoch on January 1, 1970 at midnight.
- backdated_timestamp: Optional[int] = None¶
A backdated timestamp. Its semantics are unclear.
- update_timestamp: Optional[int] = None¶
Nominally, the time of an update. In practice, if a post includes this field, its value appears to be the same as that of
timestamp
. In other words, the field has devolved to a flag indicating whether a post was updated.
- post: Optional[str] = None¶
The post’s textual body.
- name: Optional[str] = None¶
The name for a recommendations.
- title: Optional[str] = None¶
The title of a post. This field is filled in automatically and hence generic. Starting with more common ones, variations include:
Alice
Alice updated her status.
Alice shared a memory.
Alice wrote on Bob's timeline.
Alice is feeling blessed.
Alice was with Bob.
- text: tuple[str, ...]¶
The text introducing a shared memory.
- external_context: Optional[deface.model.ExternalContext] = None¶
An external context, typically with URL only.
- event: Optional[deface.model.Event] = None¶
The event this post is about.
- places: tuple[deface.model.Location, ...]¶
The places for this post. Almost all posts have at most one
deface.model.Location
. Occasionally, a post has two locations that share the same address, latitude, longitude, and name but differ ondeface.model.Location.url
, with one location havingNone
and the other having some value. In that case,deface.ingest.ingest_post()
eliminates the redundant location object while keepingurl
’s value. Posts with two or more distinct locations seem rare but do occur.
- tags: tuple[str, ...]¶
The tags for a post, including friends and pages.
- media: tuple[deface.model.Media, ...]¶
The photos and videos attached to a post.
- is_simultaneous(other: deface.model.Post) bool [source]¶
Determine whether this post and the other post have the same timestamp.
- is_mergeable_with(other: deface.model.Post) bool [source]¶
Determine whether this post can be merged with the given post. The two posts are mergeable if they differ in their media at most.
- merge(other: deface.model.Post) deface.model.Post [source]¶
Merge this post with the other post. If the two posts differ only in their media, this method returns a new post that combines the media from both posts.
- Raises
MergeError – indicates that the two posts differ in more than their media or have different media descriptors for the same photo or video.
- classmethod from_dict(data: dict[str, typing.Any]) deface.model.Post [source]¶
Create a new post from deserialized JSON text. This method assumes that the JSON text was created by serializing the result of
deface.serde.prepare()
, just asdeface.serde.dumps()
does.
- class deface.model.PostHistory[source]¶
Bases:
object
A history of posts. Use
add()
to add posts one-by-one, as they are ingested. This class organizes them byPost.timestamp
. That lets it easily merge posts that only differ in media as well as eliminate duplicate posts. The latter is particularly important when ingesting posts from more than one personal data archive, since archives may just overlap in time. Once all posts have been added to the history,timeline()
returns a list of all unique posts sorted bytimestamp
.- add(post: deface.model.Post) None [source]¶
Add the post to the history of posts. If the history already includes one or more posts with the same timestamp, this method tries merging the given post with each of those posts and replaces the post upon a successful merge. Otherwise, this method adds the post to the history.
- timeline() list[deface.model.Post] [source]¶
Get a timeline for the history of posts. The timeline includes all posts from the history in chronological order.
- deface.model.find_simultaneous_posts(timeline: list[deface.model.Post]) list[range] [source]¶
Find all simultaneous posts on the given timeline and return the ranges of their indexes.
deface.serde¶
- deface.serde.restore_utf8(data: bytes) bytes [source]¶
Restore the UTF-8 encoding for files exported from Facebook. Such files may appear to be valid JSON at first but nonetheless encode all non-ASCII characters incorrectly. Notably, what should just be UTF-8 byte values are Unicode escape sequences of the form
\u00xx
. This function replaces such sequences with the byte value given by the last two hexadecimal digits. It leaves all other escape sequences in place.NB: If an arbitrary but odd number of backslashes precedes
u00xx
, the final backslash together with theu00xx
forms a unicode escape sequence. However, if an even number of backslashes precedesu00xx
, there is no unicode escape sequence but text discussing unicode escape sequences.This function should be invoked on the bytes of JSON text, before parsing.
- deface.serde.loads(data: bytes, **kwargs: Any) Union[None, bool, int, float, str, list[typing.Any], dict[str, typing.Any]] [source]¶
Return the result of deserializing a value from the given JSON text. This function simply wraps an invocation of the eponymous function in Python’s
json
package — after applyingrestore_utf8()
to the givendata
. It passes the keyword arguments through.
- deface.serde.prepare(data: Any) Any [source]¶
Prepare the given value for serialization to JSON. This function recursively replaces enumeration constants with their names, lists and tuples with equivalent lists, and dataclasses and dictionaries with equivalent dictionaries. While generating equivalent dictionaries, it also filters out entries that are
None
, the empty list[]
, or the empty tuple()
. All other values remain unchanged.
- deface.serde.dumps(data: Any, **kwargs: Any) str [source]¶
Return the result of serializing the given value as JSON text. This function simply wraps an invocation of the eponymous function in Python’s
json
package — after applyingprepare()
to the givendata
. It passes the keyword arguments through.
deface.validator¶
- class deface.validator.Validator(value: deface.validator.T, filename: str = '', key: Optional[Union[int, str]] = None, parent: Optional[deface.validator.Validator[deface.validator.T]] = None)[source]¶
Bases:
Generic
[deface.validator.T
]- property filename: str¶
Get the filename for the file with the JSON data.
- property only_key: str¶
Get the only key. If the current value is a singleton object, this method returns the only key. Otherwise, it raises an assertion error.
- property keypath: str¶
Determine the key path for this validator value. The key path is composed from list items, formatted as say
[42]
, and object fields, formatted like.answer
for fields named with Python identifiers or like["42"]
otherwise.
- property value: deface.validator.T¶
Get the current value.
- raise_invalid(message: str) NoReturn [source]¶
Raise a validation error for the current value. The error message is automatically formatted as the character sequence consisting of filename, keypath, a space, and the given message string.
- Raises
ValidationError – indicates a malformed JSON object.
- to_integer() deface.validator.Validator[int] [source]¶
Coerce the current value to an integer.
- Raises
ValidationError – indicates that the current value is not an integer.
- to_float() deface.validator.Validator[float] [source]¶
Coerce the current value to an integral or floating point number.
- Raises
ValidationError – indicates that the current value is neither an integer nor a floating point number.
- to_string() deface.validator.Validator[str] [source]¶
Coerce the current value to a string.
- Raises
ValidationError – indicates that the current value is not a string.
- to_list() deface.validator.Validator[list[typing.Any]] [source]¶
Coerce the current value to a list.
- Raises
ValidationError – indicates that the current value is not a list.
- items() collections.abc.Iterator[deface.validator.Validator[deface.validator.T]] [source]¶
Get an iterator over the current list value’s items. Each item is wrapped in the appropriate validator to continue validating the JSON data. If the current value is not a list, this method raises an assertion error.
- to_object(valid_keys: Optional[set[str]] = None, singleton: bool = False) deface.validator.Validator[dict[str, typing.Any]] [source]¶
Coerce the current value to an object. If
valid_keys
are given, this method validates the object’s fields against the given field names. Ifsingleton
isTrue
, the object must have exactly one field.- Raises
ValidationError – indicates that the current value is not an object, not an object with a single key, or has a field with unknown name.
- __getitem__(key: Union[int, str]) deface.validator.Validator[Any] [source]¶
Index the current value with the given key to create a new child validator. The given key becomes the new validator’s key and the result of the indexing operation becomes the new validator’s value. This validator becomes the new validator’s parent.
- Raises
TypeError – indicates that the current value is neither list nor object, that the key is not an integer even though the current value is a list, or that the key is not a string even though the current value is an object.
IndexError – indicates that the integer key is out of bounds for the current list value.
ValidationError – indicates that the required field named by the given key for the current object value is missing.